In this blog post, we will be giving a high-level overview of how we structure all of our services around Moleculer. Then, we will understand how we use Moleculer to collect critical insights and metrics to debug and optimize our infrastructure to serve millions of minutes of live video calls every month. So let’s get started!
An Overview of Dyte’s Backend Architecture 📐
Before getting into Moleculer’s awesome features, let me briefly introduce how Moleculer fits into Dyte’s infrastructure. Dyte’s REST API enables you to interact with the platform programmatically and lets you do useful tasks, like:
- Create/Update meetings
- Add participants to meetings
- Customize meeting UI components
- Customize participant permissions and roles
- Trigger/Query meeting recordings
- Register Webhooks to receive notifications from meetings
- Get meeting analytics/statistics and much more
The diagram above gives a high-level overview of how Dyte’s backend servers are designed around the microservice pattern. Each time we get an API request, the request is mapped to one of our API backend replicas running on our cluster, which then uses a Moleculer action (e.g., session.get()) to route the call to the appropriate microservice. Under the hood, Moleculer implements message passing between services using a transporter (in our case, Redis). Moleculer actions are robust and have inbuilt support for load balancing. The cool part about this feature is that it has zero extra infra cost, Moleculer maintains state in the transporter itself to balance your action invocations between replicas. Some key benefits that we are realizing through Moleculer are:
Ease of development and debugging 👨💻
At Dyte, we use the Moleculer-decorators npm package to make it even simpler to develop microservices. Want to convert a class into a microservice and expose its methods as actions on the service mesh?
Just use @Service(), @Action(), and you are all set!
The parameters are not only type-checked during compile-time, but during run time as well. Internally, Moleculer uses the powerful fastest-validator package to perform run-time data validation, which helps prevent vulnerabilities/crashes due to users passing in unexpected inputs.
Another valuable feature of Moleculer we widely use in Dyte during local development is the Moleculer-cli, which lets you trigger actions/events and view actions/services quickly.
A large part of our infrastructure is event-driven, i.e., events are triggered by services, and other services need to listen for these events and then do operations accordingly. For example,
- A participant joins a meeting, and then the “participant joined” event is triggered.
- The stats service notes down joining times and records them in the database.
- The auto-scaling service updates the participant count for the room to make better decisions about allocating future rooms to servers… and so on.
Moleculer has inbuilt support for events. One of its best features is that these are load-balanced across replicas of the instance, i.e., each event is guaranteed only to be received by one replica.
Fault Tolerance ⚠️
At Dyte, many times in production, we encounter service failures. For example, specific actions require a high number of DB calls / CPU processing. A high load on these actions can cause increased response times and sometimes even bottleneck the physical node on which the service runs, thus affecting other services, leading to cascading failures. Moleculer has two helpful features to mitigate these issues:
- Retry Policies let us configure timeouts for action calls and rules to retry actions on timeouts with configurable backoff times and maximum limits.
2. Circuit Breakers: In some cases, it is more beneficial to cut off calls to action to prevent cascading failures to other system components. This feature allows us to configure the maximum number of action failures before blocking all calls to the action.
Metrics, metrics, and more metrics! 📊
Moleculer exposes many metrics crucial for Dyte to monitor how well our system performs and make optimizations accordingly. The best part, these metrics are exposable as Prometheus metrics, which means we could easily plug them into our existing monitoring system. The complete list of metrics that are exposed can be found here, but the most useful ones we found are:
- Action response times (p50, p90, p99)
- Action requests per minute
- Total errors/ error rate
- CPU/Memory Utilization
At Dyte, we integrated these statistics with Last9's powerful dashboards and anomaly detection alerting features. Here is an example of the response times from one of the actions on our internal monitoring panel:
Under 30-40ms, that looks a-okay!
Request Tracing with Jaeger 🔬
A feature of Moleculer we recently discovered was its ability to export tracing-related information to other popular services like Jaeger. Though the Prometheus metrics provided us with valuable high-level overviews of how our actions are performing, tracing gives us an in-depth analysis of each action call in a chain of sequential calls. This lets us identify which actions contribute more to the total response time of an API endpoint, just enabling us to make better decisions on architectural changes we must make to mitigate these bottlenecks.
Final Thoughts 🥳
In our next blog post on this topic, we will compare Moleculer with other popular microservice frameworks like Kafka, Istio, etc. Make sure you don't miss that. Till then, adios! 👋
If you haven’t heard about Dyte yet, head over to https://dyte.io to learn how we are revolutionizing live video calling through our SDKs and libraries and how you can get started quickly on your 10,000 free minutes which renew every month. If you have any questions, you can reach us at firstname.lastname@example.org or ask our developer community.