Back in my previous company, we ran into issues where our MongoDB server became very slow and affected all our queries. Fortunately, we used Atlas and Profiler was available to us to analyse what was going on.
Here are some things we looked at.
High Operation Time
Recurring DB queries
CPU Usage
Aggregation Pipelines
Performance Advisor for recommended indexes (Though, it might not be useful every time)
Optimisation
Add Cache Layer
After looking at the metrics, we reduced recurring DB queries by adding a caching layer in between, since the data changed less frequently than the expensive queries we were making.
Improving Pipelines
First, we added an index to a couple of fields which were actively being used in our pipelines.
Second, we improved the performance of the pipeline by moving our $match
filter state before the lookup to reduce the amount of lookups.
Finally, we used $project
to limit the amount of data we passed from one stage to another.
Scale Hardware
We also had to increase the CPU and RAM of our infra to handle the increased volume of queries being made to the DB as our last step.