The Aggregation Pipeline: SQL Power in MongoDB
Welcome to Week 3! 📊 This week we unlock the real power of MongoDB: The Aggregation Framework.
If find() is like selecting rows in Excel, aggregate() is like Pivot Tables and Macros on steroids.
1. The Pipeline Concept
Think of it like a factory assembly line. Documents go in one end -> pass through a series of Stages -> Result comes out the other end.
[ Stage 1 ] -> [ Stage 2 ] -> [ Stage 3 ]
2. Basic Stages
A. $match (Filter)
It’s just like find(), but inside a pipeline. Always put this FIRST to reduce the amount of data processing.
{ $match: { status: "active" } }
B. $group (Summarize)
This is your GROUP BY. You can calculate sums, averages, counts, etc.
{
$group: {
_id: "$category", // Group by Category
totalSales: { $sum: "$amount" }, // Sum the prices
avgPrice: { $avg: "$price" } // Average price
}
}
C. $sort (Order)
Sorts the result.
{ $sort: { totalSales: -1 } } // Highest sales first
D. $project (Reshape)
Selects which fields to keep, rename, or calculate new ones.
{
$project: {
_id: 0,
categoryName: "$_id",
revenue: "$totalSales"
}
}
3. Putting it Together
Let’s say we have a Orders collection. We want to find the Top 3 Selling Products that are “Shipped”.
db.orders.aggregate([
// Stage 1: Filter only shipped orders
{ $match: { status: "shipped" } },
// Stage 2: Group by Product and Sum the Quantity
{
$group: {
_id: "$product_name",
totalQuantity: { $sum: "$quantity" }
}
},
// Stage 3: Sort by Quantity Descending
{ $sort: { totalQuantity: -1 } },
// Stage 4: Limit to top 3
{ $limit: 3 }
])
4. Why not just use MapReduce?
MapReduce is deprecated in MongoDB. The Aggregation Framework is written in C++ and is much, much faster.
đź§ Daily Challenge
- Open up Compass or Shell.
- Run an aggregation on your
Usercollection. - Group by
address.cityand count how many users live in each city.{ $group: { _id: "$address.city", count: { $sum: 1 } } }
See you on Day 2 for Lookups (Joins) and more complex pipelines! đź”—