The aggregation framework in MongoDB is a powerful tool for performing data processing and analysis on documents within a collection. It enables complex queries and transformations using a series of pipeline stages, each applying specific operations on the data. This chapter will provide a comprehensive overview of the aggregation framework, covering basic concepts, common stages, and advanced techniques. Each section will include detailed explanations and code examples.
The aggregation framework processes data records and returns computed results. It is often used for tasks such as data transformation, statistical analysis, and data summarization. Aggregations are built on an extensible expression language and can perform a variety of operations, including filtering, grouping, sorting, and reshaping documents.
The aggregation pipeline is a series of stages through which documents pass. Each stage transforms the documents in some way and passes the results to the next stage. The stages can filter, group, sort, and reshape documents.
A pipeline stage is a MongoDB operator that performs an operation on the documents passed to it. Common stages include $match
, $group
, $sort
, and $project
.
Filters documents to pass only those that match the specified condition.
db.collection.aggregate([
{ $match: { status: "A" } }
]);
$match
: Filters documents where status
is “A”
// Output
[
{ _id: 1, status: "A", score: 85 },
{ _id: 2, status: "A", score: 90 }
]
Shapes the documents by including, excluding, or adding new fields.
db.collection.aggregate([
{ $project: { item: 1, total: { $multiply: ["$price", "$quantity"] } } }
]);
$project
: Includes the item
field and adds a new field total
calculated by multiplying price
and quantity
.
// Output
[
{ _id: 1, item: "apple", total: 60 },
{ _id: 2, item: "banana", total: 30 }
]
Groups documents by a specified field and applies aggregate functions.
db.collection.aggregate([
{ $group: { _id: "$status", total: { $sum: "$score" } } }
]);
$group
: Groups documents by status
and calculates the total score
for each group.
// Output
[
{ _id: "A", total: 175 },
{ _id: "B", total: 70 }
]
Sorts the documents by a specified field.
db.collection.aggregate([
{ $sort: { score: -1 } }
]);
$sort
: Orders the documents by score
in descending order.
// Output
[
{ _id: 2, status: "A", score: 90 },
{ _id: 1, status: "A", score: 85 },
{ _id: 3, status: "B", score: 70 }
]
Limits the number of documents passed to the next stage.
db.collection.aggregate([
{ $limit: 2 }
]);
$limit
: Passes only the first 2 documents to the next stage.
// Output
[
{ _id: 1, status: "A", score: 85 },
{ _id: 2, status: "A", score: 90 }
]
Skips a specified number of documents and passes the rest to the next stage.
db.collection.aggregate([
{ $skip: 1 }
]);
$skip
: Skips the first document and passes the remaining documents to the next stage.
// Output
[
{ _id: 2, status: "A", score: 90 },
{ _id: 3, status: "B", score: 70 }
]
Deconstructs an array field from the input documents to output a document for each element.
db.collection.aggregate([
{ $unwind: "$items" }
]);
$unwind
: Outputs a document for each element in the items
array field.
// Output
[
{ _id: 1, items: "apple" },
{ _id: 1, items: "banana" },
{ _id: 2, items: "orange" }
]
Used for performing arithmetic operations on fields.
db.collection.aggregate([
{ $project: { total: { $add: ["$price", "$tax"] } } }
]);
$add
: Adds the price
and tax
fields.
// Output
[
{ _id: 1, total: 110 },
{ _id: 2, total: 220 }
]
Used for performing operations on arrays.
db.collection.aggregate([
{ $project: { firstItem: { $arrayElemAt: ["$items", 0] } } }
]);
$arrayElemAt
: Retrieves the first element of the items
array.
// Output
[
{ _id: 1, firstItem: "apple" },
{ _id: 2, firstItem: "orange" }
]
Used for evaluating boolean conditions.
db.collection.aggregate([
{ $project: { isExpensive: { $gt: ["$price", 100] } } }
]);
$gt
: Evaluates if price
is greater than 100.
// Output
[
{ _id: 1, isExpensive: true },
{ _id: 2, isExpensive: false }
]
Used for comparing field values.
db.collection.aggregate([
{ $project: { isExpensive: { $gt: ["$price", 100] } } }
]);
$gt
: Checks if price
is greater than 100.
// Output
[
{ _id: 1, isExpensive: true },
{ _id: 2, isExpensive: false }
]
Used for manipulating dates.
db.collection.aggregate([
{ $project: { year: { $year: "$date" } } }
]);
$year
: Extracts the year from the date
field.
// Output
[
{ _id: 1, year: 2020 },
{ _id: 2, year: 2021 }
]
Used for manipulating strings.
db.collection.aggregate([
{ $project: { uppercaseName: { $toUpper: "$name" } } }
]);
$toUpper
: Converts the name
field to uppercase.
// Output
[
{ _id: 1, uppercaseName: "ALICE" },
{ _id: 2, uppercaseName: "BOB" }
]
Performs a left outer join to a collection in the same database to filter in documents from the joined collection.
db.orders.aggregate([
{
$lookup: {
from: "inventory",
localField: "item",
foreignField: "item",
as: "inventory_docs"
}
}
]);
$lookup
: Joins the orders
collection with the inventory
collection based on the item
field.
// Output
[
{
_id: 1,
item: "apple",
inventory_docs: [
{ _id: 101, item: "apple", qty: 50 }
]
},
{
_id: 2,
item: "banana",
inventory_docs: [
{ _id: 102, item: "banana", qty: 30 }
]
}
]
Processes multiple aggregation pipelines within a single stage and outputs a document that contains the results of each pipeline.
db.collection.aggregate([
{
$facet: {
"priceSummary": [
{ $group: { _id: null, avgPrice: { $avg: "$price" }, maxPrice: { $max: "$price" } } }
],
"categories": [
{ $group: { _id: "$category", count: { $sum: 1 } } }
]
}
}
]);
$facet
: Runs multiple pipelines to get different summaries in a single query.
// Output
[
{
priceSummary: [
{ _id: null, avgPrice: 150, maxPrice: 200 }
],
categories: [
{ _id: "electronics", count: 10 },
{ _id: "appliances", count: 5 }
]
}
]
Categorizes documents into groups, called buckets, based on a specified expression and bucket boundaries.
db.collection.aggregate([
{
$bucket: {
groupBy: "$price",
boundaries: [0, 50, 100, 150, 200],
default: "Other",
output: {
"count": { $sum: 1 },
"items": { $push: "$name" }
}
}
}
]);
$bucket
: Divides documents into price ranges and counts the number of documents in each range.
// Output
[
{ _id: 0, count: 2, items: ["item1", "item2"] },
{ _id: 50, count: 3, items: ["item3", "item4", "item5"] },
{ _id: 100, count: 1, items: ["item6"] },
{ _id: "Other", count: 1, items: ["item7"] }
]
Performs a recursive search on a collection, searching a collection for documents that match a specified condition.
db.employees.aggregate([
{
$graphLookup: {
from: "employees",
startWith: "$reportsTo",
connectFromField: "reportsTo",
connectToField: "name",
as: "reportingHierarchy"
}
}
]);
$graphLookup
: Recursively searches the employees
collection to find the reporting hierarchy of each employee.
// Output
[
{
_id: 1,
name: "Alice",
reportsTo: "Bob",
reportingHierarchy: [
{ _id: 2, name: "Bob", reportsTo: "Charlie" },
{ _id: 3, name: "Charlie", reportsTo: null }
]
},
{
_id: 2,
name: "Bob",
reportsTo: "Charlie",
reportingHierarchy: [
{ _id: 3, name: "Charlie", reportsTo: null }
]
}
]
$match
stages.$match
and $sort
stages early in the pipeline to reduce the number of documents processed in later stages.allowDiskUse
option if necessary.practical examples with sample data. We’ll use a collection of documents that represent sales records in a retail store.
Here’s a sample dataset of sales records:
db.sales.insertMany([
{ _id: 1, date: new Date("2023-07-25"), product: "Laptop", quantity: 2, amount: 1500, city: "New York" },
{ _id: 2, date: new Date("2023-07-25"), product: "Laptop", quantity: 1, amount: 750, city: "San Francisco" },
{ _id: 3, date: new Date("2023-07-26"), product: "Phone", quantity: 3, amount: 900, city: "New York" },
{ _id: 4, date: new Date("2023-07-26"), product: "Phone", quantity: 2, amount: 600, city: "San Francisco" },
{ _id: 5, date: new Date("2023-07-27"), product: "Tablet", quantity: 5, amount: 1250, city: "New York" },
{ _id: 6, date: new Date("2023-07-27"), product: "Tablet", quantity: 4, amount: 1000, city: "San Francisco" },
{ _id: 7, date: new Date("2023-07-28"), product: "Headphones", quantity: 10, amount: 500, city: "New York" },
{ _id: 8, date: new Date("2023-07-28"), product: "Headphones", quantity: 8, amount: 400, city: "San Francisco" }
]);
To calculate the total sales amount for each day.
db.sales.aggregate([
{ $group: { _id: "$date", totalSales: { $sum: "$amount" } } },
{ $sort: { _id: 1 } }
]);
$group
: Groups the documents by the date
field and calculates the total sales amount for each day using $sum
.$sort
: Sorts the grouped documents by date
in ascending order.
// Output
[
{ _id: ISODate("2023-07-25T00:00:00Z"), totalSales: 2250 },
{ _id: ISODate("2023-07-26T00:00:00Z"), totalSales: 1500 },
{ _id: ISODate("2023-07-27T00:00:00Z"), totalSales: 2250 },
{ _id: ISODate("2023-07-28T00:00:00Z"), totalSales: 900 }
]
To find the average sales amount for each city.
db.sales.aggregate([
{ $group: { _id: "$city", avgSalesAmount: { $avg: "$amount" } } }
]);
$group
: Groups the documents by the city
field and calculates the average sales amount for each city using $avg
.
// Output
[
{ _id: "New York", avgSalesAmount: 1037.5 },
{ _id: "San Francisco", avgSalesAmount: 687.5 }
]
To find the top 3 most sold products based on the quantity sold.
db.sales.aggregate([
{ $group: { _id: "$product", totalQuantity: { $sum: "$quantity" } } },
{ $sort: { totalQuantity: -1 } },
{ $limit: 3 }
]);
$group
: Groups the documents by the product
field and calculates the total quantity sold for each product using $sum
.$sort
: Sorts the grouped documents by totalQuantity
in descending order.$limit
: Limits the output to the top 3 products.
// Output
[
{ _id: "Headphones", totalQuantity: 18 },
{ _id: "Tablet", totalQuantity: 9 },
{ _id: "Phone", totalQuantity: 5 }
]
To find the average quantity sold per product for each city.
db.sales.aggregate([
{ $group: { _id: { product: "$product", city: "$city" }, avgQuantity: { $avg: "$quantity" } } },
{ $sort: { "_id.city": 1, "_id.product": 1 } }
]);
$group
: Groups the documents by both product
and city
and calculates the average quantity sold for each combination using $avg
.$sort
: Sorts the grouped documents first by city
and then by product
in ascending order.
// Output
[
{ _id: { product: "Headphones", city: "New York" }, avgQuantity: 10 },
{ _id: { product: "Headphones", city: "San Francisco" }, avgQuantity: 8 },
{ _id: { product: "Laptop", city: "New York" }, avgQuantity: 2 },
{ _id: { product: "Laptop", city: "San Francisco" }, avgQuantity: 1 },
{ _id: { product: "Phone", city: "New York" }, avgQuantity: 3 },
{ _id: { product: "Phone", city: "San Francisco" }, avgQuantity: 2 },
{ _id: { product: "Tablet", city: "New York" }, avgQuantity: 5 },
{ _id: { product: "Tablet", city: "San Francisco" }, avgQuantity: 4 }
]
To get a summary of total sales and average sales amount per product, all in one query.
db.sales.aggregate([
{
$facet: {
totalSales: [
{ $group: { _id: null, totalAmount: { $sum: "$amount" } } }
],
averageSalesPerProduct: [
{ $group: { _id: "$product", avgAmount: { $avg: "$amount" } } }
]
}
}
]);
$facet
: Allows multiple aggregation pipelines within a single stage.totalSales
: Aggregation pipeline to calculate the total sales amount.averageSalesPerProduct
: Aggregation pipeline to calculate the average sales amount per product.
// Output
[
{
totalSales: [
{ _id: null, totalAmount: 6900 }
],
averageSalesPerProduct: [
{ _id: "Headphones", avgAmount: 450 },
{ _id: "Laptop", avgAmount: 1125 },
{ _id: "Phone", avgAmount: 750 },
{ _id: "Tablet", avgAmount: 1125 }
]
}
]
In this chapter, we delved into the MongoDB aggregation framework, exploring its basic concepts, common pipeline stages, and advanced techniques. The aggregation framework is a powerful tool for data transformation and analysis, enabling complex operations to be performed efficiently within MongoDB. Happy coding !❤️