Data archiving is the process of moving infrequently accessed or historical data from the main database to a secondary storage system. This helps reduce the load on MongoDB, optimize query performance, manage storage costs, and meet compliance requirements.
Data archiving refers to systematically moving less frequently accessed data to a designated archive collection or database, ensuring active storage is optimized for performance-critical data. Archiving doesn’t delete data but rather retains it in a less active or read-only environment.
Data archiving is essential for:
Common scenarios that benefit from archiving include:
One basic approach is moving data to a separate collection within the same database, keeping it available but out of primary storage.
Let’s create two collections: orders
(primary) and orders_archive
(for archived data).
orders
:
db.orders.insertMany([
{ orderId: 1, product: "Laptop", createdAt: new Date("2022-01-01") },
{ orderId: 2, product: "Smartphone", createdAt: new Date("2022-02-01") },
{ orderId: 3, product: "Tablet", createdAt: new Date("2022-03-01") }
]);
orders_archive
:
const archiveDate = new Date("2022-02-01");
const documentsToArchive = db.orders.find({ createdAt: { $lt: archiveDate } });
// Insert archived documents
db.orders_archive.insertMany(documentsToArchive.toArray());
// Remove archived documents from main collection
db.orders.deleteMany({ createdAt: { $lt: archiveDate } });
archiveDate
.orders_archive
.orders
.orders_archive
now holds older orders:
[
{ "orderId": 1, "product": "Laptop", "createdAt": ISODate("2022-01-01") }
]
Separating archived data into a different database optimizes performance for frequently accessed collections.
// Define target database
const archiveDB = db.getSiblingDB("archive");
// Find and insert data to archive database
const oldOrders = db.orders.find({ createdAt: { $lt: archiveDate } });
archiveDB.orders.insertMany(oldOrders.toArray());
// Delete from main collection
db.orders.deleteMany({ createdAt: { $lt: archiveDate } });
getSiblingDB
: Accesses the archive
database within MongoDB.orders
in the main database to orders
in archive
and deletes the archived data from orders
.The aggregation pipeline allows for data transformation and selection before archiving, which is useful for archiving processed or summary data.
Example: Archiving only essential fields from orders
.
db.orders.aggregate([
{ $match: { createdAt: { $lt: archiveDate } } },
{ $project: { orderId: 1, product: 1, createdAt: 1 } },
{ $merge: { into: "orders_archive" } }
]);
// Delete archived data
db.orders.deleteMany({ createdAt: { $lt: archiveDate } });
$match
: Selects documents before archiveDate
.$project
: Selects only essential fields to archive.$merge
: Inserts data into orders_archive
.MongoDB Atlas Triggers enable automatic data archiving by executing functions in response to database events.
In MongoDB Atlas:
exports = function() {
const db = context.services.get("mongodb-atlas").db("mydatabase");
const archiveDB = context.services.get("mongodb-atlas").db("archive");
const archiveDate = new Date();
archiveDate.setMonth(archiveDate.getMonth() - 6); // Archive data older than 6 months
const oldData = db.collection("orders").find({ createdAt: { $lt: archiveDate } });
archiveDB.collection("orders").insertMany(oldData.toArray());
db.collection("orders").deleteMany({ createdAt: { $lt: archiveDate } });
};
Using cron jobs is a common way to run scripts periodically on local or cloud-hosted MongoDB instances.
0 0 * * * /usr/bin/mongo /path/to/archiveScript.js
archiveScript.js
):
const db = connect("mongodb://localhost:27017/mydatabase");
const archiveDB = connect("mongodb://localhost:27017/archive");
const archiveDate = new Date();
archiveDate.setMonth(archiveDate.getMonth() - 6);
const data = db.orders.find({ createdAt: { $lt: archiveDate } });
archiveDB.orders.insertMany(data.toArray());
db.orders.deleteMany({ createdAt: { $lt: archiveDate } });
Define clear policies for data retention and review them regularly to keep storage and performance optimized.
Regularly monitor your archive database for storage and performance, as archived data can still impact system resources over time.
Maintaining documentation of archiving policies helps ensure team members understand and follow retention requirements.
For an e-commerce app, order history older than one year can be archived for performance reasons.
Logs older than three months may be moved to an archive for compliance with logging policies.
Inactive user profiles (e.g., no login for 12+ months) can be archived to save space.
Defining and implementing effective data archiving strategies in MongoDB helps organizations optimize database performance, reduce storage costs, and comply with data retention policies. MongoDB’s flexibility with separate collections, databases, aggregation pipelines, and automated triggers makes it easy to customize archiving for different use cases. Happy Coding!❤️