Data archiving and retention policies are essential for effectively managing large data sets in MongoDB. Retaining data that is necessary for compliance, analytics, or historical tracking, while removing obsolete data, is key to keeping MongoDB environments optimized for performance and cost.
Data archiving is the process of moving rarely accessed or historical data from the main operational database to an archive storage, where it remains available but is out of primary storage. Archiving:
Data retention policies define rules for how long data should be stored in both primary and archival storage. These policies vary based on compliance requirements, operational needs, and business rules. Retention policies typically specify:
Implementing well-defined archiving and retention policies in MongoDB is essential for:
Before setting up policies, identify:
Here’s how different data types can guide retention policies:
Maintaining documentation of all retention policies ensures consistent data management, allowing teams to understand the lifecycle of each data set.
One common approach is to use separate collections for archiving, which keeps archived data in the same database but in a distinct location.
Create a collection for active transactions
and a separate collection transactions_archive
for archived data.
transactions
Collection:
db.transactions.insertMany([
{ transactionId: 101, amount: 150, createdAt: new Date("2023-01-01") },
{ transactionId: 102, amount: 250, createdAt: new Date("2023-03-15") }
]);
transactions_archive
:
const archiveDate = new Date();
archiveDate.setMonth(archiveDate.getMonth() - 6);
const documentsToArchive = db.transactions.find({ createdAt: { $lt: archiveDate } });
db.transactions_archive.insertMany(documentsToArchive.toArray());
db.transactions.deleteMany({ createdAt: { $lt: archiveDate } });
transactions_archive
and deletes them from transactions
.Separating archived data into a different database, such as archive
, enhances performance for operational queries by isolating active data.
const archiveDB = db.getSiblingDB("archive");
const oldTransactions = db.transactions.find({ createdAt: { $lt: archiveDate } });
archiveDB.transactions.insertMany(oldTransactions.toArray());
db.transactions.deleteMany({ createdAt: { $lt: archiveDate } });
In MongoDB Atlas, you can set up triggers to run automatically and archive data periodically based on criteria you define.
exports = function() {
const db = context.services.get("mongodb-atlas").db("mainDB");
const archiveDB = context.services.get("mongodb-atlas").db("archiveDB");
const archiveDate = new Date();
archiveDate.setMonth(archiveDate.getMonth() - 6);
const oldData = db.collection("transactions").find({ createdAt: { $lt: archiveDate } });
archiveDB.collection("transactions").insertMany(oldData.toArray());
db.collection("transactions").deleteMany({ createdAt: { $lt: archiveDate } });
};
The trigger automatically archives data older than six months every day, ensuring data in the transactions
collection remains current.
The aggregation pipeline is useful for transforming data before archiving it, such as selecting specific fields to reduce storage size.
Example: Archive specific fields from a user_activity
collection.
db.user_activity.aggregate([
{ $match: { createdAt: { $lt: archiveDate } } },
{ $project: { userId: 1, activityType: 1, createdAt: 1 } },
{ $merge: { into: "user_activity_archive" } }
]);
db.user_activity.deleteMany({ createdAt: { $lt: archiveDate } });
Using custom conditions, such as archiving only users who have been inactive for a year, helps tailor archiving to specific requirements.
db.users.find({ lastActive: { $lt: archiveDate }, isActive: false });
TTL (Time to Live) indexes in MongoDB are ideal for automatically expiring documents after a set period, making them useful for managing retention policies for short-lived data.
db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 2592000 }); // 30 days
You can define retention policies for archived data, specifying rules for data expiration. For instance, older logs might be kept for six months in logs_archive
, after which they can be deleted.
db.logs_archive.createIndex({ archivedAt: 1 }, { expireAfterSeconds: 15552000 }); // 6 months
db.logs.find({ createdAt: { $lt: archiveDate } }).forEach((log) => {
log.archivedAt = new Date(); // Set archivedAt for TTL to work
db.logs_archive.insertOne(log);
db.logs.deleteOne({ _id: log._id });
});
Ensure your archive storage remains optimized by setting monitoring alerts, especially if archiving large datasets.
As business needs evolve, retention policies may need adjustments to reflect the latest compliance standards or performance requirements.
While archiving reduces active storage needs, keep regular backups to avoid data loss of archived data.
Archive order data over a year old for historical reporting, while keeping recent orders available.
Move logs older than six months to an archive for regulatory purposes and delete logs older than a year.
For time-series data from sensors, retain recent data for analysis and archive older data for long-term storage.
In MongoDB, data archiving and retention policies are essential for maintaining an optimized and compliant database environment. By defining clear policies for data lifecycle management, implementing archiving strategies, and leveraging MongoDB’s features such as TTL indexes and automated triggers, you can ensure efficient data storage and access. With these techniques, organizations can manage growing data volumes without compromising on performance or regulatory requirements. Happy Coding!❤️