GridFS is a specification for storing and retrieving large files, such as images, audio files, and videos, in MongoDB. By default, MongoDB documents have a size limit of 16 MB. When you need to store files larger than this limit, GridFS comes into play. Instead of storing the file as a single document, GridFS divides it into smaller chunks and stores each chunk as a separate document in the fs.chunks collection. The file's metadata, such as its filename and upload date, is stored in the fs.files collection.
GridFS is useful when you need to store and retrieve files that exceed MongoDB’s document size limit. It also provides a mechanism to efficiently stream files, which is particularly useful for serving large files to clients, such as when building a video or audio streaming service. Additionally, GridFS integrates seamlessly with MongoDB, allowing you to leverage MongoDB’s indexing, querying, and other features to manage your files.
To work with GridFS in Node.js, you’ll need:
First, let’s set up a new Node.js project and install the necessary packages:
mkdir gridfs-nodejs
cd gridfs-nodejs
npm init -y
mongodb
: The official MongoDB driver for Node.js.multer
: A middleware for handling multipart/form-data
, which is primarily used for uploading files.gridfs-stream
: A streaming API for GridFS.express
: A minimal and flexible Node.js web application framework.
npm install mongodb multer gridfs-stream express
To interact with GridFS in MongoDB using Node.js, we’ll use the official mongodb
package along with gridfs-stream
, which provides a Node.js-style streaming API.
const { MongoClient } = require('mongodb');
const uri = 'mongodb://localhost:27017';
const client = new MongoClient(uri);
async function connect() {
try {
await client.connect();
console.log('Connected to MongoDB');
} catch (err) {
console.error('Failed to connect to MongoDB', err);
}
}
connect();
To use GridFS, we first need to set it up using the gridfs-stream
module. GridFS uses two collections: fs.files
and fs.chunks
. The fs.files
collection stores metadata about the file, and the fs.chunks
collection stores the file’s binary data split into chunks.
const mongoose = require('mongoose');
const gridfsStream = require('gridfs-stream');
// Connect to MongoDB using Mongoose
mongoose.connect('mongodb://localhost:27017/gridfstest', {
useNewUrlParser: true,
useUnifiedTopology: true,
});
const connection = mongoose.connection;
let gfs;
connection.once('open', () => {
gfs = gridfsStream(connection.db, mongoose.mongo);
gfs.collection('uploads'); // Setting the collection name for GridFS
console.log('GridFS is ready to use');
});
uploads
instead of the default fs
.When you run the code, you should see the following output:
Connected to MongoDB
GridFS is ready to use
Now that we have set up our connection to MongoDB and GridFS, let’s look at how to upload files.
We’ll use Express to create a simple server that handles file uploads.
const express = require('express');
const multer = require('multer');
const app = express();
// Middleware to handle multipart/form-data
const storage = multer.memoryStorage();
const upload = multer({ storage });
// Route to handle file upload
app.post('/upload', upload.single('file'), (req, res) => {
const file = req.file;
// Create a write stream to GridFS
const writestream = gfs.createWriteStream({
filename: file.originalname,
});
// Write the file buffer to GridFS
writestream.write(file.buffer);
writestream.end();
writestream.on('close', (file) => {
res.json({ fileId: file._id, message: 'File uploaded successfully' });
});
});
app.listen(3000, () => {
console.log('Server is running on port 3000');
});
file
.close
event when the upload is complete.To test the upload, you can use a tool like Postman or curl to send a file to the /upload
endpoint.
After a successful upload, you’ll receive a JSON response similar to:
{
"fileId": "60c72b2f4f1a4c23d8e7f7c5",
"message": "File uploaded successfully"
}
This confirms that the file has been stored in GridFS.
Once files are stored in GridFS, you can retrieve them by their filename or ID.
app.get('/file/:filename', (req, res) => {
const { filename } = req.params;
// Create a read stream to GridFS
const readstream = gfs.createReadStream({ filename });
readstream.on('error', (err) => {
res.status(404).json({ error: 'File not found' });
});
readstream.pipe(res);
});
const { ObjectId } = require('mongodb');
app.get('/file/id/:id', (req, res) => {
const { id } = req.params;
// Create a read stream to GridFS
const readstream = gfs.createReadStream({ _id: ObjectId(id) });
readstream.on('error', (err) => {
res.status(404).json({ error: 'File not found' });
});
readstream.pipe(res);
});
When you make a GET request to either /file/:filename
or /file/id/:id
, the server will send the requested file back to the client.
You might also need to delete files from GridFS. This can be done using the file’s ID.
app.delete('/file/:id', (req, res) => {
const { id } = req.params;
gfs.remove({ _id: ObjectId(id), root: 'uploads' }, (err, gridStore) => {
if (err) return res.status(404).json({ error: 'File not found' });
res.json({ message: 'File deleted successfully' });
});
});
After a successful deletion, you’ll receive a JSON response similar to:
{
"message": "File deleted successfully"
}
For very large files, you can optimize your upload and retrieval processes by streaming files directly from the client to the server and from the server to the client. This avoids loading the entire file into memory, which is crucial when working with files that are several gigabytes in size.
Streaming Large File Uploads:
Let’s modify the file upload code to handle large file uploads using streams.
app.post('/upload', (req, res) => {
const { filename } = req.query; // Assume the client provides the filename as a query parameter
// Create a write stream to GridFS
const writestream = gfs.createWriteStream({
filename: filename,
});
// Pipe the request (which contains the file) directly to GridFS
req.pipe(writestream);
writestream.on('close', (file) => {
res.json({ fileId: file._id, message: 'File uploaded successfully' });
});
});
Similarly, for file retrieval, we can stream the file from GridFS to the client.
app.get('/download/:filename', (req, res) => {
const { filename } = req.params;
// Create a read stream to GridFS
const readstream = gfs.createReadStream({ filename });
readstream.on('error', (err) => {
res.status(404).json({ error: 'File not found' });
});
// Stream the file to the client
res.setHeader('Content-Disposition', 'attachment; filename=' + filename);
readstream.pipe(res);
});
With these changes, both uploading and downloading large files become more efficient and scalable.
Sometimes, you might want to store additional metadata with your files in GridFS. Metadata can include information like the file’s uploader, description, or any other custom fields.
app.post('/upload', upload.single('file'), (req, res) => {
const file = req.file;
const { uploader, description } = req.body;
// Create a write stream to GridFS with metadata
const writestream = gfs.createWriteStream({
filename: file.originalname,
metadata: {
uploader: uploader,
description: description,
},
});
// Write the file buffer to GridFS
writestream.write(file.buffer);
writestream.end();
writestream.on('close', (file) => {
res.json({ fileId: file._id, message: 'File uploaded successfully' });
});
});
metadata
field allows you to attach custom data to the file in GridFS.You can also query files by their metadata using MongoDB’s powerful querying capabilities.
app.get('/files/uploader/:uploader', (req, res) => {
const { uploader } = req.params;
// Find files with the specified uploader
gfs.files.find({ 'metadata.uploader': uploader }).toArray((err, files) => {
if (!files || files.length === 0) {
return res.status(404).json({ error: 'No files found' });
}
res.json(files);
});
});
fs.files
collection for documents where the uploader matches the specified value.This endpoint will return a list of files uploaded by a specific user, allowing you to organize and retrieve files based on custom criteria.
Proper error handling is crucial in any application. With GridFS, you should be prepared to handle various errors, such as connection issues, file not found errors, and stream errors.
app.get('/file/:filename', (req, res) => {
const { filename } = req.params;
const readstream = gfs.createReadStream({ filename });
readstream.on('error', (err) => {
if (err.code === 'ENOENT') {
return res.status(404).json({ error: 'File not found' });
}
return res.status(500).json({ error: 'An error occurred while retrieving the file' });
});
readstream.pipe(res);
});
When working with file uploads, it’s important to validate and sanitize user inputs to prevent security vulnerabilities, such as file path traversal attacks or denial of service (DoS) attacks.
const path = require('path');
app.post('/upload', upload.single('file'), (req, res) => {
const file = req.file;
const safeFilename = path.basename(file.originalname);
const writestream = gfs.createWriteStream({
filename: safeFilename,
});
writestream.write(file.buffer);
writestream.end();
writestream.on('close', (file) => {
res.json({ fileId: file._id, message: 'File uploaded successfully' });
});
});
When dealing with high traffic or large files, you may want to consider additional performance optimizations.
files_id
and n
) are in place and consider adding additional indexes based on your queries.
const writestream = gfs.createWriteStream({
filename: safeFilename,
chunkSize: 1024 * 1024, // 1 MB chunks
});
GridFS provides a robust and scalable solution for storing and retrieving large files in MongoDB. By leveraging GridFS in Node.js, you can build powerful applications that handle file storage efficiently, even when dealing with large datasets. This chapter has covered everything from setting up GridFS, uploading and retrieving files, handling metadata, to advanced topics like streaming large files and implementing best practices.Happy coding !❤️