Working with Git LFS Tracked Files

Git Large File Storage (LFS)—a powerful Git extension that helps manage large files in Git repositories. Git LFS is essential for tracking files that exceed Git's storage capabilities, such as high-resolution images, large datasets, and media files. Through Git LFS, we can easily integrate large files into a repository without bloating its size.

Introduction to Git LFS

Git Large File Storage (LFS) is a tool designed to handle large files that regular Git repositories struggle with. Git LFS does this by replacing large files in your repository with lightweight pointers. Instead of directly storing the files, it manages them externally, so your repository remains manageable and efficient.

In Git LFS, these large files are stored on a separate server, and when cloned or checked out, the files are downloaded automatically. This approach keeps repository size under control and prevents slowdowns in Git operations.

Why Use Git LFS?

Git was designed to manage source code and lightweight text files. Large files can make repositories difficult to manage because:

  • Performance Impact: When dealing with files like videos or large datasets, Git commands can become slow and unwieldy.
  • Storage Constraints: Storing every version of a large file can quickly exceed disk storage limits.
  • Efficient Collaboration: By keeping repositories lightweight, teams can collaborate more effectively without slowdowns.

Common use cases for Git LFS include:

  • Large image files
  • Video files
  • Datasets or machine learning models
  • Executable binaries and other non-text files

Setting Up Git LFS

To start using Git LFS, you need to install it and configure it for your repository.

Step 1: Install Git LFS

If Git LFS isn’t already installed, you can get it by following these instructions based on your operating system.

  • On macOS: brew install git-lfs
  • On Linux: sudo apt-get install git-lfs
  • On Windows: Use the Git for Windows installer, which includes Git LFS.

After installation, initialize Git LFS in your repository with:

				
					git lfs install

				
			

This command sets up Git LFS in the repository and updates your Git configuration.

Step 2: Configure Git LFS for Your Repository

Now that Git LFS is installed, it’s time to configure it for your repository. Let’s look at how to start tracking files with Git LFS.

Tracking Files with Git LFS

Git LFS uses the command git lfs track to specify which files should be managed by Git LFS rather than Git itself.

Step-by-Step Tracking Example

1. Choose the File Type:

  • To track large image files with the .png extension, run:

				
					git lfs track "*.png"

				
			

2. Check .gitattributes:

  • When you run the git lfs track command, Git LFS adds an entry to a .gitattributes file. This file stores rules for managing files in your repository. Open the .gitattributes file to verify:

				
					*.png filter=lfs diff=lfs merge=lfs -text

				
			

3. Add and Commit Changes:

  • After configuring Git LFS to track the files, add the .gitattributes file and commit it to your repository.

				
					git add .gitattributes
git commit -m "Add .png files to Git LFS tracking"

				
			

Output: Your .png files are now tracked by Git LFS, meaning they will be stored externally, with pointers left in the repository.

Working with LFS Tracked Files

Now that you’ve set up Git LFS, you can manage large files just like any other Git file. Here’s how Git LFS handles tracked files in various scenarios.

Committing LFS Files

1. Add the large file (e.g., large-file.png) to your repository:

				
					git add large-file.png

				
			

2. Commit the file:

				
					git commit -m "Add large-file.png to repository"

				
			

Output: When you check your commit, you’ll see that the large-file.png is stored as a pointer instead of the actual file content.

Cloning and Pulling Repositories with LFS Files

When another user clones the repository, Git LFS will automatically download the tracked files as needed.

				
					git clone <repository-url>

				
			

Upon cloning, Git LFS will detect the tracked files and download them.

Advanced Configuration Options

Git LFS offers additional configurations for optimizing storage and access.

Setting Up a Custom LFS Server

If your Git hosting service doesn’t support LFS or if you want to store files on a private server, you can configure a custom Git LFS server. Here’s an example configuration:

				
					git config lfs.url <custom-server-url>

				
			

This command points your Git repository to a specified server to store LFS files.

Managing Storage Quotas

Git LFS offers commands to help you monitor storage usage:

				
					git lfs ls-files

				
			

This command lists all the files managed by LFS along with their sizes, helping you stay aware of repository storage use.

Example Workflows with Git LFS

To illustrate how Git LFS fits into your workflow, here are a few examples.

Example 1: Adding Multiple File Types to Git LFS

1. Track Multiple Extensions:

				
					git lfs track "*.jpg"
git lfs track "*.mov"

				
			

2. Add and Commit Changes:

				
					git add .gitattributes
git commit -m "Add image and video files to Git LFS tracking"

				
			

Example 2: Migrating an Existing Large File to Git LFS

If a large file already exists in your Git history and needs to be moved to Git LFS:

1. Track the file with Git LFS:

				
					git lfs track "large-dataset.csv"

				
			

2. Use git filter-branch to rewrite history if the file is already committed (advanced step).

3. Commit the .gitattributes file to finalize the tracking.

Best Practices for Using Git LFS

  1. Track Specific File Types Only: Avoid tracking unnecessary files. Specify only the file types that need to be in Git LFS.
  2. Check Storage Limits: Hosting services like GitHub impose limits on LFS storage. Ensure you are within your quota.
  3. Use .gitattributes Effectively: Manage your .gitattributes file to keep track of which files are stored in Git LFS.

Git LFS is an invaluable tool for managing large files in Git repositories, especially in projects involving media files, datasets, and binaries. By offloading large files to an external storage system, Git LFS keeps repositories lightweight, improves performance, and enhances team collaboration. Happy Coding!❤️

Table of Contents