How to upload and version your model files using github


LFS

In this tutorial I am going to illustrate how you can upload model files larger than 100MB to github with versioning. Even though this tutorial is focused on storing large model files, you can use the same techniques to upload your audio, video, datasets and graphics using git.

Prerequisites: Latest version of git should be installed. You can download it from this link

Step 1:

Install Git LFS (this activity is only required once)

git lfs install

Step 2:

tell git lfs which are the files that needs to be tracked. In our case we need to track the pytorch model file(s) with extension “.pt” Either you can directly edit .gitattributes as below.

# Model files
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text

So here I’ve added ”*.pt” and ”*.pth” file extension for pytorch model file types I want to be controlled by Git LFS. The options added in the line tells git that I want to filter, diff and merge using the LFS tool and finally the -text argument tells Git that this is not a text file, which i think is a weird way to tell it that it’s a binary file.

or you can use the following git lfs command to do the same.

git lfs track "*.pt"

Make sure to track .gitattributes so that the changes are reflecting in your repo.

git add .gitattributes

git add .gitattributes

Please note that adding a file type to be tracked by git lfs doesn’t mean that all the pre-existing files with that type that are already commited will be tracked. Use git lfs migrate to migrate those files to LFS. This is strange as we naturally expect git lfs to start tracking all the files in the repo with the given extension. Hope this will be updated in the future.

Step 3:

Once you add the files to be tracked you can continue adding. commiting and pushing changes to git as you would normally do.

git add model.pt
git commit -m "Add model file"
git push origin main

You are all done. LFS can version large files with git. Not just that - it makes your syncing much faster with the same git workflow. Cool!

But do keep in mind that if you upload a lot of large files to github in such a way that it affects github performance, they will be sending you a mail to do some cleanup. So keep your repository size as minimal as possible.