Using Git's Reflog to Recover Data

Git is a nearly ubiquitous tool for version control these days and it serves its purpose dutifully. As great as Git is, there are still sharp edges and chances for mistakes to happen. In this post, I'll be sharing information about a lesser-known feature in Git that can fix those mistakes and help with data recovery. Let's dive in!

Aaron Bos | Wednesday, November 29, 2023

Imagine a time when you've made changes (ie hard reset, rebase, force delete, etc.) in a Git repository that didn't go as planned and data is lost. You immediately start to panic. The commits that you're looking for aren't there anymore. If you're like me you don't have to imagine because it's actually happened 😅. Luckily Git has some built-in functionality that provides a bit of a safety net in most cases like this.

To make this "imaginary" scenario a little more concrete let's say that we've performed a hard reset on a branch which results in losing commits that we ultimately needed. Let's first take a look at the commits that we have in the main branch before the reset occurs.

$ git log --oneline
3e9e281 (HEAD -> main) Update vital
6b805a7 Add vital file
fc3d29b Add important stuff
50f5027 Initial commit

Then we make the mistake of hard resetting to commit fc3d29b.

$ git reset --hard fc3d29b

Our commit history now looks like this.

$ git log --oneline
fc3d29b (HEAD -> main) Add important stuff
50f5027 Initial commit

😱 At this point we realized that we needed to reset to commit 6b805a7 instead of fc3d29b. Unfortunately, we don't know what commit that is or where it has gone. What next?

In order to get back to commit 6b805a7 we first need to run the git reflog command to find the commit that we're looking for. In short, running this command will show a log of changes to HEAD that have been made to a repo.

$ git reflog
fc3d29b (HEAD -> main) HEAD@{0}: reset: moving to fc3d29b
3e9e281 HEAD@{1}: commit: Update vital
6b805a7 HEAD@{2}: commit: Add vital file
fc3d29b (HEAD -> main) HEAD@{3}: commit: Add important stuff
50f5027 HEAD@{4}: commit (initial): Initial commit

Based on the output of this command we can see that this is the commit that we're looking for 6b805a7 HEAD@{2}: commit: Add vital file. We can now create a new branch pointing at that commit that will get everything back into the state that we're looking for.

$ git branch recovered-main 6b805a7
$ git checkout recovered-main
Switched to branch 'recovered-main'

Now if we look at the commit history for the recovered-main branch we can see that HEAD is pointing to the commit that we were looking to recover ✨.

$ git log --oneline
6b805a7 (HEAD -> recovered-main) Add vital file
fc3d29b (main) Add important stuff
50f5027 Initial commit

If you'd like to learn a bit more about Git's reflog command and related tidbits, grab a coffee ☕️ read on.

References in Git

Git uses references as a way to provide easy-to-use names or “pointers” to specific SHA-1 commits. Any time a specific branch name is mentioned that is actually referring to a reference. If you’re curious about what these refs look like you can see them by inspecting the contents of .git/refs directory inside of a git repository.

Git also has the concept of “symbolic references”, which are references to other references 🤯. The most common example of a symbolic reference is HEAD which is a reference to the branch that is currently checked out. If you’ve ever seen the detached HEAD message that’s because HEAD is pointing to a git object instead of a reference.

What is the Reflog?

Git's "reflog" is short for reference log. The purpose of reference logs is to record when the tips of branches and other references are updated in the local repository. Every time that HEAD changes (ie commit, changing branches, etc.) the reflog is updated by Git. As I demonstrated at the beginning of this post, the reference logging behavior provides a convenient way to recover references. Reflog data can be found in the .git/logs directory.

What if I Deleted the Reflog?

In the unlikely event that the .git/logs directory is deleted from the repository, all hope isn't lost (although I'd recommend not deleting stuff manually in the .git directory). Git has a built-in utility known as git-fsck which can be used to verify the connectivity and validity of objects in the database. If the missing data that we are looking for isn't in the reflog, we can use the git fsck --full command to see if there are "dangling" objects in Git's internal database. A dangling object can be thought of as a commit or blob that isn't attached to another object (no parents or ancestors).

Git is a powerful tool that most developers are familiar with, but it can be easy to make mistakes. I've found taking the time to learn a bit of the underlying functionality of Git makes navigating these tricky scenarios a bit more manageable and less scary.

As always thank you for taking the time to read this blog post!