4-Minute Read

What is Git staging area

Unlike the other systems, Git has something called the “staging area” or “index”. This is an intermediate area where commits can be formatted and reviewed before completing the commit.

This allows you to stage only portions of a modified file. From https://git-scm.com/about/staging-area.

There are three different “repository states” in Git: the HEAD, the staging area, and the working tree.

The HEAD represents the commit that you have checked out, which is the latest saved snapshot of your project. The HEAD state is stored in the Git object store and can be recovered using the command git reflog. The working tree represents the current state of the project in the file system. The working tree state is not persistent; commands such as git status generate the working tree state on the fly by reading the file system metadata. The staging area stores the changes that you want included in the next commit. However, the staging area itself is not recoverable. If you modify the staging area with commands like git add, the previous staging area is gone forever.

Why the staging area is not ideal

Behind the scenes, the staging area is backed by a file named “index”, typically located in the “.git/” directory. This file stores a snapshot of the project’s file state — a literal copy of the entire project, including every file name and its contents. The file state in the staging area may be the same as the HEAD state (the current commit) if you haven’t used the “git add” command yet. It could also be the same as the working tree if you have added all the changes to the project. It could also be different from both if you haven’t added all the changes.

The staging area creates the illusion that it is a reliable way to stage your changes, but it is actually very easy to lose your changes there. Suppose you are developing a non-trivial feature. After some time, you have successfully built the framework of the feature. You’re happy with your progress and stage the changes. Then, you had a brilliant idea and decided to refactor many of the changes and stage them again. After refactoring, however, you realize that the idea isn’t brilliant after all and that the original approach is better. But now, the original changes are gone! Since you overwrote the staging area after the second “git add,” there is no way to revert the project to the original changes.

Unlike commits, Git does not track staging area history. Modifications to files in the staging area can be wiped out by the next “git add.”

Another problem caused by the introduction of the staging area is the need for manual tracking of new changes. Due to the staging area, any modifications to the project can be in one of two states: “staged” or “untracked.” Untracked changes are more vulnerable to frequent editing. Staged changes, on the other hand, are at least saved in the index file. However, a single wrong “rm” could destroy untracked changes.

The rise of fast-paced software development and experimentation

Git is 20 years old. The industry has changed a lot in those years. Modern software is more complicated than ever, and writing code is no longer straightforward. Experimentation is often required to discover the optimal solution, not to mention that the predictive modal based code generator is trial and error by nature. It is essential to record everything first and discard unwanted changes later. You never know when you’ll need the changes you made an hour ago. To avoid losing important changes while experimenting, some advocate starting from a clean repository and committing often.

Having to commit every few minutes is not a fun coding experience. People should not adapt to tools.

What should the Source Control system do

A source control system should automatically save the current working tree state periodically and provide a way to revert to recent states.

In fact, source control system like jj eliminates the need for a staging area by saving the working tree state each time a command is run.

You might ask, “What about the original purpose of the staging area? How can we commit unrelated changes in a file separately?”. The answer lies in the ability to easily edit commit history. Both jj and sapling have powerful built-in history editing commands. Sapling even provides a “select and drag and drop” style commit splitting UI. There is even proposal to add “git history” command to provide a better history-altering feature than Git’s interactive rebase.

Recent Posts

Categories

About

A young developer who loves Linux.