Here, I’m going to talk about git
. Simply git. That’s all.
Git is a version control. It means that, for text files, it’ll store differences between a version and another.
Git is distributed. Meaning that multiple people can work on the same file and sometimes even on the same feature and one will not step on each others’ toes.
Git is semi-immutable. A commit is a fixed, immutable point in time containing the difference between the last version and the new version.
Now, let’s talk about what git is not
Git is not a Changelog
Changelogs are things that changed on your app. An approach that I love to use is that every PR should have at least one entry on the Changelog, so we know what new feature or bugfix was implemented on that PR. It also makes it easy to use git blame
on the changelog and see the merge/PR that fixed that issue.
Using git
to generate a changelog is weird – you complicate all the flow of git (standardize commits, to the point you can be afraid of committing, or use some unknown feature like notes and other things) just for a task that, let’s be honest, it’s just add a single line into a single file and comitting it.
Git is not linear
Multiple people can work on the same codebase at the same time. They will commit at different times. Recently, there’s been a recent trend on trying to make the story of commits “linear”. I have no idea where that happens, but it always caused me trouble.
Every commit on git is immutable – so when we do a pull with rebase, you’re not “moving these commits to the front of the branch”, you are essentially “removing these commits and replacing them with something equivalent, considering what’s on the branch’s HEAD”. This basically means – you’re rewriting history.
This doesn’t seem like big deal at first, but consider this: when you make a code change, you’re basically reading a specific code and changing that do to something else. So, supposedly, you tested these changes, saw that they worked, and so on. When you rebase, you’re essentially making the same change but on a different codebase – and now, maybe your code doesn’t work anymore. This actually have some implications: if someone wants to know why a specific line was added, mostly that person will use git blame
. And then, maybe checkout on that commit. In this scenario, that person will land on a codebase that doesn’t work, whereas without a rebase, the codebase would work – it would only be outdated. That means, we can know that merging the feature X with the code Y broke things – it’s a valuable lesson that we can learn.
The second thing is the timeline. We can know without a rebase when a code was included. With rebase, we’re rewriting history – the commit now comes after other work, even when it was supposedly written before.
Finally – Git have multiple strategies to merge different branches, sometimes even allowing us to merge work from more than two branches. So, it was designed to do merge commits
Git shows historical facts about code, and history is not pretty
When you look at a piece of code and doesn’t know why it’s there, a way to know is to use git blame
and look at that commit. Squases and rebases destroy that piece of information, replacing with “added feature X”.
Some people argue that commits like “Fixed X”, then “Really fixed X”, and finally “I give up, using a hack” are bad – and I agree to a point. But the issue is, if I call git blame
and see a commit “I give up, using a hack” I know the reason for that code to exist – and that is, tried different things, none worked, now using a hack to see if it works. So I know that code is probably problematic and can easily be removed and replaced if I find a better way to do it.
History is not pretty – deal with it. It would be wonderful if we could only put code in production when we had all the time to test for all features, performance, refactor, and test it again in all scenarios. On the real world, that’s not the case – and, if you are new to programming and is reading this post, learn to get used to this. Write your code knowing that it’ll be horrible tomorrow, so write it in a way that it’s easy to replace or even delete. That’s simply how things are…
Also, if you rebase/squash commits to avoid these “Work in progress” commits, you risk losing code (together with valuable information). An example on that is when two people are contributing on a feature. The first person adds a function that’s not yet complete. Then, the second person finds that function useless, and comment it; on a future commit, removes it; and to avoid these “intermediate steps”, squash everything. When the first person “pulls” the code, that person can’t “merge” – it needs to be a “pull with rebase”, and the function is gone.
Not “committed out of existence”, like some commit that can be reverted – it’s literally as if it never existed in the first place. If there’s something that supposedly version control should do, is to avoid losing code when two or more people are working on the same feature. So why?
Git is not “git flow”
There are good names, bad names, horrible names, and there’s “git flow” and “conventional commits”. There’s nothing “git” on git flow, except that it uses git, and there’s nothing “conventional” on conventional commits. That’s just a name of some rules that some companies use – it’s not common ground by far – specially git flow.
There are people that use “thunk-based development” – somebody pulls from the default branch (master
, main
, principal
, whatever the name is), makes a feature, commits it directly on that branch and that is a deploy. If that doesn’t work, the person reverts it and it’s a deploy.
There are people that use various versions of some workflow involving branches. I once worked on a place that we had our master
(default) branch always deployable, and every deploy we generated a tag with the version number. We had essentially two flows: the normal workflow, where short-lived (1 or 2 days tops) branches were created and merged back on master
, and “hotfixes” – basically, where we would checkout the latest tag, create a branch from there, commit things to fix a bug, generate a new tag from the hotfix branch, deploy that new tag, then make a pull request to master
to fix that issue on newer versions.
Finally, there are people that use “feature branches” – a branch is created from the default branch, commits are added into that feature branch, people pull from the default branch regularly so it’s always up-to-date with the latest code, and then everything gets integrated back into the default branch. This work can be brittle, so people always pull from the default branch, to be sure that the feature branch works with the latest code. Feature branches can be long-lived or short-lived, so there are no rules (usually people are against “feature branches” because they tend to be long-lived. There’s nothing inherently wrong with that, considering that changes are always merged). Usually, the work gets merged back via a “Pull Request”, but let’s also be clear:
Git is not “pull requests”
Pull requests are a GitHub invention, that most people copied. It’s useful, it helps, but there’s nothing specific on git about that. All these strategies (squash, rebase) are also inventions from GitHub (that again, people copied). The new “feature” of GitHub “merge queue” is also an invention of them (and I actually find that it defeats the whole purpose of git, but anyway). Branch protections, merge checks, discussions and requirements, none of these exist on git – so, when people say that “it’s how it’s supposed to work”, know that it is not – there are teams that don’t use pull requests at all, and as an example, the Linux kernel accepts patches only via e-mail (because git can make a diff file between commits).
Conclusion
I honestly don’t know where the whole “rebase” and “linear, clean story” cult started. I know that, in my experience, it leads to loosing code, lots of weird conflicts and unpredictable stuff, and basically breaks the idea of multiple people working at the same thing. It also makes git blame
and git bisect
useless, two insanely useful features that I see less and less people using it.
I really wish for it to stop. People that don’t like seeing the intermediate commits can always use git log --first-parent
. It works, it’ll basically only show pull request merges if you use GitHub, and it keeps history. Your git blame
and git bisect
will thank you later.