How to get the parent branch in git
TL;DR
If you are wondering why git
does not provide such a command out of the box, the TL;DR is that in git
branches and commits do not have a parent branch.
Why do we want to find a parent branch?
If your workflow with git uses feature branches, then it might happen often that you create a branch from another branch, make some changes, and then want to merge it back.
This workflow creates at least two use cases where it would be useful to have a parent branch.
Better merge request UI
When developing, often a branch is created from another branch, some changes are made, then one wants to merge the branch back to the previous one.
Gitlab (and other repositories), offer a webpage for creating merge requests.
By default, it proposes to merge the branch into another, which probably is not the one we are interested in.
If Gitlab could recognize where a branch came from, it could provide a better default value.
The same holds for the command line and other tools.
cache
If one branch is created from another one, chances are that the diff is not big. If some caching mechanism is used (for example ccache
) during the build process, then it normally makes sense to use the same cache folders for different builds.
It is possible to use a global cache for all branches too. If there are multiple main branches with great differences, then it will be generally less efficient, as the ratio of "cache hit"/"cache misses" will be much lower.
This can be avoided by using a larger cache, but it might reduce the average performance, as more objects need to be inspected.
In the case of concurrent builds, different cache folders can also improve performance, as fewer synchronizations between processes are required.
Why is it not possible to define a parent?
Most repositories have a semi-linear history or a history that resembles a tree.
A set of main branches, and minor branches that are created from those main branches and merged back (or not):
# a representation of a git repository with a linear history
A--B--C--D--E--F--G branch1
# a representation of a git repository with a tree-like history
K--L--M--N branch2
/
A---B---C---D---E--F--G branch1
\
H--I--J branch3
# a representation of a git repository with a branch
# and other branches forked from it (and eventually merged back)
K--L--M--N--L
/ \
A---B---C---D---E---F---G branch1
\
H--I--J branch2
But a git repository can have a more complex structure.
You might have octopus merges
# a representation of a git repository with an octopus merge
G--H--I--J--K
/ \
/ L--M--N--O--P--Q
/ / /
A---B---C---D---E---F
A project can also have multiple root nodes, a common use case is when the history of two projects where merged:
# a representation of a git repository with multiple root nodes
G--H--I--J--K
\
L--M--N--O--P--Q
/
A---B---C---D---E---F
Last but not least, multiple branches can point to the same commit:
# a representation of a git repository with
# two branches at the same position
A---B---C---D---E---F branch1
\
G---H branch2/branch3
And all those things combined.
Reduce the scope to repositories that look like a tree
Even in this context, the question "Where does this branch come from" does not fully make sense.
Consider the following graph
---x---x--- branch1
\ \
\ z---x--- branch2
\ \
\ z--- branch4
\
z--- branch3
It seems obvious to say that branch4
comes from branch3
.
It is less obvious if the git history is rendered slightly different
z--- branch3
/
---x
\ z--- branch1
\ /
---x
\ z--- branch2
\ /
z---x
\
z---branch4
Is the parent of branch4
the branch branch2
or is it the other way round? Or is branch1
the parent (or is branch
the children)?
Both interpretations of the graph are valid.
For this reason, git branches do not have parents, they have "common ancestor" (the x
in the graphs).
A repository with a manually defined set of parent branches
Instead of using "parent", from now on I’m writing about "main branches".
For the use cases I have in mind, I do not want to know when a feature branch has been created from another feature branch.
I want to know in which branch they are probably going to be merged back, and to which branch they are "near".
For some workflows, it will be a relatively small set of branches where no one commits directly but only merges back.
Those are the requirements
-
a manually defined set of branches is defined as main branches
-
main branches are generally never deleted
-
other branches are labeled as "development branches" (feature branches, hotfix branches, test branches, and so on)
-
the nearest commit defines the corresponding main branch of a development branch
-
main branches form a tree-like structure (main branches are not merged to other branches, but development branches are forked from main branches and merged back)
Why do I want the main branches generally never deleted?
Consider the following scenario.
Someone creates branch3
. From branch3
, someone creates branch1
From branch1
, someone creates branch2
From branch2
, someone creates branch4
And on all branches, new commits are added.
The graph looks like this:
z--- branch3
/
---x
\ z--- branch1
\ /
---x
\ z--- branch2
\ /
z---x
\
z--- branch4
From the given description, the "parent" of branch4 is branch2.
But then branch2
disappears, it might simply get deleted:
z--- branch3
/
---x
\ z--- branch1
\ /
---x
\
\
z---\
\
z--- branch4
or it might have been merged in another branch (in the following graph, in branch1)
z--- branch3
/
---x
\ z-------------x branch1
\ / /
---x /
\ z---z
\ /
z---x
\
z--- branch4
Thus from one day to another, the answer to "Which branch is the parent of branch4
?" changed, without changing branch4
.
It was branch2, now it’s branch1
. And if branch1
and branch3
are deleted too, then branch4
has no parents.
And no commit on branch4
was touched or altered in any way; no git rebase
or other fancy operations.
Granted, it might not be a dealbreaker and might cause no harm, but generally, I do not want the answer to change unless I’m changing branch4
(for example through a git rebase
).
Why do I want to manually identify "main branches"?
Consider the following graph
z--- master
/
---x
\ z--- branch1
\ /
---x
\ z--- feat2
\ /
z---x
\
z--- feat3
When working with "feature" branches, we often are interested in the parent branch because we want to know where we need to merge it back.
In this case, we do not care if feat2
was created from feat3
, the other way around, or both from a feat1
branch that does not exist anymore.
If master
and branch1
are the only branches that can be identified as main branches, and branches are only created from parent branches or branches created from parent branches, then feat2
and feat3
necessarily have master
or branch1
as a corresponding main branch (we want the answer to be branch1
).
Note 📝 | One does not necessarily need to enlist manually what are the main branches. Filtering the branches and relying on a naming convention is a viable approach. |
Find common ancestors
For experimenting, let us create a dummy repository
git init;
git commit --allow-empty -m master-1;
git commit --allow-empty -m master-2;
git sw -c main1;
git commit --allow-empty -m main1-1;
git commit --allow-empty -m main1-2;
git sw master;
git commit --allow-empty -m master-3;
git sw -c main2;
git commit --allow-empty -m main2-1;
git commit --allow-empty -m main2-2;
git sw -c feature1;
git commit --allow-empty -m feature1-1;
git commit --allow-empty -m feature1-2;
git sw main2;
git commit --allow-empty -m main2-3;
git sw master;
git commit --allow-empty -m master-4;
tig master feature1 main1 main2;
# creates a graph that should look like
# o [master] master-4
# │ o [feature1] feature1-2
# │ o feature1-1
# │ │ o [main2] main2-3
# │ o─┘ main2-2
# │ o main2-1
# o─┘ master-3
# │ o [main1] main1-2
# │ o main1-1
# o─┘ master-2
# I master-1
The branch master
, main1
, main2
, and main3
are the main branches.
There might be tons of other branches, merges, and strange constructs, but we are not interested in those, except for feature1
.
At this point, we want to find the "nearest" branch between feature1
and the main branches. The desired answer is main2
.
The first thing to do would be to search for all relevant common ancestors
mbmain1=$(git merge-base feature1 main1);
mbmain2=$(git merge-base feature1 main2);
mbmaster=$(git merge-base feature1 master);
Given the requirements, those common ancestors will form a linear graph and can be sorted topographically.
We are interested in the latest/newest commit: nearest_ci=$(git rev-list --topo-order --max-count=1 $mbmain1 $mbmain2 $mbmaster)
And now, we need to verify which common ancestor it is:
if [ "$nearest_ci" = "$mbmain1" ]; then :; echo "branched from main1";
elif [ "$nearest_ci" = "$mbmain2" ]; then :; echo "branched from main2";
elif [ "$nearest_ci" = "$mbmaster" ]; then :; echo "branched from master";
fi
The branch feat
could have multiple branches that are equally near. For example, if multiple branches point to the same commit. Thus the ordering of comparisons is relevant, if we might prefer one branch over the other.
The whole process could be written as
#!/usr/bin/env bash
dev_branch=feature1;
main_branches=(main1 main2 master); # ordering is relevant
declare -A map_merge_base;
for mbranch in "${main_branches[@]}"; do :;
map_merge_base[$(git merge-base "$dev_branch" "$mbranch")]="$mbranch";
done
nearest_ci=$(git rev-list --topo-order --max-count=1 "${!map_merge_base[@]}")
printf "%s is the corresponding main branch\n" "${map_merge_base[$nearest_ci]}";
Associative arrays are not really new (introduced in bash 4, released in 2009), but the syntax is not compatible with zsh
(the default shell for Mac systems), and possibly other shells.
The same commands would work also in zsh
, if one replaces ${!map_merge_base[@]}
with ${(k)map_merge_base[@]}
) in more or less all zsh
implementations, as zsh
had associative arrays since 1998.
An alternative that uses a non-associative array, and thus works both in bash
and zsh
would be
#!/usr/bin/env bash
dev_branch=feature1;
main_branches=(main1 main2 master); # ordering is relevant
map_merge_base=()
for mbranch in "${main_branches[@]}"; do :;
map_merge_base+=("$(git merge-base "$dev_branch" "$mbranch"):$mbranch")
done
nearest_ci=$(git rev-list --topo-order --max-count=1 "${map_merge_base[@]%%:*}")
for v in "${map_merge_base[@]}"; do :;
if [ "$nearest_ci" = "${v%%:*}" ]; then :;
printf "%s is the corresponding main branch\n" "${v#*:}";
break
fi
done;
But since the script begins with #!/usr/bin/env bash
, it does not use zsh
(or another shell) by default if bash
is not available.
Another alternative would be to write the script in POSIX sh, thus without arrays.
By copying the snippet for iterating over strings:
#!/bin/sh
dev_branch=feature1;
main_branches="main1 main2 master"; # ordering is relevant
map_merge_base="";
commits="";
SEP=" ";
STRING="$main_branches$SEP";
while [ "$STRING" != "${STRING#*"${SEP}"}" ] && { [ -n "${STRING%%"${SEP}"*}" ] || [ -n "${STRING#*"${SEP}"}" ] ; }; do
VALUE="${STRING%%"${SEP}"*}";
STRING="${STRING#*"${SEP}"}";
mrbase=$(git merge-base "$dev_branch" "$VALUE")
map_merge_base="$map_merge_base $mrbase:$VALUE";
commits="$commits $mrbase";
done;
# NOTE: not quoting $commits by design
nearest_ci=$(git rev-list --topo-order --max-count=1 $commits)
SEP=" ";
STRING="$map_merge_base$SEP";
while [ "$STRING" != "${STRING#*"${SEP}"}" ] && { [ -n "${STRING%%"${SEP}"*}" ] || [ -n "${STRING#*"${SEP}"}" ] ; }; do
VALUE="${STRING%%"${SEP}"*}";
STRING="${STRING#*"${SEP}"}";
if [ "$nearest_ci" = "${VALUE%%:*}" ]; then :;
printf "%s is the corresponding main branch\n" "${VALUE#*:}";
break
fi
done;
Otherwise, just use a more structured programming language.
The main advantage of using the last script is that it should work out-of-the-box on Windows (as git
for Windows is packaged with bash
), most GNU/Linux systems, and macOS (as it has zsh
as the default shell).
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.