Git Logo by Jason Long is licensed under the CC BY 3.0 License

How to get the parent branch in git

Notes published the
8 - 10 minutes to read, 2064 words
Categories: shell version control systems
Keywords: bash git scripting sh shell version control systems

TL;DR

If you are wondering why git does not provide such a command out of the box, the TL;DR is that in git branches and commits do not have a parent branch.

Why do we want to find a parent branch?

If your workflow with git uses feature branches, then it might happen often that you create a branch from another branch, make some changes, and then want to merge it back.

This workflow creates at least two use cases where it would be useful to have a parent branch.

Better merge request UI

When developing, often a branch is created from another branch, some changes are made, then one wants to merge the branch back to the previous one.

Gitlab (and other repositories), offer a webpage for creating merge requests.

By default, it proposes to merge the branch into another, which probably is not the one we are interested in.

If Gitlab could recognize where a branch came from, it could provide a better default value.

The same holds for the command line and other tools.

cache

If one branch is created from another one, chances are that the diff is not big. If some caching mechanism is used (for example ccache) during the build process, then it normally makes sense to use the same cache folders for different builds.

It is possible to use a global cache for all branches too. If there are multiple main branches with great differences, then it will be generally less efficient, as the ratio of "cache hit"/"cache misses" will be much lower.

This can be avoided by using a larger cache, but it might reduce the average performance, as more objects need to be inspected.

In the case of concurrent builds, different cache folders can also improve performance, as fewer synchronizations between processes are required.

Why is it not possible to define a parent?

Most repositories have a semi-linear history or a history that resembles a tree.

A set of main branches, and minor branches that are created from those main branches and merged back (or not):

# a representation of a git repository with a linear history

A--B--C--D--E--F--G   branch1
# a representation of a git repository with a tree-like history

     K--L--M--N   branch2
    /
A---B---C---D---E--F--G   branch1
             \
              H--I--J   branch3
# a representation of a git repository with a branch
# and other branches forked from it (and eventually merged back)

      K--L--M--N--L
     /             \
A---B---C---D---E---F---G   branch1
             \
              H--I--J   branch2

But a git repository can have a more complex structure.

You might have octopus merges

# a representation of a git repository with an octopus merge

        G--H--I--J--K
       /             \
      /   L--M--N--O--P--Q
     /   /           /
A---B---C---D---E---F

A project can also have multiple root nodes, a common use case is when the history of two projects where merged:

# a representation of a git repository with multiple root nodes

      G--H--I--J--K
                   \
        L--M--N--O--P--Q
             /
A---B---C---D---E---F

Last but not least, multiple branches can point to the same commit:

# a representation of a git repository with
# two branches at the same position

A---B---C---D---E---F   branch1
         \
          G---H   branch2/branch3

And all those things combined.

Reduce the scope to repositories that look like a tree

Even in this context, the question "Where does this branch come from" does not fully make sense.

Consider the following graph

---x---x---   branch1
    \   \
     \   z---x---   branch2
      \       \
       \       z---   branch4
        \
         z---   branch3

It seems obvious to say that branch4 comes from branch3.

It is less obvious if the git history is rendered slightly different


     z--- branch3
    /
---x
    \      z--- branch1
     \    /
      ---x
          \       z--- branch2
           \     /
            z---x
                 \
                  z---branch4

Is the parent of branch4 the branch branch2 or is it the other way round? Or is branch1 the parent (or is branch the children)?

Both interpretations of the graph are valid.

For this reason, git branches do not have parents, they have "common ancestor" (the x in the graphs).

A repository with a manually defined set of parent branches

Instead of using "parent", from now on I’m writing about "main branches".

For the use cases I have in mind, I do not want to know when a feature branch has been created from another feature branch.

I want to know in which branch they are probably going to be merged back, and to which branch they are "near".

For some workflows, it will be a relatively small set of branches where no one commits directly but only merges back.

Those are the requirements

  • a manually defined set of branches is defined as main branches

  • main branches are generally never deleted

  • other branches are labeled as "development branches" (feature branches, hotfix branches, test branches, and so on)

  • the nearest commit defines the corresponding main branch of a development branch

  • main branches form a tree-like structure (main branches are not merged to other branches, but development branches are forked from main branches and merged back)

Why do I want the main branches generally never deleted?

Consider the following scenario.

Someone creates branch3. From branch3, someone creates branch1 From branch1, someone creates branch2 From branch2, someone creates branch4

And on all branches, new commits are added.

The graph looks like this:

     z--- branch3
    /
---x
    \      z--- branch1
     \    /
      ---x
          \       z--- branch2
           \     /
            z---x
                 \
                  z--- branch4

From the given description, the "parent" of branch4 is branch2.

But then branch2 disappears, it might simply get deleted:

     z--- branch3
    /
---x
    \      z--- branch1
     \    /
      ---x
          \
           \
            z---\
                 \
                  z--- branch4

or it might have been merged in another branch (in the following graph, in branch1)

     z--- branch3
    /
---x
    \      z-------------x branch1
     \    /             /
      ---x             /
          \       z---z
           \     /
            z---x
                 \
                  z--- branch4

Thus from one day to another, the answer to "Which branch is the parent of branch4?" changed, without changing branch4.

It was branch2, now it’s branch1. And if branch1 and branch3 are deleted too, then branch4 has no parents.

And no commit on branch4 was touched or altered in any way; no git rebase or other fancy operations.

Granted, it might not be a dealbreaker and might cause no harm, but generally, I do not want the answer to change unless I’m changing branch4 (for example through a git rebase).

Why do I want to manually identify "main branches"?

Consider the following graph

     z--- master
    /
---x
    \      z--- branch1
     \    /
      ---x
          \       z--- feat2
           \     /
            z---x
                 \
                  z--- feat3

When working with "feature" branches, we often are interested in the parent branch because we want to know where we need to merge it back.

In this case, we do not care if feat2 was created from feat3, the other way around, or both from a feat1 branch that does not exist anymore.

If master and branch1 are the only branches that can be identified as main branches, and branches are only created from parent branches or branches created from parent branches, then feat2 and feat3 necessarily have master or branch1 as a corresponding main branch (we want the answer to be branch1).

Note 📝
One does not necessarily need to enlist manually what are the main branches. Filtering the branches and relying on a naming convention is a viable approach.

Find common ancestors

For experimenting, let us create a dummy repository

git init;
git commit --allow-empty -m master-1;
git commit --allow-empty -m master-2;
git sw -c main1;
git commit --allow-empty -m main1-1;
git commit --allow-empty -m main1-2;
git sw master;
git commit --allow-empty -m master-3;
git sw -c main2;
git commit --allow-empty -m main2-1;
git commit --allow-empty -m main2-2;
git sw -c feature1;
git commit --allow-empty -m feature1-1;
git commit --allow-empty -m feature1-2;
git sw main2;
git commit --allow-empty -m main2-3;
git sw master;
git commit --allow-empty -m master-4;
tig master feature1 main1 main2;
# creates a graph that should look like
# o [master] master-4
# │ o [feature1] feature1-2
# │ o feature1-1
# │ │ o [main2] main2-3
# │ o─┘ main2-2
# │ o main2-1
# o─┘ master-3
# │ o [main1] main1-2
# │ o main1-1
# o─┘ master-2
# I master-1

The branch master, main1, main2, and main3 are the main branches.

There might be tons of other branches, merges, and strange constructs, but we are not interested in those, except for feature1.

At this point, we want to find the "nearest" branch between feature1 and the main branches. The desired answer is main2.

The first thing to do would be to search for all relevant common ancestors

mbmain1=$(git merge-base feature1 main1);
mbmain2=$(git merge-base feature1 main2);
mbmaster=$(git merge-base feature1 master);

Given the requirements, those common ancestors will form a linear graph and can be sorted topographically.

We are interested in the latest/newest commit: nearest_ci=$(git rev-list --topo-order --max-count=1 $mbmain1 $mbmain2 $mbmaster)

And now, we need to verify which common ancestor it is:

  if [ "$nearest_ci" = "$mbmain1" ]; then :; echo "branched from main1";
elif [ "$nearest_ci" = "$mbmain2" ]; then :; echo "branched from main2";
elif [ "$nearest_ci" = "$mbmaster" ]; then :; echo "branched from master";
fi

The branch feat could have multiple branches that are equally near. For example, if multiple branches point to the same commit. Thus the ordering of comparisons is relevant, if we might prefer one branch over the other.

The whole process could be written as

#!/usr/bin/env bash

dev_branch=feature1;

main_branches=(main1 main2 master); # ordering is relevant
declare -A map_merge_base;
for mbranch in "${main_branches[@]}"; do :;
  map_merge_base[$(git merge-base "$dev_branch" "$mbranch")]="$mbranch";
done
nearest_ci=$(git rev-list --topo-order --max-count=1 "${!map_merge_base[@]}")

printf "%s is the corresponding main branch\n" "${map_merge_base[$nearest_ci]}";

Associative arrays are not really new (introduced in bash 4, released in 2009), but the syntax is not compatible with zsh (the default shell for Mac systems), and possibly other shells.

The same commands would work also in zsh, if one replaces ${!map_merge_base[@]} with ${(k)map_merge_base[@]}) in more or less all zsh implementations, as zsh had associative arrays since 1998.

An alternative that uses a non-associative array, and thus works both in bash and zsh would be

#!/usr/bin/env bash

dev_branch=feature1;

main_branches=(main1 main2 master); # ordering is relevant
map_merge_base=()
for mbranch in "${main_branches[@]}"; do :;
  map_merge_base+=("$(git merge-base "$dev_branch" "$mbranch"):$mbranch")
done
nearest_ci=$(git rev-list --topo-order --max-count=1 "${map_merge_base[@]%%:*}")

for v in "${map_merge_base[@]}"; do :;
  if [ "$nearest_ci" = "${v%%:*}" ]; then :;
    printf "%s is the corresponding main branch\n" "${v#*:}";
    break
  fi
done;

But since the script begins with #!/usr/bin/env bash, it does not use zsh (or another shell) by default if bash is not available.

Another alternative would be to write the script in POSIX sh, thus without arrays.

By copying the snippet for iterating over strings:

#!/bin/sh

dev_branch=feature1;

main_branches="main1 main2 master"; # ordering is relevant

map_merge_base="";
commits="";
SEP=" ";
STRING="$main_branches$SEP";
while [ "$STRING" != "${STRING#*"${SEP}"}" ] && { [ -n "${STRING%%"${SEP}"*}" ] || [ -n "${STRING#*"${SEP}"}" ] ; }; do
  VALUE="${STRING%%"${SEP}"*}";
  STRING="${STRING#*"${SEP}"}";
  mrbase=$(git merge-base "$dev_branch" "$VALUE")
  map_merge_base="$map_merge_base $mrbase:$VALUE";
  commits="$commits $mrbase";
done;

# NOTE: not quoting $commits by design
nearest_ci=$(git rev-list --topo-order --max-count=1 $commits)

SEP=" ";
STRING="$map_merge_base$SEP";
while [ "$STRING" != "${STRING#*"${SEP}"}" ] && { [ -n "${STRING%%"${SEP}"*}" ] || [ -n "${STRING#*"${SEP}"}" ] ; }; do
  VALUE="${STRING%%"${SEP}"*}";
  STRING="${STRING#*"${SEP}"}";
  if [ "$nearest_ci" = "${VALUE%%:*}" ]; then :;
    printf "%s is the corresponding main branch\n" "${VALUE#*:}";
    break
  fi
done;

Otherwise, just use a more structured programming language.

The main advantage of using the last script is that it should work out-of-the-box on Windows (as git for Windows is packaged with bash), most GNU/Linux systems, and macOS (as it has zsh as the default shell).


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.