Iterating over delimited string in shell scripts

Last time I automated some jobs with my shell, I confronted with the problem of iterating over a string in bash. I needed to split the string with a specific delimiter, and iterate over the splitted substring.

While I advocate that from the moment your shell scripts acquire enough logic it makes more sense to convert them to something else, this did not seem to be the case. Program invocations and string manipulation is a common tasks in shell scripts. After all, inputs and outputs, apart from the return code of a program, is text. Through the shell, eventually through external tools, we add or remove lines, delete or substitute characters, and so on.

Still, when confronted with this apparently simple tasks, I encountered much more issues than expected.

There are multiple suggestions online, but all of those I could find had some limitation or issues.

If whitespace is the actual delimiter

for i in $vals; do :;
  # do something with i

seems to do the job, but has a couple of issues.

The first one is that it works on multiple delimiters, not only whitespace, but tabs and newlines too. The second issue is that it’s behaviour depends on the status of IFS (and eventually shell expansion).

First possible approach, use IFS

Depending on IFS might not seam a great deal, on the contrary, it might seem like a feature because it gives the possibility to change the delimiter. Unfortunately it is a global setting, so it will also change the behaviour of other commands, which I did not want to alter.

Also changing and setting it back is not that easy:

for i in $vals; do :;
  # do something
unset IFS;

After calling unset, we might have changed the environment we where working, because it might have been set to something. I won’t ever try to backup the original value for setting it back, as the code get’s messy pretty quick. It need to track if IFS was original set or not, and if set, save the value in order to restore it. And hopefully no command in the middle changes IFS again or depends on IFS.

Those and many others are subtle errors when dealing with global variables. Therefore I did absolutely not want to change IFS as additional requirement to use a shell script.

Actually, it is possible to change environment variables for single commands locally. This would avoid the burden of keeping track of the original value of IFS in order to set it back. For example:

IFS='...' command_to_execute;

Unfortunately iterating is not executing a single command, so this approach, no matter hard how I tried, did not work.

Second approach, change characters

The second approach involved changing the string I wanted to iterate by replacing the delimiter with a space character. This has a similar issue with IFS, space characters should get replaced too, unless it was the actual delimiter. And it would be still susceptible to changes of IFS.

For having it as robust as possible, a script should check IFS, replace the IFS character (or whitespace if IFS was not set) in the string with something else, then replace the delimiter with an IFS character, and during the loop undo all those changes in order to do something with the substring.

I did not feel confident enough to implement something like that. It also poses the problem with which character to replace those appearing in IFS without conflicting with other characters.

Third approach, parameter expansion

I was already thinking to use something else for my task, when it occurred to me that I did not check if I could accomplish something with Parameter Expansion, so I gave it a try.

After some time, I came with following piece of code/pattern

ORIGINAL_STRING="...."; # string we want to iterate

while [ "$STRING" != "${STRING#*${SEP}}" ] && { [ -n "${STRING%%${SEP}*}" ] || [ -n "${STRING#*${SEP}}" ] ; }; do
  # do something with "$VALUE"

This is a less readable than I hoped for, but has all the feature one could need and something more:

  • does not depend on IFS, other global environment variables or settings

  • the delimiter is parametrizable, thus leaving all the needed flexibility

  • works for any delimiter that consist of a single or multiple character: quotes, comma, whitespace, alphanumeric characters, …​

  • works in posix sh, so it is expected to work on most shells: sh, dash, bash, zsh, fish, …​

  • does not depend on any external tool, so it should work in all environments and configurations (GNU/Linux, Solaris-like, various bash packages for Windows like WSL, Cygwin and MinGW, and other systems like *BSD, Mac Os, Android, …​)

  • does not invoke any external tool, thus even if it might not be the most performant solution, it does not need to load any other binary from the disk

  • does not depend on the content of the string

  • works if between two separator there is an empty string, like 'a,,b' with ',' as separator, if the separator is at the begin or end of the string, like ,a and a, and for the empty string too.

Notice that 'a,b' and 'a,b,' (with ',' as separator) are handled the same way, the last trailing separator is thus optional. If you need that a single trailing separator defines a new (empty) element to iterate, just change STRING="$ORIGINAL_STRING$SEPARATOR"; to STRING="$ORIGINAL_STRING$SEPARATOR$SEPARATOR";.

Of course if you are not interested in keeping the original string, there is no reason to copy the content, but it’s important to add a trailing separator for the algorithm to work correctly.