Iterating over delimited string in shell scripts

Last time I automated some jobs with my shell, I was confronted with the problem of iterating over a string in bash. I needed to split the string with a specific delimiter and iterate over the split substring.

While I advocate that from the moment your shell scripts acquire enough logic it makes more sense to convert them to something else, this did not seem to be the case. Program invocations and string manipulation are common tasks in shell scripts. After all, inputs and outputs, apart from the return code of a program, are text. Through the shell, eventually, through external tools, we add or remove lines, delete or substitute characters, and so on.

Still, when confronted with these simple tasks, I encountered many more issues than expected.

There are multiple suggestions online, but all of those I could find had some limitations or issues.

If whitespace is the actual delimiter

for i in $vals; do :;
  # do something with i
done

seems to do the job, but has a couple of issues.

The first one is that it works on multiple delimiters, not only whitespace but tabs and newlines too. The second issue is that its behavior depends on the status of IFS (and eventually shell expansion).

The first possible approach, use IFS

Depending on IFS might not seem a great deal, on the contrary, it might seem like a feature because it gives the possibility to change the delimiter. Unfortunately, it is a global setting, so it will also change the behavior of other commands, which I did not want to alter.

Also changing and setting it back is not that easy:

IFS=$',';
vals='...';
for i in $vals; do :;
  # do something
done
unset IFS

After calling unset, we might have changed the environment we were, as it might have been set to something. I won’t even try to back up the original value for setting it back, as the code gets messy pretty quickly. It needs to track if IFS was originally set or not, and if set, save the value to restore it. And hopefully no command in the middle changes IFS again or depends on IFS.

Those and many others are subtle errors when dealing with global variables. Therefore I did not want to change IFS as an additional requirement to use a shell script.

It is possible to change environment variables for single commands locally. This would at least avoid the burden of keeping track of the original value of IFS to set it back. For example:

IFS='...' command_to_execute

Unfortunately iterating is not executing a single command, so this approach, no matter hard how I tried, did not work.

Second approach, change characters

The second approach involved changing the string I wanted to iterate by replacing the delimiter with a space character. This has a similar issue with IFS, space characters should get replaced too, unless it was the actual delimiter. And it would be still susceptible to changes in IFS.

To have it as robust as possible, a script should check IFS, replace the IFS character (or whitespace if IFS was not set) in the string with something else, then replace the delimiter with an IFS character, and during the loop undo all those changes in order to do something with the substring.

I did not feel confident enough to implement something like that. It also poses the problem of which character to replace those appearing in IFS without conflicting with other characters.

Third approach, parameter expansion

I was already thinking of using something else for my task when it occurred to me that I did not check if I could accomplish something with Parameter Expansion 🗄️, so I gave it a try.

After some time, I came up with the following piece of code/pattern

SEP=",";
ORIGINAL_STRING="...."; # string we want to iterate

STRING="$ORIGINAL_STRING$SEP";
while [ "$STRING" != "${STRING#*"${SEP}"}" ] && { [ -n "${STRING%%"${SEP}"*}" ] || [ -n "${STRING#*"${SEP}"}" ] ; }; do
  VALUE="${STRING%%"${SEP}"*}";
  STRING="${STRING#*"${SEP}"}";
  # do something with "$VALUE"
done

This is less readable than I hoped for, but has all the features one could need and something more:

  • does not depend on IFS, other global environment variables, or settings

  • the delimiter is parametrizable, thus leaving all the needed flexibility

  • works for any delimiter that consists of one or more characters: quotes, commas, whitespace, alphanumeric characters, emojis, *, …​

  • works in POSIX sh, so it is expected to work on most shells: sh 🗄️, dash, bash, zsh, fish, …​

  • does not depend on any external tool, so it should work in all environments and configurations (GNU/Linux, Solaris-like, various bash packages for Windows like WSL, Cygwin and MinGW, and other systems like *BSD, Mac Os, Android, …​)

  • does not invoke any external tool, thus even if it might not be the most performant solution, it does not need to load any other binary from the disk

  • does not depend on the content of the string

  • works if between two separators there is an empty string, like 'a,,b' with ',' as a separator. Works if the separator is at the begin or end of the string, like ,a and a, and for the empty string too.

Notice that 'a,b' and 'a,b,' (with ',' as a separator) are handled the same way, the last trailing separator is thus optional. If want that a single trailing separator defines a new (empty) element to iterate, just change STRING="$ORIGINAL_STRING$SEPARATOR"; to STRING="$ORIGINAL_STRING$SEPARATOR$SEPARATOR";.

Of course, if you are not interested in keeping the original string, there is no reason to copy the content, but it’s important to add a trailing separator for the algorithm to work correctly.


If you have questions, comments, or found typos, the notes are not clear, or there are some errors; then just contact me.