posix shell tips and tricks

there’s some very unknown quirky things that you can do with pure posix shell. this will focus on obscure stuff that I’ve rarely seen documented or talked about elsewhere. if you want some more shell resources, check out shell information

BASH_REMATCH

in bash, BASH_REMATCH is an array that corresponds to the groups that are captured in the last used =~ command. this can be emulated in posix shell by a little-known command called expr. for almost all use cases, expr is superseded by test and arithmetic expansion, which are shell builtins, and you shouldn’t use expr, but one singular operand in expr is unique to it: :. take a look:

$ cat expr.sh
if rematch="$(expr -- "$1" : 'https\{0,1\}://\([^/]\{1,\}\)')"; then
	echo "$rematch"
else
	echo "not a valid URL!"
fi

$ sh expr.sh https://example.com/post/120937
example.com
$ sh expr.sh "a random string"
not a valid URL!

as you can see, this allows one to both test if a string matches a regex pattern and to return a part of the string in one call. there are some caveats to doing this, though:

you must use basic regular expressions, which is much more cumbersome than the more friendly extended regular expressions
the pattern is by default anchored to the start of the line, which can be remedied by putting .* at the beginning
only the first capturing group will be returned, even if you have more than one

if you don’t need the rematch at all, then here are other ways of matching a string against a pattern, regex or not, and they should usually be preferred over expr. expr just has one singular very niche use case that shines when its time is right, such as how I use it in mimix

eval and escaping

there’s a lot of scare about eval, and for very good reason! it’s very powerful yet very dangerous when used in the wrong context

and that’s the thing I want to focus on: in the wrong context. I usually find eval being referred to as “parsing your code twice”, which I think is a bit of a misnomer. for me personally, the posix definition of eval makes it easier to understand. once your outside script parses all the expansions, eval will use the arguments generated from the expansion as shell code, effectively being the same thing as sh -c.

this leads in to how I personally treat eval, and how I feel others should treat eval: if you do not want to give the user of a script a shell, do not pass unfiltered data into eval.

if the users of your script are already expected to have a shell, and if the script does not run with elevated privileges, then eval poses no more of a security threat than whatever the user can do in a normal shell they can access, though it can easily still cause headaches to use

eval can also be very useful when building command line arguments from user input, so long as you take great care to escape said input. in posix shell, this is as simple as doing escaped_input="'$(printf '%s' "$input" | sed "s/'/'\\\\''/g")'". this wraps the input in single quotes, where a shell never expands any special characters within except ', which are also dealt with by replacing them with '\'', as one would in a normal shell. in essence, this is doing the exact same thing as printf %q from bash! I utilize this for eval command argument building in agetar

a miscellaneous quirk to note

process substitution strips trailing newlines, so technically doing escaped_input="'$(printf '%s' "$input" | sed "s/'/'\\\\''/g")'" isn’t enough. for 99.9999% of intents and purposes, you don’t need to worry about stripping trailing newlines at all, but if you somehow need to or want that guarantee of complete and utter safety, replace it with the following: escaped_input="'$(printf '%s' "$input" | sed -e "s/'/'\\\\''/g" -e "s/^$/''/")'". this makes sure that there’s no trailing newlines and allows one to breathe easy

pipeline trick

I’m shamelessly reposting most of what is shown in this excellent github page by izabera to preserve it somewhere other than github

if you want to send the stdout of a process to the stdin of multiple processes at once, it’s simple in bash:

gives_output | tee >(needs_input) >(needs_input_2) >/dev/null

however, it’s much trickier in posix shell. the first thought that comes to many minds (including mine!) is to make fifos for communication with the processes in a background shell:

mkfifo fifo1 fifo2
needs_input < fifo1 &
needs_input_2 < fifo2 &
gives_output | tee fifo1 fifo2 >/dev/null

however, doing this means a separate process group is created for each individual program, which makes it very messy to clean up if you don’t know what you’re doing and have to ^C the process for some reason (I dealt with this when making twitch-notify). to work around this issue, you can instead abuse pipes in a way they sorely weren’t intended to be used, just like so:

mkfifo fifo1 fifo2
gives_output | tee fifo1 fifo2 >/dev/null | \
needs_input < fifo1 | needs_input_2 < fifo2

this keeps all the processes in one nice little process group, and you don’t have to worry about lingering processes when you ^C while it runs. you can even handle cleaning up the fifos in the pipeline:

... | { rm fifo1; needs_input; } < fifo1 | ...

this looks dangrous, but it’s safe since the shell executes redirection first, then opens the fifo in read blocks until there’s a writer

very cool!

back