Nushell vs Bash
After last week’s post on how Nushell can be used to convert unstructured data to structured data, I wondered how I would approach the task using traditional tooling. I wrote ‘Bash’ in the title, but what I precicesley mean is POSIX pipes and command line utilities typically found on a Unix-like environment (Linux, macOS, etc).
Again, the task is to convert weirdly structured text output (from sshping) to JSON for further processing or long term storage. For example, I might set up a cronjob to run analytics for my SSH connectivity. The results would be handy to have in JSON for later statistics.
The output format is colon-deliminated with lots of whitespace:
ssh-Login-Time: 1.83 s
Minimum-Latency: 1.81 ms
...
Download-Size: 8.00 MB
Download-Rate: 7.05 MB/s
My search brings me to the jo utility. Very nice and simple. Installation is easy on all supported platforms. I used homebrew. Note that there are also Node.js, Golang and Rust implementations available. I tried the Rust version and it worked fine but offered no benefit over the original, so I stuck with that.
First, jo expects inputs to have ‘=’ as the delimiter. But the input uses ‘:’
as the delimiter. The -d
option did not seem to make a difference, so I
resorted to sed to change the delimiter:
> cat ping.raw \
| sed -e 's/\:/=/' \
| jo
{"ssh-Login-Time":" 1.83 s",
"Minimum-Latency":" 1.81 ms",
...
"Download-Size":" 8.00 MB",
"Download-Rate":" 7.05 MB/s"}
Trimming the whitespace took the longest for me. I knew that jq is a useful command line utility for JSON data, so that’s where I looked. I also took a look at awk but the solutions all seemed that they required saving as a script, and I was looking for a one-liner1.
jq let’s you iterate over values in dictionaries using map_values
and text
replacement can be done using sub
. I combined this to strip leading whitespace
with this: jq 'map_values(sub("^[[:space:]]+"; ""))'
> cat ping.raw \
| sed -e 's/\:/=/' \
| jo \
| jq 'map_values(sub("^[[:space:]]+"; ""))'
{
"ssh-Login-Time": "1.83 s",
"Minimum-Latency": "1.81 ms",
"Median-Latency": "3.18 ms",
"Average-Latency": "3.32 ms",
"Average-Deviation": "987 us",
"Maximum-Latency": "9.77 ms",
"Echo-Count": "1.00 kB",
"Upload-Size": "8.00 MB",
"Upload-Rate": "16.0 MB/s",
"Download-Size": "8.00 MB",
"Download-Rate": "7.05 MB/s"
}
I then realized (after spending all that time getting jq to trim whitespace) that I could have used sed for this, too.
> cat ping.raw \
| sed -e 's/:[[:space:]]*/=/' \
| jo
Interpretation
I would say the “Bash” version is pretty okay. It took me longer to write than it should have. This is because I spend a lot of time figuring out a way to get jq to clean up the whitespace. I will admit that this is not really using Bash and that it sticks to traditional POSIX tools. I use two tools that you would have to install: jq and jo.
For reference, here is the Nushell version:
> open ping.raw | lines | split-column ':' metric value | str value --trim | save ping.json
And here is the Bash version:
> cat ping.raw \
| sed -e 's/:[[:space:]]*/=/' \
| jo
Comparing it to the Nushell version, I like that I can make it more readable by splitting it into multiple lines.
I find the Nushell version much easier to understand in a glance. For example, since jo is new to me, I doubt I will remember what it does in a week’s time. And the regular expression passed to sed is very readable – for a regular expression.
Challenge – Pure POSIX
Can I do this while sticking only to traditional command line tools?
After an hour, I came up with something which works. But this is how I felt about AWK:
"For this reason, awk programs are often refreshingly easy to both write and read."
— Christoph πΊπ¦ π»π (@chsiedentop) April 26, 2020
Here is my AWK script in all its glory:
> cat ping.raw | awk '
BEGIN {
FS = ":[[:space:]]+"; # FileSeparator is ':\s+'
print "\{"; # Start a JSON dict with '{'
};
{
# Print ', "<key>": "<value>"', except for the first line,
# where 'sep' will still be empty.
printf "%s\"%s\": \"%s\"\n",
sep, $1, $2
sep = ", " # Set 'sep' after the first line.
}
END { print "\}" } # Close the JSON dict with '}'
'
I think that doing this task in AWK is an abomination2. It is readable only if you are very familiar with AWK (and I’d say my comments help). It took me also ages to write. And it is the wrong approach – I should not be hand-wrangling strings to write to a common format. Finally, it is also quite verbose. The only positive aspect is that it will work on any machine without any dependencies.
Conclusion
My ranking of the various solutions is this:
- Nushell
- POSIX pipes + sed + jo + jq
- …
- …
- Wait for it …
- …
- AWK
Nu is easier and offers more than sed + jo + jq solution. It is easy to save it not as JSON but as YAML or TOML or any other format. It is also very easy to do a bit of processing along the way. But the sed + jo + jq solution is also good enough for me. It does not require a new shell, and will be familiar to more people.
References:
- jo: https://jpmens.net/2016/03/05/a-shell-command-to-create-json-jo/
- jo repo: https://github.com/jpmens/jo
- Online jq test environment which was an enormous help: https://jqplay.org/
- AWK help from Stackoverflow: https://stackoverflow.com/a/16974094/ (And the manpages of course…)