Thursday, April 8, 2010

New Features In Bash Version 4.x - Part 2

Now that we have covered some of the minor improvements found in bash 4.x, we will begin looking at the more significant new features focusing on changes in the way bash 4.x handles expansions.

Zero-Padded Brace Expansion

As you may recall, bash supports an interesting expansion called brace expansion.  With it, you can rapidly create sequences.  This is often useful for creating large numbers of file names or directories in a hurry.  Here is an example similar to one in my book:

bshotts@twin7: ~$ mkdir -p foo/{2007..2010}-{1..12}
bshotts@twin7: ~$ ls foo
2007-1   2007-4  2008-1   2008-4  2009-1   2009-4  2010-1   2010-4
2007-10  2007-5  2008-10  2008-5  2009-10  2009-5  2010-10  2010-5
2007-11  2007-6  2008-11  2008-6  2009-11  2009-6  2010-11  2010-6
2007-12  2007-7  2008-12  2008-7  2009-12  2009-7  2010-12  2010-7
2007-2   2007-8  2008-2   2008-8  2009-2   2009-8  2010-2   2010-8
2007-3   2007-9  2008-3   2008-9  2009-3   2009-9  2010-3   2010-9

This command creates a series of directories for the years 2007-2010 and the months 1-12.  You'll notice however that the list of directories does not sort very well.  This is because the month portion of the directory name lacks a leading zero for the months 1-9.  To create this directory series with correct names, we would have to do this:

bshotts@twin7:~$ rm -r foo
bshotts@twin7:~$ mkdir -p foo/{2007..2010}-0{1..9} foo/{2007..2010}-{10..12}
bshotts@twin7:~$ ls foo
2007-01  2007-07  2008-01  2008-07  2009-01  2009-07  2010-01  2010-07
2007-02  2007-08  2008-02  2008-08  2009-02  2009-08  2010-02  2010-08
2007-03  2007-09  2008-03  2008-09  2009-03  2009-09  2010-03  2010-09
2007-04  2007-10  2008-04  2008-10  2009-04  2009-10  2010-04  2010-10
2007-05  2007-11  2008-05  2008-11  2009-05  2009-11  2010-05  2010-11
2007-06  2007-12  2008-06  2008-12  2009-06  2009-12  2010-06  2010-12

That's what we want. but we had to basically double the size of our command to do it.

bash version 4.x now allows you prefix zeros to the values being expanded to get zero-padding when the expansion is performed.  For example:

No leading zeros:

bshotts@twin7:~$ echo {1..20}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

One leading zero:

bshotts@twin7:~$ echo {01..20}
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20

Two leading zeros:

bshotts@twin7:~$ echo {001..20}
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020

...and so on.

With this new feature, our directory creating command can be reduced to this:

bshotts@twin7:~$ rm -r foo
bshotts@twin7:~$ mkdir -p foo/{2007..2010}-{01..12}
bshotts@twin7:~$ ls foo
2007-01  2007-07  2008-01  2008-07  2009-01  2009-07  2010-01  2010-07
2007-02  2007-08  2008-02  2008-08  2009-02  2009-08  2010-02  2010-08
2007-03  2007-09  2008-03  2008-09  2009-03  2009-09  2010-03  2010-09
2007-04  2007-10  2008-04  2008-10  2009-04  2009-10  2010-04  2010-10
2007-05  2007-11  2008-05  2008-11  2009-05  2009-11  2010-05  2010-11
2007-06  2007-12  2008-06  2008-12  2009-06  2009-12  2010-06  2010-12

Case Conversion

One of the big themes in bash 4.x is upper/lower-case conversion of strings.  bash adds four new parameter expansions and two new options to the declare command to support it.

So what is case conversion good for?  Aside from the obvious aesthetic value, it has an important role in programming.  Let's consider the case of a database look-up.  Imagine that a user has entered a string into a data input field that we want to look up in a database.  It's possible the user will enter the value in all upper-case letters or lower-case letters or a combination of both.  We certainly don't want to populate our database with every possible permutation of upper and lower case spellings.  What to do?

A common approach to this problem is to normalize the user's input.  That is, convert it into a standardized form before we attempt the database look-up.  We can do this by converting all of the characters in the user's input to either lower or upper-case and ensure that the database entries are normalized the same way.

The declare command in bash 4.x can be used to normalize strings to either upper or lower-case.  Using declare, we can force a variable to always contain the desired format no matter what is assigned to it:

#!/bin/bash

# ul-declare: demonstrate case conversion via declare

declare -u upper
declare -l lower

if [[ $1 ]]; then
        upper="$1"
        lower="$1"
        echo $upper
        echo $lower
fi

In the above script, we use declare to create two variables, upper and lower.  We assign the value of the first command line argument (positional parameter 1) to each of the variables and then display them on the screen:

bshotts@twin7:~$ ul-declare aBc
ABC
abc

As we can see, the command line argument ("aBc") has been normalized.

bash version 4.x also includes four new parameter expansions that perform upper/lower-case conversion:

FormatResult
${parameter,,}Expand the value of parameter into all lower-case.
${parameter,}Expand the value of parameter changing only the first character to lower-case.
${parameter^^}Expand the value of parameter into all upper-case letters.
${parameter^}Expand the value of parameter changing on the first character to upper-case (capitalization).

Here is a script that demonstrates these expansions:

#!/bin/bash

# ul-param - demonstrate case conversion via parameter expansion

if [[ $1 ]]; then
        echo ${1,,}
        echo ${1,}
        echo ${1^^}
        echo ${1^}
fi

Here is the script in action:

bshotts@twin7:~$ ul-param aBc
abc
aBc
ABC
ABc

Again, we process the first command line argument and output the four variations supported by the new parameter expansions.  While this script uses the first positional parameter, parameter my be any string, variable, or string expression.

Further Reading

The Linux Command Line
  • Chapter 8 (covers expansions)
The Bash Reference Manual
The Bash Hackers Wiki
Other installments in this series: 1 2 3 4

3 comments:

  1. Finally, brace expansion with leading zeros, wooohooo :)
    Very usable feature

    ReplyDelete
  2. The increment is also handy, saves you hacking a C-ish for loop sometimes.

    ReplyDelete
  3. Yep, praises on the braces expansion. I intuitively tried it years ago ; it's the obvious thing to do.

    But, I think 001..20 is a bit odd. I guess it makes sense, but I probably would have implemented on 001..020 to remove the ambiguity and "rm" after a typo. :-)

    ReplyDelete