LinuxCommand.org: Tips, News And Rants: 2010

Thursday, June 10, 2010

New Discount Offer For The Linux Command Line

I recently got a notice from Lulu stating that, for a limited time, you may receive a 10% discount when you order a printed copy of The Linux Command Line. Details from the notice are as follows:

Disclaimer: Use coupon code SUMMERREAD305 at checkout and receive 10% off The Linux Command Line. Maximum savings with this promotion is $10. You can only use the code once per account, and you can't use this coupon in combination with other coupon codes. This great offer ends on June 30, 2010 at 11:59 PM so try not to procrastinate! While very unlikely we do reserve the right to change or revoke this offer at anytime, and of course we cannot offer this coupon where it is against the law to do so.

Thursday, June 3, 2010

My Top 5 Bash Resources

Over the course of writing The Linux Command Line and this blog, I've had frequent need of good reference resources for command line programs including the shell itself, bash. Here is my list of the ones that stand out:

1. The Bash Man Page

Yeah, I know. I spent nearly half a page in my book trashing the bash man page for its impenetrable style and its lack of any trace of user-friendliness, but nothing beats typing "man bash" when you're already working in a terminal. The trick is finding what you want in its enormous length. This can sometimes be a significant problem, but once you find what you are looking for, the information is always concise and authoritative though not always easy to understand. Still, this is the resource I use most often.

2. The Bash Reference Manual

Perhaps in response to the usability issues found in the bash man page, the GNU Project produced the Bash Reference Manual. You can think of it as the bash man page translated into human readable form. While it lacks a tutorial focus and contains no usage examples, it is much easier to read and is more usefully organized than the bash man page.

3. Greg's Wiki

The bash man page and the Bash Reference Manual both extensively document the features found in bash. However, when we need a description of bash behavior, different resources are needed. The best by far is Greg's Wiki. This site covers a variety of topics, but of particular interest to us are the Bash FAQ which contains over one hundred frequently asked questions about bash, the Bash Pitfalls which describes many of the common problems script writers encounter with bash, and the Bash Guide, a useful set of tutorials for bash users. There are also several fun to read rants.

4. Bash Hackers Wiki

Like Greg's Wiki, the Bash Hackers Wiki provides many different articles relating to bash, its features, and its behavior. Included are some useful tutorials on various programming techniques and issues with scripting with bash. While the writing is, at times, a little chaotic, it does contain useful information. Heck, they even trash my Writing Shell Scripts tutorial (Hmmm...I really ought to fix some of that stuff).

5. Chet Ramey's Bash Page

Chet Ramey is the current maintainer of bash and he has his own page. On this page, you can find version information, latest news, and other things. The most useful document on the Bash Page is its version of the Bash FAQ. The NEWS file contains a concise list of features that have been added to each version of bash.

There you have it. Enough reading to keep even the most curious shell user busy for weeks. Enjoy!

Tuesday, June 1, 2010

Using Configuration Files With Shell Scripts

If you have worked with the command line for a while, you have no doubt noticed that many programs use text configuration files of one sort or another. In this lesson, we will look at how we can control shell scripts with external configuration files.

Why Use Configuration Files?

Since shell scripts are just ordinary text files, why should we bother with additional text configuration files? There are a couple of reasons that you might want to consider them:

Having configuration files removes the need to make changes to a script. There may be cases where you want to insure that a script remains in its original form.
In particular, you may want to have a script that is shared by multiple users and each user has a specific desired configuration. Using individual configuration files prevents the need to have multiple copies of the script, thus making administration easier.

Sourcing Files

Implementing configuration files in most programming languages is a fairly complicated undertaking, as you must write code to parse the configuration file's content. In the shell, however, parsing is automatic because you can use regular shell syntax.

The shell builtin command that makes this trick work is named source. The source command reads a file and processes its content as if it were coming from the keyboard. Let's create a very simple shell script to demonstrate sourcing in action. We'll use the cat command to create the script:

me@linuxbox:~$ cat > bin/cd_script
#!/bin/bash
cd /usr/local
echo $PWD

Press Ctrl-d to signal end-of-file to the cat command. Next, we will set the file attributes to make the script executable:

me@linuxbox:~$ chmod +x bin/cd_script

Finally, we will run the script:

me@linuxbox:~$ cd_script
/usr/local
me@linuxbox:~$

The script executes and by doing so it changes the directory to /usr/local and then outputs the name of the current working directory which is /usr/local. Notice however, that when the shell prompt returns, we are still in our home directory. Why is this? While it may appear at first that the script did not change directories, it did as evidenced by the output of the PWD shell variable. So why isn't the directory still changed when the script terminates?

The answer lies in the fact that when you execute a shell script, a new copy of the shell is launched and with it comes a new copy of the environment. When the script finishes, the copy of the shell is destroyed and so is its environment As a general rule, a child process, such as the shell running a script, is not permitted to modify the environment of the parent process.

So if we actually wanted to change the working directory in the current shell, we would need to use the source command and to read the contents of our script. Note that the name of the source command may be abbreviated as a single dot followed by a space.

me@linuxbox:~$ . cd_script
/usr/local
me@linuxbox:/usr/local$

By sourcing the file, the working directory is changed in current shell as we can see by the trailing portion of the shell prompt. Be aware that, by default, the shell will search the directories listed in the PATH variable for the file to be read. Files that are read by source do not have to be executable, nor do they need to start with the shebang (i.e. #!) mechanism.

Implementing Configuration Files In Scripts

Now that we see how sourcing works, let's try our hand at writing a script that uses a the source command to read a configuration file.

In part 4 of the Getting Ready For Ubuntu 10.04 series, we wrote a script to perform a backup of our system to an external USB disk drive. The script looked like this:

#!/bin/bash

# usb_backup # backup system to external disk drive

SOURCE="/etc /usr/local /home"
DESTINATION=/media/BigDisk/backup

if [[ -d $DESTINATION ]]; then
   sudo rsync -av \
       --delete \
       --exclude '/home/*/.gvfs' \
       $SOURCE $DESTINATION
fi

You will notice that the source and destination directories are hard-coded into the SOURCE and DESTINATION constants at the beginning of the script. We will remove these and modify the script to read a configuration file instead:

#!/bin/bash

# usb_backup2 # backup system to external disk drive

CONFIG_FILE=~/.usb_backup.conf

if [[ -f $CONFIG_FILE ]]; then
        . $CONFIG_FILE
fi

if [[ -d $DESTINATION ]]; then
        sudo rsync -av \
                --delete \
                --exclude '/home/*/.gvfs' \
                $SOURCE $DESTINATION
fi

Now we can create a configuration file named ~/.usb_backup2.conf that contains these two lines:

SOURCE="/etc /usr/local /home"
DESTINATION=/media/BigDisk/backup

When we run the script, the contents of the configuration file is read and the SOURCE and DESTINATION constants are added to the script's environment just as though the lines were in the text of the script itself. The

if [[ -f $CONFIG_FILE ]]; then
        . $CONFIG_FILE
fi

construct is a common way to set up the reading of a file. In fact, if you look at your ~/.profile or ~/.bash_profile startup files, you will probably see something like this:

if [ -f "$HOME/.bashrc" ]; then
   . "$HOME/.bashrc"
fi

which is how your environment is established when you log in at the console.

While our script in its current form requires the configuration file to define the SOURCE and DESTINATION constants, it's easy to make the use of the file optional by setting default values for the constants if the configuration file is either missing or does not contain the required definitions. We will modify our script to set default values and also support an optional command line option (-c) to specify an optional, alternate configuration file name:

#!/bin/bash

# usb_backup3 # backup system to external disk drive

# Look for alternate configuration file
if [[ $1 == -c ]]; then
   CONFIG_FILE=$2
else
   CONFIG_FILE=~/.usb_backup.conf
fi

# Source configuration file
if [[ -f $CONFIG_FILE ]]; then
   . $CONFIG_FILE
fi

# Fill in any missing values with defaults
SOURCE=${SOURCE:-"/etc /usr/local /home"}
DESTINATION=${DESTINATION:-/media/BigDisk/backup}

if [[ -d $DESTINATION ]]; then
    sudo rsync -av \
       --delete \
       --exclude '/home/*/.gvfs' \
       $SOURCE $DESTINATION
fi

Code Libraries

Since the files read by the source command can contain any valid shell commands, source is often used to load collections of shell functions into scripts. This allows central libraries of common routines to be shared by multiple scripts. This can make code maintenance considerably easier.

Security Considerations

On the other hand, since sourced files can contain any valid shell command, care must be take to make sure that nothing malicious is placed in a file that is to be sourced. This holds especially true for any script that is to be run by the superuser. When writing such scripts, make sure that the super user owns the file to be sourced and that the file is not world-writable. Some code like this could do the trick:

if [[ -O $CONFIG_FILE ]]; then
    if [[ $(stat --format %a $CONFIG_FILE) == 600 ]]; then
        . $CONFIG_FILE
    fi
fi

Further Reading

The bash man page:

BUILTIN COMMANDS (source command)
CONDITIONAL EXPRESSIONS (testing file attributes)

The Linux Command Line:

Chapter 35 - Strings And Numbers (parameter expansions to set default values)

Thursday, May 27, 2010

Will The iPad Kill The Netbook?

Ever since Apple announced the iPad, there have been countless stories in the press about the iPad's effect on the netbook market. I'm a big fan of netbooks and I agree that the netbook market is in trouble but it's not because of the iPad.

It's because of Windows.

Now, I don't mean this as a piece of simple-minded anti-MS snark (though I am fully capable ;-). I'm serious. Windows is the problem with netbooks. Installing Windows on a netbook changes the device from a small effective portable Internet interaction device into a tiny, underpowered, laptop computer.

In the beginning of the netbook revolution, hardware makers chose Linux. The first generation of netbooks featured small screens (7-9 inch) and solid-state disks. To make use of this platform, they pretty much had to use Linux because of its small footprint and easy customization allowing manufacturers the freedom to create user interfaces appropriate to the device. The fact that the OS license was free didn't hurt either given the price points that netbooks originally held.

So what went wrong?

First, Microsoft was able to respond to the threat to its consumer OS monopoly by releasing a version of Windows XP with ultra-cheap licensing provided that the computer was suitably underpowered. Asus, for example, sold both Linux and Windows versions of it's netbooks for a time. Both models cost the same but the Linux model had a larger drive. Why? Because the Windows license placed a cap on size of the drive that would qualify the computer for the low-cost license.

Second, the Linux distributions supplied by the netbook makers were not very good. I can personally attest to that. My editor's eeePC 901 (pictured above with my own HP Mini 1116NR) came with the Asus version of Xandros and frankly, it sucked. After struggling with it for several months, I replaced the Xandros with Ubuntu 9.04 Netbook Remix and now the machine is a delight.

Finally, in response to the inappropriate user interface Windows provides for small screen devices, netbook makers made netbooks larger with 10-12 inch screens and they gave up on solid-state drives. Almost all netbooks today come with slow 160 GB hard disks. So now you have a slow 12 inch laptop that costs about the same as a "real" laptop and isn't really that portable anymore. No wonder nearly one-third of netbook shoppers are buying IPads instead.

Interface, Interface, Interface.

But the iPad should not be directly competitive with netbooks at the conceptual level. In many ways the iPad is a remarkable device for content consumption. Unlike a Windows computer, it requires virtually no system administration. This makes the device a perfect "television of the future" where one just uses it to passively consume content. However, its lack of a real keyboard and limited connectivity options makes it a poor choice as a portable Internet interaction device; a role that the netbook hardware platform excels in.

Clearly, Apple devised a near perfect user interface for a tablet, something Microsoft was never able to do. It is possible that the next generation of netbooks will do better. There have been a number of announcements of upcoming models that will be based on ARM chips using operating systems, such as Android, better suited to mobile devices. Even as much as I like Ubuntu's netbook remix, it's still a crude hack to shoehorn a desktop OS onto a small screen computer.

Thanks for listening! See you again soon.

Monday, May 24, 2010

Site News: 20,000 Downloads And New Series Navigation

A few updates on the state of the site:

The book reached the 20,000 download mark over the weekend. Thanks everybody! This represents the number of downloads performed from the Sourceforge site, but there are probably more since the book is mirrored at a variety of other sites throughout the world.
If you have been thinking about purchasing a printed copy of The Linux Command Line, now may be the time. Lulu (my on-demand publisher) has a limited time, free shipping offer! See the lulu.com home page for details. You can order your very own copy directly from here.
Last week I updated many of my previous blog posts to add handy navigation links for the multi-part series posts. It is now much easier to move from installment to installment in my popular series. Included are Building An All-Text Linux Workstation, New Features In Bash Version 4.x, and Getting Ready For Ubuntu 10.04.

Enjoy!

Tuesday, May 18, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 5

For our final installment, we're going to install and perform some basic configuration on our new Ubuntu 10.04 system.

Downloading The Install Image And Burning A Disk

We covered the process of getting the CD image and creating the install media in installment 3. The process is similar. You can download the CD image here. Remember to verify the MD5SUM of the disk you burn. We don't want to have a failed installation because of a bad disk. Also, be sure to read the 10.04 release notes to avoid any last minute surprises.

Last Minute Details

There may be a few files that we will want to transfer to the new system immediately, such as the package_list.old.txt file we created in installment 4 and each user's .bashrc file. Copy these files to a flash drive (or use Ubuntu One, if you're feeling adventuresome).

Install!

We're finally ready for the big moment. Insert the install disk and reboot. The install process is similar to previous Ubuntu releases.

Apply Updates

After the installation is finished and we have rebooted into our new system, the first thing we should do is apply all the available updates. When I installed last week, there were already 65 updates. Assuming that we have a working Internet connection, we can apply the updates with the following command:

me@linuxbox ~$ sudo apt-get update; sudo apt-get upgrade

Since the updates include a kernel update, reboot the system after the updates are applied.

Install Additional Packages

The next step is to install any additional software we want on the system. To help with this task, we created a list in installment 4 that contained the names of all of the packages on the old system. We can compare this list with the new system using the following script:

#!/bin/bash

# compare_packages - compare lists of packages

OLD_PACKAGES=~/package_list.old.txt
NEW_PACKAGES=~/package_list.new.txt

if [[ -r $OLD_PACKAGES ]]; then
   dpkg --list | awk '$1 == "ii" {print $2}' > $NEW_PACKAGES
   diff -y $OLD_PACKAGES $NEW_PACKAGES | awk '$2 == "<" {print $1}'
else
   echo "compare_packages: $OLD_PACKAGES not found." >&2
   exit 1
fi

This scripts produces a list of packages that were present on the old system but not yet on the new system. You will probably want to capture the output of this script and store it in a file:

me@linuxbox ~ $ compare_packages > missing_packages.txt

You should review the output and apply some editorial judgement as it is likely the list will contain many packages that are no longer used on the new system in addition to the packages that you do want to install. As you review the list, you can use the following command to get a description of a package:

apt-cache show package_name

Once you determine the final list of packages to be installed, you can install each package using the command:

sudo apt-get install package_name

or, if you are feeling especially brave, you can create a text file containing the list of desired packages to install and do them all at once:

me@linuxbox ~ $ sudo xargs apt-get install < package_list.txt

Create User Accounts

If your old system had multiple user accounts, you will want to recreate them before restoring the user home directories. You can create accounts with this command:

sudo adduser user

This command will create the user and group accounts for the specified user and create the user's home directory.

Restore The Backup

If you created your backup using the usb_backup script from installment 4 you can use this script to restore the /usr/local and /home directories:

#!/bin/bash

# usb_restore - restore directories from backup drive with rsync

BACKUP_DIR=/media/BigDisk/backup
ADDL_DIRS=".ssh"

sudo rsync -a $BACKUP_DIR/usr/local /usr

for h in /home/* ; do
   user=${h##*/}
   for d in $BACKUP_DIR$h/*; do
       if [[ -d $d ]]; then
           if [[ $d != $BACKUP_DIR$h/Examples ]]; then
               echo "Restoring $d to $h"
               sudo rsync -a "$d" $h
           fi
       fi
   done
   for d in $ADDL_DIRS; do
       d=$BACKUP_DIR$h/$d
       if [[ -d $d ]]; then
           echo "Restoring $d to $h"
           sudo rsync -a "$d" $h
       fi
   done
   # Uncomment the following line if you need to correct file ownerships
   #sudo chown -R $user:$user $h
done

You should adjust the value of the ADDL_DIRS constant to include hidden directories you want to restore, if any, as this script does not restore any directory whose name begins with a period to prevent restoration of configuration files and directories.

Another issue you will probably encounter is the ownership of user files. Unless the user ids of each of the users on old system match the user ids of the users on the new system, rsync will restore them with the user ids of the old system. To overcome this, uncomment the chown line near the end of the script.

If you made your backup using the usb_backup_ntfs script, use this script to restore the /usr/local and /home directories:

#!/bin/bash

# usb_restore_ntfs - restore directories from backup drive with tar

BACKUP_DIR=/media/BigDisk_NTFS/backup

cd /
sudo tar -xvf $BACKUP_DIR/usrlocal.tar

for h in /home/* ; do
   user=${h##*/}
   sudo tar   -xv \
           --seek \
           --wildcards \
           --exclude="home/$user/Examples" \
           -f $BACKUP_DIR/home.tar \
           "home/$user/[[:alnum:]]*" \
           "home/$user/.ssh"
done

To append additional directories to the list to be restored, add more lines to the tar command using the "home/$user/.ssh" line as a template. Since tar restores user files using user names rather than user ids as rsync does, the ownership of the restored files is not a problem.

Enjoy!

Once the home directories are restored, each user should reconfigure their desktop and applications to their personal taste. Other than that, the system should be pretty much ready-to-go. Both of the backup methods provide the /etc directory from the old system for reference in case it's needed.

Further Reading

Man pages for the following commands:

apt-cache
apt-get
adduser
xargs

Other installments in this series: 1 2 3 4 4a 5

Saturday, May 15, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 4a

After some experiments and benchmarking, I have modified the usb_backup_ntfs script presented in the last installment to remove compression. This cuts the time needed to perform the backup using this script by roughly half. The previous script works, but this one is better:

#!/bin/bash

# usb_backup_ntfs # backup system to external disk drive

SOURCE="/etc /usr/local /home"
NTFS_DESTINATION=/media/BigDisk_NTFS/backup

if [[ -d $NTFS_DESTINATION ]]; then
   for i in $SOURCE ; do
       fn=${i//\/}
       sudo tar -cv \
           --exclude '/home/*/.gvfs' \
           -f $NTFS_DESTINATION/$fn.tar $i
   done
fi

Further Reading

Other installments in this series: 1 2 3 4 4a 5

Tuesday, May 11, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 4

Despite my trepidations, I'm going to proceed with the upgrade to Ubuntu 10.04. I've already upgraded my laptop and with Sunday's release of an improved totem movie player, the one "show stopper" bug has been addressed. I can live with/work around the rest. The laptop does not contain much permanent data (I use it to write and collect images from my cameras when I travel) so wiping the hard drive and installing a new OS is not such a big deal. My desktop system is another matter. I store a lot of stuff on it and have a lot of software installed, too. I've completed my testing using one of my test computers verifying that all of the important apps on the system can be set up and used in a satisfactory manner, so in this installment we will look at preparing the desktop system for installation of the new version of Ubuntu.

Creating A Package List

In order to get a grip on the extra software I have installed on my desktop, I started out just writing a list of everything I saw in the desktop menus that did not appear on my 10.04 test systems. This is all the obvious stuff like Thunderbird, Gimp, Gthumb, etc., but what about the stuff that's not on the menu? I know I have installed many command line programs too. To get a complete list of the software installed on the system, we'll have to employ some command line magic:

me@twin7$ dpkg --list | awk '$1 == "ii" {print $2}' > ~/package_list.old.txt

This creates a list of all of the installed packages on the system and stores it in a file. We'll use this file to compare the package set with that of the new OS installation.

Making A Backup

The most important task we need to accomplish before we install the new OS is backing up the important data on the system for later restoration after the upgrade. For me, the files I need to preserve are located in /etc (the system's configuration files. I don't restore these, but keep them for reference), /usr/local (locally installed software and administration scripts), and /home (the files belonging to the users). If you are running a web server on your system, you will also probably need to backup portions of the /var directory as well.

There are many ways to perform backups. My systems normally backup every night to a local file server on my network, but for this exercise we'll use an external USB hard drive. We'll look at two popular methods: rsync and tar.

The choice of method depends on your needs and on how your external hard drive is formatted. The key feature afforded by both methods is that they preserve the attributes (permissions, ownerships, modification times, etc.) of the files being backed up. Another feature they both offer is the ability to exclude files from the backup because there are a few things that we don't want.

The rsync program copies files from one place to another. The source or destination may be a network drive, but for our purposes we will use a local (though external) volume. The great advantage of rsync is that once an initial copy is performed, subsequent updates can be made very rapidly as rsync only copies the changes made since the previous copy. The disadvantage of rsync is that the destination volume has to have a Unix-like file system since it relies on it to store the file attributes.

Here we have a script that will perform the backup using rsync. It assumes that we have an ext3 formatted file system on a volume named BigDisk and that the volume has a backup directory:

#!/bin/bash

# usb_backup - Backup system to external disk drive using rsync

SOURCE="/etc /usr/local /home"
EXT3_DESTINATION=/media/BigDisk/backup

if [[ -d $EXT3_DESTINATION ]]; then
   sudo rsync -av \
       --delete \
       --exclude '/home/*/.gvfs' \
       $SOURCE $EXT3_DESTINATION
fi

The script first checks that the destination directory exists and then performs rsync. The --delete option removes files on the destination that do not exist on the source. This way a perfect mirror of the source is maintained. We also exclude any .gvfs directories we encounter. They cause problems. This script can be used as a routine backup procedure. Once the initial backup is performed, later backups will be very fast since rsync identifies and copies only files that have changed between backups.

Our second approach uses the tar program. tar (short for tape archive) is a traditional Unix tool used for backups. While its original use was for writing files on magnetic tape, it can also write ordinary files. tar works by recording all of the source files into a single archive file called a tar file. Within the tar file all of the source file attributes are recorded along with the file contents. Since tar does not rely on the native file system of the backup device to store the source file attributes, it can use any Linux-supported file system to store the archive. This makes tar the logical choice if you are using an off-the-shelf USB hard drive formatted as NTFS. However, tar has a significant disadvantage compared to rsync. It is extremely cumbersome to restore single files from an archive if the archive is large.

Since tar writes its archives as though it were writing to magnetic tape, the archives are a sequential access medium. This means to find something in the archive, tar must read through the entire archive starting from the beginning to retrieve the information. This is opposed to a direct access medium such as a hard disk where the system can rapidly locate and retrieve a file directly. It's like the difference between a DVD and a VHS tape. With a DVD you can immediately jump to a scene whereas with a VHS tape you have to scan down the entire length of the tape until you get to the desired spot.

Another disadvantage compared to rsync is that each time you perform a backup, you have to copy every file again. This is not a problem for a one time backup like the one we are performing here but would be very time consuming if used as a routine procedure.

By the way, don't attempt a tar based backup on a VFAT (MS-DOS) formatted drive. VFAT has a maximum file size limit of 4GB and unless you have a very small set of home directories, you'll exceed the limit.

Here is our tar backup script:

#!/bin/bash

# usb_backup_ntfs - Backup system to external disk drive using tar

SOURCE="/etc /usr/local /home"
NTFS_DESTINATION=/media/BigDisk_NTFS/backup

if [[ -d $NTFS_DESTINATION ]]; then
   for i in $SOURCE ; do
       fn=${i//\/}
       sudo tar -czv \
           --exclude '/home/*/.gvfs' \
           -f $NTFS_DESTINATION/$fn.tgz $i
   done
fi

This script assumes a destination volume named BigDisk_NTFS containing a directory named backup. While we have implied that the volume is formatted as NTFS, this script will work on any Linux compatible file system that allows large files. The script creates one tar file for each of the source directories. It constructs the destination file names by removing the slashes from the source directory names and appending the extension ".tgz" to the end. Our invocation of tar includes the z option which applies gzip compression to the files contained within the archive. This slows things down a little, but saves some space on the backup device.

Other Details To Check

Since one of the goals of our new installation is to utilize new versions of our favorite apps starting with their native default configurations, we won't be restoring many of the configuration files from our existing system. This means that we need to manually record a variety of configuration settings. This information is good to have written down anyway. Record (or export to a file) the following:

Email Configuration
Bookmarks
Address Books
Passwords
Names Of Firefox Extensions
Others As Needed

Ready, Set, Go!

That about does it. Once our backups are made and our settings are recorded, the next thing to do is insert the install CD and reboot. I'll see you on the other side!

Further Reading

The following chapters in The Linux Command Line

Chapter 16 - Storage Media (covers formatting external drives)
Chapter 19 - Archiving And Backup (covers rsync, tar, gzip)

Man pages:

rsync
tar

An article describing how to add NTFS support to Ubuntu 8.04

http://maketecheasier.com/how-to-reformat-an-external-hard-drive-to-ntfs-format-in-ubuntu-hardy/2008/09/29

Other installments in this series: 1 2 3 4 4a 5

Friday, April 30, 2010

The Bugs In Ubuntu 10.04

Now that Ubuntu 10.04 ("Lucid Lynx") has been released, I can spend some time talking about my experience testing it.

I was really hoping that 10.04, being a LTS (Long Term Support) release, would have focused on supreme reliability and stability. A sort of "9.10 without the bugs." Unfortunately this was not the case. 10.04 introduces a host of new features and technologies, some of which are still rather "green."

In the comments to follow, I was to vent some of my frustrations over the quality of the 10.04 release. I don't want to disparage the work of the people in the Ubuntu community nor the staff at Canonical. I'm sure they worked very hard getting this version out. Many of the problems are rooted in the upstream projects from which Ubuntu is derived.

A Pet Peeve

If you go to the bug tracking site for Ubuntu, the first thing you see is a long list of open "Critical" bugs. Looking at the list you notice that some of these bugs are very old. Discounting the symbolic bug number 1, the "We don't have as much market share as Microsoft" bug, you see that some of these open, critical bugs are years old.

Now, I used to administer a bug database (albeit a much smaller one than the one at Ubuntu), and to my eye this just looks terrible. It leaves a bad impression. If the bug is really "Critical," then it should get addressed either by marking it no longer relevant due to its age, or it should get fixed.

Overt Cosmetic Problems

While a lot of attention was given to the look of Ubuntu 10.04, serious cosmetic problems appear on many machines. Neither of my test systems could take full advantage of the visual "improvements" during boot up. There are a lot of distracting flashes and, on my laptop, it displays the image you see at the top of this post.

Not very impressive.

O.K. so maybe I have weird hardware of something, but how do you explain this: click on the help icon on the top panel and wait for Ubuntu Help Center to come up. Select "Advanced Topics" from the topics list then "Terminal Commands References [sic]" and look at the result:

Nice.

Connectivity Issues

I'm sure that the folks at Canonical are very interested in corporate adoption of their product. This means, of course that it has got to play well with Windows file shares. Unfortunately, here too, 10.04 falls down. Throughout the beta test phase there were numerous problems with gvfs and gnome-keyring. As it stands now, many of these of these problems have been worked out, but as of today, you still cannot mount a password protected Windows share if you store the password. It seems to work at first, but if you reboot your machine and try it again you get something like this:

X Problems Galore

While I encountered some important problems with my desktop test system, they were nothing compared to the problems I have with my laptop.

A few words about my laptop. Yeah it's old. It's an IBM ThinkPad T41 circa 2004. I bought it from EmperorLinux pre-installed with Fedora Core 1. Since then, I have run numerous Linux distributions on it without problems. In fact, the T41 was a favorite of Linux users since it ran Linux so easily. This is, until 10.04.

Various things will just crash the X server, such as using the NoScript Firefox extension, or the remote desktop viewer. Suspend and resume don't work (the display never comes back). You can work around these problems if you add the kernel parameter nomodeset to grub, but then all of your videos in totem look like this:

Not exactly what I had in mind.

A Release Too Far?

I am still looking forward to upgrading to 10.04. I'm hopeful that, in time, the issues I have with 10.04 will be addressed. But what about in the meantime, with thousands of possibly new users trying out the live CD only to find issues like the ones I found? That's not good.

Admittedly, I got spoiled by 8.04. It has served me very well for the last two years. It got me through the production of my book without crashing once and frankly I'm willing to wait a few seconds more for my system to boot and willing to give up a lot of CPU-sucking, memory-hogging eye candy to have a system that stays up and can do the basic things right.

Further Reading

[UPDATE] I'm not the only one with concerns about 10.04:

http://cristalinux.blogspot.com/2010/04/ubuntu-1004-lucid-lynx-final-review.html

Thursday, April 29, 2010

Ubuntu 10.04 Has Been Released.

I'm downloading the CD as I write. You can get it here.

Tuesday, April 27, 2010

The Financial Physics of Free Software

In the Internet age, does software have value? Of course software is valuable in the sense that it provides service and is useful, but does software have monetary value?

If one looks at the law of supply and demand, the fact that software, like all other forms of digital content, can be endlessly reproduced and distributed at virtually no cost negates its value because software distributed this way lacks scarcity. Digital content is simply not a scarce resource. This hasn't stopped people from trying to impose artificial scarcity on digital data through the use of digital restrictions management (DRM) and draconian imaginary property laws but these approaches have had only limited success. This is not surprising as attempting to create an artificial shortage goes against the physical nature of the Internet and of computers themselves.

The Proprietary Model

If you have ever checked out my resume, you know that I spent the greater portion of my career in the proprietary software world and was, at one time, a big supporter of proprietary software. I was fortunate to have spent all of my years in the software industry working for small companies where one could wander the halls and learn every aspect of the business. In addition to being a technical manager, I had number of marketing and sales assignments as well.

Software development in the proprietary world is speculative. Typically, a product manager or marketing director is given the assignment of coming up with the "next big thing," a product that can sold to many customers at a profit. The reason that it has to be big is because proprietary software development is fantastically expensive. The product manager will present ideas to management and get approval for some personnel and a budget based on the product manager's forecasts for delivery dates and sales targets. After approval, the software development process begins, In some companies this process is very formal including requirements specifications, design reviews, test plans, etc. At the end of the process, the software product goes to market. This involves a number significant expenses including marketing, advertising, trade shows, etc.

It is important to remember that proprietary software companies don't actually sell software. They sell licenses. It is through this mechanism that they attempt to create a scarcity that gives their product value.

Proprietary software only has value once it is written. You will sometimes see product announcements appear for non-existent yet-to-be-developed products. Such products are derisively known as "vaporware" in the industry because proprietary software does not have value until it is written and actually availabile in short supply.

The Free Software Model

To members of the proprietary software community, the notion of free software appears insane. This is because they think that free software means that they have to go through all of the steps and expense of the process above and then not collect any revenue on the back-end. There are a number of problems with this assumption.

The development process for free software is fundamentally different. First off, it is not speculative. Developers of free software typically have an interest in actually using the program they want to write. It also means that free software developers are usually subject matter experts for their chosen program.

Free software is much less expensive to produce than proprietary software. The development process is much less formal than closed proprietary processes owing to the fact that development is done in the open. This allows a more natural and organic method of solving problems and fixing bugs and, unlike proprietary development, the development tools and shared software components are free. Free software also does not incur the engineering overhead of implementing "copy protection," user registration systems, and tiered product versions that are used to establish upgrade paths for proprietary products.

Finally, free software products don't have the marketing and sales expenses of proprietary software.

Making Money

While the proprietary software appears to make a lot of money now, is it sustainable? Will the Internet and its ability to perform infinite duplication and distribution drain the value from software? Only time will tell, but I'm betting that the Internet will emerge victorious. We can already see the signs of this victory with the rise of "cloud computing" which is eliminating the need for software all together. But cloud computing raises a number of issues including privacy and security, as well as freedom.

There has been a lot of discussion of how to make money with free software. Most of the ideas put forth involve charging for services. After all, Red Hat, a very successful software company, makes its money that way, but I want to suggest another possibility.

As we saw, proprietary software only has value after it is written and is available for license sales. The free software model assumes from the start that once a program is written it no longer has value because it is not scarce. In contrast to proprietary software, free software only has value before it is written. The absence of a desired software program is the ultimate scarcity. There exists an opportunity to exploit this fact. It's not really a new idea by any means. This is how the custom software business works. Clients want something and pay big money to get something written. What I envision is a business that somehow aligns many clients with developers so that the cost of development can be spread out among many clients.

What will such a business look like? That's an exercise I will leave to my more entrepreneurial readers.

Further Reading

Thursday, April 22, 2010

New Features In Bash Version 4.x - Part 4

In this final installment of our series, we will look at perhaps the most significant area of change in bash version 4.x: arrays.

Arrays In bash

Arrays were introduced in version 2 of bash about fifteen years ago. Since then, they have been on the fringes of shell programming. I, in fact, have never seen a shell script "in the wild" that used them. None of the scripts on LinuxCommand.org, for example, use arrays.

Why is this? Arrays are widely used in other programming languages. I see two reasons for this lack of popularity. First, arrays are not a traditional shell feature. The Bourne shell, which bash was designed to emulate and replace, offers no array support at all. Second, arrays in bash are limited to a single dimension, an unusual limitation given that virtually every other programming language supports multi-dimensional arrays.

Some Background

I devote a chapter in my book to bash arrays but briefly, bash supports single dimension array variables. Arrays behave like a column of numbers in a spread sheet. A single array variable contains multiple values called elements. Each element is accessed via an address called an index or subscript. All versions of bash starting with version 2 support integer indexes. For example, to create a five element array in bash called numbers containing the strings "zero" through "four", we would do this:

bshotts@twin7:~$ numbers=(zero one two three four)

After creating the array, we can access individual elements by specifying the array element's index:

bshotts@twin7:~$ echo ${numbers[2]}
two

The braces are required to prevent the shell from misinterpreting the brackets as wildcard characters used by pathname expansion.

Arrays are not very useful on the command line but are very useful in programming because they work well with loops. Here's an example using the array we just created:

#!/bin/bash

# array-1: print the contents of an array

numbers=(zero one two three four)

for i in {0..4}; do
   echo ${numbers[$i]}
done

When executed, the script prints each element in the numbers array:

bshotts@twin7:~$ array-1
zero
one
two
three
four

mapfile Command

bash version 4 added the mapfile command. This command copies a file line-by-line into an array. It is basically a substitute for the following code:

while read line
    array[i]="$line"
    i=$((i + 1))
done < file

with mapfile, you can use the following in place of the code above:

mapfile array < file

mapfile handles the case of a missing newline at the end of the file and creates empty array elements when it encounters a blank line in the file. It also supports ranges within the file and the array.

Associative Arrays

By far, the most significant new feature in bash 4.x is the addition of associative arrays. Associative arrays use strings rather than integers as array indexes. This capability allow interesting new approaches to managing data. For example, could create an array called colors and use color names as indexes:

colors["red"]="#ff0000"
colors["green"]="#00ff00"
colors["blue"]="#0000ff"

Associative array elements are accessed in much the same way as integer indexed arrays:

echo ${colors["blue"]}

In the script that follows, we will look at several programming techniques that can be employed in conjunction with associative arrays. This script, called array-2, when given the name of a directory, prints a lsting of the files in the directory along with the names of the the file's owner and group owner. At the end of listing, the script prints a tally of the number of files belonging to each owner and group. Here we see the results (truncated for brevity) when the script is given the directory /usr/bin:

bshotts@twin7:~$ array-2 /usr/bin
/usr/bin/2to3-2.6                        root       root
/usr/bin/2to3                            root       root
/usr/bin/a2p                             root       root
/usr/bin/abrowser                        root       root
/usr/bin/aconnect                        root       root
/usr/bin/acpi_fakekey                    root       root
/usr/bin/acpi_listen                     root       root
/usr/bin/add-apt-repository              root       root
.
.
.
/usr/bin/zipgrep                         root       root
/usr/bin/zipinfo                         root       root
/usr/bin/zipnote                         root       root
/usr/bin/zip                             root       root
/usr/bin/zipsplit                        root       root
/usr/bin/zjsdecode                       root       root
/usr/bin/zsoelim                         root       root

File owners:
daemon    :     1 file(s)
root      : 1394 file(s)

File group owners:
crontab   :     1 file(s)
daemon    :     1 file(s)
lpadmin   :     1 file(s)
mail      :     4 file(s)
mlocate   :     1 file(s)
root      : 1380 file(s)
shadow    :     2 file(s)
ssh       :     1 file(s)
tty       :     2 file(s)
utmp      :     2 file(s)

Here is a listing of the script:

   1   #!/bin/bash
     2
     3   # array-2: Use arrays to tally file owners
     4
     5   declare -A files file_group file_owner groups owners
     6
     7   if [[ ! -d "$1" ]]; then
     8       echo "Usage: array-2 dir" >&2
     9       exit 1
    10   fi
    11
    12   for i in "$1"/*; do
    13       owner=$(stat -c %U "$i")
    14       group=$(stat -c %G "$i")
    15       files["$i"]="$i"
    16       file_owner["$i"]=$owner
    17       file_group["$i"]=$group
    18       ((++owners[$owner]))
    19       ((++groups[$group]))
    20   done
    21
    22   # List the collected files
    23   { for i in "${files[@]}"; do
    24       printf "%-40s %-10s %-10s\n" \
    25           "$i" ${file_owner["$i"]} ${file_group["$i"]}
    26   done } | sort
    27   echo
    28
    29   # List owners
    30   echo "File owners:"
    31   { for i in "${!owners[@]}"; do
    32       printf "%-10s: %5d file(s)\n" "$i" ${owners["$i"]}
    33   done } | sort
    34   echo
    35
    36   # List groups
    37   echo "File group owners:"
    38   { for i in "${!groups[@]}"; do
    39       printf "%-10s: %5d file(s)\n" "$i" ${groups["$i"]}
    40   done } | sort

Line 5: Unlike integer indexed arrays, which are created by merely referencing them, associative arrays must be created with the declare command using the new -A option. In this script we create five arrays as follows:

files contains the names of the files in the directory, indexed by file name
file_group contains the group owner of each file, indexed by file name
file_owner contains the owner of each file, indexed by file name
groups contains the number of files belonging to the indexed group
owners contains the number of files belonging to the indexed owner

Lines 7-10: Checks to see that a valid directory name was passed as a positional parameter. If not, a usage message is displayed and the script exits with an exit status of 1.

Lines 12-20: Loop through the files in the directory. Using the stat command, lines 13 and 14 extract the names of the file owner and group owner and assign the values to their respective arrays (lines 16, 17) using the name of the file as the array index. Likewise the file name itself is assigned to the files array (line 15).

Lines 18-19: The total number of files belonging to the file owner and group owner are incremented by one.

Lines 22-27: The list of files is output. This is done using the "${array[@]}" parameter expansion which expands into the entire list of array element with each element treated as a separate word. This allows for the possibility that a file name may contain embedded spaces. Also note that the entire loop is enclosed in braces thus forming a group command. This permits the entire output of the loop to be piped into the sort command. This is necessary because the expansion of the array elements is not sorted.

Lines 29-40: These two loops are similar to the file list loop except that they use the "${!array[@]}" expansion which expands into the list of array indexes rather than the list of array elements.

Further Reading

The Linux Command Line

Chapter 36 (Arrays)
Chapter 37 (Group commands)

Other bash 4.x references:

A Wikipedia article on associative arrays:

http://en.wikipedia.org/wiki/Associative_array

The Complete HTML Color Chart:

http://www.immigration-usa.com/html_colors.html

Other installments in this series: 1 2 3 4

Ubuntu 10.04 RC Has Been Released

For those of you following along with my Getting Ready For Ubuntu 10.04 series, the Release Candidate has just come out. LWN has the release announcement.

Wednesday, April 21, 2010

Hitler, as Downfall producer, orders a DMCA takedown

As you may have heard, the producers of the movie "Downfall" recently staged a DMCA takedown of all of the bunker scene parodies on YouTube, including the one I posted on this blog. Brad Templeton, well-known activist, has posted a response. You can view it here.

Further Reading

Thursday, April 15, 2010

stat

I was going to write the next installment in my New Features In Bash Version 4.x series today, but in thinking about the examples I want to use, I thought I should talk about the stat command first.

We're all familiar with ls. It's the first command that most people learn. Using ls you can get a lot of information about a file:

bshotts@twin7:~$ ls -l .bashrc
-rw-r--r-- 1 bshotts bshotts 3800 2010-03-25 13:18 .bashrc

Very handy. But there is one problem with ls; it's output is not very script friendly. Commands like cut cannot easily separate the fields (though awk can, but we're not talking about that yet). Wouldn't it be great if there was a command that let you get file information in a more flexible way?

Fortunately there is such a command. It's called stat. The name "stat" derives from the word status. The stat command shows the status of a file or file system. In it's basic form, it works like this:

bshotts@twin7:~$ stat .bashrc
File: `.bashrc'
Size: 3800         Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d   Inode: 524890      Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ bshotts)   Gid: ( 1000/ bshotts)
Access: 2010-04-15 08:46:22.292601436 -0400
Modify: 2010-03-25 13:18:09.621972000 -0400
Change: 2010-03-27 08:41:31.024116233 -0400

As we can see, when given the name of a file (more than one may be specified), stat displays everything the system knows about the file short of examining its contents. We see the file name, its size including the number of blocks it's using and the size of the blocks used on the device. The attribute information includes the owner and group IDs, and the permission attributes in both symbolic and octal format. Finally we see the access (when the file was last read), modify (when the file was last written), and change (when the file attributes were last changed) times for the file.

Using the -f option, we can examine file systems as well:

bshotts@twin7:~$ stat -f /
File: "/"
    ID: 9e38fe0b56e0096d Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 18429754   Free: 10441154   Available: 9504962
Inodes: Total: 4685824    Free: 4401092

Clearly stat delivers the goods when it comes to file information, but what about that output format? I can't think of anything worse to deal with from a script writer's point-of-view (actually I can, but let's not go there!).

Here's where the beauty of stat starts to shine through. The output is completely customizable. stat supports printf-like format specifiers. Here is an example extracting just the name, size, and octal file permissions:

bshotts@twin7:~$ stat -c "%n %s %a" .bashrc
.bashrc 3800 644

The -c option provides basic formatting capabilities, while the --printf option can do even more by interpreting backslash escape sequences:

bshotts@twin7:~$ stat --printf="%n\t%s\t%a\n" .bashrc
.bashrc   3800   644

Using this format, we can produce tab-delimited output, perfect for processing by the cut command. Each of the fields in the stat output is available for formatting. See the stat man page for the complete list.

Further Reading

The stat man page

The Linux Command Line:

Chapter 10 (file attributes and permissions)
Chapter 21 (cut command)
Chapter 22 (printf command)

A Wikipedia article on the stat() Unix system call from which the stat command is derived:

http://en.wikipedia.org/wiki/Stat_(Unix)

Tuesday, April 13, 2010

New Features In Bash Version 4.x - Part 3

In this installment, we are going to look at a couple of commands that have been updated in bash 4.x

read Enhancements

The read command has gotten several small improvements. There is one however that I thought was a real standout. You can now provide a default value that will be accepted if the user presses the Enter key. The new -i option is followed by a string containing the default text. Note that for this option to work, the -e option (which enables the readline library) must also be specified. Here is an example script:

#!/bin/bash

# read4: demo new read command feature

read -e -p "What is your user name? " -i $USER
echo "You answered: '$REPLY'"

When the script is executed, the user is prompted to answer, but a default value is supplied which may be edited if desired:

bshotts@twin7:~$ read4
What is your user name? bshotts
You answered: 'bshotts'

case Improvements

The case compound command has been made more flexible. As you may recall, case performs a multiple choice test on a string. In versions of bash prior to 4.x, case allowed only one action to be performed on a successful match. After a successful match, the command would terminate. Here we see a script that tests a character:

#!/bin/bash

# case4-1: test a character

read -n 1 -p "Type a character > "
echo
case $REPLY in
   [[:upper:]])   echo "'$REPLY' is upper case." ;;
   [[:lower:]])   echo "'$REPLY' is lower case." ;;
   [[:alpha:]])   echo "'$REPLY' is alphabetic." ;;
   [[:digit:]])   echo "'$REPLY' is a digit." ;;
   [[:graph:]])   echo "'$REPLY' is a visible character." ;;
   [[:punct:]])   echo "'$REPLY' is a punctuation symbol." ;;
   [[:space:]])   echo "'$REPLY' is a whitespace character." ;;
   [[:xdigit:]])   echo "'$REPLY' is a hexadecimal digit." ;;
esac

Running this script produces this:

bshotts@twin7:~$ case4-1
Type a character > a
'a' is lower case.

The script works for the most part, but fails if a character matches more than one of the POSIX characters classes. For example the character "a" is both lower case and alphabetic, as well as a hexadecimal digit. In bash prior to 4.x there was no way for case to match more than one test. In bash 4.x however, we can do this:

#!/bin/bash

# case4-2: test a character

read -n 1 -p "Type a character > "
echo
case $REPLY in
   [[:upper:]])   echo "'$REPLY' is upper case." ;;&
   [[:lower:]])   echo "'$REPLY' is lower case." ;;&
   [[:alpha:]])   echo "'$REPLY' is alphabetic." ;;&
   [[:digit:]])   echo "'$REPLY' is a digit." ;;&
   [[:graph:]])   echo "'$REPLY' is a visible character." ;;&
   [[:punct:]])   echo "'$REPLY' is a punctuation symbol." ;;&
   [[:space:]])   echo "'$REPLY' is a whitespace character." ;;&
   [[:xdigit:]])   echo "'$REPLY' is a hexadecimal digit." ;;&
esac

Now when we run the script, we get this:

bshotts@twin7:~$ case4-2
Type a character > a
'a' is lower case.
'a' is alphabetic.
'a' is a visible character.
'a' is a hexadecimal digit.

The addition of the ";;&" syntax allows case to continue on to the next test rather than simply terminating. There is also a ";&" syntax which permits case to continue on to the next action regardless of the outcome of the next test.

Further Reading

The bash man page:

The Compound Commands subsection of the SHELL GRAMMAR section.
The SHELL BUILTIN COMMANDS section.

The Bash Hackers Wiki:

The Bash Reference Manual

Other installments in this series: 1 2 3 4

LinuxCommand.org: Tips, News And Rants

Thursday, June 10, 2010

New Discount Offer For The Linux Command Line

Thursday, June 3, 2010

My Top 5 Bash Resources

Tuesday, June 1, 2010

Using Configuration Files With Shell Scripts

Thursday, May 27, 2010

Will The iPad Kill The Netbook?

Monday, May 24, 2010

Site News: 20,000 Downloads And New Series Navigation

Tuesday, May 18, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 5

Saturday, May 15, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 4a

Tuesday, May 11, 2010

Project: Getting Ready For Ubuntu 10.04 - Part 4

Friday, April 30, 2010

The Bugs In Ubuntu 10.04

Thursday, April 29, 2010

Ubuntu 10.04 Has Been Released.

Tuesday, April 27, 2010

The Financial Physics of Free Software

Thursday, April 22, 2010

New Features In Bash Version 4.x - Part 4

Ubuntu 10.04 RC Has Been Released

Wednesday, April 21, 2010

Hitler, as Downfall producer, orders a DMCA takedown

Thursday, April 15, 2010

stat

Tuesday, April 13, 2010

New Features In Bash Version 4.x - Part 3

Search This Blog