For more details, and talks in past semesters, consult the full schedule of talks.
Past topics can (and should) be repeated occasionally. In addition, here are some topics people might like to hear about:
To give a talk, please contact swig@math.arizona.edu.
You will certainly want to map out some sort of backup strategy since accidents and malfunctions can happen at any time --- usually the most inconvenient time. Accidents can include anything from various forms of operator error like deleting or overwriting an important file to accidentally turning off the power strip under your desk with your big toe. Malfunctions include disk failure, smoke coming out of the chassis, and so on.
In any case, life is a lot less stressful if you can simply get the data back by tossing in your last backup and running your restore program.
In this talk, I'll discuss two kinds of backup hardware: tape drives and removable-disk drives.
I'll use my home system as an example: I have 4 GB of disk space, 2.5 GB of which is Linux (the rest for Win95 --- games). I do not back up the Win95 partition at all, but I do back up "everything" on all of my Linux partitions. This includes the root filesystem, /usr/local, user account area, and scratch area.
If you can back up "everything", I'd suggest that you do. The best reason is that there is no question as to what's on the backup: if the file existed at the time of the backup, it is definitely on the backup media. At a minimum, I suggest that you do backups on a partition by partition basis: something like "I back up drives C: and D:, but not E: and F:." You want to avoid backing up bits and pieces from all over the place; odds are that, unless you write it all down in great detail, you'll eventually make a mistake and lose something important. In other words, a simple backup plan is a good backup plan.
Next you'll want to decide how often to run your backups. Here you'll need to strike a balance between fear and convenience. If you back everything up every day, you'll probably never lose anything. On the other hand, you'll also spend several hours a day running the backups, which is annoying. If you back things up once a month, you'll only have to deal with it every 4 weeks, but your risk of data loss is higher. In other words, you're flying by seat of your pants for 4 weeks before the files you're working on are safely archived.
I back up my home system every two weeks. This is primarily due to the fact that I have a slow and noisy tape drive (HP Colorado T3000). My backup takes about 2.5 hours for the 1.1 GB worth of data on my Linux partitions (or about 7.5 MB/min). The tape drive makes a high whirr, and since the machine sits in my bedroom, I can't sleep with a backup running. Every other Sunday morning, I drop in a tape and let the thing run.
Now, if I were to finish the backup at noon on Sunday, work for the next 72 hours, and experience a disk crash, I'd be out of luck. There are at least two strategies for dealing with this possibility, both of which require discipline (aka fear). If you're working on a small number of files between backups (Chapter7.tex, for example), you can put these files on a floppy every night. Second, if you have a modem, you can FTP your critical files to other UNIX accounts for storage -- this is how I cope with it.
Finally, you'll want to think about how many sets of backup media to use. Relying on a single tape is not a very good idea -- each new backup erases the old one. If a backup fails for some reason and your disk croaks, you're left with nothing. Also, if you delete a file between backups and don't notice that it's gone until after the second backup, you're out of luck.
I use three tapes at home. The first two are used on a round-robin basis for regular backups every other Sunday. This means that, at worst, a single tape failure loses me a month's work. It also means that I can retrieve files up to a month old. The third tape is for long-term purposes: I run a full backup on it every six months or so. If the first two "regular" tapes crap out for some reason, there's always this one. Losing 6 months' work is better than losing everything.
In terms of strategy, you should always assume that any component of your machine can fail at any time (including your backup system!) -- and plan accordingly. If your tape drive dies, can you get a new one (ie, how long until it becomes obsolete)? What are the chances of being able to borrow someone else's drive to restore your system (ie, are a lot of other people using the same hardware as you)?
If you have some data that is really important to you, the safest thing to do is to leave copies of it on as many (backed up) systems as possible, the idea being that multiple simultaneous failures are highly unlikely.
Finally, a word of warning: it is not a good idea to use your machine while a backup is running. Depending on your backup software, you may get unpredictable results if you create/modify/delete files while the backup program is running. For maximum reliability, the best bet is to consider backups "down time."
Tape drives suitable for home systems cost anywhere from $200 to about $750 and can store anywhere from 3-10 GB per tape. Transfer rates for these drives range from 5-66 MB/min. At 5 MB/min, it takes about 3.5 hours to back up 1 GB of data. At 66 MB/min, this figure becomes about 16 minutes. Clearly, the amount of data you want to back up affects the drive you'll need. If you have two 4 GB disks (a reasonable configuration, these days), it will take over 24 hours to spin them off at 5 MB/min versus a little over 2 hours at 66 MB/min.
Here is an overview of some commonly available tape media:
| Media | Description |
| 8mm Exabyte | This is basically a standard 8mm video tape. You can purchase more expensive "data quality" 8mm tapes, but the usual Sony, Fuji, etc. video tapes that you can buy at the local drugstore work fine. |
| 4mm DAT | This is the same DAT tape used in digital audio equipment. DAT tapes are more expensive than their 8mm counterparts, but they are smaller and generally have faster search access times. |
| QIC/Travan | QIC/Travan (Quarter Inch Cartridge) tapes are used in most low cost PC tape drives. Although the drives are cheap, the tapes are more expensive than 4mm or 8mm tapes. For home use, however, the extra cost of the media is more than offset by the low cost of the drive. |
| Proprietary | Most other tapes can be thought of as "proprietary"; in other words, the manufacturer of the tape drive also makes the tapes. The HP Colorado 5GB tape falls into this category. Among other things, you'll want to make sure that the company you buy one of these from isn't going to go out of business any time soon -- and is committed to manufacturing the tapes into the future. |
When buying a tape, you need to be aware of the interface used by the drive. For example, most 4mm and 8mm drives are SCSI. If your machine does not come equipped with a SCSI bus, you'll need to purchase a SCSI card, and (if the drive is external) a SCSI cable.
PC SCSI cards normally plug into either the ISA or PCI bus. The PCI cards are much faster and are therefore desirable if you plan to add lots of SCSI devices. The ISA cards are a little less expensive and significantly slower than their PCI counterparts. SCSI busses come in several flavors: narrow/wide + single-ended/differential + fast/ultra. If you plan to use SCSI disks at some point, you'll probably want to get an Ultra-SCSI capable controller. Since most tape drives are slow (compared to disks), you can usually get by with fast narrow single-ended SCSI if you don't care about SCSI disks.
If you have to buy a SCSI cable, be sure and buy a good one! This will cost you $60-$75, but the trouble free operation you'll enjoy will make the expense worth it. A cheap SCSI cable or terminator can cause all sorts of bizarre and hard-to-find problems -- and you'll probably end up buying a good cable and terminator anyway.
The QIC drives with which I am familiar use one of 3 interfaces: your existing floppy controller (HP Colorado T3000), a dedicated ISA card (T3000 again), or your existing IDE controller (HP Colorado 5GB). The thing to remember here is that you can only have 2 floppy devices and 4 IDE devices on most machines -- remember that many newer CDROM drives also use the IDE bus.
Finally, a word on tape capacities: you will often see them listed as "4.0/8.0 GB" or "4 GB native, 8 GB compressed." The native figure is the amount of data that the tape will actually hold. The other figure is the expected amount of data you can put on the tape after compressing the data stream. You can buy drives with compression implemented in hardware, but they are usually considerably more expensive than those without. This means that your drive will have to implement data compression in software. In other words, achieving the "compressed" figure is a function of your backup software.
With data compression, there are no guarantees. The tape manufacturers assume that you'll get an average 2:1 compression ratio. If your disk contains lots of text, you'll get more compression. If you've already compressed your files with gzip, odds are that the "compressed" data stream will actually be larger than the uncompressed gzip files! Over the long haul, though, the 2:1 figure is pretty close to what you should get. On my home system, I get about 1.5:1 -- primarily because I have a partition that contains 300 MB of compressed tar files.
The ZIP drive is basically a "super floppy" -- it holds 100 MB of data (or 71 3.5" floppies) on a disk about the size of two floppies stacked on top of one another. The disks cost $20 singly and a 10-pack is $100. The drive itself costs under $100 and is available for internal IDE and SCSI, and external SCSI and parallel port models.
DO NOT buy a parallel port ZIP for your PC, especially if you have a printer. It won't work well with either Linux or Windows. The problem is that the printer tends to try to interpret the ZIP's SCSI commands and the poor ZIP drive tries to interpret the printer's PostScript. It's a real mess. If you must purchase one of these beasts, at least get the "ZIP plus" because it works on both SCSI and parallel ports. If you want to run Linux, you should avoid the internal IDE model, too.
The internal and external SCSI models have worked well in almost every case. For $150, you can get an internal SCSI ZIP, ZIP Zoom ISA SCSI card, and a ZIP disk -- the external model costs a little more (cost of a wall brick) -- not bad.
ZIP disks are fairly fast, but they suffer from one serious limitation: they do not hold enough information to back up today's hard disk drives. You will still need 40 ZIP disks to back up a 4 GB IDE disk; in other words, you'll probably want to have a disk for each major project on your machine and forego backing up the base OS. 100 MB will hold a lot of TeX/C/Matlab code and data.
If you need more than 100 MB of storage, the JAZ drive is a good way to go. JAZ disks hold 1 GB of data (10 ZIP disks' or 700+ floppies' worth) and are only slightly larger than ZIP disks. The disks cost $100 and the JAZ drive (with 1 disk included) costs about $300. It is available in internal and external SCSI models -- there is no IDE/ATAPI model at this time.
The nice thing about the JAZ drive is that it is almost as fast as a standard IDE hard disk, which means that your backups run fast. On the down side, it consumes more power than a ZIP drive and so it is not as suitable for use with laptops.
If you get an external JAZ drive, never move the drive while it contains a disk! Doing so can ruin the disk, the drive, or (probably) both.
IOmega recently released the 2 GB JAZ drive. It is still fairly expensive, but it holds twice the data and is faster than its predecessor (it uses an Ultra-SCSI interface). It can also read and write the "regular" 1 GB JAZ disks, so you don't have to throw them away.
If you get a SCSI ZIP/JAZ drive, you should be aware that IOmega recommends their ISA ZIP Zoom and PCI JAZ Jet SCSI controllers. I haven't seen the JAZ Jet myself, but the ZIP Zoom is basically an Adaptec AHA-152x series SCSI controller with a special IOmega BIOS on it. Although most any SCSI controller will work with these drives, the IOmega BIOS can come in real handy because it allows you to low level format your disks. For instance, if you leave your JAZ disk on top of the TV and it gets totally scrambled, you'll need to reformat it in order to use it again. The same is true if you password protect the disk and forget the password. These SCSI cards are no more expensive than comparable ISA/PCI cards.
| Some manufacturers: |
| Model | Capacity (GB) | Xfer Rate (MB/Min) | Interface | Drive Cost | Media Cost | Media |
| Exabyte EXB-8700LT | 5.0/10.0 | 30-60 | SCSI | $700 | $6-10 | 8mm |
| Seagate STD280N | 4.0/8.0 | 33-66 | SCSI | $750 | $20 | 4mm DAT |
| HP Colorado 5GB | 2.5/5.0 | 40-55 | IDE/ATAPI | $200 | N/A | HP 5GB |
| IOmega JAZ 2GB | 2.0 | 300-450 | SCSI | $650 | N/A | 2 GB JAZ disk |
| HP Colorado T3000 | 1.6/3.2 | 5-8 | FDC | $200 | $45 | Travan TR-3 |
| IOmega JAZ | 1.0 | 200-300 | SCSI | $280 | $100 | JAZ disk |
| IOmega ZIP | 0.1 | 48-60 | SCSI | $150 | $10-20 | ZIP disk |
| Notes: |
|
Below, I'll talk about JAZ disks only. If you have a ZIP drive, just change "jaz" to "zip" everywhere and things should work.
Second, if you want all users to be able to partition and format JAZ disks, you need to open up permissions on your JAZ SCSI device (which I'll assume is sda -- it will be sda if the JAZ drive is your only SCSI device). Again, this is easy:
If you only want certain users to be able to perform these operations, you can add them to the "disk" group in /etc/group.
Third, you need to add an entry to /etc/fstab like the following:
/dev/sda4 /jaz ext2 defaults,user,exec,noauto 0 0
The first field is the device to mount: in this case, the
fourth partition of the first SCSI device (the JAZ disk). The
second field is the mount point we created earlier. The third field
is the filesystem type. The Linux filesystem is called "ext2", DOS
is "msdos", and Win95 is "vfat". I strongly recommend that you use
the ext2fs from Linux. The fourth field specifies the mount options
-- "user" lets any user mount and unmount the filesystem, "exec"
lets people execute programs stored on the disk, and "noauto" tells
the system to not check or mount it at boot time. If you leave out
the "noauto", you won't be able to boot the system unless there is
a JAZ disk in the drive. This is because Linux checks all
filesystems at boot time, and figures that something is terribly
wrong if the disk simply isn't there. The last two fields are
unimportant in this case. If you want to be able to mount DOS
formatted JAZ disks as well, you can "mkdir /jaz-dos" and add the
following to /etc/fstab:
/dev/sda4 /jaz-dos msdos defaults,user,exec,noauto 0 0
alias scsi_hostadapter aha152x
options aha152x aha152x=0x140,11
alias block-major-8 sd_mod
The first line tells the kernel daemon, kerneld, that you
have an Adaptec 152x series SCSI controller. The second line tells
the Adaptec driver that your card is at IO base 0x140 and uses IRQ
11. You won't need this line if you have a PCI controller. The
third line tells kerneld to load the SCSI disk module when a
request for SCSI disk access comes in.
There are two ways to figure out the IO base and IRQ for your ISA controller. First, you can get out the manual, open the chassis, look at the jumpers, and figure it all out. Second, normally the SCSI BIOS displays this information for a second or two at boot time.
If you have a different SCSI controller, you'll need to change the first two lines. Again, either open the machine or watch the screen at boot time to figure out which card you have. After booting, you can get a list of the available drivers by typing:
You can determine your kernel version by typing:
If it's an ISA card you'll need to know (at a minimum) the IO base and IRQ. Have a look in /usr/src/linux/drivers/scsi/ to see how to pass this information along to the device driver (usually listed near the top of the kernel source code ;&). If it's a PCI card, you can usually omit the "options" line, but you'll still need the two "alias" lines.
Of course, you'll need to change the exact driver and options to correspond to your SCSI card.
If you get some "unresolved symbols" errors, you may need to try:
first. If this fails, have a look at /usr/src/linux/README and /usr/src/linux/Documentation/modules.txt, cross your fingers, and rebuild your kernel as suggested above.
Once you figure out which "insmod" commands you need to execute, add them to the end of /etc/rc.d/rc.local. Reboot and make sure that they're getting loaded correctly. You can get a list of loaded drivers by doing:
If you're using kerneld, the SCSI drivers should load, and you should end up at the fdisk prompt after a few seconds. If you get errors, you'll need to go back and alter your configuration until fdisk will start up. Use 'q' to quit fdisk, NOT 'w'.
After you get all of this set up comes the good news: you'll never have to mess with this stuff again!
Now that the partition table is set up, we can create a filesystem on the disk with:
While the disk is mounted, you can move files onto and off of the disk simply by copying them into and out of /jaz.
Note that in order to eject the disk (with the button on the drive), you must unmount it first. In order to unmount it, there must be no open files on the disk; in other words, all programs accessing the disk need to terminate and your shell's working directory cannot be anywhere below the disk's mount point.
I have never used a SCSI tape drive with Linux, so I can't give authoritative information on it. As far as I know, however, it's all the same at the user level. Regardless of your tape drive, you'll need to read the mt manpage. The mt program is a utility that lets you rewind, fast forward, and perform various other simple operations on the tape. To get the data onto the tape, you simply write it to the appropriate device file in /dev. SCSI tapes will be /dev/st0, /dev/nst0, /dev/st1, etc. Things are a little more complicated for QIC drives on the floppy controller. If you use the standard "ftape" driver, the devices will be /dev/rft0, /dev/nrft0, etc. If you download the zftape driver (which I personally like a lot), the devices are /dev/qft0 and /dev/nqft0, etc. If you want regular users to be able to access the tape drive, you'll need to open up permissions on the appropriate device files in /dev (see the JAZ disk section above).
What's the difference between st0 and nst0? The st0 device automatically rewinds the tape when the special "file" /dev/st0 is closed. The nts0 device does not rewind the tape. So, if you want to put multiple archives on a tape, you'll need to use the "n" devices.
The nice things about zftape are kernel-level blockwise data compression (see below) and the fact that it is more robust than the standard ftape driver. The downside is that you'll have to recompile your kernel and turn off support for the standard ftape driver in order to use zftape.
I don't know anything about using an IDE tape drive. There is kernel support for it and I assume it works just like the others.
In what follows, I'll assume that you will either dump your data stream into a tape device or into a single file on your ZIP/JAZ disk.
There are two free, commonly available programs you can use to make backups: tar and dump. While tar is simple to use and understand, it is not the best choice for making backups, primarily because tar provides no mechanism for doing interactive data restoration. Therefore, I'll skip talking about tar.
Using dump is fairly simple. After you've positioned and otherwise set up the tape with mt, you can use a command like:
This command dumps the contents of the Linux filesystem /dev/hda2 into the gzip program (for compression) and finally places the output into the file backup1.gz on your JAZ disk. See the dump manpage for details on the arguments. You can also do something like:
if you don't want to back up the whole partition. Unfortunately, you can only back up one filesystem or directory tree per dump invocation. If you have multiple filesystems and are using a tape drive, you'll need to read up on mt to see how to put multiple archives on a single tape.
You'll need to be logged in as root to run dump because it needs direct access to the disk devices. Don't even think about making these mode 666. Adding yourself to the disk group in /etc/group is OK, though.
There are two other gotcha's with tapes. Most tape drives require that data into and out of the drive appear in quantized chunks called blocks. For example, the ftape driver uses 29 KB blocks and zftape uses 10 KB blocks. The drive's or driver's documentation ought to spell this out. Now, a gzipped dump to a tape looks something like:
The extra dd command carves up the compressed output into 10 KB chunks and presents them to the tape drive.
The other gotcha is that it isn't a great idea to compress tape data with gzip (though I did it for months and had no problems). Tapes are a bit more prone to errors than disks. If the data is compressed and even a single bit gets flipped on the tape, you will be unable to decompress the data and recover your files for the remainder of that tape record. This is the primary reason that I like zftape: each 10 KB input block is compressed separately (in the Linux kernel) and written to tape. If a bit gets flipped, you lose a single block instead of the rest of the tape!
If you have a ZIP/JAZ drive, these gotcha's do not apply.
Backups are not very impressive. Restoring data from a backup is very impressive. Here's how to do it from a JAZ backup:
and a tape:
When you run the restore program, you'll be presented with a prompt. Typing "?" or "help" gives you a list of options. First, you can "ls" to see what's in the current directory (on the tape) and use "cd" to change directories. Too mark a file for restoration, type "add filename". To restore a directory and all of its contents (recursively), type "add dirname". When you are done selecting files and directories and are ready to restore the files, type "extract".
NOTE: the extracted files will be placed in the current directory. The safest place to restore into is /tmp or somesuch -- this way, you don't erase anything in your account, etc. After the extract completes, you can copy the files back into your account after verifying that they are actually the files you want (I obliterated a week's worth of programming once by accidentally restoring old files on top of the ones I'd been working on!).
If you want to run backups automatically at given intervals, you can use the cron facility. The crontab manpage gives all the details, but here's an example. Log in as root and type "crontab -e". You'll be popped into vi or emacs. Then you'll want to add a line like:
Quit the editor and type "crontab -l" to verify what you entered. The line you added will run dump at 1:00 AM every Sunday.