Handling large disks
The example used in this text utilizes a file system on a floppy disk. What happens when you are dealing with larger hard disks? When you create an image of a disk drive with theddcommand there are a number of components to the image. These components can include a boot sector, partition table, and the various partitions (if defined).
When you attempt to mounta larger image with the loop device, you find that themountcommand is unable to find the file system on the disk. This is becausemountdoes not know how to “recognize” the partition table. The easy way around this (although it is not very efficient for large disks) would be to create separate images for each disk partition that you want to analyze. For a simple hard drive with a single large partition, you would create two images.
Assuming your suspect disk is attached as the master device on the secondary IDE channel:
dd if=/dev/hdc of=image.disk bs=4096(gets the entire disk)dd if=/dev/hdc1 of=image.part1 bs=4096(gets the first partition)
The first command gets you a full image of the entire disk for backup purposes, including the boot record and partition table. The second command gets you the partition. The resulting image from the second command can be mounted via the loop device.
Note that although both of the above disks will contain the same file system with the same data, the sha1sumswill obviously not match.
One method for handling larger disks (mounting the image with the loop device) is to send themountcommand a message to skip trying to mount the first 63 sectors of the image. These sectors are used to contain information (like the MBR) that is not part of a normal data partition. We know that each sector is 512 bytes, and that there are 63 of them. This gives usan offset of 32256 bytes from the start of our image to the first partition we want to mount. This is then passed to themountcommand as an option:
mount –t vfat –o ro,noexec,loop,offset=32256 image.disk /mnt/analysisThis effectively “jumps over” the first 63 sectors of the image and goes straight to the “boot sector” of the first partition, allowing themountcommand to work properly.
You could also use NASA’s enhanced loopback driver, which we will discuss a little later.
When you are dealing with larger disks (over 2GB), you must also concern yourself with the size of your image files. If your Linux distribution relies on the 2.2.x kernel then you will encounter a file size limit of 2GB (on x86 systems). The Linux 2.4.xkernel solves this problem. You can either compile the 2.4.x kernel on your current system, or use a distribution that includes the 2.4.x kernel in its default installation. Just about any distribution from anytime this century (!) will have the 2.4 kernel.
Now that we know about the issues surrounding creating large images from whole disks, what do we do if we run into an error? Suppose you are creating a disk image withddand the command exits halfway through the process with a read error? We can instructddto_attempt_to read past the errors using theconv=noerroroption. In basic terms, this is telling theddcommand to ignore the errors that it finds, and attempt to read past them. When we specify thenoerroroption it is a good idea to include thesyncoption along with it. This will “pad” theddoutput wherever errors are found and ensure that the output will be “synchronized” with the original disk. This may allow file system access and file recovery where errors are not fatal. The command will look something like:
dd if=/dev/hdx of=image.disk1 conv=noerror,sync
In addition to the structure of the images and the issues of image sizes, we also have to be concerned with memory usage and our tools. You might find thatgrep, when usedas illustrated in our floppy analysis example, might not work as expected with larger images and could exit with an error similar to:
grep: memory exhausted
The most apparent cause for this is thatgrepdoes its searches line by line. When you are “grepping” a large disk image, you might find that you have a huge number of bytes to read through beforegrepcomes across a newline character. What ifgrephad to read 200MB of data before coming across a newline? It would “exhaust” itself (the input buffer fills up).
What if we could force-feedgrepsome newlines? In our example analysis we are “grepping” for text. We are not concerned with non-text characters at all. If we could take the input stream togrepand change the non-text characters to newlines, grep would have no problem. Note that changing the input stream togrepdoes_not_change the image itself. Also, remember that we are still looking for a byte offset. Luckily, the character sizesremain the same, and so the offset does not change as we feed newlines into the stream (simply replacing one “character” with another).
Let’s say we want to take all of the control characters streaming intogrepfrom the disk image and change them to newlines. We can use the_translate_command,tr, to accomplish this. Check outman trfor more information about this powerful command:
tr ‘[:cntrl:]’ ‘\n’ < image.disk1 | grep -abif searchlist.txt > hits.txt
This command would read: “Translate all the characters contained in the set ofcontrol characters ([:cntrl:])_to_newlines (\n). Take the input totrfromimage.disk1_and pipe the output togrep,sending the results to_hits.txt. This effectivley changes the stream before it gets togrep.
This is only one of many possible problems you could come across. My point here is that when issues such as these arise, you need to be familiar enough with the tools Linux provides to be able to understandwhysuch errors might have been produced, and how you can get around them. Remember, the shell tools and the GNU software that accompany a Linux distribution are extrememly powerful, and are capable of tackling nearly any task. Where the standard shell fails, you might look at_perl_or_python_as options. These subjects are outside of the scope of the current presentation, but are introduced as fodder for further experimentation.