sed | 0x0000

BASH script to add GUI to cp

21 07 2011

Hi folks!

Got an interesting BASH script for you this time. I regularly use a BASH script to clone a HDD using cp, and other than perhaps using “-v”, I couldn’t think of a way to give it a GUI. A little research and thought into it came up with the following result:

#!/bin/bash SRC="/mnt" TGT="/mnt2" echo "Copying $SRC to $TGT" [[ -w /tmp/fifo ]] && rm /tmp/fifo mkfifo /tmp/fifo exec 5<>/tmp/fifo (cat <&5 | dialog --guage "Copying OS and data files from source to target HDD" 7 30) & CPPID=$! TOTAL=$(df | grep $SRC | awk '{print $3}' | tr '\n' '+' | sed 's/.$/\n/' | bc) while true; do COPIED=$(df | grep $TGT | awk '{print $3}' | tr '\n' '+' | sed 's/.$/\n/' | bc) echo 100*$COPIED/$TOTAL | bc >&5 if [ ! "$(ps aux | awk '{print $2}' | grep $CPPID)" ]; then break fi sleep 10 done 5<&- reset

Apologies for the truly horrible formatting. WordPress’s code tags aren’t the best! Now I’ll do my best to explain each stage…

First up, the script sets variables for the locations to copy from and to. These must be mounted volumes in this example, as we use `df` to detect them. This wouldn’t be too difficult to convert to use some other method of working out the size, such as `du`, if you aren’t using a mounted volume.

We then create a pipe called fifo in /tmp. This is how we will feed data to `dialog`.

Now we connect file descriptor 5 to the pipe, to facilitate the connection to our ncurses dialog, then pipe FD 5 into the `dialog` program, yet again forked into the background.

Next, we find the total size of the source partition, and set it to the variable $TOTAL.

Here we set the copy process off, and fork it into the background, so the script will jump through to the next section, which regularly updates our instance of `dialog`.

Here’s the interesting bit… We run a while true loop, which every 10 seconds, checks the cumulative size of the target partition ($COPIED), and compares it to $TOTAL by converting it to a percentage, using bc. This number is then fed into the pipe, which we have connected to dialog. Each loop, we also check to see if the copy process is still running, and if not, we break out of the loop.

To clean up, we close the file descriptor (5<&-), and use `reset`, as dialog has a nasty habit of breaking the terminal, and `clear` doesn't fix it.

As always, if you have any questions, tips, improvements or corrections, just direct them straight to the comments!

n00b

Comments : 1 Comment »
Tags: awk, BASH, bc, cp, df, egrep, grep, gui, regex, script, sed, tr
Categories : Uncategorized

Wget an entire FTP folder from it’s index (RegExp Introduction)

4 12 2009

Hi, folks!

Just a couple of hours ago I was trying to download all the files in a folder on the OSUOSL FTP Slackware mirror with wget, and all I kept getting was the index.html file from the page, so I decided to write a little script to download any file linked in the index. I’m sure there are tools which can do this far more succinctly, but I thought this would be a good way to begin to explain the incredibly useful nature of regular expressions. Here’s how my script turned out…

for i in $(wget ftp://ftp.mirrorservice.org/sites/ftp.slackware.com/pub/slackware/slackware-13.0/source/a/tar/ -O - | grep "ftp://" | sed 's/^.*href=\"//g' | sed 's/\".*$//g'); do wget $i; done

Now let me break it down… The first command I wanted was one to download the index.html file and extract the necessary link data from it’s content. To download a file and then stream it’s contents to another command, use the wget syntax:

wget {URL} -O - | {COMMAND}

First of all I piped the file’s contents to grep, which ignores any line which does not contain the phrase “ftp://”. This will ensures that we are only working with lines which contain a hyperlink to a file, ignoring all the extraneous HTML tags. The next process was to remove any of the surrounding HTML from the links. A link in an HTML document will always be preceeded by <a href=". To remove this part, I used sed. There are other tools which would work in a similar manner, but I find sed to be a great way of learning regular expressions, and I find it's syntax to be very easy to understand. The command to remove anything up to and including the href=" is as follows:

sed 's/^.*href=\"//g'

To anybody who doesn’t understand regular expression syntax, this looks like a jumble of characters. I’ll explain briefly how it works… Sed’s syntax for basic search and replace is as follows:

sed 's/{REG EXP OR TEXT TO SEARCH FOR}/{TEXT TO REPLACE WITH/g'

The regexp to match in our example is ^.*href=\”

^ means “From the beginning of the line”.
.* is a wildcard, denoting absolutely any sequence of characters.
href=\” describes the exact text string we want to have as the final characters in the match. The \ is an escape character to force the ” to be treated as a character.

In our command, the second part of the sed command is empty. This means that any text which matches the regexp will just be removed. The regexp will match from the beginning of any line which contains href=” up to the “.

Now each line processed will read something along the lines of:

ftp://ftp.mirrorservice.org:21/sites/ftp.slackware.com/pub/slackware/slackware-13.0/source/a/tar/rmt.8.gz">rmt.8.gz (2429 bytes)

The second use of sed will remove anything occurring after the filename. It reads like so:

sed 's/\".*$//g'

This is used in the same way as the previous use of sed. The regexp to match is \”.*$.

$ means “End of line”, so this matches everything from the first occurrence of ” up to the end of each line. The output should now be nothing but a list of links. The final part is to wrap the output in a loop, and hand each line to wget.

Anyway, I hope this has been informative, and I’ll no doubt post some more soon!

n00b