How to download Flash 10.2 video streams in Linux.

10 02 2011

Hey, people!

Just thought I’d post this little nugget of information, as it’s taken me a little while to work out how to do it. Before I start, I’ll mention that downloading copyrighted material may well be extremely illegal where you live, so make sure you only use this technique to download videos which contain your work entirely. This can be useful if you have, for example, lost your user details for a video upload site of some variety, and there is no facility to retrieve your videos without it.

Anyway, enough with the disclaimer, on with the hack…

As you may know, Flash Player used to store the temporary stream files in /tmp. They switched from this at the end of last year to storing them in the specific browser’s cache folder for reasons unbeknownst to the masses. This still made it pretty easy to locate any file you may have wished to download. After a recent update, I found that I was unable to locate the temporary caching folder anywhere.

My first step was to load the video in a browser, and check the output of the following command:
lsof | grep -i flash

This came out with a predictable, and very useful single line:
plugin-co 25646 n00b 17u REG 8,2 31286337 787220 /tmp/FlashXXepl6qa (deleted)

This showed me that there was a file descriptor open to a “deleted” file, /tmp/FlashXXepl6qa. I’m no programmer, so I have no idea how this works, but it seems that it’s adding chunks of data to this file descriptor (I imagine stored in RAM), while the file itself is technically nonexistent.

UPDATE (24/04/11):

Thanks to reader Raven Morris, I have some more information about what exactly is happening. The reason is that Linux does not have file locking, like Windows. Windows programs, when they open a file, will lock it, so no other programs can access it. In Linux, when a file is deleted, the operating system will keep track of it until there are no programs which have it open. Once the last program closes the last file descriptor to the file in question, the file becomes unrecoverable without relevant forensic tools, but until then, there is a record of it within the /proc/*/fd directory tree.

Anyway, the second field of the output tells us which process currently has the file descriptor open, and the fourth tells us which number the file descriptor has taken. This is all we need to access the file itself.

If you navigate to your /proc folder, you will see a bunch of folders all named numerically, including a folder which matches the number in the second field. Now navigate to this folder, then its subfolder “fd”. In this folder, you will see a whole selection of numbers. These relate to the file descriptors themselves. Run “ls -l” in this folder, and you will see that each of these numbers is linked to either pipes, sockets or files. Within this, the number from the fourth field will be symbolic linked to the /tmp/Flash* file we found before. To test that this is the right file, you can run it through mplayer or vlc (“mplayer filedescriptornumber”/”vlc filedescriptornumber”). If you’re having trouble finding the filename, try “ls -l | grep Flash”, as pointed out by reader Nobi.

Once the video is fully streamed, you can use a simple “cp” to copy the file from the file descriptor to a real location on your hard disk. (“cp filedescriptornumber ~/Videos/filename.flv”).

Another way to locate these files is to use the following command:
stat -c %N /proc/*/fd/* 2>&1|awk -F[\`\'] '/Flash/{print$2}'

I encourage you to play with the various sections of it to see how it works. If you’re having trouble getting it to work, make sure you have the apostrophes, backticks and spaces in the correct location.

Reader Robert submitted the following BASH alias, to automate the whole process. Here’s the script to insert into your bashrc:

cpflashvideo() { cd /proc/`lsof | awk ‘/Flash/&&/plugin-co/’ | awk //’{printf “%s”, $2}’`/fd/ && cp `ls -al | grep ‘\(deleted\)’ | awk ‘//{printf “%s”, $8}’` $* && cd – > /dev/null; }

UPDATE (13/07/11):

A couple of my friends were pondering the question of this, and came up with the following for those of you who have a few to many tabs open at one time… I’ve included a couple of versions, to show how this could be done using entirely awk, or a mixture of awk and sed.

This first one uses regexp matching in awk to ensure that the letter is stripped from the end of the fourth field:
for FILE in $(lsof -n | grep "Flash.*deleted" | awk '{printf "/proc/" $2 "/fd/"; sub(/[a-z]+/,"",$4); print $4}'); do
cp $FILE $(mktemp -u --suffix .flv $HOME/Videos/Video-XXX)

The next uses sed to strip off the last character of the fourth field:
for FILE in $(lsof -n|grep .tmp.Flash | awk '{print "/proc/" $2 "/fd/" $4}' | sed 's/.$//'); do
cp -v $FILE $(mktemp -u --suffix=.flv --tmpdir=$HOME/Videos/)

I hope this post is as interesting for others as I found it myself, and as always, direct any questions to me in the comments!

If you are a Spanish speaker, and have had trouble understanding this, user racsoprieto has translated the basics of the article here.