Decompiling SWF flash file and extracting the closed captions text content

Flash was designed in the mid 1990s to address the increasing need for video and interactivity on the web. It had serious design flaws, one of them being the lack of accessibility options. Flash content was not designed to be accessible to screen-readers, rendering it useless for people with disabilities. Even today, most of the flash content is not indexed by search engines. How often do you find Google linking directly to an SWF file? I have not seen that happen yet. Macromedia did attempt to improve accessibility by giving an option for publishers to add closed captions to videos. Sorry, but that is not a remedy, as I am still unable to control how the content is displayed. The definition of web accessibility used by Adobe Flash does not align with what is considered accessible by W3C. I honestly think that there is no-one at Adobe that actually understands the concept.

In my personal experience, the best way to make flash content accessible is to hack it apart with a decompiler tool. Luckily we have a few to choose from. Next, I will describe the process for extracting closed captions and all other text content from SWF flash files. Using a decompiler is our only option since Macromedia/Adobe did not bother to make text easily available to copy and manipulate.

  • Swftools (http://www.swftools.org) - I have built it from source by doing git clone git://git.swftools.org/swftools, and then compiled using the standard ./configure && make && make install routine. One of the tools offered in the package is called "swfstrings". I did not have any luck with it, as all the output was generated vertically with one letter on each line. Another tool called "swfdump" turned out to be more useful. I ran swfdump -a foo.swf > foo.txt command to retrieve the contents of SWF. The resulting text file did contain all of the closed captions and text fields, along with tons of other stuff that I did not want.
  • Flasm (http://flasm.sourceforge.net) - I have downloaded the version 1.62 binary and was able to decompile the contents of SWF by executing flasm -d foo.swf > foo.txt. The resulting text file had all the closed captions surrounded by a ton of other garbage. At least all the text was easily distinguishable. Flasm is probably the easiest tool to use for extracting closed captions to a text file.
  • Swfdec (http://swfdec.freedesktop.org) - I have compiled the "swfdec-0.9.2.tar.gz" source code. The only tool that was somewhat useful was called "dump". I executed /swfdec/tools/.libs/dump foo.swf > foo.txt, but the resulting file did not have any closed captions. It had the text content of the explicitly defined text fields. Not useful for extracting every single line of text contained in the SWF flash file.
  • Under Linux, most of the decompilers generated reasonable outcome. The text content was extracted, but it was not convenient to separate it from the other Action Script generated output. I still needed to look through 8mb output file to find the actual text, but it was not too hard as the text paragraphs could be easily spotted. The difficulty of sorting through decompiled text files served as a motivation to see if the flash decompilers under Windows would perform any better. Here is what I found:

  • Sothink SWF Decompiler - I have used this decompiler before, and it proved to be reliable with simple animations. This time however it complained that "The swf file foo.swf is corrupted".
  • Eltima Trillix Flash Decompiler - Same as above: "Failed to open foo.swf. File is corrupted".
  • SWF Decompiler Magic - Opens the file just fine, but only offers to convert to exe, then crashes during the conversion.
  • Sonne Flash Decompiler - Opens fine and decompiles some actionscript. No text, movies or sounds were extractable. No errors generated.
  • Metrix Flash Decompiler Gold - Got almost exactly the same result as with Sonne. Maybe a little more actionscript but no text or media files.
  • Action Script Viewer by Manitu - Generated two warnings: "More than 5mb memory will be allocated. SWF signature not found". Then failed to display anything.
  • To make the comparison of SWF decompilers fair, I have used the same set of SWF presentation movies. These files were 30 to 80 mb in size, each containing a 30min to 1 hour presentation supplemented by closed captioned text. All files played fine under Linux and Windows using official Adobe flash player.

    Back to main index