mathjax + gtag

Saturday, September 5, 2015

A XSVF Assembler/Disassembler in python

Introduction


As a sequence of my last project, the JTAG/XSVF library for Arduino, I felt I needed a XSVF assembler and disassembler, so that I could hack JTAG a little bit. I found that XSVF is very convenient, much more than SVF when you are dealing with a single component in the JTAG chain. I also found out that the XSVF files produced by programmers are very inefficient and full of unnecessary stuf. It would be great if I could write my own XSVF files.

The problem is that XSVF is a binary format. At first I started editing in binary with a program called ghex, but this is far from confortable. It is easy to get lost and almost impossible to maintain. So I decided to write an assembler.

The code is here on github.

The story


In fact, the disassembler was practically ready. Having written the XSVF player in a modular way, the disassembler was just another instance of the same code. Instead of playing XSVF, I would disassemble XSVF. Piece of cake.

The XSVF player was in C++ and even though I could resuse the code, I'd much rather rewrite it in python. Python is about two orders of magnitude more productive than C/C++, even with all the years I have coding in those languages.

There was also an issue with one XSVF instruction, namely XSDRINC. This is a problematic instruction and has been obsoleted. Xilinx's IMPACT software does not generate it anymore. This instruction has not been implemented in the JTAG library, the reason beeing that to support it I would need to start executing the instruction before finishing decoding it, or use an unacceptable amount of memory for a microcontroller (Arduino). It can certainly be done, but was slightly against the original philosophy of the code. Maybe I'll do it later, but since it was an obsolete instruction that I would not be able to test, I decided to skip it in order to quickly have a working XSVF player.

I have basically translated the C++ code into python and the disassembler was ready, really no big deal. That code could now even be reused to write a XSVF player in python :). Anyway, hacking JTAG was the original motivation, so while the disassembler was a necessary tool during debugging, now I needed the assembler to start writting my own code in a maintainable way. Since I was now using python, I decided to implement XSDRINC, because I had not the same memory limitations I had in the Arduino.

And it turns out that using python was indeed a good choice for other reasons.

Pyparsing


The first problem you face when you write a compiler or interpreter for a language is scanning and parsing. Scanning is classifying the original input in a sequence of valid tokens. Parsing is the syntatical analysis that validates that sequence of tokens. You need to check if the sentences are properly constructed according to a certain grammar.

It turns out that programming a scanner and a parser even for an extremelly simple language like XSVF is a nightmare of details. Also, on the documenting side, the code is very distant from the actual grammar definition. Lets call python to the rescue...

Pyparsing is a fantastic module that makes it very easy to program a scanner/parser in python. With the added bonus that you can use the python syntax to define a grammar that looks like BNF (Bakus-Naur Form), so close that it makes it unnecessary to document it in a separate BNF doc.

In about 2 hours, I was able to learn how to use pyparsing and write my working XSVF parser. The code examples are very good, and reading the library code resolved some subtle issues.

I had to struggle a little with some of my original ideas of supporting bytes in hexadecimal and binary. Specifying these in the language grammar was not that obvious, there was a subtle "order of matching" problem I had not seen comming. I guess I had in my mind a much straighter separation between scanning and parsing, therefore I missed well defining the language tokens. Anyway, it payed back to adapt my mind to the pyparsing way.

The language


Since I had the assembler and the disassembler, I wrote the following code to test every instruction:

As you can see, I tried not to clutter the syntax while keeping it readable. There is a nice old school assembly style command (the semi-colon) that comments everything until the end of the line. Byte sequences can be written in binary or hexadecimal, without the overhead of using a prefix like "0x". These sequences are usually very large because they are normally used in programming or boundary scan. Beeing able to mix hexadecimal, binary and comments is a good way to keep it readable.

The final test was to run the sequence assembler -> disassembler -> assembler and compare the two assembled files. I did it with a SHA1 hash, and after some debugging, they compared ok for the test file.

$ cat asm_disasm_test.sh
#! /bin/bash
./XSVFAssembler.py > test.xsvf
./xsvf -c disasm -n test.xsvf > test.xsvf.s
./xsvf -c asm test.xsvf.s > test2.xsvf
sha1sum test*.xsvf

$ ./asm_disasm_test.sh
b1fcb2845c934c622d9b8ffff857d08c9542c8b4 test2.xsvf
b1fcb2845c934c622d9b8ffff857d08c9542c8b4 test.xsvf

Conclusion


Now I have a fully working assembler/disassembler for XSVF. That opens some doors for JTAG hacking and boundary scan testing. Let's see where it leads. If you decide to try it, please share your comments.

References


  1. JTAG/XSVF library for Arduino
  2. Bakus-Naur Form
  3. Pyparsing web page