Monday, October 24, 2016

Why is ssh slow to connect?


There are several reasons why ssh may be getting slow to connect. I'll cover here the ones that happened to me.

On the server side

  1. Make sure you have a line "UseDNS no" uncommented.
  2. Make sure you have "good" (i.e., reachable) DNS servers in /etc/resolv.conf.
  3. Make sure you have no reverse DNS lines in /etc/hosts.deny.
The server side configuration is typically in the file /etc/ssh/sshd_config. Every time you make a change to this file, remember to restart the ssh daemon ("# systemctl restart sshd").

    Item number 3 deserves some comments, since it was the hardest to get right. Denyhosts may be adding lines like  "" that will trigger reverse DNS even if you have "UseDNS no" in the configuration file. You have to remove these lines, but if you have Denyhosts installed, maybe you already know how hard it is to remove entries from this file, since they keep reappearing. If you use OpenSUSE, you have a script called "/usr/sbin/dh_reenable" that will do the trick for you. If you are not using OpenSUSE or have installed Denyhosts by hand, then you have to do it manually, take a look at the Denyhosts FAQ here.

    I still don't have a good solution to problem number 3, the best thing would be that Denyhosts did not add reverse DNS entries in /etc/resolv.conf, but I did not find a way to configure it to behave like that.

    On the client side

    The global client side configuration is typically in the file /etc/ssh/ssh_config. But you can configure things on a user level by editing the file ~/.ssh/config.

    I use the local configuration file like this:

    Host *
    Compression yes

    Host analise

    Host home

    That way I can just type "ssh analise" or "ssh home". In the "home" case, it has the advantage of making my dynamic dns name shorter.

    1. Make sure you have the line "GSSAPIAuthentication=no".

    Monday, August 1, 2016

    OpenSUSE Leap 42.1 and nvidia kernel driver

    The symptoms

    Nvidia kernel drivers were not loaded after kernel update.

    The problem

    The package nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64 runs a script that regenerates the kernel drivers and creates symlinks in /lib/modules/4.1.27-27-default/weak-updates/updates. After running:

    # zypper in --force $(rpm -qa "nvidia-gfx*kmp*")

    I noticed that the links were not beeing generated:

    # zypper in --force $(rpm -qa "nvidia-gfx*kmp*")
    Retrieving repository 'network:utilities' metadata ..........................................................................................................[done]
    Building repository 'network:utilities' cache ...............................................................................................................[done]
    Loading repository data...
    Reading installed packages...
    Forcing installation of 'nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64' from repository 'nVidia Graphics Drivers'.
    Resolving package dependencies...

    The following package is going to be reinstalled:

    1 package to reinstall.
    Overall download size: 5.8 MiB. Already cached: 0 B. No additional space will be used or freed after the operation.
    Continue? [y/n/? shows all options] (y): 
    Retrieving package nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64                                                     (1/1),   5.8 MiB ( 64.6 MiB unpacked)
    Retrieving: nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64.rpm ..........................................................................[done (2.2 MiB/s)]
    Checking for file conflicts: ................................................................................................................................[done]
    (1/1) Installing: nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64 ....................................................................................[done]
    Additional rpm output:
    make: Entering directory '/usr/src/linux-4.1.27-27-obj/x86_64/default'
      Building modules, stage 2.
      MODPOST 0 modules
    make: Leaving directory '/usr/src/linux-4.1.27-27-obj/x86_64/default'
    /usr/src/kernel-modules/nvidia-367.35-default /
    make "CC=cc" KBUILD_OUTPUT=/usr/src/linux-obj/x86_64/default KBUILD_VERBOSE= -C /lib/modules/4.1.27-27-default/source M=/usr/src/kernel-modules/nvidia-367.35-default ARCH=x86_64 NV_KERNEL_SOURCES=/lib/modules/4.1.27-27-default/source NV_KERNEL_OUTPUT=/usr/src/linux-obj/x86_64/default NV_KERNEL_MODULES="nvidia nvidia-uvm nvidia-modeset nvidia-drm" INSTALL_MOD_DIR=kernel/drivers/video modules
    make[1]: Entering directory '/usr/src/linux-4.1.27-27'
    make[2]: Entering directory '/usr/src/linux-4.1.27-27-obj/x86_64/default'
      Building modules, stage 2.
      MODPOST 4 modules
    make[2]: Leaving directory '/usr/src/linux-4.1.27-27-obj/x86_64/default'
    make[1]: Leaving directory '/usr/src/linux-4.1.27-27'
    ld -T /lib/modules/4.1.27-27-default/source/scripts/ -r -o nv-linux.o \
      nvidia.mod.o nvidia/nv-interface.o

    Modprobe blacklist files have been created at /etc/modprobe.d to prevent Nouveau from loading. This can be reverted by deleting /etc/modprobe.d/nvidia-*.conf.

    *** Reboot your computer and verify that the NVIDIA graphics driver can be loaded. ***

    depmod: WARNING: //lib/modules/4.1.27-27-default/misc/vboxvideo.ko disagrees about version of symbol VBoxGuest_RTLogBackdoorPrintf
    depmod: WARNING: //lib/modules/4.1.27-27-default/misc/vboxvideo.ko disagrees about version of symbol VBoxGuest_RTErrConvertToErrno
    depmod: WARNING: //lib/modules/4.1.27-27-default/misc/vboxvideo.ko disagrees about version of symbol VBoxGuest_RTAssertShouldPanic
    depmod: WARNING: //lib/modules/4.1.27-27-default/misc/vboxvideo.ko disagrees about version of symbol VBoxGuest_RTAssertMsg1Weak
    depmod: WARNING: //lib/modules/4.1.27-27-default/misc/vboxvideo.ko disagrees about version of symbol VBoxGuest_RTAssertMsg2Weak
    Warning: /lib/modules/4.1.27-27-default is inconsistent
    Warning: weak-updates symlinks might not be created

    Output of nvidia-gfxG04-kmp-default-367.35_k4.1.12_1-25.1.x86_64.rpm %posttrans script:
        Creating initrd: /boot/initrd-4.1.12-1-default
        Executing: /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force --force-drivers "nvidia   -drm" /boot/initrd-4.1.12-1-default 4.1.12-1-default
        *** Including module: bash ***
        *** Including module: warpclock ***
        *** Including module: i18n ***
        *** Including module: ifcfg ***
        *** Including module: drm ***
        *** Including module: plymouth ***
        *** Including module: kernel-modules ***
        Omitting driver i2o_scsi
        *** Including module: resume ***
        *** Including module: rootfs-block ***
        *** Including module: terminfo ***
        *** Including module: udev-rules ***
        Skipping udev rule: 91-permissions.rules
        Skipping udev rule: 80-drivers-modprobe.rules
        *** Including module: haveged ***
        *** Including module: systemd ***
        *** Including module: usrmount ***
        *** Including module: base ***
        *** Including module: fs-lib ***
        *** Including module: shutdown ***
        *** Including module: suse ***
        *** Including modules done ***
        *** Installing kernel module dependencies and firmware ***
        *** Installing kernel module dependencies and firmware done ***
        *** Resolving executable dependencies ***
        *** Resolving executable dependencies done***
        *** Hardlinking files ***
        *** Hardlinking files done ***
        *** Stripping files ***
        *** Stripping files done ***
        *** Generating early-microcode cpio image ***
        *** Constructing GenuineIntel.bin ****
        *** Store current command line parameters ***
        Stored kernel commandline:
        root=UUID=xxxxxxxxxxxxxxxxxxxxxx rootflags=rw,relatime,data=ordered rootfstype=ext4
        *** Creating image file ***
        *** Creating image file done ***
        Some kernel modules could not be included
        This is not necessarily an error:
        Update bootloader...
        Creating initrd: /boot/initrd-4.1.27-24-default
        Executing: /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force --force-drivers "nvidia   -drm" /boot/initrd-4.1.27-24-default 4.1.27-24-default
        *** Including module: bash ***
        *** Including module: warpclock ***
        *** Including module: i18n ***
        *** Including module: ifcfg ***
        *** Including module: drm ***
        *** Including module: plymouth ***
        *** Including module: kernel-modules ***
        Omitting driver i2o_scsi
        *** Including module: resume ***
        *** Including module: rootfs-block ***
        *** Including module: terminfo ***
        *** Including module: udev-rules ***
        Skipping udev rule: 91-permissions.rules
        Skipping udev rule: 80-drivers-modprobe.rules
        *** Including module: haveged ***
        *** Including module: systemd ***
        *** Including module: usrmount ***
        *** Including module: base ***
        *** Including module: fs-lib ***
        *** Including module: shutdown ***
        *** Including module: suse ***
        *** Including modules done ***
        *** Installing kernel module dependencies and firmware ***
        *** Installing kernel module dependencies and firmware done ***
        *** Resolving executable dependencies ***
        *** Resolving executable dependencies done***
        *** Hardlinking files ***
        *** Hardlinking files done ***
        *** Stripping files ***
        *** Stripping files done ***
        *** Generating early-microcode cpio image ***
        *** Constructing GenuineIntel.bin ****
        *** Store current command line parameters ***
        Stored kernel commandline:
        root=UUID=xxxxxxxxxxxxxxxxxxxxxxx rootflags=rw,relatime,data=ordered rootfstype=ext4
        *** Creating image file ***
        *** Creating image file done ***
        Some kernel modules could not be included
        This is not necessarily an error:
        Update bootloader...

    The solution

    Uninstall the virtualbox guest packages: virtualbox-guest-kmp-default, virtualbox-guest-tools and virtualbox-guest-x11 and rerun # zypper in --force $(rpm -qa "nvidia-gfx*kmp*") as root. After that, the links are created as follows:

    /lib/modules/4.1.27-27-default/weak-updates/updates # l
    total 8
    drwxr-xr-x 2 root root 4096 Aug  1 15:41 ./
    drwxr-xr-x 4 root root 4096 Jul 21 19:03 ../
    lrwxrwxrwx 1 root root   51 Aug  1 15:41 nvidia-drm.ko -> /lib/modules/4.1.12-1-default/updates/nvidia-drm.ko
    lrwxrwxrwx 1 root root   55 Aug  1 15:41 nvidia-modeset.ko -> /lib/modules/4.1.12-1-default/updates/nvidia-modeset.ko
    lrwxrwxrwx 1 root root   51 Aug  1 15:41 nvidia-uvm.ko -> /lib/modules/4.1.12-1-default/updates/nvidia-uvm.ko
    lrwxrwxrwx 1 root root   47 Aug  1 15:41 nvidia.ko -> /lib/modules/4.1.12-1-default/updates/nvidia.ko

    Tuesday, March 8, 2016

    Arduino Shield for custom board CPLD programming and testing using pogo pins

    This post is just to show how I have used the Arduino JTAG programming hardware/software that I have discussed before.

    The idea was to have a setup where I could both program and test a CPLD based board.

    Lets see the photos:

    The photo above shows the support for the board to be programmed with the pogo pins at the center, the board itself and the top shield. I have used two identical shield boards and have spaced them to give the pogo pins the proper vertical direction.

    Some of the pads on the board to be tested are SMD, others are through hole. Of course, after it was assembled, I realized I should have left the pogos that go peek at through holes slightly higher that those that peek at SMD pads. That would have made the process of fitting the board much better.

    In the same holes, I have mounted the board support, which is a kind of "negative" of the board. It consists of two milled pcbs, with two concentric circles to give support to the board. If you look carefully to the left of the photo, you can see a small dent that is used to give the board the proper orientation.

    In this next photo, we can see the toggle clamp device used to hold the board in place in action.

    In the last photo, we can see the full stack: the Arduino at the ground, the two shield boards in the middle and the support with a board in it.

    Some details for those interested:

    Hope you like it, comments are welcome!

    Saturday, March 5, 2016

    Serial Buffer Size versus Effective Bit Rate of Arduino USB


    I have recently just met a few of the "gotchas" related to serial programming and memory on the Arduino, and got a few lessons. I was debugging someone else's non-working code. Non-working for, apparently, no good reason. To make a long story short, the problem was that the program used a large amount of SRAM (static RAM) memory, in the form of strings. The Arduino Uno has 32 Kib of flash, but only 2 KiB of SRAM. That is why strings in the Arduino should be kept in flash memory to save the precious SRAM. To do so, you have to use the "F()" macro, so that the compiler does that for you.

    Figuring out the problem was not easy, since using "Serial.print()"  without "F()" to debug would just make things worse in an unpredictable way. But at a certain point, I got it, and since then I tried my best to spare SRAM. That is when I started facing the problem of the size of the serial buffer.

    Serial communication on the Arduino has one big problem: there is no hardware flow control. That means that if you want reliable communication, you must implement your own flow control mechanism. Anything you come up with software, implies a greater overhead than what you would get with a hardware mechanism. But of course, using a larger reception buffer would minimize the problem. The larger the buffer, the smaller the number of times the flow control mechanism must work.

    The Arduino software has a default size of 64 bytes for the serial buffer. I wonder if that was enough, so I wrote some code to test it.

    Some theory

    Lets try to come up with a model. Linear models look interesting, for a start. Lets try the following: the time that a transfer takes \((\Delta t)\) is proportional to the number of bytes we want to transfer. If you consider a serial transmission with UART, eight data bits, one start bit and one stop bit, the time to transfer one byte is proportional to either ten times the inverse of the bit rate or some byte processing overhead \((O_{byte})\), whichever is greater. But since these bytes are transfered in blocks, we can imagine that the total transfer time also has an overhead component proportional to the number of blocks \((O_{block})\). In equations:

    \begin{equation} \Delta t = NumBytes \cdot \max \left( \frac{10}{BitRate}, O_{byte}\right) + NumBlocks \cdot O_{block} \end{equation}

    \begin{equation} \Delta t = NumBytes \cdot \max \left( \frac{10}{BitRate}, O_{byte}\right) + \frac{NumBytes}{BlockSize} \cdot O_{block} \end{equation}

    \begin{equation} \label{eqDeltaTFinal} \Delta t = NumBytes \cdot \left[ \max \left( \frac{10}{BitRate}, O_{byte}\right) + \frac{O_{block}}{BlockSize} \right] \end{equation}

    \begin{equation} \frac{\Delta t}{NumBytes} = \max \left( \frac{10}{BitRate}, O_{byte}\right) + \frac{O_{block}}{BlockSize} \end{equation}

    \begin{equation} \frac{10}{EffectiveBitRate} = \max \left( \frac{10}{BitRate}, O_{byte}\right) + \frac{O_{block}}{BlockSize} \end{equation}

    Equation \ref{eqDeltaTFinal} shows two things:

    1. We can mitigate the block overhead using a larger block size.
    2. We should try to keep the byte overhead less than 10 times the inverse of the bit rate.

    In this work, I will estimate the byte overhead and the block overhead from the measure of the effective bit rate for various block sizes.

    Arduino Bit Rates

    For the data to have some meaning, we will have to use an exact calculation of the Arduino bit rates. The formula is (for AVR's U2X bit = 1):

    \begin{equation}\label{eqnBitRate}BitRate = \frac{10 \cdot ClockFrequency}{8 \cdot (UBRR + 1)} \end{equation}


    \begin{equation}\label{eqnBytePeriod}\frac{1}{ByteRate} = \frac{10}{BitRate} = \frac{8 \cdot (UBRR + 1) \cdot 1000}{ClockFrequency} \,\, ms/byte \end{equation}

    For the Arduino, \(ClockFrequency = 16 MHz \), such that \(9600\,bits/s\) is actually \(9615.4\,bits/s\) \((UBRR = 16,\,1.0406\,ms/byte)\), and \(115200\,bits/s\) is actually \(117647\,bits/s\,(UBRR = 207,\,85.144\,\mu{s}/byte)\).

    I found this nice AVR bit rate calculator, if you are curious, you can play with it.

    The Data

    Each graph consists on a log-log plot of two sets of data. The red curves are for the bit rate of 9600, and the blue curves are for the bit rate of 115200. Both curves refer to the transfer time of 32768 bytes. For each graph I have created an artificial byte overhead using a delay after receiving the byte. The log-log plot is necessary to linearize the "\(\frac{1}{x}\)" relation of total time versus block size.

    The tail of the curves can be estimated from equation \ref{eqDeltaTFinal} taking the limit when the block size is large:

    \begin{equation} \Delta t_{tail} = NumBytes \cdot \max \left( \frac{10}{BitRate}, O_{byte}\right) \end{equation}

    which is \(\left(NumBytes \cdot \frac{10}{BitRate}\right)\) or \(\left(NumBytes \cdot O_{byte}\right)\). If we assume that \( O_{byte}\leqslant \frac{10}{BitRate}\), then for a sequence of 32768 bytes we have the theoretical values of 34.078 ms and 2.785 ms for 9615.4 bps and 117647 bps respectively. Which agrees whith the measured values of 34.1 ms and 2.79 ms.

    The graphs show that for the Arduino working in 9600 bps (9615.4 bps actually), a buffer size of 17 bytes is enough to reach the minimum theoretical transfer time. For 115200 bps (117647 bps actually) a buffer size of 27 bytes will do.

    give us the estimate of \(\max \left( \frac{10}{BitRate}, O_{byte}\right)\), while the linear part gives us the estimate of \(\frac{O_{block}}{BlockSize}\).

    Reality is always full of surprises. The tail behaves as we would expect from our crude model, but for the lower values of the buffer size, we can see some unexpected things.


    Depending on the byte processing overhead your algorithm has, we saw that a 63 byte buffer can have the same performance in 9600 bits/s or 115200 bits/s.

    The bumpy block overhead is something that I might analyse more carefully some day in the future.

    Saturday, September 5, 2015

    A XSVF Assembler/Disassembler in python


    As a sequence of my last project, the JTAG/XSVF library for Arduino, I felt I needed a XSVF assembler and disassembler, so that I could hack JTAG a little bit. I found that XSVF is very convenient, much more than SVF when you are dealing with a single component in the JTAG chain. I also found out that the XSVF files produced by programmers are very inefficient and full of unnecessary stuf. It would be great if I could write my own XSVF files.

    The problem is that XSVF is a binary format. At first I started editing in binary with a program called ghex, but this is far from confortable. It is easy to get lost and almost impossible to maintain. So I decided to write an assembler.

    The code is here on github.

    The story

    In fact, the disassembler was practically ready. Having written the XSVF player in a modular way, the disassembler was just another instance of the same code. Instead of playing XSVF, I would disassemble XSVF. Piece of cake.

    The XSVF player was in C++ and even though I could resuse the code, I'd much rather rewrite it in python. Python is about two orders of magnitude more productive than C/C++, even with all the years I have coding in those languages.

    There was also an issue with one XSVF instruction, namely XSDRINC. This is a problematic instruction and has been obsoleted. Xilinx's IMPACT software does not generate it anymore. This instruction has not been implemented in the JTAG library, the reason beeing that to support it I would need to start executing the instruction before finishing decoding it, or use an unacceptable amount of memory for a microcontroller (Arduino). It can certainly be done, but was slightly against the original philosophy of the code. Maybe I'll do it later, but since it was an obsolete instruction that I would not be able to test, I decided to skip it in order to quickly have a working XSVF player.

    I have basically translated the C++ code into python and the disassembler was ready, really no big deal. That code could now even be reused to write a XSVF player in python :). Anyway, hacking JTAG was the original motivation, so while the disassembler was a necessary tool during debugging, now I needed the assembler to start writting my own code in a maintainable way. Since I was now using python, I decided to implement XSDRINC, because I had not the same memory limitations I had in the Arduino.

    And it turns out that using python was indeed a good choice for other reasons.


    The first problem you face when you write a compiler or interpreter for a language is scanning and parsing. Scanning is classifying the original input in a sequence of valid tokens. Parsing is the syntatical analysis that validates that sequence of tokens. You need to check if the sentences are properly constructed according to a certain grammar.

    It turns out that programming a scanner and a parser even for an extremelly simple language like XSVF is a nightmare of details. Also, on the documenting side, the code is very distant from the actual grammar definition. Lets call python to the rescue...

    Pyparsing is a fantastic module that makes it very easy to program a scanner/parser in python. With the added bonus that you can use the python syntax to define a grammar that looks like BNF (Bakus-Naur Form), so close that it makes it unnecessary to document it in a separate BNF doc.

    In about 2 hours, I was able to learn how to use pyparsing and write my working XSVF parser. The code examples are very good, and reading the library code resolved some subtle issues.

    I had to struggle a little with some of my original ideas of supporting bytes in hexadecimal and binary. Specifying these in the language grammar was not that obvious, there was a subtle "order of matching" problem I had not seen comming. I guess I had in my mind a much straighter separation between scanning and parsing, therefore I missed well defining the language tokens. Anyway, it payed back to adapt my mind to the pyparsing way.

    The language

    Since I had the assembler and the disassembler, I wrote the following code to test every instruction:

    As you can see, I tried not to clutter the syntax while keeping it readable. There is a nice old school assembly style command (the semi-colon) that comments everything until the end of the line. Byte sequences can be written in binary or hexadecimal, without the overhead of using a prefix like "0x". These sequences are usually very large because they are normally used in programming or boundary scan. Beeing able to mix hexadecimal, binary and comments is a good way to keep it readable.

    The final test was to run the sequence assembler -> disassembler -> assembler and compare the two assembled files. I did it with a SHA1 hash, and after some debugging, they compared ok for the test file.

    $ cat
    #! /bin/bash
    ./ > test.xsvf
    ./xsvf -c disasm -n test.xsvf > test.xsvf.s
    ./xsvf -c asm test.xsvf.s > test2.xsvf
    sha1sum test*.xsvf

    $ ./
    b1fcb2845c934c622d9b8ffff857d08c9542c8b4 test2.xsvf
    b1fcb2845c934c622d9b8ffff857d08c9542c8b4 test.xsvf


    Now I have a fully working assembler/disassembler for XSVF. That opens some doors for JTAG hacking and boundary scan testing. Let's see where it leads. If you decide to try it, please share your comments.


    1. JTAG/XSVF library for Arduino
    2. Bakus-Naur Form
    3. Pyparsing web page

    Tuesday, August 18, 2015

    A JTAG/XSVF Library for Arduino


    I have recently felt the need to incorporate a JTAG port in a project to program a hardware that contained a CPLD. The idea was to both program it and perform some integrity tests on the board. I imagined something using pogo pins, to make it easier and quicker to test everything. I would also write the necessary test routines and generate some kind of report.

    With this objective in mind, I have decided to design an Arduino shield to do the job. The testing routines were not really a big deal. And I was sure I would find some JTAG library for Arduino ready to be used. That was not the case.

    There were some projects using Arduino to control a JTAG TAP (Test Access Port), but they were all incomplete. And I had no idea what was really JTAG. So I had to study a little bit to make things work for me.

    In the end, the challenge proved enlightening. There were some caveats, both from hardware and from software. I'll try to address them in this article.

    The library is hosted in here in github. It is also in the new Arduino IDE library manager, so it should be easy to find it.

    After all, what is JTAG?

    In the internet days, everything starts with a good look at Wikipedia. This is the link to JTAG on Wikipedia. To make a long story short, here is the quote for JTAG definition on Wikipedia:
    The Joint Test Action Group (JTAG) is an electronics industry association formed in 1985 for developing a method of verifying designs and testing printed circuit boards after manufacture. In 1990 the Institute of Electrical and Electronics Engineers codified the results of the effort in IEEE Standard 1149.1-1990, entitled Standard Test Access Port and Boundary-Scan Architecture.
    What it means is that the name JTAG originally meant an electronics industry association. But after IEEE published the Standard 1149.1, which referred to a "Standard Test Access Port", this test access port (TAP) has become more or less a synonym for JTAG.

    The physical layer consists of 5 signals:

    • \(\hbox{TCK}\) - Test Clock input
    • \(\hbox{TMS}\) - Test Mode Select
    • \(\hbox{TDI}\) - Test Data Input
    • \(\hbox{TDO}\) - Test Data Output
    • \(\overline{\hbox{TRST}}\) - Test Reset Input
    The \(\overline{\hbox{TRST}}\) signal is optional.

    JTAG enabled devices can be all connected together. The signals TCK and TMS (and \(\overline{\mbox{TRST}}\), if present) should be connected to all devices. As a consequence, all state machines in all devices will be always at the same state. The TDI of the JTAG interface is connected to the TDI of the first device. The TDO of a device should be connected to the TDI of the next device. Finally, the TDO of the last device is connected to the JTAG interface. This way, data can be shifted in or out of a big shift register, that gets bigger if you add more devices.

    The IEEE standard defines not only what the electric cable signals are, but also defines the logic state machine that must understand these signals. Objectives were to keep the number of signals on the cable as low as possible, but keeping the architecture versatile enough to perform any task.

    The standard goes on to define the basics of the Boundary-Scan Architecture, which seems to be what they had primarily in mind at that time. It is a way to of testing your hardware "on the fly", i.e., while the circuit is operating. Boundary-Scan compatible devices make it possible to control the logical value of output and input pins of your integrated circuits to see how the whole hardware will respond to those stimulus.

    What is JTAG used for?

    It didn't take much to start "abusing" JTAG to perform other things besides boundary-scan. For example:

    • Customized hardware testing for quality control
    • CPLDs and FPGAs programming
    • Microcontroller debugging
    In order to understand how these tasks can be acheived, we must understand how the JTAG TAP works.

    The TAP

    The TAP is a synchronous finite state machine. The following diagram shows the state diagram of a JTAG TAP. Transitions are controlled by the state of TMS on the rising edge of TCK.

    The state machine is actually simple. In what follows, xR means DR or IR.
    • Actions on the test logic can happen either on the risign or on the falling edge of TCK.
    • At any time, there are two registers that you have to be concerned with: the Instruction Register (IR) and the Data Register (DR). The actual content of DR depends on the value loaded on IR. IR must have at least two bits.
    • There is a reset state that is easily reacheable. Starting from any state in this diagram, if you hold TMS high for 5 consecutive clock rising edges, the TAP is guaranteed to enter TEST-LOGIC-RESET. When this state is entered, the IR is loaded with either the IDCODE instruction or the BYPASS instruction, so that when the TAP moves into RUN-TEST/IDLE no action will occur. If the signal TRST is present, it assynchronously forces the TAP into TEST-LOGIC-RESET.
    • RUN-TEST/IDLE is the sate where you will usually wait or pass between operations. It is one of the stable states, meaning if you hold TMS on a determined value, you stay in that state. For example, the instruction RUBINST causes a self-test of the system on this state.
    • The two vertical state columns perform similar funtions. The first is meant to read and write to the DR while the second does the same to the IR. The DR is actually a group of registers. The one you actually access is dependent on the contents of the IR, so you can think of IR as a selector.
    • SELECT-xR-SCAN: Temporary state to enter xR operations or just to pass on to the next state.
    • CAPTURE-xR: The current selected DR is loaded on the shift registers. In the case of the IR, this state loads the fixed binary pattern "01" on the bits closer to TDO. The other bits may load other design-specific data.
    • SHIFT-xR: In this state, the data loaded in the shift-register is both serially shifted in xR through TDI and shifted out through TDO.
    • EXIT1-xR: This is temporary state that provides a way to bypass PAUSE-xR and EXIT2-xR and go straight to UPDATE-xR.
    • PAUSE-xR: Nothing happens, the controller is paused while in this state.
    • EXIT2-xR: A temporary state that provides a way to return to SHIFT-xR.
    • UPDATE-xR: Data is latched into xR on the falling edge of TCK.
    In other words, we write to IR to tell the device what we want, and write or read to DR to set a property or get a response.

    BYPASS is an instruction that turns DR into a one bit register that always capture a zero.

    There is no such thing as shifting-in without shifting-out or vice-versa. We always do both.

    Playing around with JTAG

    The Inpact software from Xilinx has a nice interface that allows you to play with the TAP:

    Select "Boundary-Scan", then "Initialize Chain" and then on the Debug menu choose "Enable/Disable Debug Chain". It is rudimentary and not very practical, but has its uses. Hand collecting of the bits shifted in and out of IR and DR can be tedious. That is why a second interface is provided that is slightly more usefull.

    For example, on the previous screen, click on "Test Logic Reset", and this state becomes green. If the IEEE standard is folowed by this device, then the "IDCODE" instruction (if present) must have been loaded into IR. If we shift out DR, then we must be able to access this IDCODE. To do that, fill the box next to SCAN DR with 32 zeroes, click on "Execute" and lets see what comes out:
    TDO Capture Data: 00000110111001011110000010010011
    If we break this binary string into hexadecimal, we get something more meaningfull. This is 06E5E093, which is the IDCODE for XC2C64A.

    At this point, I should mention a very usefull file to have in hand. It is the BSDL file for the device. BSDL stands for "Boundary-Scan Definition Language", its a subset of VHDL that defines the boundary-scan parameters of a device. BSDL has been defined on the same IEEE Std 1149.1. There we can look at the instructions that our device will accept. The relevant part for us now is:

    attribute INSTRUCTION_LENGTH of xc2c64a : entity is 8;
    attribute INSTRUCTION_OPCODE of xc2c64a : entity is
          "INTEST (00000010)," &
          "BYPASS (11111111)," &
          "SAMPLE (00000011)," &
          "EXTEST (00000000)," &
          "IDCODE (00000001)," &
          "USERCODE (11111101)," &
          "HIGHZ (11111100)," &
          "ISC_ENABLE_CLAMP (11101001)," &
          "ISC_ENABLEOTF (11100100)," &
          "ISC_ENABLE (11101000)," &
          "ISC_SRAM_READ (11100111)," &
          "ISC_SRAM_WRITE (11100110)," &
          "ISC_ERASE (11101101)," &
          "ISC_PROGRAM (11101010)," &
          "ISC_READ (11101110)," &
          "ISC_INIT (11110000)," &
          "ISC_DISABLE (11000000)," &
          "TEST_ENABLE (00010001)," &
          "BULKPROG (00010010)," &
          "ERASE_ALL (00010100)," &
          "MVERIFY (00010011)," &
          "TEST_DISABLE (00010101)," &
          -- "STCTEST (00010110)," &
          "ISC_NOOP (11100000)";
    attribute INSTRUCTION_CAPTURE of xc2c64a : entity is "XXXXXX01" ; 
    attribute IDCODE_REGISTER of xc2c64a : entity is "XXXX0110111001011XXX000010010011";
     That tells us that the IR has 8 bits and what codes it does understand. Lets try IDCODE again, now the hard way. Put "0000001" in the "Scan IR" box, then click "Execute". Then, if it is not already there, put "00000000000000000000000000000000" in the "Scan DR" box and click the second "Execute" button. You should get
    TDO Capture Data: 00000101
    TDO Capture Data: 00000110111001011110000010010011
    The first line with 8 bits seem to be the mandatory "01" that IR will always load, plus six bits that have a meaning set by the manufacturer. This is what is implied by the line 'attribute INSTRUCTION_CAPTURE of xc2c64a : entity is "XXXXXX01" ;'  Notice that since we must enter "CAPTURE-IR" before "UPDATE-IR", there is no way for the device to guess what instruction is going to be loaded. So this value is either a constant or dependent upon an internal state of the device. We will see later that it has something to do with checking for read/write protect.

    The second line with 32 bits is exactly what we got before when we performed the "Shift-DR" straight after "TEST-LOGIC-RESET".

    It is possible to use Impact to check that we have really got the right id code, if you go out of "Debug Mode", click on the device and then on "Available Operations" click "Get Device ID":
    INFO:iMPACT - Current time: 8/14/15 5:01 PM
    Maximum TCK operating frequency for this device chain: 33000000.
    Validating chain...
    Boundary-scan chain validated successfully.
    '1': IDCODE is '00000110111001011110000010010011'
    '1': IDCODE is '06e5e093' (in hex).
    '1': : Manufacturer's ID = Xilinx xc2c64a, Version : 0
    It is possible to crak a little bit more the of the id code. Bits 31 to 28 "0000" are the version number, bits 27 to 22 (6E5E in hexa) are the part number and the final 11 bits "093" (in hexa) are the manufacturer's id and in this case mean Xilinx. The first bit, bit 0 is in fact required to be '1' if an IDCODE instruction is present. Remember that upon TEST-LOGIC-RESET, IR gets loaded with IDCODE or BYPASS. But BYPASS is required to load a '0' at the start of the scan cycle, so this bit is used to identify which instruction has been used upon reset.

    Playing around with SVF

    Now we know enough about JTAG, lets see what else we can do. Suppose you want to describe a set of operations to be performed on a JTAG TAP controller, for example, suppose you want to program a certain device. How do you describe what has to be done? The answer is a programming language. SVF stands for "Serial Vector Format", and is a file format that specifies how and which boundary-scan vectors should be transferred to a device, and also which should be some of the expected results, like ID code or checksum.

    The problem with SVF is that it is too verbose. Good for humans to read, but excessive for computers to deal with. So Xilinx has come up with XSVF, which is a binary form o SVF.

    Fortunately for us, Impact is able to generate both SVF and XSVF. In our previous example, go to menu "Output->SVF File->Create SVF File..." and choose a name for the file to save data in. Double click on "Get Device ID" then go to "Output->SVF File->Stop Writing to SVF File".

    You should get a file like this:

    There is a quick reference for SVF and XSVF in the apendices of the document XAPP503 - SVF and XSVF File Formats for Xilinx Devices (

    The most relevant information is in the instructions SIR and SDR. These are a version of the Impact gui interface that we have previously used. But besides telling the SVF machine what it should shift in, it also tell it what it should get shifted out. It also specifies masks, so that the relevant bits may be checked. For example:
    SIR 8 TDI (01) SMASK (ff) ;
    SDR 32 TDI (00000000) SMASK (ffffffff) TDO (f6e5f093) MASK (0fff8fff) ;
    This means "shift the following 8 bits into IR: 00000001, all of which are relevant, then shift 32 zeros into DR, all of which are relevant, then compare the value received with f6e5f093, after masking both with 0fff8fff". Sounds familiar? Pretty much what we did by hand before, except the masking part.

    The comments in the file are interesting. And we can see a lot of redundant operations. The ID code is checked three times, and the read/write protect is checked twice. In the end, the device is put on BYPASS.

    Playing around with XSVF

    If we use Impact to generate a XSVF version of the ID code check program, we get this:

    A completely binary file, improper for humans to read but good enough for computers. There you have the exact same program as before, but coded in binary XSVF instructions, rather than SVF. I wrote a disassembler in python for debugging, and the output is the following:

    We can see the same redundancies that were present in the SVF file.

    The previously mentioned XAPP503 has the documentation for the XSVF instructions. Which boils down to shifting stuff into IR or DR, reading stuff back and pulsing TCK while in some state, usually in . Not new stuff, after all we have been through.

    Arduino, at last

    Now that we know what JTAG is, we can program a microprocessor to do the job for us. Arduino is a nice choice for a number of reasons I don't need to get into. I have searched for a library that would just do what I wanted, but none seemed to work with my hardware. There were several problems, I'll try to address them.

    The first issue I found was memory usage. It is absolutely essential that you keep your strings within flash. Even more when you are in debug mode, where you want lots of output to understand what is going on.

    The second problem was the Arduino serial interface, which has no flow control. That means that we must provide one. What I did was to always use a fixed block transfer size. I have used the Arduino software serial buffer avoiding unecessary copying to spare memory. There are 64 bytes in this buffer, but only 63 are useable because of the circular buffer implementation. I have managed to increase its size to 256 bytes, the process is documented in the code. I have spent quite some time to make the transfer as fast as possible.

    The cable was another issue. Don't make it too long and do some termination/impedance matching. Reflections in TCK will kill any attempt to program a JTAG device. I have used three voltage dividers to convert from the 5 Volts logic to the 3.3 Volts logic of the devices I was using, and I have chosen the resistor values to kinda match the expected impedance without killing Arduino's ATMEGA 328p output drivers.

    The best solution for the signal conversion would have been to use buffers. Usually the JTAG cable will have a VCC signal that can be used to sense the device's operating voltage. That does not exclude the impedance matching, that in this case could be done with some series resistors.

    The VCC signal of the JTAG cable is used in this project to detect that the cable is actually connected to some hardware.

    Here is a picture of the two programmers I have used, the Xilinx one and the Arduino. Each of them is connected to a XC2C64A breakout board from Dangerous Prototypes.

    Here it is a copy/paste of the terminal screen while programming an example:

    $ ./xsvf ../xsvf/XC2C64A/VHDL-CPLDIntro3LEDinverse.xsvf
    File: /home/user/sketchbook/arduino/libraries/JTAG/extras/xsvf/XC2C64A/VHDL-CPLDIntro3LEDinverse.xsvf
    Ready to send 22846 bytes.
    IMPORTANT: Free memory: 771 bytes.
    Sent:    22846 bytes,        0 remaining ()
    IMPORTANT: ********
    IMPORTANT: Success!
    IMPORTANT: ********
    IMPORTANT: Processed 1417 instructions.
    IMPORTANT: Checksum:  0x36/22846.
    IMPORTANT: Sum: 0x0033D4CA/22846.
    Received device quit: Exiting!
      Expected checksum:  0x36/22846.
      Expected sum: 0x0033D4CA/22846.
    Elapsed time: 4.31 seconds.

    Now what?

    Besides programming devices, it is now possible to use JTAG to actually communicate with devices. One possibility is to create a JTAG TAP in VHDL and use it to control your device.

    Another possibility is to use the Arduino JTAG to hack into hardware. It is not unusual to find "lost" and undocumented JTAG interfaces on several devices, if you search a little around the internet, you will have some ideas of what you can do.

    Hope you like this, happy jtagging!

    Sunday, October 19, 2014

    How to post code on Blogger using Gist

    The problem

    How to post nice pieces of code to a blog? The desired properties are:
    1. Beauty.
    2. Language awareness.
    3. Easy to copy to the clipboard.
    4. Easy to integrate in Blogger.
    5. Easy to maintain

    One possible solution

    Use Gist.You must have an account on GitHub. Gists are versioned text files, so you can alter them and fix any problems on the Gist page without having to deal with your blog page.

    How to integrate it into Blogger

    After creating your Gist, copy the "Embed URL" to the clipboard:
    The link is something like this:
    <script src=""></script>
    Reserve a space to integrate the Gist in your blog post. Then choose "HTML" in the Compose/HTML radio button. Insert that code in the proper place in your page, and your're done:

    Take a look at the snapshot:

    Saturday, October 18, 2014

    Uploading photos to Facebook using google-chrome on linux

    The problem

    Uploading photos to Facebook seems to be unresponsive, I get to this screen and when I press the "Choose File" button, nothing happens:

    The solution

    Go to a file manager (e.g, konqueror) drag the file with the mouse and drop it over one of the buttons. The file will then be accepted:

    If you want to unselect a photo, click the corresponding "Choose File" button. Apparently, its intended function is not to choose a file, it is to reset the file to "No file choosen":

    After that, click "Upload Photos" and wait, because it takes some time.

    Thursday, October 16, 2014

    Cuda 6.5 on OpenSuSE 12.3

    NVidia Drivers

    Make sure you have the official NVidia drivers installed in your system:
    • Run Yast.
    • Click in "Software Repositories"
    • Click in "Add"
    • Choose "Specify URL", then "Next"
    • Repository Name: "nvidia", URL:
    • Confirm
    • Go back to Yast
    • Click in "Software Management"
    • Search for "nvidia"
    • Add the following packages: 
    1. x11-video-nvidiaG03-340.46-30.1.x86_64
    2. nvidia-gfxG03-kmp-desktop-340.46_k3.7.10_1.1-30.1.x86_64
    3. nvidia-computeG03-340.46-30.1.x86_64
    4. nvidia-settings-325.15-1.3.x86_64
    5. nvidia-glG03-340.46-30.1.x86_64
    6. nvidia-texture-tools-2.0.6-36.2.x86_64
    7. nvidia-uvm-gfxG03-kmp-desktop-340.46_k3.7.10_1.1-30.1.x86_64
    Notice that the above assumes your board is supported by the G03 kernel driver and that you are using the "kernel-desktop". Make sure you choose the proper driver for your board and the kernel driver corresponding to your kernel.

    CUDA Installation

    Install the CUDA repository. Although the repository is for OpenSuSE 13.1, it will work perfectly with 12.3.
    • Click in "Add"
    • Choose "Specify URL", then "Next"
    • Repository Name: "cuda", URL:
    • Confirm
    • Go back to Yast
    • Click in "Software Management"
    • Search for "cuda"
    • Add the following packages (some of them will be automatically added): 
    1. cuda-documentation-6-5-6.5-14.x86_64
    2. cuda-cudart-6-5-6.5-14.x86_64
    3. cuda-cufft-dev-6-5-6.5-14.x86_64
    4. cuda-repo-opensuse131-6.5-14.x86_64
    5. cuda-visual-tools-6-5-6.5-14.x86_64
    6. cuda-cufft-6-5-6.5-14.x86_64
    7. cuda-npp-dev-6-5-6.5-14.x86_64
    8. cuda-curand-dev-6-5-6.5-14.x86_64
    9. cuda-license-6-5-6.5-14.x86_64
    10. cuda-runtime-6-5-6.5-14.x86_64
    11. cuda-misc-headers-6-5-6.5-14.x86_64
    12. cuda-samples-6-5-6.5-14.x86_64
    13. cuda-curand-6-5-6.5-14.x86_64
    14. cuda-toolkit-6-5-6.5-14.x86_64
    15. cuda-cublas-6-5-6.5-14.x86_64
    16. cuda-cusparse-dev-6-5-6.5-14.x86_64
    17. cuda-drivers-340.29-0.x86_64
    18. cuda-cudart-dev-6-5-6.5-14.x86_64
    19. cuda-npp-6-5-6.5-14.x86_64
    20. cuda-command-line-tools-6-5-6.5-14.x86_64
    21. cuda-cusparse-6-5-6.5-14.x86_64
    22. cuda-6.5-14.x86_64
    23. cuda-core-6-5-6.5-14.x86_64
    24. cuda-cublas-dev-6-5-6.5-14.x86_64
    25. cuda-driver-dev-6-5-6.5-14.x86_64
    26. cuda-6-5-6.5-14.x86_64
    • Click "Accept".


    Usefull Links

    Monday, October 6, 2014

    Moving Averages Using Google Apps Scripts


    A moving average (MA) process, also known as weighted moving average (WMA) is a type of signal filtering that consists in performing a weighted average over a finite sequence of past samples of the original signal. Implementing such scheme in a worksheet is not always straightforward due to the handling of missing values. This article proposes an implementation of such filters on Google Sheets using Google App Scripts, a Javascript like language. Exponential Moving Average (EMA) filters are an important special case of WMA, so they have been implemented on top of WMA.


    Moving Average, MA, Weighted Moving Average, WMA, Exponential Moving Average, EMA, Noise Removal, Filtering, Google Sheets, Spreadsheets, Google App Scripts, Javascript.


    In previous articles [1][2], implementations of WMA and EMA have been proposed using the normal spreadsheet function infrastructure. These implementations try to be compatible with existing spreadsheet standard functions so that they can be easily ported to other spreadsheet e.g., OpenOffice.

    The inconvenient in this solution is that the formulas are large and as a consequence, hard to read and maintain, making it easy to slip subtle errors. Also, the formula must be called once for each line to be calculated, and this process has a big overhead. The ideal solution should make a single call to a function that would return an array of processed data. The following implementation addresses both issues.



    The resulting spreadsheet shows a comparison of the previously posted methods with this scripted based one.


    Two javascript functions have been developed to implement the missing data weighted moving averages of previous articles [1][2]. The results have been shown to be identical. The javascript based method has the advantage of being much cleaner to maintain, typically requiring a single cell on the spreadsheet.