SPEAKING OF PORTING SOFTWARE

THIS SOFTWARE-PORTING HANDS-ON EXPERIMENT UNCOVERS A POTENTIAL AUDIO DECODER FOR EMBEDDED SYSTEM APPLICATIONS, ADDING AUDIO OR SPEECH TO THE APPLICATIONS’ USER INTERFACES.

BY ROBERT CRAVOTTA • TECHNICAL EDITOR -- EDN Europe, 01 Sep 2009

The goal of this hands-on project was to port a commonset of software across a variety of processor architectures:two ARM Cortex-M3 ports using Atmel and TexasInstruments processors, a port to a Microchip PIC32device, and a port to an Atmel AVR32 processor. Eachport encountered some problems, but they all successfullycompleted the goal of a working port of the targetsoftware. During the project, I realized that the software for the portingeffort, Vorbis Tremor, might also make a good candidate that embedded-system developers might want to consider when exploringhow to add audio to their embedded-system design.

AT A GLANCE
  • Contemporary compilers arecompetent at creating executablesof open-source software for embedded-processor targets.
  • Compilers cannot automate theporting of real-world interfaces andmanaging dynamic-memory structures.This remains a vendor’s taskor a hands-on job.
  • Embedded development toolsmust support a complex ecosystemof host and target architectures; thisarrangement provides many opportunitiesfor unexpected behaviors tomanifest themselves in the tools.
  • The Tremor audio decoder isa candidate worth considering forembedded-system designers whoare exploring whether to add richaudio data to their interfaces.

I adopted two lessons I had learnedfrom a previous project about acceleratingsoftware with hardware that spansmultiple vendors (Reference 1). The firstis to use a common set of software in eachport so that everyone can benefit fromidentifying the differences and similaritiesin each effort. The second lesson Ihad learned was to ensure that each vendorprovided engineering support so thatits development kit could be part of theproject. This requirement serves multiplepurposes, most notably not overwhelmingme with more than 70 development kits.

For these projects, it is imperative to choose a scale of work that is neither too trivial nor too ambitious. An original candidate for the software to port was benchmark code from the EEMBC (Embedded Microprocessor Benchmark Consortium) because many companies use the benchmark software on their processors, meaning that more companies would be able to participate in the project. However, the focus of the project was on the porting effort rather than on the optimization of the code and processor architecture, and these benchmarks focus on the processing performance of the core. For these reasons, I eventually abandoned the idea of using benchmark code in favor of using an audio codec because it combines the requirements for real-time performance with a realworld interface for storage, retrieval, and playback of the audio. I also needed no expensive equipment to tell whether the code was running in real time because I had access to sensitive and free signal processors— my ears— to tell whether the audio was playing too fast or too slow or whether the processor was missing data.

While exploring the idea of using an audio cod e c — i n c l u d - ing MP3—for the porting target, I discovered the open-source Ogg Vorbis audio-compression format and its Tremor library, a fixed-point implementation of the Vorbis decoder. Using a fixed-point decoder would allow more processor architectures to participate in this project.

Borrowing from another lesson I had learned from my earlier life as an engineer, I framed the project description to avoid as much bias as possible in how each team approached the porting effort (Reference 2). This type of uncertainty when specifying a project often yields unexpected benefits, and this project was no exception. The project required the porting effort only to provide a mechanism to store and access audio files and be able to play them on some output. Each vendor could choose any processor and any development tools it wished to complete the porting effort. Each team made different choices in solving the porting challenge, and one of those design decisions highlighted why the Tremor player might be a good candidate for embeddedsystem developers to consider.

FOUR PORTS

For this project, each engineer was free to select development boards and tools. The porting effort took an average of a day and a half from choosing the target, understanding the opensource software, and making the necessary changes to the software to complete the port. After that, I duplicated the porting effort over the phone with each team. This approach saved me a lot of time, and it gave me access to the thought process of each person. It also meant that each project was unique rather than a refinement of my own effort with each development kit. In each porting effort, some things worked smoothly, but there was always something that did not proceed the way we would have liked. I will share the problems but without specifying with which team they occurred.

Atmel participated in two ports, an AVR32 and an ARM Cortex-M3 (Figure 1). A different team member performed each port, and each took a different approach. The AVR32 port used the ATEVK1105 evaluation kit (Figure 2). Atmel released this new board this year at the ESC (Embedded Systems Conference) in San Jose, California. We used the AVR32 Studio development-tool set. The audio output went through an adapted DAC for wave playback via software from a previous project that used this peripheral. We performed the port in two stages. The first stage linked the .ogg audio file into the executable file. The second stage accessed the .ogg audio file using code from a FAT (file-allocation-table)-library example through a data-flash device. This two-stage approach helps isolate delay sources.

The audio codec uses dynamic allocation, which can be a significant source of delay if you are not careful about externalmemory accesses and garbage-collection events in the heap. In the case of the AVR device, the multiply function proved to be an area for optimization, in part because it handles big- and little-endian representation and it does not take advantage of the extra hardware resources available on the AVR processor to improve multiplication performance.

In addition to isolating sources of delay, this two-stage approach made me realize that I didn’t necessarily need to store the audio stream on an external storage device because an embedded system often does not allow the user to access the data, and rarely will it even change the audio stream during the application’s life.

The Atmel ARM Cortex-M3 port used the new SAM3U-EK evaluation kit. This board is so new that a complete set of driver code and samples was unavailable for all of the peripherals, including a DAC driver. The Cortex-M3 is a new generation of the ARM architecture and does not directly benefit from legacy code from earlier architectures, such as the ARM7. However, the support library for this architecture will grow, especially as more M3 devices from a growing list of vendors become available. In a sense, this project was an early adopter of this board and processor (Reference 3). The project used the Yagarto (yet-another-GNU-ARM tool chain). The engineer considered other tool chains, such as IAR, but, due to time constraints with a learning curve involving the allocation library, used Yagarto. The engineer stored and retrieved the audio file from the SD (secure-digital) card using sample open-source code to manage the file system. Onboard NAND flash could also have stored the audio files.

The Atmel M3 port uses a ring-buffer implementation to feed the DAC/DMA engine transfer; this approach differs from the ping-pong buffers the other ports used. The ring buffer allows the buffer tuning to adjust not only the buffer size but also the number of buffers to optimize performance by tracking how many of the buffers were full over time. For example, with a 22-kbps sample, a 2-kbyte buffer resulted in 60 to 70% full buffers, whereas a 4-kbyte buffer resulted in less than half the buffers being full. The porting effort progressed starting from 8-kbps samples. This approach exposed efficiency issues in the allocation of memory. The CPU’s usage with the 8 kbps was 10%, but usage shot up to 60% at 22 kbps. The higher bit rates caused more access to external memory, which introduced significant delays. Possible optimizations include changing the dynamic-memory allocation code as well as some manual managing of the heap to straddle external and internal memory.

The Microchip PIC32 port employed the Explorer 16 development board (Figure 3) with a customized board and the MPlab Real ICE (in-circuit emulator). We used the MPlab Academic version for the software-development tool chain. The PIC32 device is pin-compatible with earlier 16-bit devices and uses the same peripheral blocks as the PIC24. The porting used legacy code by recompiling the PIC24 code. We stored and retrieved the audio files with an SD card and played through the PWM (pulse-width modulator) using a ping-pong-buffer implementation.

The Texas Instruments ARM Cortex- M3 port used the DK-LM3S9B96 development kit with the Keil µVision3 software- development tool chain (Figure 4). This effort required rewriting the allocation routines to avoid dynamic allocation during playback; this task included explicitly defining a stack and a heap space. We stored and retrieved the audio files with the SD card using sample code. The audio output used I2S (inter-IC-sound) demonstration code, which included volume control and a touchscreen scroll interface. Other optimization options addressed the multiplication macros.

PROBLEMS

Each porting effort ran into problems. Some of these were early-adopter problems, such as those that occur when you are using newly released resources. For example, one of the boards featured an earlier version of the firmware that had a problem that the manufacturer fixed in a later version of the firmware. From this situation, I learned that there should be a straightforward way to update the firmware or version information on the box to avoid sending out a board with a problem that the manufacturer has already fixed.

To make this project more interesting, I used a 64-bit Vista desktop. None of the original port efforts used this machine, so, although this approach did not stop the projects, it did cause some stalls. In one case, we learned that we had to explicitly install the software-development tools as administrators by right-clicking the setup. exe file and specifying “run” as administrators. In two projects, we had trouble with getting my desktop to properly recognize the board through the USB (Universal Serial Bus). In one case, it required finding the 64-bit version of the .inf file in a different directory from that in the 32-bit version. In another case, it required adding the missing 64-bit information to the 32-bit version. Apparently, 64-bit Vista has not been a big issue, but I expect that more developers will in the near future be using 64-bit Vista hosts. These types of problems help to illustrate the challenges facing development-tool support teams as they work to support not only several host operating systems, but also different versions of these operating systems.

In one porting effort, the developmenttool installation DVD was a blank disk. Fortunately, quickly accessing all files online and downloading them eliminated that problem, without which the experience with these kits would have been excellent. In another case, the manufacturer had to separately ship a power cord because not all kits included a power cord. The reason for this omission was to help keep the cost of the kits down and to avoid filling your drawers with too many redundant power cords.

A minor problem occurred when I tried to plug in a serial port between my desktop and the development board. Imagine my surprise when I realized that my computer had a dozen USB ports but no serial port. Two other computers that I recently purchased also had no serial port. You might need a serial port, but do not assume that manufacturers still include them.

Despite all of these problems, the bringup on the boards was usually smooth and straightforward. We would set up and power up the board and then verify that the preloaded software was operating properly. After that, we would select some code, compile it in the tool set, load it onto the board, and then verify that it was operating properly. Starting from a known condition and adding one more step into the tool-chain flow in this way helped us identify where a problem might originate and how to address it. Likewise, with the porting effort, adding peripheral ports one at a time or increasing the bit rate in steps helped isolate where logic and performance problems were originating.

NO PERFECT SOLUTION

In this porting, we had to rewrite the input and output of the audio data. The Vorbis implementation uses stdio for handling the input and output. However, the authors of the software recognize that this mechanism is not appropriate for embedded- system applications, so the code includes a callback structure, ov callbacks, that allows a developer to provide custom functions for these important I/O functions, including decoding a Vorbis stream from a memory buffer.

A big reason for doing this experiment was to demonstrate that there are no perfect solutions for porting code—especially embedded code. None of these ports used an operating system on the target system. As a result, accessing the peripherals required an explicit effort by the developer. This effort might include pulling code from a library or from sample code or, in an early-adopter phase, writing the code yourself.

Additionally, the software may exhibit issues depending on how the memory architecture has changed. Unfortunately, compilers are weak in this area and do not provide as much automation or assistance as developers could use. However, Tremor comes in three main versions. A generalpurpose implementation targets processors with access to large off-chip memory, a low-memory version trades memory space for more instructions during execution, and a third version contains low memory code for processors without byte addressing.

The Ogg Vorbis specification is in the public domain and is free for commercial or noncommercial use, making the format an interesting candidate for embedded- system applications. Developers can independently write Ogg Vorbis software that is compatible with the specification for no charge and without restrictions of any kind. As embedded-system applications expand and the richness of the user interface expands beyond blinking lights and simple buzzers, such as those on coffee pots and washing machines, developers may want to consider the Tremor implementation. Embedded systems need not support file sharing that a rich usermultimedia environment might have to do, and they can take advantage of the static nature of the audio messages that they might include to provide a new and cost-effective differentiating feature. As for choosing a processor and the development- support tools, this exercise demonstrated that vendors that are serious about supporting this capability may want to perform an optimization of the Tremor code and offer it as a reference-design implementation.

For more information
ARM:
www.arm.com
Atmel:
www.atmel.com
EEMBC :
www.eembc.org
IAR :
www.iar.com
Microchip:
www.microchip.com
Texas Instruments
Aerospace
:
www.ti.com
Xiph.Org
Foundation
:
www.xiph.org
   
REFERENCES
  1. Cravotta, Robert, “Accelerate your performance,” EDN, Nov 11, 2004, pg 50, www.edn.com/article/CA476908.
  2. Cravotta, Robert, “Valuing uncertainty,” EDN, Jan 5, 2006, pg 38, www.edn.com/ article/CA6294179.
  3. Cravotta, Robert, “Welcome to the jungle,” EDN, Oct 30, 2003, pg 39, www. edn.com/article/CA330073.
MORE AT EDN.COM
For related blog posts about embedded processing, go to www.edn.com/ blog/1890000189.html.


 

Our Sponsors



Ads by Google