Extending LALP(Language for Aggressive Loop Pipelining)

•

Karan Bhandari, Chen-Shih Lin
Abstract— Field-Programmable Gate Arrays (FPGAs) have been widely used in System-On-Chip (SOC) design and embedded systems to accomplish verification and production. FPGAs can achieve almost the same performance as Application-Specific-Integrated Circuits (ASICs), but they are more general because they are programmable and reconfigurable. However, to master in programming hardware description languages (HDLs), such as VHDL and Verilog, is not an easy work. Software engineers found hardware domain to be elusive. LALP, a novel language for aggressive loop pipelining has emerged to program FPGAs by using high-level programming language, similar to C language. This allows even software engineers to write hardware code and convert popular algorithms to hardware synthesizable language with minimal effort. Hardware resources have advantages of parallelism and concurrence; therefore, LALP exploits these features and provides the control of execution stages. LALP is a convenient language to program FPGAs. However, the current LALP does not provide function call, and it only supports VHDL rather than Verilog. Since both VHDL and Verilog are the substantial HDLs, the focus of this project is to build a compiler that can generate Verilog. We have implemented LALP’s support for procedure and Verilog generation engine. Furthermore, our Verilog engine achieves floating point addition and subtraction which are not provided in the current implementation of LALP.
Index Terms—Floating Point Hardware Description Language, Compilers, Aggressive Loop Pipelining, Procedural Coding

1 Introduction
RECENTLY we are living among many kinds of embedded products, such as cell phones, mp3 players, and GPS navigation units. To satisfy the growing requirement for embedded systems, FPGAs can be used to design and verify these special purposed systems. Embedded systems require high performance but should be cost-effective, and their time-to-market is a very important key point. ASICs can provide high performance and also be granted the responsibility to handle particular tasks. However, ASICs lack of flexibility to reprogram in a short time, and take a year or more to design. Unlike ASICs, FPGAs are reprogrammable and reconfigurable, and they still maintain the sort of performance and can reduce costs. Furthermore, FPGAs can be used to verify your ASICs design before tape-out.
However, it is not easy for developers who are well versed in high-level programming language design to become experts in hardware design, such as VHDL and Verilog. Additionally, almost all of the computing algorithms, ADPCM, Bubble Sort, and so on- have already been implemented using high-level programming languages, such as C and C++. If we can convert these algorithms to the corresponding hardware structures, it would be very convenient.
For hardware structures, parallelization and concurrency are substantial methods to improve performance. Parallelism among operations is one of the basic components when those algorithms are translated into hardware structures. Loops are usually the critical path to consume most of the operation resources. Loop pipelining is one of the techniques to alleviate the problem. According to the [1], the authors proposed LALP, used aggressive loop pipelining, to achieve the maximum throughput. This language can describe sequential code and loop computations of algorithms through C-like syntax and grammar, and it also builds corresponding compilation framework which can generate optimized hardware structures. LALP has an advantage of higher level of abstraction than the current HDLs. Therefore, the software programmer can easily use LALP to rewrite the algorithms and its compilation framework which can then convert algorithms to HDLs, such as VHDL and fulfill performance requirements. We can think LALP is an intermediate language in a compilation flow between C and hardware.
The paper [1] has already implemented many modules converted to the corresponding VHDL codes, such as ADPCM, FDCT, and Bubble Sort. However, LALP lacks of supporting function call and floating point operations. Using function call an individual can reduce duplicate code and also let program become more readable. Therefore, in this project, we will implement function call as a new feature, and also add floating point operations to its library. Verilog is also an important hardware description language, though the current LALP only provides VHDL generate engine. To make LALP suitable for more general purpose, we developed a new compiler to convert LALP to Verilog.
The rest of the paper will address in more detail about the approaches we used to implement those challenges we mentioned above. Section 2 of this paper explains the motivations of this project. Section 3 of this paper describes the ideas and the ways we implemented to realize procedure call. Section 4 describes the Verilog engine and the procedure of floating point operations. In Section 5 we simulate several simple Verilog modules and implement these modules on Xilinx Nexys2 Spartan3 FPGA board. The conclusion is presented on Section 6.

2 Background
In our project we have addressed three sections that [1] wished to implement. They are function calling, Verilog inter-conversion and support for floating point operations. Function calling and procedure invocation will be used interchangeably in the upcoming sections. In this section the need for these three features will be elucidated.
Function calling is a critical component in the programming arsenal of higher level programming language. It allows a program to be more structural and modular, thus enhancing readability and effective reuse. Functions foster team work and effective distribution of work among different development teams dispersed in diverse geographic locations. An abstract view of the program will allow the reader to see the logical flow of operations and can reduce the learning curve. It also enables faster debugging by splitting code into manageable testable sections. Generically, function call involves changing of control and a reasonable amount of overhead due to the control or context switch. But the procedure invocation that we have implemented results in code that has the same timing goals as the original code output that erstwhile LALP generates.
LALP has the ability to generate VHDL but lacks the ability to generate Verilog. Verilog is a popular HDL that is rampantly used in consumer electronics. On the other hand VHDL is used by defense and aerospace sector. LALP was being catered to a specific sector. Now by adding Verilog support we make LALP engine to be more general purpose.
Erstwhile LALP lacked floating point operations. The creators of LALP wanted to support floating point operations in order to support scientific computations and support real world calculations that require a certain degree of precision. With the inclusion of support for floating point operations a wider range of values can be accessed and operated upon. Floating point operation support involves inclusion of substantial number hardware components and pre-computations. In the section 4 we will demonstrate a four step procedure that we have implemented in order to support floating point operations.
3 Implementation of Function CallingWe have implemented procedure calling in the LALP framework. From now on the term ‘procedure’ and ‘function’ is used interchangeably. Procedure calling will help any programmer to eliminate redundant code, thus it enhances efficiency and productivity by enabling reuse.
In order to achieve the realization of this module, we append a pre-compilation step before the LALP engine starts commences its conversion procedure. This significantly increases conversion time, but this is not a source of concern because the final converted output operates at the speed promised by LALP engine.
It is mandatory to define the function body before the main module is defined. By default all variables in the LALP framework have a global scope. This implies that all variables can be accessed from either the main module or the body of any function. Local scope could be debilitating due to parallel access of the variables and subsequent paralysis from a coherency issue. Since all the variables are global, there is no need to pass parameters into the function. A programmer can commence function declaration procedure by beginning with the keyword “function” followed by the name of the function, then with open and close parenthesis. The block is delimited by curly braces and the body is defined within it. The code snippet below depicts the use of functions in a simple bubble sort program.

function assignA()
{
maior = a;
maior = b when b > a;
menor = b;
}
function assignB()
{
menor = a when a < b;
troca = maior;
}
const DATA_WIDTH = 32;
const ITERATIONS = 32;
typedef fixed(DATA_WIDTH, 1) int;
typedef fixed(1, 0) bit;
bubble_sort_alp(in bit init, out fixed(DATA_WIDTH, 1) output, out bit done) {
{
int v[32] = {29, 81, 38, 76, 90, 10, 65, 82, 89, 23, 93, 28, 58, 15, 73, 91,30, 83, 39, 77, 91, 11, 66, 84, 92, 24, 94, 31, 58, 15, 73, 91};
int a, b, maior, menor, troca;
fixed(6, 0) i, im1, j, v_addr;}
i.clk_en = init;
counter (i=0; i<31; i++@125);
im1 = i + 1;j.load = i.step;
j.clk_en = init;
counter (j=im1; j<32; j++@4);
a = v.data_out when j.step;
b = v.data_out when j.step@1;
assignA();
assignB();
troca = menor when j.step@3;
v.address = v_addr;
v.data_in = troca when (j.step@2) | (j.step@3);
v_addr = i;
v_addr = j when j.step | (j.step@2);
output = v.data_out;
done = i.done@2;
}
Table 1 LALP source code of Simple bubble sort using function call.
This code snippet is discussed by [1]. We have partially modified it to demonstrate function calling. Two functions have been defined: “assignA” and “assignB” whose names are self explanatory. [1] discussed the syntax and purpose of the code above. Our code is cleaner and more modular due to the presence of function. There is room for improvement but we have not demonstrated it for the sake of brevity and simplicity.
4 Verilog EngineLALP has a powerful VHDL generation engine. It allows the programmer to specify clock signals and append parallelism to VHDL via a high level language. It is domain specific and lacks floating point computation power. We wanted to embark on a mission that not only demonstrates floating point operations but also augment LALP to generate Verilog with limited functionality.
Verilog operation could be accessed by invoking the GUI, loading the ALP file and generating the Verilog file which can be loaded into Xilinx ISE, synthesized, translated and flashed to any FPGA device. It is device independent since it has limited features and avoids using onboard RAM/Flash memory. It limits itself to registers.
Our Verilog engine is capable of performing the following operations: Integer addition, subtraction and multiplication for two variables/non variables. The other one is floating point addition/subtraction for two variables/non variables.
Our motive to create the Verilog engine is to demonstrate the capability of implementing and mapping floating point operations from a high level language to a digital system design in the form of Verilog– a capability that LALP engine lacked. Our outputs were tried and tested on both the ISE simulation engine as well as Digilent NEXYS2 FPGA kit.
Our output does not emulate the parallel and optimized nature of LALP output as designed by its founders. It is in its inception stages, it lacks loops and advanced language constructs. For simple operations it manages to lift the burden of the programmer to translate “C” like code to digital system design.

4.1 Syntax

The syntax of LALP for Verilog has been derived from C so that software engineers find themselves at home. By modifying it to a certain degree it is usable for LALP which can eventually be converted to hardware synthesizable Verilog code.
The structure of the alp code that will eventually be converted to Verilog code is decomposed into the following units.
1. Constant declaration
2. Module name and Input /Output parameters
3. Register Definitions
4. Computation.
Consider the following code block:
const DATA_WIDTH = 4;
const G=9.881;
Verilogmodule simple(in int toggle,in int lion,out int king) {
{
int shift_value=2;
float myfltnumber;
int june;
float bilirubinValue=-0.4E5
}
king = toggle+lion;
done=1;
}
Table 2 Simple Verilog code example
The first part of the code is the section where the constants are defined. The value ‘DATA_WIDTH’ influences the size of the vector the output Verilog unit contains. The user can scale up or scale down the value of ‘DATA_WIDTH’ depending on the size of general purpose I/O the output board or test vector supports i.e. the number of LEDs or switches/ buttons can influence its value. The other constant values can be utilized within the code instead of its numeric value. Constants can help the programmer to easily refactor the code later. Suppose if the acceleration due to gravity ‘G’ in the North pole is 9.881 but it is 9.779 around the equator region, the programmer need not change the value everywhere in the code and can alter it based on which region the output is utilized in.
The main Verilog module commencement is demarked by the keyword ‘Verilogmodule’ followed by the module name and a list of input and output parameters delimited by opening and closing braces. Each parameter is a three part tuple consisting of data direction (in or out), data type and identifier.
Once the module is defined then we enter the main block after surpassing the opening braces. The main block has two sections, register definition and computation section. In the register section we declare variables of type floating and integer. This is eventually converted to a vector of ‘DATA_WIDTH’ size if is of type integer. If the type is floating then it is dealt accordance to strategy covered in the next section. The register definition is delimited by curly braces.
Currently the Verilog engine supports two operands addition, subtraction and multiplication of integers and when floating point operands are used then addition and subtraction is supported.

4.3 Floating point Arithmetic Engine Prelude
VHDL has built in floating point types, but it could involve a rigorous procedure to synthesize it [2] due to its complexity. In this setup we deal with simple floating point numbers that can be represented by boards with limited I/O options like NEXYS2. We follow Pong’s [2] four step procedure to perform floating point arithmetic.
The parser will work if the user uses the format:

(sign)0.(Fraction)E(Exponent) (1)

(-1) (sign)0.(Fraction) * 10 (exponent) (2)

Internally it is converted to following format
In (2) the sign could either be 0 or 1 depending on whether is is positive or negative. +0.4E3 and -0.7E2 are examples of (1).
So a conventional floating point declaration is split into its sign, fractional and exponent part within the Verilog file (refer to 5.3).
Since we are dealing with limited I/O, memory and constraints, we experience loss of digits during normalization or alignment.
All exceptions emanating from incorrect syntax and semantics are displayed in the second textbox of the GUI. At any point of time during the parsing stage if an exception emanates from user end or verilog engine’s end then that is revealed in the second textbox of the GUI.

5 Experimental Results
This section presents the results of simulation and the way we target on Xilinx Nexy2 Spartan3 FPGA board. Using the Verilog compiler we built in the previous section, we generated several simple integer calculations, and floating point operations.

FOR MORE INFORMATION PLEASE VISIT http://bit.ly/OIlU4l