From: Matthijs Kooijman Date: Thu, 24 Jul 2008 16:46:27 +0000 (+0200) Subject: Improve Montium description. X-Git-Tag: Report-final~32 X-Git-Url: https://git.stderr.nl/gitweb?a=commitdiff_plain;h=82c46df3768c0cb22a00d866014dde30f172d653;p=matthijs%2Fprojects%2Finternship.git Improve Montium description. --- diff --git a/Report/Main/Context/Montium.tex b/Report/Main/Context/Montium.tex index 7fc8145..3556d5f 100644 --- a/Report/Main/Context/Montium.tex +++ b/Report/Main/Context/Montium.tex @@ -1,8 +1,8 @@ \section{Montium Tile Processor} The Montium Tile Processor (Montium) is the main product of Recore Systems. It is a reconfigurable processor that is aimed for inclusion in a tiled, -heterogenous multi-core system on chip (SoC), connected to other tiles and the -outside world throug a network on chip (NoC). +heterogenous multi- or manycore system on chip (SoC), connected to other tiles +and the outside world through a network on chip (NoC). The Montium has a number of fundamental differences with "regular" processors and DSP engines, that make it both interesting and challenging to program for @@ -18,16 +18,17 @@ The Montium is built from a few parts. The central part is the interconnect, which ties memories, Arithmetic and Logic Units (ALU) and the Communication and Configuration Unit (CCU) together. The memories store data locally, the ALU's process data and the CCU moves data and configuration on and off the -Montium. Furthermore, the sequencer is the closest thing to a normal processor -in the Montium: It accepts and executes instructions one by one, is capable of -performing (conditional) jumps and some other limited control flow. +Montium. Furthermore, there is a sequencer, which is the closest thing to a +normal processor in the Montium: It accepts and executes instructions one by +one, is capable of performing (conditional) jumps and some other limited control +flow. \subsubsection{Sequencer} The Sequencer executes its instructions one by one and controls all other elements through the configuration registers (CR). To keep the size of sequencer instructions limited, while not limiting the flexibility of the other elements, -a level of configuration registers is introduced. These registers are wide and -contain multiple sets of input signals to the various multiplexers, function +two levels of configuration registers is introduced. These registers are wide +and contain multiple sets of input signals to the various multiplexers, function units, etc. The sequencer instructions in turn contain indices into these configuration @@ -36,14 +37,23 @@ the entire Montium for the cycle during which the instruction is executed. This also means that the Montium is reconfigured on every cycle, for maximum flexibility and performance. +Using a two-level configuration register scheme ensures that when a (part of) a +particular configuration is reused in more then one sequencer instruction, it +does not have to be duplicated entirely. Only the index pointing to the right +configuration register (which is a lot smaller) is duplicated in multiple +sequencer instructions. This does of course limit the amount of different +configurations that a single program can use and thus limit the size of a +Montium program. + \subsubsection{Memories} The Montium contains ten memories (two for each ALU). Each of these memories has its own Address Generation Unit (AGU), which can generate different memory -patterns. This means that the instructions or CR's never contain direct memory +patterns. This means that the instructions or CRs never contain direct memory addresses, only modifications to the current address. Each memory simply reads from its current address and offers the value read to the interconnect (which can then further distribute it to wherever it is needed). Writing works in the -same way (though a memory can only read or write in the same cycle). +same way (though a memory can only read or written in the same cycle TODO: Is +this true?). \subsubsection{ALU's} The main processing elements of the Montium are its 5 ALU's. Each of them has @@ -51,22 +61,24 @@ four (16 bit) inputs, each with a number of input registers. Each ALU contains a number of function units, a multiplier, a few adders and some miscelaneous logic. Each of the elements in the ALU can be controlled seperately and data can be routed in different ways through configuration of multiplexers inside the -ALU. The ALU has two output ports, without registers. +ALU. The ALU has two output ports, without registers. Additionally, there is a +connection from each ALU to its neighbour. The ALU also has no internal registers, so data travels through the entire ALU in a single cycle, to arrive at the outputs before the end of the cycle. This means that the ALU can perform a lot of computation in a single clock cycle. For example, using four of the five ALU's, an FFT butterfly operation (two complex -multiplications and four complex additions) can be exected in a single clock -cycle. +multiplications and four complex additions TODO: Right?) can be exected in a +single clock cycle. The downside of this approach is that the data will have a +long path to travel over, which limits the clock speed of the design. \subsubsection{CCU} The CCU controls communication with the external world, usually a network-on-chip. During normal operations, the CCU can take values from the -interconnect and stream them out onto one of the lanes of the NoC, or vice -versa. Additionally, the CCU can be used from external to the Montium to start -and stop execution and move configuration registers, sequencer instructions and -memory contents into and out of the Montium. +interconnect and stream them out onto the NoC, or vice versa. Additionally, the +CCU can be used from external to the Montium to start and stop execution and +move configuration registers, sequencer instructions and memory contents into +and out of the Montium. \subsubsection{Interconnect} The central part of the Montium is the interconnect, which is a mostly connected @@ -80,18 +92,18 @@ outputs of that ALU, without requiring a global bus. \subsection{Design changes} Currently, the Montium design is experiencing a major overhaul. During work with the original design, a number of flaws or suboptimal constructs have been found. -In particular, the ALUs are capable of performing a large number of -operations in a single cycle, but since they operate sequentially, this severly -limits clock speeds. In the new design, the number of ALUs is reduced, but each -ALU is subdivided in multiple parallel-operating function units. +In particular, the ALUs are capable of performing a large number of operations +in a single cycle, but since they operate sequentially, this severly limits +clock speeds. In the new design, the number of ALUs is reduced, but each ALU is +subdivided in multiple parallel-operating function units. This approach requires computations to be properly pipelined to be efficiently -use all those function units in parallel, but since data only travels through a -single function unit in each cycle, this allows for much higher clock speeds -than the old design. +use all those function units in parallel, but since data only travels through +only a single function unit in each cycle, this allows for much higher clock +speeds than the old design. During my internship I have mainly been working with the old Montium design, and unless otherwise stated, that is what is meant when referring to the "Montium". Some of the work has been done with the new design in mind, but only during the final weeks of my internship I have been involved with the new design enough to -see most of the picture. +see most of the picture. See section \ref{Pipelining} for more details.