-particular. Since the inner loop is executed the most, it is the most
-efficient to optimize the inner loop. Also, the inner loop is also the
-piece of code that can most optimally use the parellel processing power
-of the Montium, because it can be software pipelined.
-
-This means that the compiler will emit code that performs operations
-that belong into different iterations of the original loop in the same
-cycle. Since data dependencies within a loop body usually severely limit
-the amount of operations that can be done in parallel, pipelining allows
-the second (and more) iteration to start well before the first iteration
-is done. This is done by dividing the loop body in a number of stages,
-that would normally be executed sequentially. These stages are then
-executed in parallel, but for different iterations (ie, run stage 2 of
-iteration i, while running stage 1 of iteration i+1).
+particular. Since the inner loop is executed the most often, it is the most
+efficient to optimize the inner loop. Also, the inner loop is also the piece of
+code that can most optimally use the parellel processing power of the Montium,
+because it can be software pipelined.
+
+Software pipelining means that the compiler will emit code that performs
+operations that belong in different iterations of the original loop during the
+same cycle. Since data dependencies within a loop body usually severely limit
+the amount of operations that can be done in parallel, pipelining allows the
+second (and further) iteration to start well before the first iteration is done.
+This is done by dividing the loop body in a number of stages, that would
+normally be executed sequentially. These stages are then executed in parallel,
+but for different iterations (ie, run stage 2 of iteration i, while running
+stage 1 of iteration i+1).