Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
05/16/09 18:04
Read: times


 
#165421 - Pipeline for concurrency
Responding to: ???'s previous message
Richard said:
Because interrupt response times are important, and pipelining increases interrupt response time, MCU makers have avoided pipelining as much as they could. This has not been true of CPU makers.

You continue to state this as a fact, all the time failing to notice the real world. Interrupt latency is not clock cycles until noticing an interrupt. The interrupt latency that matters is ns from interrupt source until the critical part of the interrupt has been serviced.

Part of this problem, is because you think a pipeline is something that makes one instruction take more time. It makes one instruction consume more "machine cycles".

But each machine cycle is simpler, allowing the same process geometry to run each machine cycle at a faster pace. So the total time of the instruction will not grow proportional to the number of steps in the pipeline.

The second thing, is that the pipeline is concurrent. Most of the steps of the pipeline does something on every clock cycle. At the simplest (looking back 25-30 years), you just have staggered execution with overlapping:
xxxx
 xxxx
  xxxx

This allows the processor to manage more instructions / second, than it would manage if each instruction started after the previous instruction ended.

But the next important thing is that some operations can't be broken down into a single-cycle operation. You may not be able to do a single-cycle multiply. But you may be able to run two staggered multiply - either having two fully independent multiply modules where you alternate between them, or having a pipelined multiplication unit. The only time this fails, is when the second multiply requires the output of the first multiply as an input parameter. Good coders learns to interleave critical instructions - good compilers are truly excellent at this.

A pipeline is the basic platform for creating concurrency between different instructions.

In the end, you should not spend so much time focusing on the sequential steps of a single instruction, because it is so obvious that you are getting lost there. The pipeline isn't there intended to slow down a single instruction by adding arbitrary sequences. It is there to get concurrent execution of many instructions, by noticing that a single instruction needs different features of the processor at different times, leaving the other modules of the processor empty, and possibly to use for a different instruction.

What happens if I do get a dictionary, or in this case wikipedia, and take a closer look at pipelines?

http://en.wikipedia.org/wiki/Pipeline_(computing)
"Instruction pipelines, such as the classic RISC pipeline, which are used in processors to allow overlapping execution of multiple instructions with the same circuitry."

"Graphics pipelines, found in most graphics cards, which consist of multiple arithmetic units, or complete CPUs, that implement the various stages of common rendering operations (perspective projection, window clipping, color and light calculation, rendering, etc.)."

"If engine installation takes 20 minutes, hood installation takes 5 minutes, and wheel installation takes 10 minutes, then finishing all three cars when only one car can be operated at once would take 105 minutes. On the other hand, using the assembly line, the total time to complete all three is 75 minutes. At this point, additional cars will come off the assembly line at 20 minute increments."

"As the assembly line example shows, pipelining doesn't decrease the time for a single datum to be processed; it only increases the throughput of the system when processing a stream of data."

Looking for parallel computing, I will instead get:
http://en.wikipedia.org/wiki/Parallel_computing
"A computer program is, in essence, a stream of instructions executed by a processor. These instructions can be re-ordered and combined into groups which are then executed in parallel without changing the result of the program. This is known as instruction-level parallelism. Advances in instruction-level parallelism dominated computer architecture from the mid-1980s until the mid-1990s.[19]

Modern processors have multi-stage instruction pipelines. Each stage in the pipeline corresponds to a different action the processor performs on that instruction in that stage; a processor with an N-stage pipeline can have up to N different instructions at different stages of completion. The canonical example of a pipelined processor is a RISC processor, with five stages: instruction fetch, decode, execute, memory access, and write back. The Pentium 4 processor had a 35-stage pipeline.[20]"

Richard said:
External memory can be quite fast, too, but commercially available FLASH tends to "top out" at abot 55 ns access time.

A 100MHz 1-clocker would need an instruction fetch every 10ns. With stop-and-go execution, it would stall on every branch that didn't just skip a few bytes forward/backwards. With a pipeline, one of the pipeline steps could analyze the future instructions and pre-start flash reads from the branch destination. The only time this would fail is for computed jumps, where the previous instruction computes one of the offsets of the branch.

Richard said:
Because interrupt response times are important, and pipelining increases interrupt response time, MCU makers have avoided pipelining as much as they could.

See previous discussions about he difference between computing clock cycles until first ISR instruction being executed, and total time until critical part of ISR having been serviced. Do not ever assume same clock speed with and without pipelines.

On the other hand, Intel had to finally drop their P4 net-burst architecture because they had focused too much on clock frequency (as a selling argument), while adding ridiculously long pipelines. The pipeline was many, many times too long to perform speculative computations on both branch alternatives, and each failred branch predicution resulted in a major stall. The P4 did very good is many streamed multimedia benchmarks, but the gamers where well aware about the better performance in games with the AMD chips. Thinking that the pipeline length is only important for interrupt responses in microcontrollers is to miss out on one of the important design factors. Very long pipelines are only ok when running large, predictable loops. Real microcontroller code with random branches depending on external stimuli requires limited pipeline lenghts. But you can't just go all the way and remove the pipeline without failing in the other direction - getting a dirt-slow processor. It was ok that the original 8051 was dirt-slow. It was designed based on what was reasonable at the time of release.

Richard said:
Please be sure to make a distinction, in your own mind, if nowhere else, between a parallel execution unit and a sequential one.

The distinction: The 8051 12-clocker was a sequential one. That is what happens when you do not use a pipeline - no parallel execution, since you do not fetch new instructions to process until you are done with the previous. Right now, you are focusing on a single tree, instead of noticing the woods. A pipeline is a sequential machine for a single instruction, but it is a parallel machine gobling down multiple instructions concurrently. Either just staggered, but just as well fully super-scalar, issuing more than one instruction every clock cycle.

The only way you can start multiple instructions concurrently without a pipeline is if you use a Very Long Instruction Word (VLIW) processor. But that is not a general processor - the program must be written a compiled for VLIW execution. A pipeline on the other hand allows a normal sequential program to be transformed for concurrent processing.

Richard said:
If you want to invent a new core, well, you're welcome to do that. I'd recommend adding a few BYTE operations to the big, wide-word core you're likely to produce.

I, on the other hand, will probably stick with the 8-bitters.

You still think this is a question of big or small. It isn't. It is a question of slow or fast. It is a question of how to fight latencies. How to overcome gate propagation delays. How to increase the total speed by making a larger percentage of the processor be used at every specific time.

Whenever a new 8-bit family is released, that processor will be pipelined. Not because a pipeline is a market word the manufacturer needs. But because the alternative is a processor that will have no chance. The AVR chips are pipelined. The PIC chips are pipelined. I'm pretty sure the faster 8051 chips are pipelined. But since the pipeline did bring a higher clock speed, it did not automagically bring any lousy interrupt response times.

Many times, a design with a reasonably short pipeline and no pipeline stalls, the developer will not even notice that the processor is pipelined. They will just see a well-working processor.

List of 74 messages in thread
TopicAuthorDate
max clk freq            01/01/70 00:00      
   Which            01/01/70 00:00      
   300MHz            01/01/70 00:00      
      .            01/01/70 00:00      
         Does that make it effectively 600MHz, then...?            01/01/70 00:00      
            That are the links I found...            01/01/70 00:00      
               Interesting item, but did you notice ... ?            01/01/70 00:00      
                  300Mips, equivalent to 3.6GHz!            01/01/70 00:00      
                     That's slightly misleading ...            01/01/70 00:00      
                        You sure about your math?            01/01/70 00:00      
                           It's confusing ... typical marketing drivel            01/01/70 00:00      
                              Based on the claims you posted            01/01/70 00:00      
                                 Those aren't my claims!            01/01/70 00:00      
                                    Read comments _before_ (not) answering them            01/01/70 00:00      
                                       Architecture speed            01/01/70 00:00      
                                          That was my take too            01/01/70 00:00      
                                             Of course, it does not depend on CLK frequency!            01/01/70 00:00      
                              I cannot see a confusion            01/01/70 00:00      
                                 Not all one-clocker mfg's make the same claims            01/01/70 00:00      
                                    But...            01/01/70 00:00      
                                 comparison of 12- and less-clockers            01/01/70 00:00      
                                    Very nice!            01/01/70 00:00      
                                    Cool!            01/01/70 00:00      
                                    Good overview            01/01/70 00:00      
               Another link            01/01/70 00:00      
                  Dhrystone?            01/01/70 00:00      
                     Yes ... one could argue that the core is hobbled            01/01/70 00:00      
                        to sell IS useful... ;-)            01/01/70 00:00      
                     Dhrystone            01/01/70 00:00      
                        give data            01/01/70 00:00      
                  I find it useful...            01/01/70 00:00      
                     Nonsense            01/01/70 00:00      
                        Nice attitude...            01/01/70 00:00      
                        One thing that would be useful for FPGA            01/01/70 00:00      
                           Still waiting            01/01/70 00:00      
                              Here it is ... It's simple arithmetic            01/01/70 00:00      
                                 Not at all!            01/01/70 00:00      
                                 You missed the "at the same frequency" part            01/01/70 00:00      
                                    You're right, in a sense ...            01/01/70 00:00      
                                       Still thinking of the DT8051 as 12-clocker            01/01/70 00:00      
                                          Gee ... I can see where I went off the track!            01/01/70 00:00      
                                             You deserve respect for that...            01/01/70 00:00      
                                             Very easy to miss things            01/01/70 00:00      
                                                It is a shame the documentation is so superficial            01/01/70 00:00      
                                             Marketing demagogy            01/01/70 00:00      
                                                baloney            01/01/70 00:00      
                                                   Insignificant?            01/01/70 00:00      
                                                      the "classical" timing            01/01/70 00:00      
                                                         Fair claim            01/01/70 00:00      
                                                Not so fast, there, Pilgrim...            01/01/70 00:00      
                                                   Any alternative?            01/01/70 00:00      
                                                      Possibly ... ???            01/01/70 00:00      
                                                         Still pipelining            01/01/70 00:00      
                                                            It doesn't have to pipeline            01/01/70 00:00      
                                                               What use?            01/01/70 00:00      
                                                                  if critical, lock - if you can            01/01/70 00:00      
                                                                     What question?            01/01/70 00:00      
                                                                        Whatever happened to Amit Mittal ?            01/01/70 00:00      
                                                                           maximum speed of a car            01/01/70 00:00      
                                                                           Pigeon Poster?            01/01/70 00:00      
                                                                        no question, uncernity            01/01/70 00:00      
                                                                  It's not that difficult ...            01/01/70 00:00      
                                                                     Are we talking about the same thing?            01/01/70 00:00      
                                                                        It is a matter of how you choose to view things            01/01/70 00:00      
                                                                           Q still open: any 8051 with only two clock transitions?            01/01/70 00:00      
                                                                              I do not believe bigger is better ...            01/01/70 00:00      
                                                                                 You argue quite much for not caring            01/01/70 00:00      
                                                                                    Without going into too much detail ...            01/01/70 00:00      
                                                                                       Pipeline for concurrency            01/01/70 00:00      
                                                                                          One step at a time            01/01/70 00:00      
                                                                                             Many steps at the same time            01/01/70 00:00      
                                                                              1-clocker without pipelining            01/01/70 00:00      
                                                                                 Interesting link - I just wish it was a bit meatier            01/01/70 00:00      
   what the datasheet for the particular device states            01/01/70 00:00      

Back to Subject List