Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
05/17/09 10:35
Read: times


 
#165438 - Many steps at the same time
Responding to: ???'s previous message
Richard said:
However, with a pipeline, there's a large penalty afterward, in that, once the pipeline is flushed, the pipeline is loaded with the interrupt code, then entered, and completed, the pipeline has to be restored, else the prior process can't continue.

How do you view the interrupt response time of the original 8051? It has a very long state machine, but no pipeline.

Why don't you ever consider the change in clock speed introduced together with the pipeline? Is it hard to see that a higher clock frequency reduces the cost of the pipeline in absolute terms?

Time to first instruction is only one factor of interrupt response times.
Is it hard to consider the other factor involved - how long until the hardware has been serviced? Ah yes - you still only live in the world where you compare with or without a pipeline at the same clock frequency, and without the non-pipelined processor needing extra clocks. You still think that as soon as both processors have reached that first instruction, they will continue to tick instructions at identical pace...

A more traditional view on interrupt response times is the time until the interrupt source has been serviced - not until you are ready to start servicing. How soon did you pick up the received byte from the UART, before it gets overwritten? How soon did you fill the outgoing pipe with the next data to send? How soon did you turn off the heater after an over-temp signal?

A processor with a short pipeline where fetch/decode is overlapping the last instriction coud run at twice the clock speed. It would need an extra clock cycle (of half-the-length) until reaching the first instruction of the ISR. It may then continue through the ISR at twice the speed.

Richard said:
[...] the pipeline has to be restored [...]

I don't think I have seen any processor bothering to restore any pipeline. I would be interested if you have any examples. The closest I can remember seeing is save/restore of pipelines while single-stepping code. We are not talking about any 35-step pipelines in a tiny 8051 chip. We may possibly talk about 2-5 steps, and a return from an ISR is no different from the return from a function call or a branch.

When you step up the clock frequency, it is more important that you have a cache mechanism that can pre-store the start of the ISR in RAM (as Erik mentioned), since your 100MHz 8051 will not have 10ns random access time on the flash.

With a 5-stage pipe, and a preloaded start of the ISR, a pipelined 8051 could start the work on the ISR after 10-20ns, get the first instruction done after 60-70ns and then end one instruction every new 10ns. With additional code bandwidth, most two or three-byte instructions could also be managed in that same 10ns/instruction speed. Possible exceptions would still be multiply/divide instructions or some specific instruction pairs where the pipeline can't swallow the data dependancy between the instructions.

A 2-stage pipe would get the first instruction done after 30-40ns.

Skipping the pipe without changing process technology would most probably require the 100MHz dropping down to 50Mz, so you would get maybe 20ns-40ns until pick-up of first instruction, 40-60ns until the first instruction is through and then one new instruction being finished every 20ns.

Would you care to explain your view of the interrupt response time improvements you would get without a pipeline, when taking possible clock frequencies into account?

Richard said:
No ... actually, it makes EVERYTHING in the machine take more machine cycles.

The original 8051 has 6 synchronous states for each byte of data. When you have a pipeline, you don't run one instruction at a time, so the processor produces many machine cycles in parallel.

Next think is that a complex operation requires a slow machine cycle. A simple operation allows a fast machine cycle. Trying to squeeze everything into a single machine cycle (possible with asynchronous or ripple logic, you must run with a low enough clock frequency. If all operations are simple, then you can scale the clock frequency. If you then let other instructions sneak in at the currently unused parts of the processor, you suddenly get a very big speed boost in total performance.

Go home, and read a good book about Henry Ford.

Richard said:
That's why they pipeline ... but it implicitly means more steps per instruction, and it's also why pipelining is seldom of use with "one-clockers" unless they also use a cache, which also introduces the same hassles as pipelining, among others.

A cache and a pipeline are two separate things. A one-clocker with-or-without a pipeline would stall without a cache - unless you have the ultimate cache, where you copy everything to RAM, or you reduce the clock speed until the flash latencies can keep up.

Without a pipeline, you have to ripple through the processor. But you don't have an infinite ripple speed. You either ripple the data through without any synchronization, or you ripple with asynchronous handshake signals. But the processor need to fetch, decode, retrieve data, perform operation and store data. In the original 8051, each machine cycle had six internal, synchronous, states.

A processor that can only ripple one instruction through the innards at a time will not be as fast as a processor that can have two, three or four instructions in a staggered sequence. The staggered sequence does not magically slow down the processor - it still have to fetch, decode, load, perform, store. There are even asynchronous pipelines, for situations where you just don't want to multiply the clock for the internal steps. But the maximum time for an instruction is still a function of the gate delay times in the chip.

It is quite likely that a closer investigation would show some (most?) 1-clockers to have a two-three or four-stage pipeline, where you do fetch/decode concurrently with the last clock cycle of the previous instruction, followed by load/operate/store in the two clock transitions of the following clock cycle.

Richard said:
Clock cycle: the time between corresponding edges of the external clock.

Machine cycle: the time between rising corresponding edges of the system clock.

Instruction cycle: The amount of time taken to execute, completely, the shortest of a processor's instructions.

Lovely. One of your definitions makes use of the term "system clock", without defining it.

And your description does not seem to be fully compatible with the 8051 chips that needs multiple external clock cycles for one machine cycle.

From http://www.nxp.com/acrobat_d...ARCH_1.pdf
"A machine cycle consists of a sequence of 6 states, numbered
through S6. Each state time lasts for two oscillator periods. Thus
machine cycle takes 12 oscillator periods or 1ms if the oscillator
frequency is 12MHz.
Each state is divided into a Phase 1 half and a Phase 2 half."

Richard said:
Ideally, in FPGA, one instruction cycle = one machine cycle = one propagation cycle, memory => ALU => memory, in the context of the architecture I mentioned before, wherein all ROM, RAM, SFR-space, and external memory are part of global memory. Clearly, that path is the rate-determining step. It requires no pipelining, and would benefit little from it.

An ALU that could do 100 million ram -> ALU -> ram/s could possibly do 200-300 million operations using the same process if you have a pipeline.

Haven't you noticed the very low clock frequencies 8051 chips are running at?

List of 74 messages in thread
TopicAuthorDate
max clk freq            01/01/70 00:00      
   Which            01/01/70 00:00      
   300MHz            01/01/70 00:00      
      .            01/01/70 00:00      
         Does that make it effectively 600MHz, then...?            01/01/70 00:00      
            That are the links I found...            01/01/70 00:00      
               Interesting item, but did you notice ... ?            01/01/70 00:00      
                  300Mips, equivalent to 3.6GHz!            01/01/70 00:00      
                     That's slightly misleading ...            01/01/70 00:00      
                        You sure about your math?            01/01/70 00:00      
                           It's confusing ... typical marketing drivel            01/01/70 00:00      
                              Based on the claims you posted            01/01/70 00:00      
                                 Those aren't my claims!            01/01/70 00:00      
                                    Read comments _before_ (not) answering them            01/01/70 00:00      
                                       Architecture speed            01/01/70 00:00      
                                          That was my take too            01/01/70 00:00      
                                             Of course, it does not depend on CLK frequency!            01/01/70 00:00      
                              I cannot see a confusion            01/01/70 00:00      
                                 Not all one-clocker mfg's make the same claims            01/01/70 00:00      
                                    But...            01/01/70 00:00      
                                 comparison of 12- and less-clockers            01/01/70 00:00      
                                    Very nice!            01/01/70 00:00      
                                    Cool!            01/01/70 00:00      
                                    Good overview            01/01/70 00:00      
               Another link            01/01/70 00:00      
                  Dhrystone?            01/01/70 00:00      
                     Yes ... one could argue that the core is hobbled            01/01/70 00:00      
                        to sell IS useful... ;-)            01/01/70 00:00      
                     Dhrystone            01/01/70 00:00      
                        give data            01/01/70 00:00      
                  I find it useful...            01/01/70 00:00      
                     Nonsense            01/01/70 00:00      
                        Nice attitude...            01/01/70 00:00      
                        One thing that would be useful for FPGA            01/01/70 00:00      
                           Still waiting            01/01/70 00:00      
                              Here it is ... It's simple arithmetic            01/01/70 00:00      
                                 Not at all!            01/01/70 00:00      
                                 You missed the "at the same frequency" part            01/01/70 00:00      
                                    You're right, in a sense ...            01/01/70 00:00      
                                       Still thinking of the DT8051 as 12-clocker            01/01/70 00:00      
                                          Gee ... I can see where I went off the track!            01/01/70 00:00      
                                             You deserve respect for that...            01/01/70 00:00      
                                             Very easy to miss things            01/01/70 00:00      
                                                It is a shame the documentation is so superficial            01/01/70 00:00      
                                             Marketing demagogy            01/01/70 00:00      
                                                baloney            01/01/70 00:00      
                                                   Insignificant?            01/01/70 00:00      
                                                      the "classical" timing            01/01/70 00:00      
                                                         Fair claim            01/01/70 00:00      
                                                Not so fast, there, Pilgrim...            01/01/70 00:00      
                                                   Any alternative?            01/01/70 00:00      
                                                      Possibly ... ???            01/01/70 00:00      
                                                         Still pipelining            01/01/70 00:00      
                                                            It doesn't have to pipeline            01/01/70 00:00      
                                                               What use?            01/01/70 00:00      
                                                                  if critical, lock - if you can            01/01/70 00:00      
                                                                     What question?            01/01/70 00:00      
                                                                        Whatever happened to Amit Mittal ?            01/01/70 00:00      
                                                                           maximum speed of a car            01/01/70 00:00      
                                                                           Pigeon Poster?            01/01/70 00:00      
                                                                        no question, uncernity            01/01/70 00:00      
                                                                  It's not that difficult ...            01/01/70 00:00      
                                                                     Are we talking about the same thing?            01/01/70 00:00      
                                                                        It is a matter of how you choose to view things            01/01/70 00:00      
                                                                           Q still open: any 8051 with only two clock transitions?            01/01/70 00:00      
                                                                              I do not believe bigger is better ...            01/01/70 00:00      
                                                                                 You argue quite much for not caring            01/01/70 00:00      
                                                                                    Without going into too much detail ...            01/01/70 00:00      
                                                                                       Pipeline for concurrency            01/01/70 00:00      
                                                                                          One step at a time            01/01/70 00:00      
                                                                                             Many steps at the same time            01/01/70 00:00      
                                                                              1-clocker without pipelining            01/01/70 00:00      
                                                                                 Interesting link - I just wish it was a bit meatier            01/01/70 00:00      
   what the datasheet for the particular device states            01/01/70 00:00      

Back to Subject List