You argue quite much for not caring

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread

???
05/16/09 12:06
Read: times

#165410 - You argue quite much for not caring
Responding to: ???'s previous message

Richard said:

It doesn't matter, even if you have 1K ALU's, if you don't have a 2K-bit (128 bytes in either direction, since that's what branching allows) concurrent "view" into code memory. With 48 bits you can do nearly everything other than branch prediction

Without a pipeline, I could use a 100kbit code read capability but to what use? If each instruction processing is required to start no earlier than after the previous instruction stopped, there would be no overlap and there would be no use to know what the following instructions are.

With a pipeline, you can add improvements such as overlapping execution, speculative execution, duplication of slow execution units, etc.

Don't know why you think you would need to be able to read all data bytes between the current position and to the longest conditional jump target. With a pipeline that have bandwidth enough to read instructions at n times higher speed than you process the instructions, and a pipeline+cache that is storing enough following instructions to take into account the number of cycles needed for one code read (reading from flash is often not a 1-cycle operation when you step up the clock frequency), then the already read instruction cache can already keep the following 2-5 instructions, while the analysis of the future instructions can identify future branches that will require speculative instruction fetching even before the processor will know if the branch will be taken or not.

In the end, the big requirement is just that the memory bandwidth (very wide, or dual/quad-pumped) is high enough in relation to the number of read cycles needed, that the processor can retrieve instruction bytes at 2, 3 or 4 times higher speed than it will consume them. It is a known fact, that a flash is slow, i.e. you can't easily reduce the cycle time. But it is at the same time a known fact that it is quite cheap to produce wider memory interfaces when accessing internal memory.

In the end, you don't have to worry about the jump distance for your conditional branches. Remember that the concept is working on processors with longer jumps (conditional and non-conditional) than the 8051 has.

What you have to worry about is the pipeline length, the quotient between code memory bandwidth and code consumption rate, in relation to code containing continuous branches after each other. A longer pipeline will quicly increase the complexity of having multiple concurrent branches to try to predict, before you reach the point when the processor knows which branch alternative that got taken. A zero-length pipeline would result in stalls, while waiting for the flash to supply data from the branch destination, or require the processor to run at so low speed that the flash can keep up, or require that all code is pre-copied to RAM, as is done in some processors.

Richard said:

I don't know ... and, frankly I don't care. Do you either know or care? If one MCU executes code substantially faster than another, that's sufficient.

Maybe you don't care. But maybe you should not discuss pipelined methods of speeding up a 8051 as an alternative to a pipeline.

Richard said:

As a follow-up to your last post: Can you do out of order execution of multiple instructions without a pipeline to reorder the instructions?
Well, you could execute them concurrently with other instructions. For example ... if one instruction increments DPTR while the next left-shifts A, those potentially can happen at the same time, since there's no resource collision.

Are you surprised if processors who do out-of-order or concurrent execution uses a pipeline, and inside this pipeline makes the decision how instructions can be swapped or combined? Doing it without a pipeline would require a quite advanced decision to be made in zero-cycle time - or require that your processor run slow enough that you could ripple through all your decisision. But instruction combining is something you do to get speed. And a pipeline is the #1 thing you look at to get speed.

After all - you have several times tried to talk about the ALU handling data two times for each instruction. One for address and one for data. But that means that the ALU has to be twice as fast. That can be rewritten: The processor must be half as fast, to let the ALU get enough time. With a pipeline and the individual jobs separated, you would stream the data through the processor, and the ALU would not have to reserve any time for any addresses. This means that a specific process technology, with a specific gate propagation time, would allow a faster processor. In a streamed design, each block is normally simpler, so the hardware for handling the address calculations don't need to support any multiply. For better processors, it will on the other hand support n-step shifts, allowing you to do array accesses of 2^n-sized entries with a[idx], a[2*idx], a[4*idx]. Small, specialized, blocks can do a better job.

In the end, I do not believe that the DP8051 is odd for using pipelining. It is far more likely that most of them are using pipelining. The alternative is of course to take a 4-clock core. Add an internal clock multiplier running the core at 100MHz with a 25MHz external clock and call it a 25MHz 1-clocker. The little problem with this choice is the timing of external memory - people may be a bit surprised when the 100MHz internal speed shows through - unless the manufacturer says that external memory accesses will take extra time, and adds clock stretching.

Richard said:

Not quite. You simply set a bit in the corresponding position in the buffer and the instruction will be "gone" in the sense that it will be replace by the subsequent portion of the stream. This isn't a pipeline because it isn't registered. A pipeline is, typically, a registered latency engine.

You are either talking about a pipeline, but refusing to call it a pipeline, or you are talking about a standard cache. The cache have bits telling about valid or free slots.

Richard said:

For example, if you have a logic block with several layers of logic more in some paths through the block than in others, you might want to register the stages to synchronize them. Since the depth of the logic is a rate-determining step, you might want to insert a pipeline register in the path, thereby turning the operation of that path into a two-clock path, while others remain one clock long, thereby making the block a two-clock block only when the long path is in use, producing overall performance improvement. I would hope to avoid that, however.

A pipeline is something you can use for many things. You seem to have focused on a single specific use for the pipeline. With the high latencies of flash memory, the ability to predict required code bytes should not be ignored. The ability to decode an instruction while saving the result of the previous instruction should not be ignored either. But there are many more things to a pipeline, so don't limit yourself to the piepline descriptions you might have found in early microprocessor documentation.

A 12MHz 12-clocker that needs 12 extra clocks will introduce 1us extra interrupt latency.
A 100MHz 1-clocker that runs the instructions with a one-clock stagger would always add 10ns extra interrupt latency from that extra pipeline step.

A pipeline is something you use to get brutal speeds out of a specific process, by adding opportunities for concurrency in all stages of the processor. You can produce a pipeline that has predictable execution times. The ability of increasing the clock frequency can allow the interrupt latency to shrink despite extra cycles in the pipeline. And the next thing is that the interrupt latency is actually two factors. One factor is time to first ISR instruction. The other factor is time until interrupt fully serviced - your ISR may have to look at flags, pick up data etc. In the end, you shouldn't bank too much on the "I would hope to avoid that". Especially since it isn't so very unlikely that you are currently using a pipelined processor, and loving it.

Richard said:

Do you think I care whether they use pipelining?

It's absolutely impossible to know what you care about, since it seems to be absolutely impossible to guess what your responses will be to any comment.

Richard said:

ARM, MIPS, etc, architectures handle their ideal word size just fine. If I'm processing bytes, though, I don't want to give away byte performance just so I can claim I'm using a 32-bit processor.

Not sure what subject you are debating now. I thought we where debating latencies. A 8-bit processor does less work with each instruction than a 32-bitter, so latencies are evey more important to keep an eye on.

Richard said:

So far, I haven't been impressed with what ARM and other 32-bit MCU's can do with bytes. When ARM's come out with upwards of 64TB of FLASH on-chip, along with, say 512 GB of on-chip RAM, and an performance of 50 exaflops per femtosecond, and a vast array of peripherals, all for $1 or less, I guess I'll have no choice but to give them a try, but, in the meantime, as I'm able to find adequate inexpensive 8-bitters, not all in the 805x-camp, I'll stick with them, thank you very much.

Wonderful view. You are probably the only one in this world with that view. I think you should be carefull with publishing it - at what time do you think your customers will go for another supplier?

My customers don't spend too much time expecting me to be faithful to a specific processor architecture, or that the selected processor must allow the full addressable memory range to be used. They only focus on a well-working product at a reasonable price. That is the reason why every new project includes a quite broad analysis of the availability of processors. That is also the reason why our products sometimes do jump between architectures. An 8051 may become a PIC. A PIC may become an AVR. An AVR may become an ARM. A PPC may become...

List of 74 messages in thread

Topic Author Date
max clk freq              01/01/70 00:00
   Which              01/01/70 00:00
   300MHz              01/01/70 00:00
      .              01/01/70 00:00
         Does that make it effectively 600MHz, then...?              01/01/70 00:00
            That are the links I found...              01/01/70 00:00
               Interesting item, but did you notice ... ?              01/01/70 00:00
                  300Mips, equivalent to 3.6GHz!              01/01/70 00:00
                     That's slightly misleading ...              01/01/70 00:00
                        You sure about your math?              01/01/70 00:00
                           It's confusing ... typical marketing drivel              01/01/70 00:00
                              Based on the claims you posted              01/01/70 00:00
                                 Those aren't my claims!              01/01/70 00:00
                                    Read comments _before_ (not) answering them              01/01/70 00:00
                                       Architecture speed              01/01/70 00:00
                                          That was my take too              01/01/70 00:00
                                             Of course, it does not depend on CLK frequency!              01/01/70 00:00
                              I cannot see a confusion              01/01/70 00:00
                                 Not all one-clocker mfg's make the same claims              01/01/70 00:00
                                    But...              01/01/70 00:00
                                 comparison of 12- and less-clockers              01/01/70 00:00
                                    Very nice!              01/01/70 00:00
                                    Cool!              01/01/70 00:00
                                    Good overview              01/01/70 00:00
               Another link              01/01/70 00:00
                  Dhrystone?              01/01/70 00:00
                     Yes ... one could argue that the core is hobbled              01/01/70 00:00
                        to sell IS useful... ;-)              01/01/70 00:00
                     Dhrystone              01/01/70 00:00
                        give data              01/01/70 00:00
                  I find it useful...              01/01/70 00:00
                     Nonsense              01/01/70 00:00
                        Nice attitude...              01/01/70 00:00
                        One thing that would be useful for FPGA              01/01/70 00:00
                           Still waiting              01/01/70 00:00
                              Here it is ... It's simple arithmetic              01/01/70 00:00
                                 Not at all!              01/01/70 00:00
                                 You missed the "at the same frequency" part              01/01/70 00:00
                                    You're right, in a sense ...              01/01/70 00:00
                                       Still thinking of the DT8051 as 12-clocker              01/01/70 00:00
                                          Gee ... I can see where I went off the track!              01/01/70 00:00
                                             You deserve respect for that...              01/01/70 00:00
                                             Very easy to miss things              01/01/70 00:00
                                                It is a shame the documentation is so superficial              01/01/70 00:00
                                             Marketing demagogy              01/01/70 00:00
                                                baloney              01/01/70 00:00
                                                   Insignificant?              01/01/70 00:00
                                                      the "classical" timing              01/01/70 00:00
                                                         Fair claim              01/01/70 00:00
                                                Not so fast, there, Pilgrim...              01/01/70 00:00
                                                   Any alternative?              01/01/70 00:00
                                                      Possibly ... ???              01/01/70 00:00
                                                         Still pipelining              01/01/70 00:00
                                                            It doesn't have to pipeline              01/01/70 00:00
                                                               What use?              01/01/70 00:00
                                                                  if critical, lock - if you can              01/01/70 00:00
                                                                     What question?              01/01/70 00:00
                                                                        Whatever happened to Amit Mittal ?              01/01/70 00:00
                                                                           maximum speed of a car              01/01/70 00:00
                                                                           Pigeon Poster?              01/01/70 00:00
                                                                        no question, uncernity              01/01/70 00:00
                                                                  It's not that difficult ...              01/01/70 00:00
                                                                     Are we talking about the same thing?              01/01/70 00:00
                                                                        It is a matter of how you choose to view things              01/01/70 00:00
                                                                           Q still open: any 8051 with only two clock transitions?              01/01/70 00:00
                                                                              I do not believe bigger is better ...              01/01/70 00:00
                                                                                 You argue quite much for not caring              01/01/70 00:00
                                                                                    Without going into too much detail ...              01/01/70 00:00
                                                                                       Pipeline for concurrency              01/01/70 00:00
                                                                                          One step at a time              01/01/70 00:00
                                                                                             Many steps at the same time              01/01/70 00:00
                                                                              1-clocker without pipelining              01/01/70 00:00
                                                                                 Interesting link - I just wish it was a bit meatier              01/01/70 00:00
   what the datasheet for the particular device states              01/01/70 00:00

Back to Subject List

Topic	Author	Date
max clk freq		01/01/70 00:00
Which		01/01/70 00:00
300MHz		01/01/70 00:00
.		01/01/70 00:00
Does that make it effectively 600MHz, then...?		01/01/70 00:00
That are the links I found...		01/01/70 00:00
Interesting item, but did you notice ... ?		01/01/70 00:00
300Mips, equivalent to 3.6GHz!		01/01/70 00:00
That's slightly misleading ...		01/01/70 00:00
You sure about your math?		01/01/70 00:00
It's confusing ... typical marketing drivel		01/01/70 00:00
Based on the claims you posted		01/01/70 00:00
Those aren't my claims!		01/01/70 00:00
Read comments _before_ (not) answering them		01/01/70 00:00
Architecture speed		01/01/70 00:00
That was my take too		01/01/70 00:00
Of course, it does not depend on CLK frequency!		01/01/70 00:00
I cannot see a confusion		01/01/70 00:00
Not all one-clocker mfg's make the same claims		01/01/70 00:00
But...		01/01/70 00:00
comparison of 12- and less-clockers		01/01/70 00:00
Very nice!		01/01/70 00:00
Cool!		01/01/70 00:00
Good overview		01/01/70 00:00
Another link		01/01/70 00:00
Dhrystone?		01/01/70 00:00
Yes ... one could argue that the core is hobbled		01/01/70 00:00
to sell IS useful... ;-)		01/01/70 00:00
Dhrystone		01/01/70 00:00
give data		01/01/70 00:00
I find it useful...		01/01/70 00:00
Nonsense		01/01/70 00:00
Nice attitude...		01/01/70 00:00
One thing that would be useful for FPGA		01/01/70 00:00
Still waiting		01/01/70 00:00
Here it is ... It's simple arithmetic		01/01/70 00:00
Not at all!		01/01/70 00:00
You missed the "at the same frequency" part		01/01/70 00:00
You're right, in a sense ...		01/01/70 00:00
Still thinking of the DT8051 as 12-clocker		01/01/70 00:00
Gee ... I can see where I went off the track!		01/01/70 00:00
You deserve respect for that...		01/01/70 00:00
Very easy to miss things		01/01/70 00:00
It is a shame the documentation is so superficial		01/01/70 00:00
Marketing demagogy		01/01/70 00:00
baloney		01/01/70 00:00
Insignificant?		01/01/70 00:00
the "classical" timing		01/01/70 00:00
Fair claim		01/01/70 00:00
Not so fast, there, Pilgrim...		01/01/70 00:00
Any alternative?		01/01/70 00:00
Possibly ... ???		01/01/70 00:00
Still pipelining		01/01/70 00:00
It doesn't have to pipeline		01/01/70 00:00
What use?		01/01/70 00:00
if critical, lock - if you can		01/01/70 00:00
What question?		01/01/70 00:00
Whatever happened to Amit Mittal ?		01/01/70 00:00
maximum speed of a car		01/01/70 00:00
Pigeon Poster?		01/01/70 00:00
no question, uncernity		01/01/70 00:00
It's not that difficult ...		01/01/70 00:00
Are we talking about the same thing?		01/01/70 00:00
It is a matter of how you choose to view things		01/01/70 00:00
Q still open: any 8051 with only two clock transitions?		01/01/70 00:00
I do not believe bigger is better ...		01/01/70 00:00
*You argue quite much for not caring*		01/01/70 00:00
Without going into too much detail ...		01/01/70 00:00
Pipeline for concurrency		01/01/70 00:00
One step at a time		01/01/70 00:00
Many steps at the same time		01/01/70 00:00
1-clocker without pipelining		01/01/70 00:00
Interesting link - I just wish it was a bit meatier		01/01/70 00:00
what the datasheet for the particular device states		01/01/70 00:00