It is a matter of how you choose to view things

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread

???
05/15/09 15:56
Read: times

#165384 - It is a matter of how you choose to view things
Responding to: ???'s previous message

Per Westermark said:

Richard Erlacher said:

It's still predictable, since computer programs can and will "know" exactly what the state of the pipeline is, in the event that the core is pipelined.

Never contested. Just noted that a pipeline has a startup time.

Richard said:

I disagree with the notion that the 805x ALU is small, as compared with the remainder of the logic.

Please stop extrapolating. I did write "The ALU etc of the 8051 are so tiny that it is very easy for the pipeline to compute both sides of a conditional branch, and throw away the wrong alternative."

Don't know where you thought you saw "compared with the remainder of the logic".The ALU of a 8051 is small, if you compare it with the ALU of more recent processors. If bigger processors can manage to compute both sides of a conditional branch, then it must obviously be possible to do for a 8051 chip too. The reason? To run a pipeline at fixed speed without worrying about any pipe stall if the chip mispredicts the branch decision.

It is also quite common to separate address calculations from the general ALU, so having two ALU does not mean having two identical copies of what the original 8051 had. The user is interested in the hehaviour, not what building block a specific transistor is located in.

I didn't intend to imply that a 1500-gate ALU, such as I suggested the 805x core might use, should be compared with the 100K-gate ALU common in modern processors. It's not the ALU's job to control the sequence of operations in the core. It certainly doesn't control the pipeline or branch prediction. My view is that data or address information flows through the ALU twice during each machine cycle on its way from and to source and destination. That information does that whether it is altered by the ALU or not. It can be incremented, decremented, ANDed, ORed, shifted, rotated, added, etc. If one uses a Wallace-tree or other multiplier, it can do that in a cycle, too. The classic ALU, which apparently used Booth's algorithm to perform the multiplication, would have to take longer.

Richard said:

Yes, you're right ... the logic depth, which can be fairly well equalized, using short, wide paths rather than narrow long ones, will provide the rate-determining step. However, if a 3-byte instruction, e.g. MOV A,#HHHH takes just as long as a single-byte instruction, MOV B,A, or a two-byte instruction, MOV A,VNAME, things will go quite a bit faster even though the individual cycles are longer.

Correct - it is always good to have a memory interface that can load the full instruction in one read. But your example requires 24 bits. Your previous post talked about 48 bits. Without a pipeline - what would you do with the information about the following instruction?

The 48-bit "view" of program memory allows the execution of two 3-byte instructions, e.g.
MOV DPTR, #HHHH and LJMP AAAA (which is really a "load PC") at a time. It also allows selective out-of-order OR concurrent execution of 3 2-byte instructions, or up to 6 single byte instructions, selectively, of course. Instructions would be removed from the instruction stream as they're executed, and replace by subsequent code-space content. Not all of code space is instructions, so this wouldn't always help. Much of the time, however, it would yield additional performance without increasing the requirement for speed, hence, reducing the speed-power product.

Richard said:

I've not built an 805x core ... yet ... though I've done considerable preliminary work on it. I've built other cores, and have found that one can build nearly any MCU core with a simple two-phase clock, e.g. the sort which was used on 6801 or 6502, etc, [...]

We are not talking about the use of a two-phase clock here. We are talking about managing the instruction with just two clock transitions.

A single input clock can easily be manipulated into a two-phase clock thereby producing a non-overlapping clock pair with a ~40% duty cycle on each phase of the input clock. That provides a convenient system for clocking the separate address and data arithmetic operations.

I was talking about one-clockers managing with just two phase changes without pipelining and without internal clock doublers.

I know that the 6800 does not do 1MIPS at 1MHz. I haven't looked at the 6502, even if I know that it is using both phases of the clock and has asynchronous logic. Is it your claim the 6502 does 1 MIPS for 1MHz input clock?

Well, the 6502 had a one-clock deep pipeline, for which it paid the price in interrupt response and branching. I don't know so much about the Motorola CPU. That was not my point, though. I simply meant that the two CPU's both used a two-phase clock generated from a single input clock of the same frequency, with two non-overlapping phases during which latched operations occurred. I say latched rather than registered, because they used latches because of their relative size (2 gates rather than ~6 for a DFF).

But was these two your proof of two-phase one-clockers?

I implied no proof of any sort.

The guy who pays for the CPU is concerned with the cost ... silicon by the pound, if you please, and while the end-user doesn't care whether it's 5000 gates or 5 million (unless he has to pay for them) it is nonetheless a significant factor.

Single-cycle branch prediction requires a much deeper view into program memory than concurrent or out-of-order execution of instructions. That's why I've chosen to ignore it for now.

I didn't intend to frame this part of the discussion as an argument. I have my own views about how an MCU core, even an 805x-type MCU core, should be architected. From what I've seen, most people don't follow that model. I prefer a short, wide path from source to destination, with a short, wide ALU section that does most of the heavy lifting while data simply flows through it. The ALU is large for a small core. The steering logic is large, for a small core, but paths through it can be set in advance, so its depth isn't as critical. Since the single clock has two phases, those phases can be separated, with an edge-detector between them to activate any gates or gated/clocked latches/registers.

Experience has taught me that this approach can be fruitful with single-clock-cycle cores. Pipelining doesn't help those types of cores ... much ... though one stage can be useful. With deep logic, and with multi-clock-per-cycle architectures, pipelining can improve performance significantly, since, while it increases latency, it typically reduces propagation delays per stage, which, at least in theory, improves performance.

RE

List of 74 messages in thread

Topic Author Date
max clk freq              01/01/70 00:00
   Which              01/01/70 00:00
   300MHz              01/01/70 00:00
      .              01/01/70 00:00
         Does that make it effectively 600MHz, then...?              01/01/70 00:00
            That are the links I found...              01/01/70 00:00
               Interesting item, but did you notice ... ?              01/01/70 00:00
                  300Mips, equivalent to 3.6GHz!              01/01/70 00:00
                     That's slightly misleading ...              01/01/70 00:00
                        You sure about your math?              01/01/70 00:00
                           It's confusing ... typical marketing drivel              01/01/70 00:00
                              Based on the claims you posted              01/01/70 00:00
                                 Those aren't my claims!              01/01/70 00:00
                                    Read comments _before_ (not) answering them              01/01/70 00:00
                                       Architecture speed              01/01/70 00:00
                                          That was my take too              01/01/70 00:00
                                             Of course, it does not depend on CLK frequency!              01/01/70 00:00
                              I cannot see a confusion              01/01/70 00:00
                                 Not all one-clocker mfg's make the same claims              01/01/70 00:00
                                    But...              01/01/70 00:00
                                 comparison of 12- and less-clockers              01/01/70 00:00
                                    Very nice!              01/01/70 00:00
                                    Cool!              01/01/70 00:00
                                    Good overview              01/01/70 00:00
               Another link              01/01/70 00:00
                  Dhrystone?              01/01/70 00:00
                     Yes ... one could argue that the core is hobbled              01/01/70 00:00
                        to sell IS useful... ;-)              01/01/70 00:00
                     Dhrystone              01/01/70 00:00
                        give data              01/01/70 00:00
                  I find it useful...              01/01/70 00:00
                     Nonsense              01/01/70 00:00
                        Nice attitude...              01/01/70 00:00
                        One thing that would be useful for FPGA              01/01/70 00:00
                           Still waiting              01/01/70 00:00
                              Here it is ... It's simple arithmetic              01/01/70 00:00
                                 Not at all!              01/01/70 00:00
                                 You missed the "at the same frequency" part              01/01/70 00:00
                                    You're right, in a sense ...              01/01/70 00:00
                                       Still thinking of the DT8051 as 12-clocker              01/01/70 00:00
                                          Gee ... I can see where I went off the track!              01/01/70 00:00
                                             You deserve respect for that...              01/01/70 00:00
                                             Very easy to miss things              01/01/70 00:00
                                                It is a shame the documentation is so superficial              01/01/70 00:00
                                             Marketing demagogy              01/01/70 00:00
                                                baloney              01/01/70 00:00
                                                   Insignificant?              01/01/70 00:00
                                                      the "classical" timing              01/01/70 00:00
                                                         Fair claim              01/01/70 00:00
                                                Not so fast, there, Pilgrim...              01/01/70 00:00
                                                   Any alternative?              01/01/70 00:00
                                                      Possibly ... ???              01/01/70 00:00
                                                         Still pipelining              01/01/70 00:00
                                                            It doesn't have to pipeline              01/01/70 00:00
                                                               What use?              01/01/70 00:00
                                                                  if critical, lock - if you can              01/01/70 00:00
                                                                     What question?              01/01/70 00:00
                                                                        Whatever happened to Amit Mittal ?              01/01/70 00:00
                                                                           maximum speed of a car              01/01/70 00:00
                                                                           Pigeon Poster?              01/01/70 00:00
                                                                        no question, uncernity              01/01/70 00:00
                                                                  It's not that difficult ...              01/01/70 00:00
                                                                     Are we talking about the same thing?              01/01/70 00:00
                                                                        It is a matter of how you choose to view things              01/01/70 00:00
                                                                           Q still open: any 8051 with only two clock transitions?              01/01/70 00:00
                                                                              I do not believe bigger is better ...              01/01/70 00:00
                                                                                 You argue quite much for not caring              01/01/70 00:00
                                                                                    Without going into too much detail ...              01/01/70 00:00
                                                                                       Pipeline for concurrency              01/01/70 00:00
                                                                                          One step at a time              01/01/70 00:00
                                                                                             Many steps at the same time              01/01/70 00:00
                                                                              1-clocker without pipelining              01/01/70 00:00
                                                                                 Interesting link - I just wish it was a bit meatier              01/01/70 00:00
   what the datasheet for the particular device states              01/01/70 00:00

Back to Subject List

Topic	Author	Date
max clk freq		01/01/70 00:00
Which		01/01/70 00:00
300MHz		01/01/70 00:00
.		01/01/70 00:00
Does that make it effectively 600MHz, then...?		01/01/70 00:00
That are the links I found...		01/01/70 00:00
Interesting item, but did you notice ... ?		01/01/70 00:00
300Mips, equivalent to 3.6GHz!		01/01/70 00:00
That's slightly misleading ...		01/01/70 00:00
You sure about your math?		01/01/70 00:00
It's confusing ... typical marketing drivel		01/01/70 00:00
Based on the claims you posted		01/01/70 00:00
Those aren't my claims!		01/01/70 00:00
Read comments _before_ (not) answering them		01/01/70 00:00
Architecture speed		01/01/70 00:00
That was my take too		01/01/70 00:00
Of course, it does not depend on CLK frequency!		01/01/70 00:00
I cannot see a confusion		01/01/70 00:00
Not all one-clocker mfg's make the same claims		01/01/70 00:00
But...		01/01/70 00:00
comparison of 12- and less-clockers		01/01/70 00:00
Very nice!		01/01/70 00:00
Cool!		01/01/70 00:00
Good overview		01/01/70 00:00
Another link		01/01/70 00:00
Dhrystone?		01/01/70 00:00
Yes ... one could argue that the core is hobbled		01/01/70 00:00
to sell IS useful... ;-)		01/01/70 00:00
Dhrystone		01/01/70 00:00
give data		01/01/70 00:00
I find it useful...		01/01/70 00:00
Nonsense		01/01/70 00:00
Nice attitude...		01/01/70 00:00
One thing that would be useful for FPGA		01/01/70 00:00
Still waiting		01/01/70 00:00
Here it is ... It's simple arithmetic		01/01/70 00:00
Not at all!		01/01/70 00:00
You missed the "at the same frequency" part		01/01/70 00:00
You're right, in a sense ...		01/01/70 00:00
Still thinking of the DT8051 as 12-clocker		01/01/70 00:00
Gee ... I can see where I went off the track!		01/01/70 00:00
You deserve respect for that...		01/01/70 00:00
Very easy to miss things		01/01/70 00:00
It is a shame the documentation is so superficial		01/01/70 00:00
Marketing demagogy		01/01/70 00:00
baloney		01/01/70 00:00
Insignificant?		01/01/70 00:00
the "classical" timing		01/01/70 00:00
Fair claim		01/01/70 00:00
Not so fast, there, Pilgrim...		01/01/70 00:00
Any alternative?		01/01/70 00:00
Possibly ... ???		01/01/70 00:00
Still pipelining		01/01/70 00:00
It doesn't have to pipeline		01/01/70 00:00
What use?		01/01/70 00:00
if critical, lock - if you can		01/01/70 00:00
What question?		01/01/70 00:00
Whatever happened to Amit Mittal ?		01/01/70 00:00
maximum speed of a car		01/01/70 00:00
Pigeon Poster?		01/01/70 00:00
no question, uncernity		01/01/70 00:00
It's not that difficult ...		01/01/70 00:00
Are we talking about the same thing?		01/01/70 00:00
*It is a matter of how you choose to view things*		01/01/70 00:00
Q still open: any 8051 with only two clock transitions?		01/01/70 00:00
I do not believe bigger is better ...		01/01/70 00:00
You argue quite much for not caring		01/01/70 00:00
Without going into too much detail ...		01/01/70 00:00
Pipeline for concurrency		01/01/70 00:00
One step at a time		01/01/70 00:00
Many steps at the same time		01/01/70 00:00
1-clocker without pipelining		01/01/70 00:00
Interesting link - I just wish it was a bit meatier		01/01/70 00:00
what the datasheet for the particular device states		01/01/70 00:00