Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
05/15/09 15:56
Read: times


 
#165384 - It is a matter of how you choose to view things
Responding to: ???'s previous message
Per Westermark said:
Richard Erlacher said:
It's still predictable, since computer programs can and will "know" exactly what the state of the pipeline is, in the event that the core is pipelined.

Never contested. Just noted that a pipeline has a startup time.

Richard said:
I disagree with the notion that the 805x ALU is small, as compared with the remainder of the logic.

Please stop extrapolating. I did write "The ALU etc of the 8051 are so tiny that it is very easy for the pipeline to compute both sides of a conditional branch, and throw away the wrong alternative."

Don't know where you thought you saw "compared with the remainder of the logic".The ALU of a 8051 is small, if you compare it with the ALU of more recent processors. If bigger processors can manage to compute both sides of a conditional branch, then it must obviously be possible to do for a 8051 chip too. The reason? To run a pipeline at fixed speed without worrying about any pipe stall if the chip mispredicts the branch decision.

It is also quite common to separate address calculations from the general ALU, so having two ALU does not mean having two identical copies of what the original 8051 had. The user is interested in the hehaviour, not what building block a specific transistor is located in.

I didn't intend to imply that a 1500-gate ALU, such as I suggested the 805x core might use, should be compared with the 100K-gate ALU common in modern processors. It's not the ALU's job to control the sequence of operations in the core. It certainly doesn't control the pipeline or branch prediction. My view is that data or address information flows through the ALU twice during each machine cycle on its way from and to source and destination. That information does that whether it is altered by the ALU or not. It can be incremented, decremented, ANDed, ORed, shifted, rotated, added, etc. If one uses a Wallace-tree or other multiplier, it can do that in a cycle, too. The classic ALU, which apparently used Booth's algorithm to perform the multiplication, would have to take longer.



Richard said:
Yes, you're right ... the logic depth, which can be fairly well equalized, using short, wide paths rather than narrow long ones, will provide the rate-determining step. However, if a 3-byte instruction, e.g. MOV A,#HHHH takes just as long as a single-byte instruction, MOV B,A, or a two-byte instruction, MOV A,VNAME, things will go quite a bit faster even though the individual cycles are longer.

Correct - it is always good to have a memory interface that can load the full instruction in one read. But your example requires 24 bits. Your previous post talked about 48 bits. Without a pipeline - what would you do with the information about the following instruction?

The 48-bit "view" of program memory allows the execution of two 3-byte instructions, e.g.
MOV DPTR, #HHHH and LJMP AAAA (which is really a "load PC") at a time. It also allows selective out-of-order OR concurrent execution of 3 2-byte instructions, or up to 6 single byte instructions, selectively, of course. Instructions would be removed from the instruction stream as they're executed, and replace by subsequent code-space content. Not all of code space is instructions, so this wouldn't always help. Much of the time, however, it would yield additional performance without increasing the requirement for speed, hence, reducing the speed-power product.

Richard said:
I've not built an 805x core ... yet ... though I've done considerable preliminary work on it. I've built other cores, and have found that one can build nearly any MCU core with a simple two-phase clock, e.g. the sort which was used on 6801 or 6502, etc, [...]

We are not talking about the use of a two-phase clock here. We are talking about managing the instruction with just two clock transitions.

A single input clock can easily be manipulated into a two-phase clock thereby producing a non-overlapping clock pair with a ~40% duty cycle on each phase of the input clock. That provides a convenient system for clocking the separate address and data arithmetic operations.

I was talking about one-clockers managing with just two phase changes without pipelining and without internal clock doublers.

I know that the 6800 does not do 1MIPS at 1MHz. I haven't looked at the 6502, even if I know that it is using both phases of the clock and has asynchronous logic. Is it your claim the 6502 does 1 MIPS for 1MHz input clock?

Well, the 6502 had a one-clock deep pipeline, for which it paid the price in interrupt response and branching. I don't know so much about the Motorola CPU. That was not my point, though. I simply meant that the two CPU's both used a two-phase clock generated from a single input clock of the same frequency, with two non-overlapping phases during which latched operations occurred. I say latched rather than registered, because they used latches because of their relative size (2 gates rather than ~6 for a DFF).

But was these two your proof of two-phase one-clockers?


I implied no proof of any sort.

The guy who pays for the CPU is concerned with the cost ... silicon by the pound, if you please, and while the end-user doesn't care whether it's 5000 gates or 5 million (unless he has to pay for them) it is nonetheless a significant factor.

Single-cycle branch prediction requires a much deeper view into program memory than concurrent or out-of-order execution of instructions. That's why I've chosen to ignore it for now.

I didn't intend to frame this part of the discussion as an argument. I have my own views about how an MCU core, even an 805x-type MCU core, should be architected. From what I've seen, most people don't follow that model. I prefer a short, wide path from source to destination, with a short, wide ALU section that does most of the heavy lifting while data simply flows through it. The ALU is large for a small core. The steering logic is large, for a small core, but paths through it can be set in advance, so its depth isn't as critical. Since the single clock has two phases, those phases can be separated, with an edge-detector between them to activate any gates or gated/clocked latches/registers.

Experience has taught me that this approach can be fruitful with single-clock-cycle cores. Pipelining doesn't help those types of cores ... much ... though one stage can be useful. With deep logic, and with multi-clock-per-cycle architectures, pipelining can improve performance significantly, since, while it increases latency, it typically reduces propagation delays per stage, which, at least in theory, improves performance.

RE


List of 74 messages in thread
TopicAuthorDate
max clk freq            01/01/70 00:00      
   Which            01/01/70 00:00      
   300MHz            01/01/70 00:00      
      .            01/01/70 00:00      
         Does that make it effectively 600MHz, then...?            01/01/70 00:00      
            That are the links I found...            01/01/70 00:00      
               Interesting item, but did you notice ... ?            01/01/70 00:00      
                  300Mips, equivalent to 3.6GHz!            01/01/70 00:00      
                     That's slightly misleading ...            01/01/70 00:00      
                        You sure about your math?            01/01/70 00:00      
                           It's confusing ... typical marketing drivel            01/01/70 00:00      
                              Based on the claims you posted            01/01/70 00:00      
                                 Those aren't my claims!            01/01/70 00:00      
                                    Read comments _before_ (not) answering them            01/01/70 00:00      
                                       Architecture speed            01/01/70 00:00      
                                          That was my take too            01/01/70 00:00      
                                             Of course, it does not depend on CLK frequency!            01/01/70 00:00      
                              I cannot see a confusion            01/01/70 00:00      
                                 Not all one-clocker mfg's make the same claims            01/01/70 00:00      
                                    But...            01/01/70 00:00      
                                 comparison of 12- and less-clockers            01/01/70 00:00      
                                    Very nice!            01/01/70 00:00      
                                    Cool!            01/01/70 00:00      
                                    Good overview            01/01/70 00:00      
               Another link            01/01/70 00:00      
                  Dhrystone?            01/01/70 00:00      
                     Yes ... one could argue that the core is hobbled            01/01/70 00:00      
                        to sell IS useful... ;-)            01/01/70 00:00      
                     Dhrystone            01/01/70 00:00      
                        give data            01/01/70 00:00      
                  I find it useful...            01/01/70 00:00      
                     Nonsense            01/01/70 00:00      
                        Nice attitude...            01/01/70 00:00      
                        One thing that would be useful for FPGA            01/01/70 00:00      
                           Still waiting            01/01/70 00:00      
                              Here it is ... It's simple arithmetic            01/01/70 00:00      
                                 Not at all!            01/01/70 00:00      
                                 You missed the "at the same frequency" part            01/01/70 00:00      
                                    You're right, in a sense ...            01/01/70 00:00      
                                       Still thinking of the DT8051 as 12-clocker            01/01/70 00:00      
                                          Gee ... I can see where I went off the track!            01/01/70 00:00      
                                             You deserve respect for that...            01/01/70 00:00      
                                             Very easy to miss things            01/01/70 00:00      
                                                It is a shame the documentation is so superficial            01/01/70 00:00      
                                             Marketing demagogy            01/01/70 00:00      
                                                baloney            01/01/70 00:00      
                                                   Insignificant?            01/01/70 00:00      
                                                      the "classical" timing            01/01/70 00:00      
                                                         Fair claim            01/01/70 00:00      
                                                Not so fast, there, Pilgrim...            01/01/70 00:00      
                                                   Any alternative?            01/01/70 00:00      
                                                      Possibly ... ???            01/01/70 00:00      
                                                         Still pipelining            01/01/70 00:00      
                                                            It doesn't have to pipeline            01/01/70 00:00      
                                                               What use?            01/01/70 00:00      
                                                                  if critical, lock - if you can            01/01/70 00:00      
                                                                     What question?            01/01/70 00:00      
                                                                        Whatever happened to Amit Mittal ?            01/01/70 00:00      
                                                                           maximum speed of a car            01/01/70 00:00      
                                                                           Pigeon Poster?            01/01/70 00:00      
                                                                        no question, uncernity            01/01/70 00:00      
                                                                  It's not that difficult ...            01/01/70 00:00      
                                                                     Are we talking about the same thing?            01/01/70 00:00      
                                                                        It is a matter of how you choose to view things            01/01/70 00:00      
                                                                           Q still open: any 8051 with only two clock transitions?            01/01/70 00:00      
                                                                              I do not believe bigger is better ...            01/01/70 00:00      
                                                                                 You argue quite much for not caring            01/01/70 00:00      
                                                                                    Without going into too much detail ...            01/01/70 00:00      
                                                                                       Pipeline for concurrency            01/01/70 00:00      
                                                                                          One step at a time            01/01/70 00:00      
                                                                                             Many steps at the same time            01/01/70 00:00      
                                                                              1-clocker without pipelining            01/01/70 00:00      
                                                                                 Interesting link - I just wish it was a bit meatier            01/01/70 00:00      
   what the datasheet for the particular device states            01/01/70 00:00      

Back to Subject List