??? 05/15/09 22:17 Read: times |
#165398 - Q still open: any 8051 with only two clock transitions? Responding to: ???'s previous message |
Richard said:
I didn't intend to imply that a 1500-gate ALU, such as I suggested the 805x core might use, should be compared with the 100K-gate ALU common in modern processors. [...]" The ALU is tiny. It is possible to make a 8051 variant with two, three or maybe 5 ALU without burning any larger number of transistors. So what problem did you have with my comment: "The ALU etc of the 8051 are so tiny that it is very easy for the pipeline to compute both sides of a conditional branch, and throw away the wrong alternative."? Richard said:
It also allows selective out-of-order OR concurrent execution of 3 2-byte instructions, or up to 6 single byte instructions, selectively, of course. Loading (at least) 48 bits at a time could allow the execution of two instructions at a time. But what was the point you wanted to make? Have you forgotted what we (or at least I) were discussing? Per said:
The question to ask is if any 8051 1-clocker can exist without either pipelining or an internal clock doubler. I'm not too happy with the "can exist" in the middle of the sentence - there are for example a clockless ARM core, so obviously it must be possible to make a 1-clocker without pipeline or internal clock doubler. But are there any? As a follow-up to your last post: Can you do out of order execution of multiple instructions without a pipeline to reorder the instructions? Richard said:
Instructions would be removed from the instruction stream as they're executed, and replace by subsequent code-space content. Does sound like a pipeline to me. So what are you trying to say? That you can have one-clocker chips without a pipeline just as long as they have a pipeline? Are you trying to correct me, so correct me based on something I have written. If you are trying to agree with me - please try to write the text so that it sounds like you are agreeing, and not arguing against me. Right now, I don't know exactly what your aim are. Richard said:
A single input clock can easily be manipulated into a two-phase clock thereby producing a non-overlapping clock pair with a ~40% duty cycle on each phase of the input clock. That provides a convenient system for clocking the separate address and data arithmetic operations. Of course it can. Why do you think I wrote: "Using a two-phase clock, you would still have quite interesting times to get data from the code space, decode, retrieve input data, compute and store back the result within one low-to-high and one high-to-low clock transition." Richard said:
I implied no proof of any sort. But you wanted to imply something. The existence of dual-phase clocks? Don't think you have to spend too much time talking about that, since it should be considered common knowledge. The question here is: Are the existing 8051 one-clockers able to do what they do with just a two-phase clock and no pipeline? This as a follow-up to Kais comment "Well, the DP8051 has such a hidden feature, he is using piplining:". Richard said:
Single-cycle branch prediction requires a much deeper view into program memory than concurrent or out-of-order execution of instructions. That's why I've chosen to ignore it for now. If you have double execution units, then you don't need to do any branch prediction. You process all instructions with one clock "lag", and you compute both sides of the branch. As soon as you know the outcome of the branch, you throw away one alternative and let the other alternative "take", in the same clock cycle as you are busy with the next instruction of the branch halve you did take. Very hard to do with external flash memory, given the tiny bandwidth of the external memory interface. A lot easier with internal code space, where you may just as well have a 128-bit interface from the flash, and maybe 4 or 8 cache lines. Snooping the pipeline would allow potential branches to be loaded. There are a large number of ways a 8051 can be speeded up a lot, if some is interested and the the code is only internal, or the processor has a very wide external interface. The interest in doing it will depend on the market shares taken by $1 ARM chips. Out-of-order execution is good for a PC processor, where you have very expensive computational blocks that you want to run at best possible utilization. A FPU block is huge. A 64x64->128-bit integer multiplier isn't so small either. But would out-of-order execution be something you want in a 8051-class microcontroller? If you at compile time can see that two instructions should be swapped, then the programmer who needs to count clock cycles don't need the swap feature - he can swap the instructions in the source. If it can't be seen at compile time, then the programmer will not be able to predict the actual clock count. You may also find that the pipeline complexity to predict all combinations of instructions to figure out if they can be reordered or run concurrently may be quite significant, compared to a "stupid" processor that just runs down both sides of a conditional branch until the condition is known. The brute force alternative would actually be simpler. Another thing. A superscalar 8051 is most definitely possible to build. Compared to a superscalar current-generation x86 processor, it would even be trivial (and quite small). But a bit-set and bit-clear (of the same bit) after each other would not combine well. And adding a single byte somewhere in the instruction stream would directly ripple through the program until you reach a point where instructions may not be run concurrently. I would think a pipelined 8051 capable of 500MHz would be more useful than a superscalar 8051 where you would need the assembler to output a special listing showing how instructions have been combined just so you can figure out if that single extra instruction will break already verified timing blocks. But I did not start this to discuss potential improvements in future 8051 chips - there are too many possible improvements, but it will be the economy and the competitors that will decide what will happen. I was just wondering if any of the one-clockers are non-pipelined and just using two clock transitions. A microcontroller is not suitable for the huge pipelines in todays PC processors. But two clock transistions is very little. Just going to three or four makes a huge difference. But going to three or four clock transitions in a one-clocker would require a pipeline where you may run overlapping instructions, or a clock doubler. Have anyone managed to stay with just two clock transitions? |