Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
06/22/10 17:31
Read: times


 
#176837 - Most concepts already exists in the wild
Responding to: ???'s previous message
You are now describing the piplelined and superscalar implementation of modern high-end processors.

The processors in PC:s, game consoles or more expensive mobile phones can do multiple instructions/clock cycle even if they have only a single core. The Pentium processor was the first x86 processor that was superscalar. It could issue one floating point and two integer instructions in the same clock cycle.

By increasing the length of the pipleline, the processor can detect dependencies between instructions and then reorder the instructions to allow multiple instructions to be started at the same time.

The next step up from this is the hyperthreading that got introduced in some of the faster P4 processors. A single processor core has double set of registers (including program counters) allowing the processor to instantly allocate the computation units (ALU, multiplier, ...) to the second program when instructions from the first program stalls.

The increase in pipeline length needed for instruction reordering and efficient superscalar performance increases the cost of a stall - for example when the memory can't keep up and supply the data in time because the data wasn't already in the cache. The hyperthreading reduced the loss from such stalls since two applications needed to stall at the same time for the core to need to idle.

A new i7 processor can have four cores - each with hyperthreading and each with superscalar performance where multiple instructions are issued each clock cycle.

And beside these things, most newer processors also have SIMD instructions for multimedia or signal processing.

But (yes, there is always a but) the increases in pipeline lengths to get the superscalar performance, and the use of data and code caches means that the timing will be more and more random. A traditional microcontroller is fully deterministic. If interrupts are not disabled, then you can compute the number of clock cycles from an event until the ISR has been activated.

With a high-end processor, you may have three levels of cache that needs to be filled to get the initial instruction. If accessing virtual memory, you may get exceptions where you basically reach another level of ISR to map in/out the required memory. This can happen for the instruction bytes and/or for the data bytes. Depending on what other instructions the processor has been busy with previously, it will take an unknown number of cycles from an instruction enters the pipeline until the required execution unit (such as an ALU) is ready to process the instruction. A processor with a 17-step pipeline will obviously have way longer time from fetching an instruction until executing it compared to a processor with a 2-step pipeline.

This is a big reason why most microcontrollers are still running at very sedate speeds even if they are using 0.13u process technology. The lower-end ARM chips are very nice. The high-end ARM chips have walked a long way towards current PC-class technologies, which makes them excellent for running Linux systems, but not as good at hard real-time for fulfilling us or sub-us requirements.

Another concept that does exist is VLIW - very large instruction word. You make the processor load multiple concurrent instructions with each read, and processes these instructions in lock-step. Running the instructions in lock-step is what differs from a traditional superscalar processor where the instructions are completely independant and the processor itself takes care of any reordering to get max speed out of the core. With VLIW, the compiler must figure out which instructions that may be processed at the same time and merge these sub-instructions into a complete instruction word.

Yet another thing that does have similarities with your ideas is the virtual design of new processors. They may have multiple cores that shares a pool of execution units and the individual cores then grabs any free address decoder, multiplier or whatever it may need. By pooling the resources, you can squeeze just a bit more performance out of the chip by reducing the stall time - this is a follow-up to the hyperthreading. Having virtual processors means that you could basically have a server that allocates 30% CPU resources to a specific application - not by just adjusting the number of time slices but by adjusting the number of core elements. You get a very fine-grained tool for handling CPU-time quota, making sure that your multimedia stream is guaranteed to have 1.2 billion multiplies and 3 billion adds/second.

There are also companies that develops processing solutions based on graphcis cards, where the modern, programmable, pipelines of the graphics cards are used to dynamically form computation networks. There exists supercomputers based on graphics cards, but I don't think any manufacturer have any processor product that would be suitable for control applications. Most pipelined solutions are about pure power, and not about quick responses.

List of 104 messages in thread
TopicAuthorDate
So What Is An 8051/2 Good For?            01/01/70 00:00      
   thoughts            01/01/70 00:00      
      The Future of the 805x            01/01/70 00:00      
         PARC            01/01/70 00:00      
            Bigger Hammers            01/01/70 00:00      
               re: Bigger Hammers            01/01/70 00:00      
               The opposite problem seems more common here!            01/01/70 00:00      
         Would Toyota have had the problem if ...            01/01/70 00:00      
            Toyota: Case in point            01/01/70 00:00      
            RE: Toyota            01/01/70 00:00      
               It was a mechanical fix ...            01/01/70 00:00      
            Parallel Processing            01/01/70 00:00      
               Sometimes the practical reality is of little consequence            01/01/70 00:00      
                  Totally Agree, but I was looking for a magic bullet            01/01/70 00:00      
                  RE: "outperform"            01/01/70 00:00      
                     There are some operations ...            01/01/70 00:00      
                        rephrased            01/01/70 00:00      
                        Now, you are extrapolating            01/01/70 00:00      
                           good points, but            01/01/70 00:00      
                              How many 8051 chips uses 0.13u?            01/01/70 00:00      
                                 not yet            01/01/70 00:00      
                           not exactly ...            01/01/70 00:00      
                              Do not get focused on one operation...            01/01/70 00:00      
                              any 8-bit instruction can exist in a 32-bit processor            01/01/70 00:00      
                                 Yes, but does it?            01/01/70 00:00      
                                    So have you looked at any other processors?            01/01/70 00:00      
                                       not a point of disagreement, but you missed it anyway            01/01/70 00:00      
                                          A good point            01/01/70 00:00      
                                             beg to differ            01/01/70 00:00      
                                             Disagree entirely!            01/01/70 00:00      
                                          Yes, auto-increment/decrement is standard and not "feature"            01/01/70 00:00      
                                             What I wanted to point out ...            01/01/70 00:00      
                                                Same same all the time. no "one size fits".            01/01/70 00:00      
                                                   and the most important point is (drumroll) ....            01/01/70 00:00      
                                                Comparing Apples to Oranges            01/01/70 00:00      
                                          Prices are comparable            01/01/70 00:00      
               Parallel processing            01/01/70 00:00      
                  Sweeping generalisation!            01/01/70 00:00      
                     Not a magic silver bullit            01/01/70 00:00      
                        Fond memories            01/01/70 00:00      
                        A magic bullet            01/01/70 00:00      
                           Most concepts already exists in the wild            01/01/70 00:00      
                           Another generalisation            01/01/70 00:00      
                           Speed vs latency            01/01/70 00:00      
                              Why 8051?            01/01/70 00:00      
                                 Isn't it obvious?            01/01/70 00:00      
                                    ARM simpler than 8051            01/01/70 00:00      
                                       Generalisation            01/01/70 00:00      
                                          ARM 'MCUs' have their limitations too!            01/01/70 00:00      
                                             You normally engineer with a backup plan            01/01/70 00:00      
                                    No, it's not!            01/01/70 00:00      
                                       Im just trying to provide an argument            01/01/70 00:00      
                                          x bits are just one parameter among many            01/01/70 00:00      
                                             Avoiding the issue            01/01/70 00:00      
                                                Avoiding what issue?            01/01/70 00:00      
                                          They say it because it's true!            01/01/70 00:00      
                                          RE: ARM is not the only 32-bitter            01/01/70 00:00      
                                    Please don'g generalize            01/01/70 00:00      
                                 Heterogenous multiprocessing widespread            01/01/70 00:00      
                  Re: Multicore 8051            01/01/70 00:00      
                     ALU chaining            01/01/70 00:00      
   Well... maybe            01/01/70 00:00      
      A Linear Accelerator?            01/01/70 00:00      
         Its one of these            01/01/70 00:00      
      please, repeat            01/01/70 00:00      
         All I was saying was            01/01/70 00:00      
   So what the '51 are good for...            01/01/70 00:00      
      Not terribly helpful            01/01/70 00:00      
         Always up to the developers            01/01/70 00:00      
            RE: The manufacturers tells us...            01/01/70 00:00      
               Sales - "may be used for" presented as "recommended"            01/01/70 00:00      
               pretty hot, low-power and small            01/01/70 00:00      
                  Automotive...            01/01/70 00:00      
         but answers your original question (at least the one...            01/01/70 00:00      
         MCS51 still rocking !!!            01/01/70 00:00      
            Scale            01/01/70 00:00      
            Missing the point            01/01/70 00:00      
               Impossible to generalize into fields            01/01/70 00:00      
                  An appropriate generalisation...            01/01/70 00:00      
               You are right..Andy Neil            01/01/70 00:00      
                  Cheers!            01/01/70 00:00      
                     No..only AT89C52 can be used            01/01/70 00:00      
                        why do you insist on Atmel?            01/01/70 00:00      
                        what a strange post            01/01/70 00:00      
                        Tools?            01/01/70 00:00      
                           Multiple manufactuers with (almost) identical chips            01/01/70 00:00      
                              Getting better            01/01/70 00:00      
                                 Unified interrupt controller is really great            01/01/70 00:00      
                           Yes, even with free tools for PIC/AVR            01/01/70 00:00      
                              I mean no offense, but ...            01/01/70 00:00      
                              Similar difficulties coming to 8051/2?            01/01/70 00:00      
                                 Same same            01/01/70 00:00      
                  Is it your purchase price or why so sure AVR or PIC are off?            01/01/70 00:00      
                     Answer to Per and Erik...            01/01/70 00:00      
                  Living in the past            01/01/70 00:00      
                     Thanks John D. Maniraj            01/01/70 00:00      
                     locking            01/01/70 00:00      
                        RE: Locking            01/01/70 00:00      
                        Agreed, but            01/01/70 00:00      
                           fairly easy            01/01/70 00:00      
      Don't forget consumer devices            01/01/70 00:00      
         A perfect application            01/01/70 00:00      
   8051 vs ARM            01/01/70 00:00      
   just thought of one case            01/01/70 00:00      

Back to Subject List