??? 03/07/09 21:07 Read: times |
#163218 - I made NO optimisation Responding to: ???'s previous message |
Per Westermark said:
Comparing the end pointer involved two LOADs. What optimization setting do you use? In my test, the RealView compiler did place the end pointer in a register so it could directly compare the source pointer with the end pointer before the conditional jump. I made NO optimisation settings. As I said, I am not an experienced user. After your advice, I chose -O3 (+ for speed) and achieved 36.98us / 39.06us for the end_pointer / count method. It looked longer, but the cycle count is shorter. Modern compilers are definitely very clever. I am not surprised that you have to give the odd hint. i.e. the compiler could analyse that you are doing # iterations, perhaps it could choose an integer loop count or a pointer compare. You always have to drop hints to an 8 bit compiler. One day the 32 bit compiler will know better than you do. I am learning things about an ARM7 ( or the LPC2103 anyway ). It does not have an external memory bus, so could not do Richard's trick. However, it may internally read ahead on the instruction stream, but it does not execute instructions from a cache. Or can it ? I have no personal problem with an improved execution speed due to pipelining. Neither do I complain if a Compiler has removed some redundant instructions. Richard and many others are horrified by Compiler optimisation. But as far as I can see the ARM7 core has a consistent execution cycle count. So a Simulator or pencil and paper can always calculate exact times from the ASM instructions. There was a recent AVR thread on a ultoa() function. A clever trick with a 32 x 16 multiply and subsequent shifts reduced a worst case from 2700 cycles to 87. However good an instruction set you have, a good algorithm can be dramatic. Modern Compilers translate accurately. It is reassuring that they are pretty efficient too. David. |