??? 03/06/09 17:47 Read: times |
#163175 - 67% of your loop was your loop Responding to: ???'s previous message |
When using the simulator, it may be better to look at the cycle count instead of the time figure.
You did 257 iterations in 67us. That is 260ns / iteration. Your clock speed was 58MHz (17ns/tick) so 260ns represents 15 clock cycles/iteration. The load instruction will take 3 clock cycles. The store instruction will take 2 clock cycles. An unrolled loop would end up with 5 clock cycles/byte, resulting or 22us at 58MHz (86ns/byte) or 17.8us st 72MHz (max speed of LPC23xx) or 69ns/byte. Some ARM chips may be slower in case the memory subsystem requires extra wait cycles. That is the reason why really fast processors adds one or more levels of data and code caches - standard big-size memory can't keep up at higher radio frequencies. In this case, your loop seem to have spent 10 clock cycles just to figure out what to do. I'm not sure why the compiler didn't do a better loop - did you use optimization? I don't have access to the ARM compiler right now, so I can't play with it. |