??? 11/26/07 16:03 Read: times |
#147413 - nevertheless ... Responding to: ???'s previous message |
Andy Peters said:
Richard Erlacher said:
... as one might encounter when combining several blocks of on board RAM to form a deeper, wider one? What's the price one has to pay, in quantative terms? With 4-bit LUT's, it's going to be quite a bit of delay, when one needs a 36-bit-wide RAM buffer, 32K-words deep, isn't it? Enough to warrant considering another solution, maybe? Richard, Before condemning muxes built from simple 4-LUTs as being too slow, it might be worth your while to code your buffer or mux or whatever, then synthesize and implement it, choosing a Spartan 3E as the target. Betcha it'll be "fast enough." It won't be fast enough unless it's at least as fast, with respect to claimed RAM speed, as the tristate option was in earlier generations. If using a MUX is slower than the tristate bus would have been, then it's a non-starter. If the deep MUX makes the access to internally shared resources slower than external resources would be, then it's really not very helpful, is it? In other words, the "another solution" worth considering is a newer device. And those newer devices might be cheaper, too.
Besides, 32k words x 36 bits requires 1,179,648 bits. Spartan 2 and 2E have nowhere near that much block RAM, and you're into the very large (and $$$$) XC3S4000 or XC3S5000 or XC3SD1800A devices or bigger Virtex 2/4/5 devices if you want it internal. No Spartan 3, 3E or 3A device has that much block RAM. You're probably better off with an external SRAM. Fortunately, I don't nave to build a fifo that big in FPGA, but it does make a good example, doesn't it? -a You're right, certainly, about the amount of RAM currently available as block RAM, even in BIG devices. However, even if it's only 8 bits wide, if it's a deep mux, it's still slowed by the delay of a LUT for each one that's needed to address the RAM. If the RAM is comprised of 4Kb blocks, then there have to be 8 of them to generate a 32k word depth. Moreover, there have to be 36 of them to make a 36-bit width. You've not addressed the question, which was how much delay is added by each LUT in the chain. The delay would be the same with a narrow or a wide word, but the width does multiply the amount of logic linearly with width. What is hidden in the fine print is the delay inherent in using the highly touted block ram imposes. Even more obscure, at least in the Spartan-II family with which I'm somewhat familiar, and which DOES have tristate resources, is the penalty for concatenating "distributed" RAM (unused LUT's) via muxes rather than tristate buses. I'm not saying that it should be different, as I can't imagine how that could happen, but the fact is that FPGA use is cheapest when the gate count is the largest. Of course, too, that applies only to the gate count as interpreted for us by the marketing guys, who believe, somehow, that a 3 input gate is the same count as four two-input gates, and that a D flipflop is 14 gates, while I always thought of it as 5 or 6. Since the fastest FPGA families are also the most costly, I'd think it important to preserve the high speed for which I'm paying. If I can't do that, thanks to the lack of a fast way to utilize all those otherwise unused gates (LUT's) as RAM when I need it, I lose interest in paying the high price. The availability of a hard power-pc core doesn't impress me when I don't want to use it. Likewise, the fact that a single LUT is capable of high speed in the large, costly FPGA doesn't help me if I just need 200K "gates" (as defined by the marketing guys), which really only amounts to 40k "real" gates. The ultra-fast multipliers don't help much, either, if I'm not using 'em. <...stepping up onto the highest soapbox ...> It's like the cellphone industry ... since they can't build a really good cellphone, or a solid network, they build a not-so-good one instead, and clutter it up with lots of unhelpful not-so-good features ... features they can advertise as "kewl", like cameras, texting, music, and real-time video, features that squander the user-paid-for bandwidth without improving the functionality of the primary instrument. <... OK ... enough of the soapbox rant ...> All this makes me wonder whom they think they're helping by eliminating the tristate resources. It's a done deal, of course, but they boast about larger amounts of RAM yet don't provide a way to use it at the rates they gleefully advertise. If it's faster to use external RAM than the internal RAM, for which they've sacrificed considerable logic, whom does it help? I'd really like to see a mux arrangement that offers as much benefit as the tristate bus that once was available, at least in "Brand-X" parts. I'd really like a clear way to utilize all those gates the marketing guys shout about in some beneficial way, since they charge by the k-gate. As far as I'm concerned, FPGA technology is what it is, but, to me, it's a big disappointment, as I know what it could have been. RE |
Topic | Author | Date |
Tri-state busses in FPGAs | 01/01/70 00:00 | |
Tristate Buffers (TBUFs) have been phased out | 01/01/70 00:00 | |
Thank you | 01/01/70 00:00 | |
Closing the loop | 01/01/70 00:00 | |
siumulate? | 01/01/70 00:00 | |
I didn't simulate it (yet) | 01/01/70 00:00 | |
hmmm | 01/01/70 00:00 | |
So ... what about a BIG multi-party bus? | 01/01/70 00:00 | |
delay | 01/01/70 00:00 | |
nevertheless ... | 01/01/70 00:00 | |
re: nevertheless | 01/01/70 00:00 | |
What disappoints me is the advertising vs reality | 01/01/70 00:00 | |
advertising | 01/01/70 00:00 | |
advertising, badvertising ... lies! | 01/01/70 00:00 | |
oy | 01/01/70 00:00 | |
If only one could rely on them ... | 01/01/70 00:00 | |
largely, it's because it's not an option | 01/01/70 00:00 | |
Zackly | 01/01/70 00:00 | |
If you have internal tristate resources ... | 01/01/70 00:00 | |
I have new worries now | 01/01/70 00:00 | |
tristates in FPGAs | 01/01/70 00:00 |