"The idea of an FPGA makes it more configurable than a CPU, but that configuration can then be taken to a fab and made into an optimized chip for better performance and density. "
And speed. FPGAs have been limited in that area. The tools have also been rather proprietary to a given vendor.
You should change those "9m" and "4m" etc to capital M's. I kept reading to find out what these 9-meter and 4-meter cells are, but apparently (still not sure) the m somehow means millions.
Logic cells are a purely virtual marketing number, the meaningful number is the LUT6 and flipflop count. They invented LCs to show that not all LUTs are the same, but the Logic cell/LUT ratio is just an arbitrary multiplier (that they've apparently bumped to 2.1 for some inscrutable reason. It used to be 1.4, then ~1.6).
Thanks Ian! Impressive indeed, but I fear it's quite out of my price range. Did they say how much it is? Also, while you mention the ability of a large FGPA to simulate a CPU like the A55, a major use of these is to allow people to prototype or even implement a custom ASIC without having to fab a custom ASIC. Can come in handy if the task is still in flux. For example, one could implement an AV1 encoder on an FPGA like this in hardware, which is likely much more efficient and faster than any software-based solution. Also, might allow a deep learning system to optimize itself also in hardware, which could be an approach to beat even that ginormous Cerebras chip .
Just look at nVidia's g-sync. To my knowledge, their still using an FGPA, because making a custom ASIC would not be cost-efficient. So often a large FGPA, while much slower than an ASIC, can completely remove the need to fab an ASIC depending on the volume you have to manufacture.
From my perspective, I wish we will get more and more of these at consumer price, to emulate old hardware.
For the price it depends on the quantity and the provider. One of the most expensive provider route is convenience retailers such as DigiKey who usually sell devices in small quantities for testing purposes mostly. In DigiKey the entry price for a VU440 in $36000. A VU13P costs at least $45000 and can go up to $80000 depending on the package used (IO capabilites etc) and speed/temperature grades.
About GSync I believe that at least the first versions were using an Altera FPGA, an Arria V GX iirc. It's not that expensive, definitely not something in the same world as the above mentionned FPGAs, but still not cheap enough (around $250 on Digikey, so maybe somewhere around $100 for mass prod.). I believe they could have used a Xilinx Artix-7 to do the same job, as I believe the limiting factor to use a cheaper device was the IO performance, and in terms of high speed transceiver vs cost, the Artix-7 does the best for video application imo (at least half the price of an Arria V GX). I guess the Artix 7 wasn't available yet when they did the design of the gsync module.
@Ian (Article): "Along with the 9m logic gates, there is also over 2000 IO segments for 4.5 Terabits of transceiver bandwidth (80 lanes of 28G) and 1.5 Terabits of DDR4 memory bandwidth, which the company states will help its customers create designs featuring multiple VU19P chips in one system with all-to-all connectivity topology."
Do we not need a time scale to get a bandwidth? My understanding was that 4.5 Terabits (562.5 Gigabytes) and 1.5 Terabits (187.5 Gigabytes) are storage capacities. You would need something like 4.5 Terabits per second and 1.5 Terabits per second to get a bandwidth. Of course I could be completely off base. In which case, feel free to call me out on it and publicly revoke my data science cred.
I'm pretty sure all these figures are "per second", when you're talking about a lot of bandwidth figures it's really easy to get lazy and drop the "per second" across the board.
Hardly you can do a such large piece of silicon on 7nm, not a way. Definition of leading node please. Advantages of a fine node: Density? true but if you can't do full reticle you are doomed. Speed? 28nm 20nm,14nm, 12nm are better. Power? modern 22nm and 20nm SOI or not SOI are better and with less power density in demanding applications. Yields? unfortunately no, they go down with geometry. Process variation? again no, crazy high right now on 10nm and 7nm. Reliablility? No. In my Knowledge only a Company can give some insurance of long term reliability on 7nm class processes.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
17 Comments
Back to Article
Threska - Tuesday, August 27, 2019 - link
"The idea of an FPGA makes it more configurable than a CPU, but that configuration can then be taken to a fab and made into an optimized chip for better performance and density. "And speed. FPGAs have been limited in that area. The tools have also been rather proprietary to a given vendor.
MrSpadge - Wednesday, August 28, 2019 - link
I would say "speed" is included in "performance".ajp_anton - Tuesday, August 27, 2019 - link
You should change those "9m" and "4m" etc to capital M's. I kept reading to find out what these 9-meter and 4-meter cells are, but apparently (still not sure) the m somehow means millions.The Chill Blueberry - Tuesday, August 27, 2019 - link
It's millions of logic cells.. Whatever those are.mlkj - Tuesday, August 27, 2019 - link
Logic cells are a purely virtual marketing number, the meaningful number is the LUT6 and flipflop count.They invented LCs to show that not all LUTs are the same, but the Logic cell/LUT ratio is just an arbitrary multiplier (that they've apparently bumped to 2.1 for some inscrutable reason. It used to be 1.4, then ~1.6).
eastcoast_pete - Tuesday, August 27, 2019 - link
Thanks Ian! Impressive indeed, but I fear it's quite out of my price range. Did they say how much it is? Also, while you mention the ability of a large FGPA to simulate a CPU like the A55, a major use of these is to allow people to prototype or even implement a custom ASIC without having to fab a custom ASIC. Can come in handy if the task is still in flux. For example, one could implement an AV1 encoder on an FPGA like this in hardware, which is likely much more efficient and faster than any software-based solution. Also, might allow a deep learning system to optimize itself also in hardware, which could be an approach to beat even that ginormous Cerebras chip .Samus - Tuesday, August 27, 2019 - link
Yeah....This is probably like a $10,000 chip.Andy Chow - Tuesday, September 3, 2019 - link
Just look at nVidia's g-sync. To my knowledge, their still using an FGPA, because making a custom ASIC would not be cost-efficient. So often a large FGPA, while much slower than an ASIC, can completely remove the need to fab an ASIC depending on the volume you have to manufacture.From my perspective, I wish we will get more and more of these at consumer price, to emulate old hardware.
TokyoQuaSar - Wednesday, September 25, 2019 - link
For the price it depends on the quantity and the provider. One of the most expensive provider route is convenience retailers such as DigiKey who usually sell devices in small quantities for testing purposes mostly. In DigiKey the entry price for a VU440 in $36000. A VU13P costs at least $45000 and can go up to $80000 depending on the package used (IO capabilites etc) and speed/temperature grades.TokyoQuaSar - Wednesday, September 25, 2019 - link
About GSync I believe that at least the first versions were using an Altera FPGA, an Arria V GX iirc. It's not that expensive, definitely not something in the same world as the above mentionned FPGAs, but still not cheap enough (around $250 on Digikey, so maybe somewhere around $100 for mass prod.). I believe they could have used a Xilinx Artix-7 to do the same job, as I believe the limiting factor to use a cheaper device was the IO performance, and in terms of high speed transceiver vs cost, the Artix-7 does the best for video application imo (at least half the price of an Arria V GX). I guess the Artix 7 wasn't available yet when they did the design of the gsync module.SSNSeawolf - Tuesday, August 27, 2019 - link
Ian's appetite for silicon raises more questions than it answers.MrSpadge - Wednesday, August 28, 2019 - link
Well, he's still hungry for news :)asmian - Tuesday, August 27, 2019 - link
The curse of the absent editor strikes again...seamless : without seams
BurntMyBacon - Wednesday, August 28, 2019 - link
@Ian (Article): "Along with the 9m logic gates, there is also over 2000 IO segments for 4.5 Terabits of transceiver bandwidth (80 lanes of 28G) and 1.5 Terabits of DDR4 memory bandwidth, which the company states will help its customers create designs featuring multiple VU19P chips in one system with all-to-all connectivity topology."Do we not need a time scale to get a bandwidth? My understanding was that 4.5 Terabits (562.5 Gigabytes) and 1.5 Terabits (187.5 Gigabytes) are storage capacities. You would need something like 4.5 Terabits per second and 1.5 Terabits per second to get a bandwidth. Of course I could be completely off base. In which case, feel free to call me out on it and publicly revoke my data science cred.
mm0zct - Wednesday, August 28, 2019 - link
I'm pretty sure all these figures are "per second", when you're talking about a lot of bandwidth figures it's really easy to get lazy and drop the "per second" across the board.ksec - Wednesday, August 28, 2019 - link
What isn't it built on leading node like 7nm?Gondalf - Saturday, August 31, 2019 - link
Hardly you can do a such large piece of silicon on 7nm, not a way.Definition of leading node please. Advantages of a fine node: Density? true but if you can't do full reticle you are doomed. Speed? 28nm 20nm,14nm, 12nm are better. Power? modern 22nm and 20nm SOI or not SOI are better and with less power density in demanding applications. Yields? unfortunately no, they go down with geometry. Process variation? again no, crazy high right now on 10nm and 7nm. Reliablility? No. In my Knowledge only a Company can give some insurance of long term reliability on 7nm class processes.