Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of beginner FPGA projects are just crappy microcontroller / crappy microprocessor projects.

I'm thinking back to my college years, where I spent about 70% of the LUTs of our little FPGA board making a Wallace Tree Multiplier. Yes, good to learn Verilog over, good for learning how half-adders and adders could work together to make bigger circuits and all that, but its not exactly a good use of FPGA capabilities.

Given how many chips are available today on the market, what are hobby-level FPGA designs that truly take advantage of custom logic in a way that a microcontroller and/or microprocessor (or other common parts) cannot replicate?

---------

Looking at history: I think the traditional use of FPGAs and/or ASICs were matrix multiplication routines, specifically Reed Solomon error correction codes. The most common implementation was probably CD-ROM error correction IIRC.

But I'd argue that such routines are doable with ARM Neon these days, especially with PMULL (Neon carryless multiplication, specifically designed to accelerate Galois Field multiplication). And a lot of other matrix-multiplications are likely an ARM Neon problem solvable with a tiny Cortex-A5 or Cortex-A7. (These CPUs are available at $8 to $20 price points, far cheaper than an FPGA, and they run Linux so they're also easier to program for than learning Verilog). Microchip's SAMA5D2 for example is like $10 and a total solution is under 500mW of power consumption (DDR2 included).

I think communications is the right overall idea. A lot of problems come down to large matrix-multiplication or other large-scale compute problems. But a lot of radio circuits (ex: Bluetooth, LoRa, Zigbee, etc. etc.) already have ASICs. Perhaps communication protocols itself need experimentation, and FPGAs are best at that?

I do think that a low-cost, low-latency, low-power communication protocol should be invented for wired communications, or infrared, etc. etc. And that might make more sense to FPGA-out rather than using a microprocessor / SIMD / ARM-Neon on.



The number of compute-focused applications that are better on FPGA is going to be tiny. Doubly so if low-end, triply so if not real-time.

FPGAs shine in hard real-time applications and as "EE Duct Tape," but almost never as raw compute, even if your utilization is rather high. If you need to slurp in data from a bunch of ADCs at many GB/s and do signal processing without missing a sample, FPGA shine. Radar, sonar, signal analyzers, beam forming, that sort of thing. If you need to connect PC buses (PCIe, Ethernet) together in a novel fashion, say because you are prototyping a new PC chip or router or building AWS, then FPGAs shine. The moment volume gets high, the scales tip back towards ASICs, but many important applications are intrinsically low volume. Often in prototyping, but sometimes in deployment too. How many F-22s exist? Only about 200. Custom chips wouldn't come close to filling a FOUP, so you can bet your bottom that they (and the labs that engineered them) are full of FPGAs.

The world is full of "Look ma, I did a FPGA" projects that in the real world would have absolutely no business running on a FPGA. That's fine, we all need to train on something, but the natural inclination to overstate scope of these pet projects can be confusing unless you know that real FPGA applications are confined to narrow (but extremely important and exciting and valuable) verticals.


According to this the F22 used an Intel i960MX- + a custom DSP from Raytheon derived from the radar processor on the F-15. but that the engineers expected to replace the DSP with a PowerPC chip.

https://www.militaryaerospace.com/computers/article/16710716...


Well... I'm thinking from the perspective of a hobby-engineer. Not so much F35 scale.

iCE40 is a $6 surface mount chip, which means I'm comparing it against all other $1 to $20 chips within my capability to put into OSHPark's 6-layer PCB-layout service.

My toolbox includes 8-bit uCs like AVR (ATMega, AVR DD, AVR EA), 16-bits like MSP430, 32-bits like Cortex-M0+, M4, M7. It includes Linux-scale Microprocessors like Microchip SAMA5D2, Microchip SAM9x60-D1G, or Boards like Beaglebone or Rasp. Pi. (And yes, I've double-checked. These 0.80mm pitch BGAs seem like they fit and route on OSHPark's 6layer 5mil trace/space impedance controlled specifications)

So where does an FPGA fit inside of here?

--------

Strangely enough, "Glue Logic" is an 8-bit territory these days. AVR DD has CCL, which are a 4x 3-input LUTs + 2x JK Flipflops + Event system that executes even while the 8-bit CPU is asleep.

See here: https://ww1.microchip.com/downloads/en/AppNotes/TB3218-Getti...

So the smallest "glue logic" purposes of FPGAs is... well... outcompeted. The $1 uCs are beating FPGAs at this particular task now. I truly can configure 12 input pins of the 8-bit uCs + 4-output pins to act as simple glue-logic fully async from the uC's clock (IE: zero code / MHz used, still functional during sleep, etc. etc. Bonus points, Event-routing system means that events route to the ADC/Timers/etc. etc. even while uC is sleeping, for maximum power efficiency). If some latency can be tolerated, you can even hook up these CCL / routing to interrupts and run a bit of code on it.

AVR DD's CCL isn't good enough for any serious design like a 32-bit LSFR. But you know, a CRC32 (LSFR implementation) probably would be best done on such an iCE40 FPGA rather than the 8-bitter's piss-poor compute capabilities. But 3x AND gates + 1x XOR gate scattered across the board? That's an 8-bitter job today.

---------

I think the answer for "What is the best total solution under $50" will likely be microprocessors and full scale chips. (Or even a full sized SBC like Rasp. Pi or Beaglebone).

But if we change the question to "What is the best total solution under 50mA", suddenly the FPGA is far more competitive. FPGAs aren't that expensive, now that I'm looking up these tiny iCE40 chips. But 1k LUTs is still pretty small.

Speaking of which: ouch. A lot of iCE40 are 0.40mm and 0.50mm pitch BGAs, so no OSHPark 6-layer for those. QFN and TQFP are available though. So just be careful about chip selection and think about the PCB you're planning to use with these chips.


> iCE40 is a $6 surface mount chip, which means I'm comparing it against all other $1 to $20 chips within my capability to put into OSHPark's 6-layer PCB-layout service.

If you are a hobby EE (and works as a software engineer for your day job), $6 is negligible. Some of the higher end RF chips cost 3 figures per chip. Cost of BOM only truly matters at scale.


I dunno. I think my mental model for my hobby stuff is that I'm aiming for a small-run (1000 or less) Etsy store kinda deal.

IE: I'm going to sell something for $150 to $500 in relatively small numbers, that meaningfully helps people with some specialized niche task that big companies are blind to... with a BoM aimed at maybe $30 and an overall production line of 1-hour (assembly time + testing / manufacturing / boxing) time or less, since I'd likely be the only person boxing these devices up and shipping them out.

I mean, ideally maybe like 10-minutes assembly time or shorter really. Depends on how much time you're valuing your labor.

I bought an HDMI lag tester that proved whether monitors for the fighting-game community were 18ms lag or 30ms lag, since the fighting game community is very, very, very particular about tournament setups. There's no way a device like this would make a sale at the large scale, but that's the kind of "Etsy-project" that I literally bought back when I cared a lot of about getting my home setup close to tournament specs.

Or perhaps $300 joysticks custom built to look/feel like arcade sticks, at least before Madcatz / big guys started making them.

In case you're curious: this was a $120 doohicky that was an HDMI output signal that flashed white-rectangles on the top-left, top-right, center, bottom-left, and bottom-right of the screen, .... plus a photodiode that accurately measured when the HDMI-signal went out minus the latency to the milisecond. The last time detected was updated through HDMI output.

This is a project most of us hobby EEs could accomplish and likely sell on Etsy. But we gotta keep costs down below $30 BoM in practice. Its a meaningful project and something good tournament organizers knew to buy and test with.

---------

I've heard the estimate that for hobby / Etsy store level manufacturing, you're looking at 5x BoM for a fair price. Ex: $20 BoM sells at $100, $100BoM sells at $500. If you can't accomplish this, then your business idea sucks, go think of another more profitable idea. If this niche product exists, then you've got a potential Etsy-business idea.

I think there's a good market for $100 to $500 specialist niche electronic tools like this, taking advantage of the small sizes of communities, small scale of builds, small markets, etc. etc. (If it were a large market, Hasbro or Nintendo or some "big guy" will jump in and likely take your market. If its like 1000 total lifetime sales, that's enough to make the hobby worth it but small enough that no big company would tackle that niche).

If you're talking about $500 parts, then we're talking about $2500 sales price (using the 5x BoM fair price scaling as a mental model), which is likely outside of the hobby/Etsy craft tool for niche subject market.

There's a lot of hobbies out there where $100 to $500 tools (ex: $100 HDMI lag tester, $300 joystick, Replicated Pop'n Music controller), is fair. Going above $500 or $1000 Bill-of-Materials (aka: sales prices in $2500+ range) kind of gets you back into professional tools and you're suddenly a loser.


Ah you plan to sell it, I thought you meant just building it for personal use. Yeah for production BOM optimization is an entirely different story.


Or at least, I'm pretending that I'll sell it. Lol.

No promises. But if something looks good enough maybe I'll ramp it up to a real production run.


There are still places I see FPGAs used by hobbyists, for example Hams working with Software Defined Radio, game console emulators with a focus on correct timing, other retro computing where FPGAs can replace/upgrade components that are hard to find.


> So where does an FPGA fit inside of here?

It doesn't. You're not missing anything.


To answer my own question, I've decided to look up the specs of Lattice Semiconductor's iCE-40 LM1K FPGA. This is very small, just 1k LUTs. But a lot of these "matrix multiplications" and Galois-field stuff simplify down into absurdly small linear-shift-feedback registers in practice (!!). At least for encoding (decoding is far more difficult).

With that in mind, these iCE-40 low-power devices are claiming to be of the ~10mA class, which puts them in the small microcontroller region. (Ex: RP2040 is 20mA, so we're already undercutting RP2040 let alone a proper Cortex-A level chip).

So... yeah. Okay, I see the use. But that's still a _lot_ of extra work compared to grabbing an off-the-shelf Cortex-A5, lol. But given the right power constraints, I can imagine that the $6 to $20 FPGA / iCE-40 would be more useful than adding a full size Cortex-A5 (or better) with SIMD / other such advance computational instruction sets.

Ex: I think I'd be able to program a LSFR for 8-bit Reed Solomon encoding (Galois add/multiply) that'd pair up with a standard microcontroller (think any ARM-Cortex M4 here), all for a total solution power consumption under 20mA going full tilt.

Since DDR2 RAM starts at like 100mA power consumption, there's a lot of FPGA+Microcontroller that you can fit before even the smallest microprocessors (aka: Cortex-A5) make sense.

----------

So I'm thinking that a small microcontroller that needs to write-only communicate over a noisy channel could in practice, require a Reed Solomon encoder (or turbocodes or whatever modern crap exists. I'm not up-to-date with the latest techniques). Reed Solomon encoder is 100% better on an FPGA since its just a linear shift feedback register.

Or heck, the matrix-multiplication to decode a Reed Solomon error correction scheme is surprisingly compute heavy, and might also be superior on an FPGA than the 10mA class uC.


In my day job I work on a product that has FPGAs, and we don't do a single matrix multiplication.

We use them primarily for performant interface with obscure bus protocols, where high performance variously means high throughput (tens of Gbps) with zero acceptable loss, or low latency (interpret the bus protocol and produce the correct response in <10ns), but amusingly for our particular application, not usually both at the same time.

Our volume is too low and the set of bus protocols we need to interact with changes too rapidly for ASICs to be economical. And it's not possible to meet our performance targets with off the shelf SoCs alone or discrete logic gates.

Although I agree with your point that its hard to beat CPUs (and GPUs) when your needs are primarily computation.


One common student project we had used the FPGA to generate a (VGA*) video signal. For example using the onboard ADC to sample a signal and visualise the waveforms. A more advanced idea was to also implement a line-drawing algorithm on the FPGA to generate wireframe graphics. While this can also be done on a microcontroller and some even include video outputs and GPUs, I think it is a nice way to see on a low level how to generate the signals with the correct timing. I used this for example to add a video output to a Gameboy.

Another a bit more exotic and involved application is a Time to Digital Converter, which can take advantage of the low-level routing inside the FPGA to sample a digital signal with significantly higher precision than the clock (resolutions of 10s of picoseconds down to below 10ps depending on the FPGA).

For work, we mostly use FPGAs for data acquisition systems, low level data processing, high speed data links and so on.


Alas, modern embedded screens (ex: NewhavenDisplays) are either SPI (for small screens) or "8080-protocol" (8080 bus-like protocol) on the faster / larger screens and somewhat easily implemented using bitbanging. So VGA is somewhat out-of-date for a hobbyist, the market has moved on from VGA in practice.

> Another a bit more exotic and involved application is a Time to Digital Converter, which can take advantage of the low-level routing inside the FPGA to sample a digital signal with significantly higher precision than the clock (resolutions of 10s of picoseconds down to below 10ps depending on the FPGA).

That certainly sounds doable and not too difficult to think about actually. But as you mentioned, its exotic. I don't think many people need picosecond resolution timing, lol.

Still, the timing idea is overall correct as an FPGA-superpower. While picosecond resolution is stupidly exotic, I think even single-digit nanosecond-level timing is actually well within a hobbyist's possible day-to-day. (Ex: a 20MHz clock is just 50 nanoseconds, and bit-stuffing so that you pass 4-bits of info / 16-time slots per clock tick means needing to accurately measure the latency of 3.125ns level signals...). This is neither exotic nor complicated anymore, and is "just" a simple 80Mbit encoding scheme that probably has real applicability as a custom low-power protocol.

And its so simple that it'd only use a few dozen or so LUTs of a FPGA to accurately encode/decode.

Ex: 0000 is encoded with a 0ns phase delay off the master clock.

0001 is encoded as 3.125ns phase delay off the clock.

0010 is encoded as 6.25ns phase delay off the clock.

... (etc. etc.)

1111 is encoded as 46.875ns phase delay off the master clock.


Yes, VGA is really not very useful nowadays, but I think it is still a useful (student) project for FPGA beginners that is relatively easy to implement, more exciting than blinking an LED and can be built on for other things.

The downside of SPI (and to some degree 8080) screens is the low refresh rate / missing vsync. There are also screens with an RGB interface, which is then again similar to VGA but digital. But yes, this does not really require an FPGA and an ARM controller with RGB interface is probably much more useful for most applications. (Or even MIPI-DSI, but I have not used it myself so far.)

Still, I have a TFP410 lying around that I wanted to strap to my FPGA at some point to get something better than VGA.

> Still, the timing idea is overall correct as an FPGA-superpower.

And while this is especially true on FPGAs with dedicated hardware like a serdes or gearbox, one can still squeeze out a bit more on most FPGAs with DDR IO or several phase-shifted clocks.


Jump Trading has a FPGA team. There are always job openings for it. Not really a hobby project, but it gives you an idea of real world applications.

>We’re looking for brilliant engineering talent to join our FPGA team that is building next-generation, ultra-low-latency systems to power trading with machine learning and other algorithms on a global scale.

>You’ll work alongside a small team of experienced engineers who came to Jump from leading companies in FPGAs, semiconductors, networking cards, and more… as well as PhDs from top FPGA research labs around the world.

https://www.jumptrading.com/careers/5305081/?gh_jid=5305081


"Given how many chips are available today on the market, what are hobby-level FPGA designs that truly take advantage of custom logic in a way that a microcontroller and/or microprocessor (or other common parts) cannot replicate?"

Any boolean-logic heavy workload such as password cracking or SHA256-mining (Bitcoin) is perfectly suited for FPGA platforms and will outperform any microprocessor or GPU in terms of performance per watt. For example in the early days of Bitcoin, FPGAs such as the Xilinx XC6SLX150 ruled mining, and many such implementations were developed by hobbyists.


I honestly don't think its possible to implement SHA256 on 1k LUTs that's discussed by these FPGA dev boards in this post. (Let alone an implementation that's going to beat out traditional CPUs or GPUs).

Like seriously: 1k x 4-LUTs means that these iCE40 FPGAs has 4096-total inputs to all of their logic. SHA256 has ya know, 256-bits of input and probably takes more than 16 "steps" to implement even with a perfectly route. (But if anyone proves me wrong, consider me happy).

You're thinking orders of magnitude too big here. The FPGAs described in this post are much, much, much smaller.


Oh, right, not 1k LUTs. But toward the $120 range, such as the Digilent Arty S7 listed in the post, with 23k LUTs, it's likely possible to implement SHA256 cracking or mining and beat a CPU or GPU in performance/watt. Probably not performance/dollar though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: