One thing I'm pretty happy with in FastLED is being able to drive ws2812

One thing I’m pretty happy with in FastLED is being able to drive ws2812 LEDs while interleaving dithering and scaling (and by extension the color correction and such) in with writing the data out on avr. Unfortunately while I had the timing locked down solidly for writing a pixel out, there was always about 25-32 clocks of overhead between each pixel (1.5-2us at 16Mhz). Squeezing out that last overhead, though, had me running head first into some stupid limitations with gcc’s asm code, not least of which being there are only 30 ‘operand slots’ (note: avr has 32 registers), made worse by the fact that read/write operands took two slots.

Mediocre.

Today, after much swearing at gcc and finally breaking the back of slow compile times, I started doing some asm cleanup, and saw the path to both reducing that overhead and hiding some of it in extra space in the interleaved code. I started out with pulling some of the dithering adjustments into some gaps in the pixel writing, and then implemented the entire loop infrastructure in asm as well.

At 16mhz on the avr, after I do the last drop of the data line on the last bit in a pixel I have 5 clock cycles before I start adding overhead. I was able to get all the inter-pixel overhead either interleaved with writing data out and this last bit of 5 clock overhead.

Which is a long way of saying - FastLED 3.1 now has ZERO gap between pixels when writing out rgb data (sadly, there’s still ~70us of overhead getting to that point, the price of flexibility - though it’s been a while since I’ve done a round of optimization there, I should soon). 1 pixel is 30us, 2 pixels is 60, 10 pixels is 30us, etc…

What will you do with your extra 32 cycles per pixel?

(The attached image shows three pixels worth of data in the scope. Go ahead, try to find the pixel boundaries without counting peaks :slight_smile:

Sadly, 8mhz avrs will have 3 clock cycles of overhead per pixel.

But at 16Mhz: exactly zero cycles wasted. It can go no faster.
This, ladies and gentlemen, is what we call the speed of light.

Blah, there’s a bug with 8mhz output, will fix later.

I continue to be impressed by your dedication to squeeze every last drop out of these things. Great work @Daniel_Garcia ​!

The fact that I don’t have a clue what you’re talking about emphasizes how glad I am people with your skills are working on stuff like this… for people like me. Kudos!

@allanGEE : with WS2811/WS2812B “Neopixels”, FastLED has to send the RGB values for each pixel from the Arduino to the LED strip at a very very precise rate, otherwise the colors come out all mangled or not at all, and we’ve had that very precise data rate for each pixel working perfectly for quite some time now. However, in the past, there have been a few millionths of a second ‘pause’ between the transmission of each pixel’s data. It’s been a tiny, tiny effect, but it’s been in there.

What Dan is (correctly!) bragging about here is that he’s now optimized the FastLED ‘data transmission’ code so perfectly that there’s literally, numerically, exactly ZERO pause between pixels. FastLED is now sending the pixel data to the LED strip as fast as physically possible, while still having the LEDs operate correctly. We’ve now hit the ‘speed of light’ for Neopixels. If you look at the scope trace picture above that Dan posted, you’ll see just uniform pulses – no ‘slight gaps’ between pixel groups at all.

The fact that FastLED is doing realtime dithering, and realtime brightness adjustment, and realtime color correction, and is transmitting pixel data ‘at the speed of light’, even on ATtiny’s, is an accomplishment worth crowing about.

I am going to update my ∞ heart project with this new update. I think I am still running it on FastSPI_LED2. With 778 LEDs those extra cycles are going to make a huge difference. I recently upgrade it to a Teensy and unfortunately it screwed up the timing of all the patters (much faster now)!

Awesome work @Daniel_Garcia ​. I tip my hat to you!

This is bad ass.