Hello Everyone! Code optimization question -  I've been revisiting some code for the teensy

Hello Everyone! Code optimization question -

I’ve been revisiting some code for the teensy 3.1 that uses fastLED to display pre formatted rgb data received over serial.

I have it setup where there’s 8 strips of equal length reading from various parts of the led byte array.

I’m trying to optimize it to squeeze out some better fps but I’m not sure if it’s possible, or if there’s anything left to optimze?

Would love some feedback/advice !
Here’s my code:

#include “FastLED.h”
#define NUM_STRIPS 8
#define NUM_LEDS_PER_STRIP 500
#define NUM_LEDS NUM_LEDS_PER_STRIP * NUM_STRIPS
CRGB leds[NUM_STRIPS * NUM_LEDS_PER_STRIP];

#define DATA_PIN_0 2
#define CLOCK_PIN_0 3

#define DATA_PIN_1 4
#define CLOCK_PIN_1 5

#define DATA_PIN_2 6
#define CLOCK_PIN_2 7

#define DATA_PIN_3 8
#define CLOCK_PIN_3 9

#define DATA_PIN_4 10
#define CLOCK_PIN_4 11

#define DATA_PIN_5 14
#define CLOCK_PIN_5 15

#define DATA_PIN_6 16
#define CLOCK_PIN_6 17

#define DATA_PIN_7 18
#define CLOCK_PIN_7 19

const int numOfBytes = NUM_LEDS_PER_STRIP * NUM_STRIPS * 3;

void setup() {
FastLED.addLeds<APA102, DATA_PIN_0, CLOCK_PIN_0, GBR>(leds, 0, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_1, CLOCK_PIN_1, GBR>(leds, NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_2, CLOCK_PIN_2, GBR>(leds, 2 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_3, CLOCK_PIN_3, GBR>(leds, 3 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_4, CLOCK_PIN_4, GBR>(leds, 4 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_5, CLOCK_PIN_5, GBR>(leds, 5 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_6, CLOCK_PIN_6, GBR>(leds, 6 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);
FastLED.addLeds<APA102, DATA_PIN_7, CLOCK_PIN_7, GBR>(leds, 7 * NUM_LEDS_PER_STRIP, NUM_LEDS_PER_STRIP);

delay(500);
Serial.begin(115200);
Serial.setTimeout(500);

LEDS.setBrightness(255);
}

void loop() {

if(Serial.available() > 0) {
Serial.readBytes((char*)leds, numOfBytes);
}
LEDS.show();
}

I have read that the 3.1 branch has support for parallel strip updates for the ws28xx family(which is awesome) but I wanted to use the apa-102’s for their faster speeds.

  • Is/will the parallel feature possible for strips with a clock line?
  • Are there any other ways to optimize my code?

Thanks!

What update rate are you getting now with this code? (You can check with int fps = FastLED.getFPS(); and then print it.)

wow, Nice to know that method exists.

Getting 19-21 pretty consistently

Also: what’s feeding the serial data, and at what baud rate?
You have 4,000 pixels, each takes three bytes. That’s 12,000 bytes of data per frame, which is a minimum of 96,000 bits of data. Even at 115,200 baud, that’s only about one frame per second of data. Right? So even if the receiving code is instantaneous, I think this setup is limited by the serial data transfer itself.

This is why the (awesome) FadeCandy board uses a high speed direct USB-layer protocol for data transfer, rather than serial emulation over USB. The FadeCandy protocol is much, much faster. Well into the megabit range.

The Teensy and FastLED can definitely drive those LEDs faster; this setup sounds like it’s limited by the data connection itself.

I’m also having a tough time getting data on how much of that time is going to reading the serial buffer, and how much of it is going to updating the led’s.

I notice that when I bump up the fps on the sending side on my pc, past a certain point the teensy / fastLED seems to lag behind.

Then if i pause the video source on the pc the teensy spends some time “catching up” until it reaches the point where I paused.

can you share your code -> sender and receiver?
I want to do the sam, also with APA102.
Perhaps we can do a fork and add the hardware serial from fadecandy to FastLED?

That would be awesome!

What you see above is receiver/teensy 3.1 code.

As for sending it’s coming out of a program called touch designer, which is sending 1 long byte array of values ranging from 0-255.

The send code is python and specific to TD so not sure how helpful it will be to anyone else but this is all it is:

op(‘sc_A’).sendBytes(*n0[0].vals)

n0 being a node that contains all the preformatted values

so you mean you also control this with TouchOSC? or just by touch designer on your PC?

@Mark_Kriegsman

Thanks for the input. I was afraid of that.
I didn’t realize the fadecandy had a different usb protocol.

To the best of my knowledge, the teensy emulates serial but ignores baud rate right?

Also, I’ve been referencing this page’s data on what I “should be able to acheive”

https://www.pjrc.com/teensy/benchmark_usb_serial_receive.html

I think what that page is saying is that 30,000 bytes can be received at 30 fps with out doing anything else with it.

Perhaps the issue here is that the data connection + updating 8 strips of that length in a non parallel fashion is dragging the frame rate down this far…

APA102 is more than ten times faster than WS2811/WS2812. Parallel probably isn’t the bottleneck here.

To put it another way, driving eight APA102 strips not-in-parallel is still faster than driving eight WS2811 strips in parallel. WS2811s are just that slow, and even in eight-way parallel they’re still slow.

It’s unfortunate that all these great features / protocols / elements are split between devices:

-apa102’s are super fast on their own.

-FastLED has parallel updating but for the ws28xx family only(which is awesome none the less!)

-Fade candy has a usb protocol with muuuuch higher speeds but is limited in the number of pixels it can control to 512 and can’t do apa’s.

@Mark Kriegsman, do you know how many APA102 can I run in one data/clock line?
I think depending on the rate, 1024 is not a problem perhaps 2048? or even more?

@Mark_Kriegsman Got ya, so it’s definitely communication that I need to find a work around / way of improving.

Teensy does have an ethernet shield/ library. I haven’t tried it but perhaps that would be worthy of trying out.

@Jurgen_Skrotzky

TouchOSC just controls touch designer, doesn’t generate or send the data.

Touch Designer you can make a quick node network to turn an image or animation into 0-255 rgb values then send it over serial or another protocol to the teensy.

The only reason for that so many different tools (OctoWS, FadeCandy, FastLED) do parallel output for WS2811 is because they’re so slow. Even parallel, they’re still slow. Don’t get hung upon thinking about parallel as a speed fix. It’s a way of desperately trying to get even half-way decent speed out of this one kind of LED strip which is outrageously slow.

As for the maximum number of pixels on one pin/strip, it’s all about frame rate.

WS2811-family has a maximum speed of about 30,000 pixels per second. So if you have 1,000 pixels in a single string, the absolute maximum refresh rate is 30/sec. If you only have 100 pixels in a string, you could refresh it up to 300x/second. (The WS2811 family also has a maximum of 400 updates/second, period, regardless of string length.)

The APA102 (and LPD8806) can easily handle 300,000 pixels per second per strip – or more. So on a 1,000-pixel strip, that’s 300 updates per second. On a 4,000-pixel setup, that’s still 75 updates per second. Dan and I have run some of these fast strips at over 1,000 FPS using ARM-based boards.

Now getting the data ready hundreds of times per second is another story. You can pipe it in from a PC or a network, but now you have more problems (as we’ve been discussing). Or you can generate the animation directly on the microcontroller, which is what a lot of FastLED’ers do. For piping in (video) data from a PC, there’s FadeCandy, and Pixel Pusher, and AllPixel, and others, as good alternatives as well.

@Mark_Kriegsman thx for the info. So I’ll go on using APA102 - there are really awesome.
Doing the animation on teensy is perfect by using FastLED! I like the simple API and the dithering and color correction and global brightness!

But did you have any idea how to interact with f.e. with Raspberry Pi. I want to decide or change animation logic outside the teensy - without reflash teensy each time I want to change something on effects.

Or it is a possibility to stream a “effect-logic” to FastLED, which is used inside the loop?
This would be great. So I can modify what is show on Teensy (with some delay - that is ok for me) - but without flashing it again.
So on Raspberry Pi I create or modify the animation logic or effect >> update this effect logic to Teensy >> FastLED will interprete and use this new code sequence :wink:

@Mark_Kriegsman @Daniel_Garcia is it possible to make a Raspberry Pi fork of FastLED to control APA102/WS2801 with Raspberry Pi’s SPI?
Or is the SPI too slow?

@Mark_Kriegsman
Thanks again for the info!
that’s some really helpful baseline info.

So with 8 strings/pins of 500 pixels a piece, a total of 4,000 I can still expect roughly 75 updates per second assuming data transfer / generation is out of the picture?
Are there cost / benefits to more strips and less leds? or more leds and less strips? Or does it matter?

I have the data generation taken care of with good fps it’s only the transmission of said data that’s getting me low fps it seems.

Does anyone know if the Fade candy be used to transmit more than 512 pixels worth?