I have a question about the efficiency of parallel output using WS2811_PORTA, B, C,

I have a question about the efficiency of parallel output using WS2811_PORTA, B, C, etc…

Clearly, using these special parallel port definitions will be faster than writing to 8 separate pins, but what about for say 2 pins?

I can just call FastLED.addLeds twice using the same pixel array and different pins and get parallel output at some performance cost (minimal memory cost). Is WS2811_PORT* faster in every case?

Thanks in advance.

If you are calling addLeds twice you won’t get parallel output - you’ll get output on one pin followed by output on the other pin. Fastled only parallelizes the output when you use the PORT* arrangement or OctoWS2811 driver.

Ok, so gotcha that calling addLeds twice isn’t what you would call parallel output. But it does output the same data to both pins as long as you pass the same LED array, so the net result is similar.

The question remains – is parallel output with PORT* faster than writing pins separately when the number of pins is small (like 2)?

I’m actually about to do a qualitative side-by-side test so I’ll post my result soon.

Okay… I think I was missing something important about the meaning of “parallel output” as the term is being used here. I was assuming that, because of the word “parallel” that all strips would be showing the exact same color data. Looking at the example it’s apparent that the total number of pixels equals the number of pins multiplied by the number of LEDs per strip. In this case, parallel refers to the amount of time it takes to write the data to the pins. (I don’t understand how this can be truly parallel with a microcontroller that doesn’t support threading, but I’ll take your word for it that it’s effectively parallel).

This is amazing and way cooler than my interpretation of the concept, but I must say that it’s in no way intuitive from the description of Parallel-Output in the wiki.

For me, parallel output has meant connecting the data input of multiple strips (or module strings) to a single PWM output pin on the Arduino – the strips are all wired in parallel, so “parallel” seemed to be an appropriate word here. I’ve had this work fine with up to 6 strips, but depending on the LEDs, problems can arise in the form of flickering and bad colors, especially with greater numbers of strips.

To fix the flickering issue I usually add additional output pins. In one case I connected 3 strips to each of 6 pins for 18 strips total.

The important thing to emphasize here is that my goal was to make every strip do the exact same thing. Topology was making it impractical to connect all the pixels in series, plus that would be wasteful from a memory usage standpoint. So, I call addLeds for each pin but use the same array so that memory is conserved. The downside is that each additional pin used decreases the framerate.

So, back to Parallel Output (your parallel data writing, not parallel wiring) – I’m wondering how this could be used to speed up my framerates. It seems like I can just follow the example and copy data N times in the array, but this would unnecessarily waste memory. There would also be the copying process in each cycle. Is there syntax to specify “use the same array for all pins”?

Finally (sorry for the long post), in case others find themselves similarly confused by the word “parallel”, maybe a brief description of what is meant by parallel output and a differentiation from parallel wiring would be helpful at https://github.com/FastLED/FastLED/wiki/Parallel-Output

Thank you so much, and as always keep up the great work!

I tried a successful experiment. Here were the conditions:

Arduino Due controller
6 WS2812b strips, 5 meters, 300 pixels each
3 strips connected to pin 25 and
3 strips connected to pin 26

addLeds called as:
LEDS.addLeds<WS2811_PORTD, 2>(leds, 300);
leds array defined 600 pixels in length
added a function to copy the first half of the array to the second half of the array (called each cycle)

The result:

All 6 strips show the same data. Visually, I can see a significant speed increase vs. non-parallel output. Probably close to twice the framerate.

So YAY! Except, the array is twice as big as it was before and I have the extra function to duplicate the data.

Would be great to avoid this but for now I’m happy.

@Todd_Moyer for your use case you might want to just add some buffering circuitry to get around your parallel wiring issue. Read about digital logic fan out: https://en.m.wikipedia.org/wiki/Fan-out