Trying to track down a pernicious bug that appears to make no sense.

Trying to track down a pernicious bug that appears to make no sense.

Given two CRGBArrays that are (currently) 120 LEDs long each, named “leds” and “actual”, the code applies changes to leds, remaps leds[] to actual[], and then displays actual[].

In an effort to reduce memory consumption, I attempted to use actual[] as a scratch space instead, remapping one 12-LED segment of leds[] to actual[] with the shuffling, then moving actual[] back to leds[] in order over the original segment of leds[] that was shuffled into actual[].

When the re-move from actual[] to leds[] occurs, BOTH arrays end up with corrupted data. This is despite the fact that actual[] is not modified in the extra step, but its data is corrupted when sent out to the physical LEDs instead of leds[]. If the shift back is not present, actual[] has expected data in the first 12 addresses, and of course leds[] has the pre-remapped data. If actual[0] through actual[11] get assigned back to the leds[] section that they originally came from (in a different order) using a basic leds[i] = actual[locali] (where i is a running count of the leds[] position and locali resets to run 0-11 each segment), both actual and the segment it was copied to show unexpected behavior.

When “corrupted”, the behavior gets very odd:
A pattern that should run from 0-11 and then repeat, using a beat() at 10 BPM (so relatively slow) with a fade(200) running against the whole set 100 times a second outside and separate from the remapping ends up with the initially-set LED apparently accurate, then skip one LED, apply a partial fade to the third, skip, apply fade to the fifth, , skip, apply fade to the 7th, and then everything else is out. The 100 fade() functions per second don’t appear to be taking effect at all.

Spitting out diagnostic information over serial, all of the values going into the array positions are correct.

The only thing that anybody has been able to figure is that multiple CRGBArray arrays may not be fully copy/assignment safe back and forth. Is this the case? Failing at that futile grasp for answers, does anybody have any other insights?

Hi +Kit Parenteau

Can you put your code in a gist or pastebin and link here please? I got lost after reading actual twice.

@Stuart_Taylor The code (and all its debugging comments and Serial prints) is at the DebuggingBranch branch here: https://github.com/Charixfox/HD120-Controller/tree/DebuggingBranch Not short, mind you.

@Kit_Parenteau after a quick look nice your remap function
You start with i=0 and then i = i - LedsPerFan;
This number is likely to be negative no ?
In your debug you display ledsperfan and no i-ledsperfan.
Actuals and leds are declared next to each other maybe that’s why you corrupt both array

@Yves_BAZIN The substraction occurs once after the first for loop increments the value of i by the amount of LedsPerFan. So it iterates through the fan doing the shuffle from leds to actual for LedsPerFan worth of i, then resets i to the same value it had at the start of the outside loop. This would allow it to, for example, given LedsPerFan = 4, iterate 0-3, end with i = 4 and exit the first inner loop, then subtract four from the counter and go back to zero to handle the same range on the counter moving things back from actual to leds.

Declaring them adjacent shouldn’t be the issue unless their internal handling in the library has an issue or the IDE compiler has an oddity.

I’ll try padding them with something else between to see, but that doesn’t inspire confidence.

Hi +Kit Parenteral what’s your background? Do you code for a living?
Also, where did this code come from, 100% you?

@Stuart_Taylor Parentearal? That got well-mangled. XD

“Knows enough to be dangerous”, “Knows some things but not others”, “Doesn’t dig into libraries”, and “Has to look up stuff”. So no, I do not code for a living. If I did, I’d be making a pretty poor living. Or worse software.

The majority of the code is mine directly. The remap premise was originally provided by a helpful user here, changed to the pre-2.1.3_FT function seen on the git repository, and then changed further with some code inclusions from Bud that added the fantype header file and the preprocessor directives.

I know enough to know that compilers and code that looks like it should work a certain way may not because computers don’t necessarily do exactly what you expect them to do. But that doesn’t mean I know what the heck they are going to end up doing when they do something unexpected, or why, or how to correct it.

I know that pointers are a pain but they’re the core of machine code. I know that types and typecasting can bite you in the tail. I know that referencing, dereferencing, and other such fun things can blow up in your face, and sometimes the problems get hidden behind something as simple as “theLeds” vs “TheLeds” when both are defined.

I know that without the ability to freeze the process and dump RAM, my job debugging becomes much more complex, as I can only view the information that is passed out of the AVR, such as the debug prints and the LED control. The LED display information gives me a general idea of what is in the arrays, but not why, or what specifically it is, or why it’s behaving oddly.

So, in general, when I hit:
scratch[0-11] gets set to array[2,3,4,5,6,7,8,9,10,11,0,1] > This works properly.
array[0-11 gets set to scratch[0-11] -> Now suddenly both array[0-11] has weird, unexpected data in it -and scratch[0-11] has unexpected data in it despite the fact that nothing in the process looked like it assigned anything TO scratch[] but only tried to assign scratch[] values to array[] by iterating through:
array[0] = scratch[0];
array[1] = scratch[1];

array[11] = scratch[11];

… I end up flummoxed on why scratch[] has odd data despite not being on the left side of the assignment.

So I implement debug serial printing to see if variables used in the loop are getting wonky and see absolutely nothing that looks off in that. I go and poke some variable types to see if int8 vs int is causing an issue and have no luck. I check with my associates who do code for a living. And when they are flummoxed too, I ask for help here because of the FastLED focus and potential insight into what would cause a loop over
array[] = scratch[];
stuff to end up modifying scratch data.

@Kit_Parenteau yes that did get mangled 0.o

I asked about your background because it gives me a better understanding of how you have approached this problem. For example, if you were a UX developer you may approach the problem a little differently than if you were a C# developer.

I had a quick scan of the code.

So this is to control some LEDs that are inside PC case fans. And you want to use serial control. I get that. I’m just not sure about the stage/live aspects. What problem are you solving here?

Why do you use pragma FASTLED_INTERNAL? And why the NAMESPACE?

@Stuart_Taylor I’ve had to dabble and learn things for far too many languages. Never enough to make a living off it, but enough to get everything done in general, and try to learn what blew up so I can fix it. Unfortunately, by “raw logic” everything should work, so I’ve entered the realm of oddities which is far beyond my expertise.

Touching on specific items…
Serial control is being phased out in the near future in favor of HID-based control. Serial control just makes it easier to send data by hand.
The stage/live aspects allows the settings to be set to a stage location and thereafter moved to live. On stage, they are present but not active. In live they are active. This allows two things:
1: commands over serial can be broken into chunks onto stage and not have partial settings applied.
2: The specialty sync function allowed timed transfers from stage to live to permit hand-coded from the computer music synchronization for a demonstration video.

The FASTLED_INTERNAL is to prevent the extra information from the library on compile, making it easier to quickly spot build errors in the results based on orange text.

The namespace is because it’s in the FastLED library examples.

As for the problem, it’s relatively focused:
If I take the code and re-enable lines 901-911 instead of 851-900, everything works just fine. Literally JUST the code on 885-887 (And closing brackets thereafter, skipping the serial.print code) breaks it weirdly. Despite only having
leds[i] = actual[thisLed];
going on, the contents of actual[] get modified, apparently prior to or as assigning the values to leds[], since leds[] gets the broken data as well.

Tons of testing later and I’ve determined that the problem occurs if any RGBArray to RGBArray assignment occurs outside the first inner loop. I took out the second loop entirely and used RGCArray built-in iteration and it still ended up with corrupted data.

Doesn’t require a second RGBArray to be involved either. I added scratch space to the end of the leds[] array and it still corrupted on the second move.

Memory consumption is fine with about 700 bytes free.

Moving data in a chunk before the inner loop and then using the inner loop to rearrange it also fails.

Hard-coding the constants numbers in the function to take the entirety of FanTypes.h out of the function still breaks.

I’m going to try to finagle a way to shift things without the loop and condition check at all.

Well, huh. Part of the issue found.

Given a CRGBArray<> leds; (Which is a CRGBSet with a pixel array), doing:

leds(12,23) = leds(0,11);

doesn’t copy the content of 0-11 into 12-23. It POINTS the contents of 12-23 at 0-11. This means that changes to 0-11 are immediately reflected in 12-23. So my code was basically creating a loopy creepy pointer tangle. I was able to reproduce the issue easily with a handful of lines of code in a fresh sketch.

The behavior is weird and I’m going to continue poking this to try to make some sense of it.