Hi, I posted videos and pictures of a clock I built a while
back for my son. Details here… https://plus.google.com/106626345342202981932/posts/SM62KcSTgon
Now he is asking me to add some new functionality and while
refreshing myself in the awful but still quite functional
spaghetti code I wrote, I am thrown off by the actual vs
expected behavior of the following rainbow function…
void rainbow (){
for(int i = 419; i >= 0; i–) {
int val = i;
for (int led = 0; led < NUM_LEDS; led++) {
val = val++ % 420;
byte hue = map (val, 0, 419, 0, 255);
leds[led] = CHSV(hue, 255, LED_MAX_BRIGHTNESS);
}
FastLED.show();
}
}
For my clock, NUM_LEDS = 555; So I would expect
FastLED.show(); to take 555 X 30µsec = 16.65 msec.
And since I loop around 420 times (The number of LEDs in my
clock section), the whole rainbow function should complete
in more or less 420 X 16.65 msec ~~ 7 seconds. Of course
here I am ignoring the time to actually execute all the
other instructions, shouldn’t they be insignificant in
comparison !?
Well, actually, it takes the function about 31 seconds to
complete!! That is over 4 times more than I expected.
Where did i mess up in my maths ?? Any ideas ??
Thanks for the shout-out in the video-- and I love the pendulum!
If you’re going to want a clock that really keeps “real time,” you’re ultimately going to need a “real time” clock chip. The internal clocks are way way to inaccurate to use as a wall clock.
Also, I believe that “map” does long division internally-- and a single long division operation ( X / Y ) can take over 600 clock cycles (37us @ 16MHz). Doing 400 of them will take nearly 1/60th of a second. You should check my math; I still need more coffee.
Ultimately, I think you’ll want to add a realtime clock chip.
To put it a different way, you are spending roughly 132us per led (24000000/(555*420)) for one complete cycle. 30us of that is writing the led data. Leaving ~100us per led. You have two divisions in there (the % 420 and the map), both of which are going to be pretty expensive. (Also are you counting the 31s by stopwatch or by counting millis?)
Hi @Mark_Kriegsman , thanks for the hint about really long execution times for some instructions. I never expected it could be so bad. I can only trust you about the maths as I have no idea how many cycles instructions take !
Note that my clock already has a DS3231 RTC, the thing has been running great and keeping accurate time for almost 2 years now.
Just trying to add new animations and needed to determine the actual max FPS I can get out of that setup of 555 WS2811 LEDs and an Arduino MEGA. I do not want to upgrade the HW although I know I could do a lot better today !!
Please do slowly enjoy your coffee first but I would appreciate if you could point out some reference about instructions execution times !
Hi @Daniel_Garcia , I measured the 31 secs with a stopwatch.
Just did a quick simplification of that rainbow function. Reduced it to…
void rainbow (){
for (int led = 0; led < NUM_LEDS; led++){
leds[led] = CHSV(led, 255, LED_MAX_BRIGHTNESS);
FastLED.show();
}
}
That now takes about 10 seconds to complete confirming the heavy impact of the % and/or map instructions.
Still a bit surprised that I do not get even closer to the expected 7 seconds. I guess I need to assimilate info about instructions execution time to explain this.
Thanks !!
The absolute max frame rate you will ever see with 555 ws2811 LEDs is 60. (~16ms per frame, without parallel output). Realistically, you are capped at 30 (assuming you don’t want to spend 100% of your time writing led data). Right now you’re getting about 14fps (which means you’re spending roughly 75% of your time prepping frames vs writing frames). As far as references (however for individual instructions, atmel publishes a data sheet of opcodes and their timings), things like division aren’t instructions - the avr chipset has no divide instruction. Also, all of Avr’s math instructions are only 8-bit. So - map has 5 parameters promoted to a long (20 bytes of data moved to call the function, which is 40 clock cycles). It then had 3 32bit subtractions (4 cycles each not including the cost of moving 4 byte values in and out of memory/registers (8-16 cycles)), 1 32-bit addition (again, 4 cycles plus up to another 8-16 cycles for memory). We’re already at 100 cycles and haven’t even touched the 32-bit multiply that will happen, which will generate a 64-bit temporary, which will then be divided by a 32 bit number which is going to be extra expensive. (Finally I’m assuming there’s interrupts firing for the RTC and Bluetooth modules?). Mark and I have looked at the disassembly of the multiply and divide functions and hand counted cycles, as well as have done some performance counting runs with them. (I should do that for map at some point to show how expensive it can be).
Hi @Mark_Kriegsman , kinda what I did there myself…
The results just shocked the crap out of me and I just could not explain it.
Thanks a lot guys for your support… I now have extra respect for your ability to get more and more Fast into FastLED !!
Also your val = val++ % 420 line is wasted. Val++ means “take the value of Val before incrementing it” - so you are effectively doing “val = val % 420;” for every led for every frame. If you are concerned about performance you should avoid doing % for any numbers that are not powers of 2. A better way to do that is “val++; if(val >= 420) val= 0;” that will be a 2 cycle addition (16 bit math on an 8 bit system), 2 cycles of comparison math, 1 cycle of status code checking and then either a 2 cycle jump or a 2 cycle 16-bit assignment of 0 to val. 7 cycles vs the potentially few dozen to few hundred of divide.
Wow @Daniel_Garcia thanks for that hint and that previous post with very detailed timing info.
Thanks again, we, HW guys need to be hand held at times…
I do stuff like that all the time. It’s part of why I’ve gotten in the habit of measuring code multiple times and re-re-counting the cycles/expected cycles. When rapidly making a lot of changes to try to figure out where a bottleneck is it can be very easy to miss the impact/scope of a change. (I’ve just gotten good at doing that triple checking before posting/uploading code - so people don’t often see that part
Hi @Daniel_Garcia and +Mark Kriegsman,
Following your very detailed comments, I started playing with my rainbow function to improve it’s efficiency.
1st) I found that, as you explained, I could easily remove the…
val = val++ % 420;
by moving the val++ into the mapping function like this…
byte hue = map (val++, 0, 419, 0, 255);
I discovered that the mapping function already deals nicely with out of range input data.
But the gain was not substantial as my function took 27 seconds instead of the 31 seconds I recorded before.
I concluded the mapping function was the worst time consumer in my function.
2nd) Tried to find a way to replace the mapping function without using any long divisions or multiplications. Here’s what… after a fair amount of crash and burns… I very proudly came up with…
val++;
byte hue = (val>>1) + (val>>4) + (val>>5) + (val>>6) + (val>>8);
That function now takes about 11 seconds to execute ! Basically I reduced my data prepping time from ~ 24 seconds to only ~ 4 seconds !!
Gimme a dozen or so more years and I’ll come help you guys to trim more fat off FastLED !!!
Seriously, thank you very much for the kind of support you guys are so graciously offering. My support issue had obviously nothing to do with FastLED and everything to do with my limited programming skills and poor grasp of the time required to execute some of the Arduino code.
You library rocks, you guys rock. Thanks to you I think I’m evolving from a mediocre programmer to a somewhat acceptable one…
Hi @Mark_Kriegsman , I am not worthy… I am not worthy…
Quick tests showed both your suggestions cut ~ 1 second off my times… they appear to be identical but I will follow the master and stick them in a long loop so that I can pick which is better…
place your bets… back in a few minutes…
Ok looped everything 30 times so that execution would take around 5 minutes… (The limit of my patience
With my method, I clocked it at 5m46s
hue = (val * 156) >> 8; clocked at 4m55s
hue = scale16by8( val, 156); clocked at 4m58s
Thanks, It was a fun exercise. Now I need to figure out exactly why… back in a few years…
Thanks Mark !!!
To make an accurate clok, check out this instructable:
Depending on your timings of setting LEDs, you can adjust the PWM signal of the interrupt generator to a fitting value. Setting up some counters could free you of % and divisions.
@Georg_Moritz , Sorry, that suggests using the Arduino clock and I know that drifts a lot more in frequency than any RTC (Real Time Clock) module.
The module I have already in there, the DS3231 is very precise and also has temperature compensated frequency control.
Additionally, it provides a battery backup to maintain real time with power failures… no Arduino does that !!