I just discovered FastLED's mod8 (and also addmod8).  Besides FastLED's mod8 being designed to

I just discovered FastLED’s mod8 (and also addmod8). Besides FastLED’s mod8 being designed to operate only on unsigned 8 bit numbers, are there any other differences between mod8 and the C++ modulus operator %? I’m guessing mod8 is faster then %?

0ea259415343e1aa58fc4e9ca525cfac.jpeg

mod8 is almost always much faster than %, and takes up less code space, too. How much faster?

Well, a standard 16MHz AVR Arduino can do about 70,000 "%"s per second.

The same 16MHz AVR Arduino can easily do over a million "mod8"s per second. If you’re using it for the common case to keep have a incrementing counter wrap around to stay in a certain range, like
counter++;
counter = mod8( counter, 10); // stay 0…9
then mod8 can do this about 5,000,000 times per second.

The reason that “%” is so relatively slow is because it performs 16-bit division, for which there is no hardware instruction on AVR. Instead, it calls a library routine, which typically takes 200-250 clock cycles. In the incrementing counter example above (which is the case for which mod8 was originally written), mod8 takes an average of just 3-4 clock cycles.

If you only need 8-bit values, and you need to process a lot of them (e.g. once per pixel per frame), mod8 is probably your friend.

Super. Thank you @Mark_Kriegsman ​. Will definately use mod8 when appropriate from now on.

You might be surprised (or not, at this point!) how much of FastLED’s fast math support is there simply to avoid the cost of doing “long division” either explicitly or implicitly (e.g., in “mod”, or in Arduino’s “random” function, etc.).

I don’t mind if the code has to do some of these (slower) operations once or twice per frame, but any processing that’s going to be done on every pixel, every frame … that needs to be fast.

I"ll be updating things as well. If there’s one thing I like it’s Fast 8/16 bit math.

@Mark_Kriegsman Reading this. I was wondering why you use the normal modulo in the “hsv2rgb_raw_C” function. Or is the difference with C not so big (since there is a special AVR hvs2rgb function)?

Partly because there’s a special AVR version which takes care of the 8-bit case, and partly because some code is older than others :slight_smile:

I’ll take a look and see if I can do a quick cleanup there or not. Thanks!

Nope, wrong reason. I just looked at the code and hsv2rgb_C isn’t just doing an ordinary “%”, it’s doing a special case of “%”. It’s doing modulo a constant which just happens to be a power of two. And when the compiler sees “x % 64”, it ‘knows’ that there’s a super efficient way to represent that: a bitwise AND. So the machine code that the compile emits in this case is not a call to the library division/modulo routine. Instead it just emits a single instruction: AND-immediate with the constant “63”.

So, I’m relying on something I know about the compiler’s real-world behavior: it has a special case for generating code to perform modulo-a-power-of-two.

It’s similar to how if you have an unsigned int, and “divide by two”, the compiler just turns that into a one-bit shift right, instead of calling the library long division routine.

Clear. However why do you do specifically do “&0x01” in the XY serpetine function, because %2 should be recognized as a bitwise AND as well (so it will do the same?).

I see that you do use bitmath (>>16) now and then. But a division with a power of two would be the same then? Because is was replacing a /32 by >>5, thinking this would matter in speed.

If the compiler is doing its job well, the generated code should be identical in each of those cases. And then it just comes down to a matter of which way the programmer feels is most clear or expressive of what they’re trying to do. In other words, at some point it’s just personal preferred in the source code, since the generated machine code will be the same either way.