So I’ve been playing with the PRU to get LinuxCNC running on a 'Bone with the PRU doing step/dir, and I measured the GPIO performance when writing via the PRU. Direct PRU outputs via R30 are super-fast, but writing standard GPIO registers is pretty quick too. The good news is there’s a pretty decently sized posted write buffer that keeps the PRU from stalling until you get to 20 or so back-to-back writes. Details below
PRU GPIO Write Timing Details
The actual write instruction to a GPIO pin using SBBO takes two PRU cycles (10 nS). However, the GPIO logic can only update every 40 nS (8 PRU cycles). This meas back-to-back writes to GPIO pins will eventually stall the PRU, or you can execute 6 PRU instructions for ‘free’ when burst writing to the GPIO.
Latency from the PRU write to the actual I/O pin changing state (normalized to PRU direct output pins = zero latency) when the PRU is writing to GPIO1 and L4_PERPort1 is idle measures 95 nS or 105 nS (apparently depending on clock synchronization)
PRU GPIO Posted Writes
When L4_PERPort1 is idle, it is possible to burst-write multiple values to the GPIO pins without stalling the PRU, as the writes are posted. With an unrolled loop (SBBO to GPIO followed by a single SET/CLR to R30), the first 20 write cycles (both instructions) took 15 nS each, at which point the PRU began to stall and the write cycle settled in to the 40 nS maximum update frequency.
PRU GPIO Read Timing Details
Reading from a GPIO pin when L4_PERPort1 is idle require 165 nS as measured using direct PRU I/O updates bracking a LBBO instruction. Since there is no speculative execution on the PRU, it is not possible to execute any instructions during this time, the PRU just stalls.
Latency from the physical I/O pin to the PRU read seeing valid data has not yet been measured.