So I've been playing with the PRU to get LinuxCNC running on a 'Bone

So I’ve been playing with the PRU to get LinuxCNC running on a 'Bone with the PRU doing step/dir, and I measured the GPIO performance when writing via the PRU. Direct PRU outputs via R30 are super-fast, but writing standard GPIO registers is pretty quick too. The good news is there’s a pretty decently sized posted write buffer that keeps the PRU from stalling until you get to 20 or so back-to-back writes. Details below

PRU GPIO Write Timing Details
The actual write instruction to a GPIO pin using SBBO takes two PRU cycles (10 nS). However, the GPIO logic can only update every 40 nS (8 PRU cycles). This meas back-to-back writes to GPIO pins will eventually stall the PRU, or you can execute 6 PRU instructions for ‘free’ when burst writing to the GPIO.

Latency from the PRU write to the actual I/O pin changing state (normalized to PRU direct output pins = zero latency) when the PRU is writing to GPIO1 and L4_PERPort1 is idle measures 95 nS or 105 nS (apparently depending on clock synchronization)

PRU GPIO Posted Writes
When L4_PERPort1 is idle, it is possible to burst-write multiple values to the GPIO pins without stalling the PRU, as the writes are posted. With an unrolled loop (SBBO to GPIO followed by a single SET/CLR to R30), the first 20 write cycles (both instructions) took 15 nS each, at which point the PRU began to stall and the write cycle settled in to the 40 nS maximum update frequency.

PRU GPIO Read Timing Details
Reading from a GPIO pin when L4_PERPort1 is idle require 165 nS as measured using direct PRU I/O updates bracking a LBBO instruction. Since there is no speculative execution on the PRU, it is not possible to execute any instructions during this time, the PRU just stalls.

Latency from the physical I/O pin to the PRU read seeing valid data has not yet been measured.

Thanks for sharing your measurements! Do you have the code you used available somewhere?

The test code is pretty simplistic (this assumes you have setup pin muxing and GPIO as required at the command line):

// Setup pointer to GPIO registers
MOV     r5, GPIO1 + GPIO_CLEARDATAOUT
MOV     r6, GPIO1 + GPIO_SETDATAOUT
MOV     r7, 0x00000040

TEMPHACK:
// Set bit zero to show we’re busy
SET r30,0

// Set GPIO bit
SBBO    r7, r6, 0, 4
SET     r30,1

// Clear GPIO bit
SBBO    r7, r5, 0, 4
CLR     r30,1

// Repeat set/clear sequence until you begin to
// stall the PRU, 20+ SBBO instructions

// Perform enough NOPs or other PRU instructions
// to allow the posted writes to complete.  I
// continued to toggle output bit 1
SET     r30,1
CLR     r30,1
SET     r30,1
CLR     r30,1
// Lots more set/clear here, to total at least 40 nS
// for each SBBO instruction above + 100 nS

// ...and do it all again
JMP     TEMPHACK