First the web link of Adafruit’s introduction about neopixel protocol:
https://learn.adafruit.com/adafruit-neopixel-uberguide/advanced-coding
Based on the timing data on this website (the part above the link to the data sheet), we could see that sending a “0” requires 0.4us high and 0.8us low; sending a “1” requires 0.85us high and 0.45us low. Since these are really small time intervals, we run the clock at full speed 16MHz in order to get the maximum cycle numbers. This data is also part of comments in the .asm file: T0H 6 cycles, T0L 14 cycles, and T1H 13 cycles, T1L 7 cycles. This means we need to finish everything within these time intervals and send the correct signal to pins. From the C function inputs, R15 stores coming data, R13 stores the length of the data and R14 is the pin used for the output signal.
To put timing under control, we also need to know the CPU cycles for every instruction in assembly. By counting the lines of provided write_ws2811.asm (low speed) and reading the comments, I get the following results: bis.b (bit set), bic.b (bit clear) take 4 cycles each (which means after reaching this line, setting the output high/low, 4 cycles elapse instantaneously); jmp, jc, jne, jeq, mov.b take 2 cycles each (no matter it jumps or not); rla.b, dec take one cycles each. Now we need to be really careful with T0H and T1L, since there is virtually only 2, 3 cycles after setting output signal.
Walking down the assembly code, you can see we first pushed R11 to store the bit count within each byte; later we decrement this bit counter to see whether to do next bit or to get the next byte. We do 7 bits in the bit_loop because for the last bit, since we need to do something different for 1 and 0 to get the next byte within proper timing. R13 is the byte count and decrements each time getting a new byte. Now follow the lines, and assume 1 or 0 at the start, you would find T0H, T1H, T0L and T1L are all matched with the exact cycles to their signal time! Note that the cycles are not perfect integers; I rounded them to the nearest integer. The neopixel allows 25% “wiggle room” in timing, so this wouldn’t affect the display.
Low speed (400KHz) code is provided by Kevin TImmerman:
Copyright (C) 2012 Kevin Timmerman
.cdecls C, LIST, “msp430g2553.h”
.def write_ws2811
; — Low Speed Mode
; High / Low us High / Low cycles @ 16 MHz
; Zero: 0.50 / 2.00 8 / 32
; One: 1.20 / 1.30 19 / 21
; Reset: 0 / 50+ 0 / 800+
;
.text ;
;
dly13: jmp $ + 2 ;
jmp $ + 2 ;
dly9: nop ;
dly8: ret ;
;
write_ws2811: ; void write_ws2811(uint8_t *data, uint16_t length, uint8_t pinmask);
;
byte_loop: ;
mov.b @R12+, R15 ; Get next byte from buffer
swpb R15 ; Swap bytes so data is in msb
bis #0×0080, R15 ; Mark end of bits
rla R15 ; Get first bit
nop ;
bit_loop: ; – Bit loop – 40 cycles per bit
bis.b R14, &P1OUT ; Output high
jc one_bit ; Jump if sending a one bit…
jmp $ + 2 ; 2 cycle nop
bic.b R14, &P1OUT ; Output low – 8 cycles elapsed
call #dly9 ; 9 cycle nop
jmp next_bit ; Next bit…
one_bit: ;
call #dly13 ; 13 cycle nop
bic.b R14, &P1OUT ; Output low – 19 cycles elapsed
next_bit: ;
bic.b R14, &P1OUT ; 4 cycle nop – output is already low
rla R15 ; Get next bit
jeq next_byte ; If zero, then do next byte…
call #dly8 ; 8 cycle nop
jmp bit_loop ; Next bit…
next_byte: ;
dec R13 ; Decrement byte count
jne byte_loop ; Next byte if not zero…
mov #800 / 3, R15 ; 800 cycle delay for reset
dec R15 ;
jne $ – 2 ;
ret ; Return
High speed (800KHz) code:
Copyright (C) 2012 Kevin Timmerman / Modified by: 2014 Yuan Gao
.cdecls C, LIST, “msp430g2553.h”
.def write_ws2811
; — High Speed Mode
; High / Low us High / Low cycles @ 16 MHz
; Zero: 0.40 / 0.85 6 / 14
; One: 0.80 / 0.45 13 / 7
; Reset: 0 / 50+ 0 / 800+
;
.text ;
dly9: nop ; bis: 4 bic: 4 jmp: 2 dec: 1 jne: 2 rla: 1 jc: 2 mov: 2
;
write_ws2811: ; void write_ws2811(uint8_t *data, uint16_t length, uint8_t pinmask);
; R15 R13 R14
push R11 ; Save R11
byte_loop: ;
mov #7, R11 ; Do 7 bits in a loop. The last bit is special so we can append
mov.b @R12+, R15 ; Get next byte from buffer
bit_loop: ; – Bit loop – 20 cycles per bit
rla.b R15 ; Get next bit
jc one ; Jump if one
bis.b R14, &P1OUT ; Output high
jmp $ + 2 ; 2 cycles nop
bic.b R14, &P1OUT ; Output low – 6 cycles elapsed
jmp $ + 2 ; 2 cycles nop
jmp next_bit ; Next bit [8 cycles low for 0]
one: ;
bis.b R14, &P1OUT ; Output high
call #dly9 ; 9 cycles nop;
bic.b R14, &P1OUT ; Output low – 13 cycles elapsed
next_bit: ; [8 cycles low for 0]
dec R11 ; Decrement bit count
jne bit_loop ; Do next bit if not zero
rla.b R15 ; Get final bit of byte
jc last_one ; Jump if one [14 low for 0 10 low for 1, jump]
last_zero:
bis.b R14, &P1OUT ; Output high
jmp $ + 2 ; 2 cycle nop
bic.b R14, &P1OUT ; Output low – 6 cycles elapsed
dec R13 ; Decrement byte count
jne byte_loop ; Next byte if count not zero
jmp reset ; All bytes done, reset
last_one: ;
bis.b R14, &P1OUT ; Output high 11
jmp $ + 2 ; 2 cycle nop
mov #7, R11 ; Reset bit counter
mov.b @R12+, R15 ; Get next byte from buffer
bic.b R14, &P1OUT ; Output low – 10 cycles elapsed
dec R13 ; Decrement byte count
jne bit_loop ; Do next byte if count is not zero
reset: ;
mov #800 / 3, R15 ; 800 cycle delay for reset
dec R15 ;
jne $ – 2 ;
pop R11 ; Restore R11
ret ; Return