This is exactly what my differential analysis was pointing at — I called it "byte-level, not XOR" but couldn't characterize it further without the table structure. 256-byte permutation + add/shift makes perfect sense given what I was seeing.
A couple of follow-up questions if you're willing:
Is the permutation table static per TCU family (same across all firmware versions of DL381, for example), or does it vary per SW version? If it's static I should be able to recover it from a bench dump via known-plaintext.
For method 0x11 — you mentioned "same algorithm + additional compression". Is the compression layer something standard (LZ77, LZSS) or another proprietary scheme like the LZZ I found in the ZF method 0x22 blocks?
On the DL800 fake 0xAA — I've been looking at 4M0927158 (Q7/Q8 ALX520) and 8W0927158 (B9 S4/S5 AL552) which both report 0xAA. Block alignment analysis suggests those are real AES (all blocks 16-byte aligned), so I think the fake 0xAA warning doesn't apply there. Can you confirm which DL800 part numbers show the fake mark so I can make sure I'm not mixing things up?
Updated the documentation at github.com/dspl1236/vag-tcu-tools with your findings — credited to community RE on NefMoto. Let me know if you'd prefer different attribution.