Very nice piece.
Comes with a big "But"...
OK, what the hell? If you're going to use over 9000 mostly-empty channels, don't use XM - this would work a lot better with IT (heck, even S3M if you could use that many PCM channels).
Heck, even simply resaving it as .it drops it to 4535.
EDIT: Without any hand-done pattern optimisations, munch.py brings this figure down to 2697.
If you track like this all the time, you really should consider learning how to use SchismTracker or something like that, if only for the size-limited compos. If you don't, you might want to prod Strobe on how to keep your .xm entries compact (hint: in XM, EVERY CELL TAKES AT LEAST A BYTE, in IT, only the non-empty cells take space)
Here's a trick that works in both .it AND .xm: You could actually pack this down into a single pattern (though for .it you might need two patterns as IT itself can't handle more than 200 rows). Every pattern has a header that takes 8 bytes in .it and 9(!) bytes in .xm (just note that in .it, each pattern also has a 4-byte pointer to it, making it 12 bytes, though you'll want to make sure there are no gaps).
This works more effectively in .it when each cell can refer to values in previous cells in that channel, and if you have the same "mask" then there's another byte or two saved.