PretrackerRaspberryCasket/README.md

# Raspberry Casket
A fast and small open source Pretracker replayer

## Raspberry Casket Player V2.x (24-Aug-2023)

Provided by Chris 'platon42' Hodges <chrisly@platon42.de>

Rewritten by *platon42/Desire* based on a resourced, binary identical
version of the original Pretracker V1.0 replayer binary provided
by *hitchhikr* (thanks!), originally written in C by *Pink/Abyss*.

This version is the hard work of reverse engineering all the
offsets, removing all the C compiler crud, removing dead and
surplus code (maybe artefacts from earlier ideas that did nothing),
optimizing the code where possible. This resulted in both reduced
size of the replayer, faster sample calculation and speeding the
tick routine up significantly.
Bugs from the original replayer were fixed.

I also added a few optional features that come in handy, such as
song-end detection and precalc progress support.

Note that this player can be also used for the playback of
Pretracker 1.5 tunes, given you don't use sfx or sub-songs.

Also: Open source. It's 2023, keeping the code closed is just not
part of the demoscene spirit (anymore?), at least for a replayer.

This player is still being optimized and worked on since its
first release in late 2022.

Productions that I know have been using Raspberry Casket so far:

- [Smooth Flowing/Dekadence](https://www.pouet.net/prod.php?which=94347)
- [Cracking Posadas/Software Failure](https://www.pouet.net/prod.php?which=94570)

### Verification

The first versions of the replayer had been verified against about
60 Pretracker tunes to create an identical internal state for each
tick and identical samples (if certain optimizations switches are disabled).

During the process this identical state and identical samples promise
had to be dropped due to bugs in the original player and optimizations.
This is especially the case for the track delay feature of Pretracker
that could in some cases cause odd behaviour and unwanted muting that
has been fixed in Raspberry Casket. So the verification is now heavily
reduced to about 20 songs that still are identical.

I do, however, now also have an emulated Paula output verification that
compares the generated sound between the original code and Raspberry Casket.
Divergences are manually checked from time to time.

If you find some problems,
please let me know under chrisly@platon42.de. Thank you.

### Usage

The new replayer comes as a drop-in binary replacement if you wish.
In this case you will get faster sample generation (about 12%
faster on 68000) and about 45% less CPU time spent during playback. However, you
won't get stuff as song-end detection and precalc progress this way.
This mode uses the old CPU DMA wait that takes away 8 raster lines.

If you want to get rid of the unnecessary waiting, you can switch
to a copper driven audio control. If you want to use the top portion
of the copperlist for this, you probably need to double buffer it.
Otherwise, you could also position the copperlist at the end of
the display and use single buffering if you call the tick routine
during the vertical blank.

Please use the documented sizes for the `MySong` and `MyPlayer` data
structures, which are the symbols `sv_SIZEOF` and `pv_SIZEOF`
respectively (about 2KB and 12KB with volume table).

The source needs two common include files to compile (`custom.i` and
`dmabits.i`). You should leave assembler optimizations enabled.

1. (If you're using copper list mode, call `pre_PrepareCopperlist`.)
2. Call `pre_SongInit` with
   - a pointer to `MySong` (`mv_SIZEOF`) in `a1` and
   - the music data in `a2`.
   It will return the amount of sample memory needed in `d0`.
3. Then call `pre_PlayerInit` with
   - a pointer to `MyPlayer` (`pv_SIZEOF`) in `a0`
   - a pointer to chip memory sample buffer in `a1`
   - the pointer to `MySong` in `a2`
   - a pointer to a longword for progress information or null in `a3`

   This will create the samples, too.
4. After that, regularly call `pre_PlayerTick` with `MyPlayer` in `a0`
   and optionally the copperlist in `a1` if you're using that mode).

### Size

The original C compiled code was... just bad. The new binary is
less than 1/3 of the original one.

The code has been also optimized in a way that it compresses better.
The original code compressed with *Blueberry's* Shrinkler goes from
18052 bytes down to 9023 bytes.

Raspberry Casket, depending on the features compiled in, is about
5716 bytes and shrinkles down to ~4071 bytes (in isolation).

So this means that the optimization is not just "on the outside".

About 2.4 KB of the code (and data) are spent for the sample generation,
the remaining code for playback.

### Timing

#### Sample precalculation

Sample generation is faster than the original 1.0 player and also
faster than the 1.5 player, which got a slightly better performance
than the 1.0 one (compiler change?).

According to my measurements on my set of Pretracker tunes,
Raspberry Casket needs between 10% to 20% less instructions.
Of these instructions, about 5% are `muls` operations and the new
player is only able to shave off between 3% and 8% percent of those,
so this is probably the limiting factor.

#### Playback

Raspberry Casket is about twice as fast as the old replayer for playback.

Unfortunately, the replayer is still pretty slow and has high
jitter compared to other standard music replayers.

This means it may take up to 32 raster lines (13-18 on average)
which is significant more than a standard Protracker replayer
(the original one could take about 60 raster lines worst case and
about 34 on average!).

Watch out for *Presto*, the [LightSpeedPlayer](https://github.com/arnaud-carre/LSPlayer) variant that should
solve this problem.

### Secrets

- Pink never actually documented how the 0xy command works (2nd instrument, not an ARP!).
  It will play the instrument y for x+1 ticks before going to the actual instrument you wanted to trigger in the first place.
  This works well, e.g. for bassdrums and other short percussion samples. Note that because y is a 4 bit nibble, you can only
  specify the instruments $1-$f this way and not $10-$1f.

### Known issues

- Songs saved with earlier versions of Pretracker than 1.0 (internal version lower than $1b) have stored the ADSR values differently in the file.
  There is no provision for fixing these values neither in the original player nor in Raspberry Casket. Loading the file in the tracker and saving
  it again will cure this. This is more of a hypothetical problem as you are unlikely to use a Pretracker V0.9 beta version, but it took me quite a
  while why Pink's "On and On" has a broken first wave sample in all the replayers but not on the tracker itself. This affects these tunes:
  Attack and Release, On and On, Rewind, Cold and Tired, PreFix all by Pink and Cracksteady by Tecon.
- Behaviour for undefined volume slides with both up- and down nibble specified is different (e.g. A9A, hi Rapture!). Don't do that.
- Don't use loops with odd lengths and offsets (even if Pretracker allows this when dragging the loop points).
- Don't stop the music with F00 and use a note delay (EDx) in the same line.
- Don't try to play music with no waves, instruments or patterns.
- The original player had the internal state machine running for wave 1 even if no note had been triggered yet on the channel.
  This could cause the first instrument using a 4xx command (trigger wave without sync) in the instrument pattern to start at
  a more or less random first loop offset instead of from the beginning. This is fixed in Raspberry Casket.
- Shinobi seemed to have used an early beta version of Pretracker where it was possible to specify a Subloop Wait of 0. That's illegal and unsupported.
- Pattern breaks with target row >= $7f will be ignored.
- Pattern break (Dxx) + Song pos (Bxx) on the same line does not work in original Pretracker & Player: New Dxx position is ignored.
  There is code to enable it in the player, so you could in theory make backwards running tracks like in Protracker.
  But this doesn't make sense as long as the tracker itself does not support it.
- Setting the same track delay multiple times will no longer mute the delayed channel and the new volume will take effect immediately.
- Clearing the track delay (multiple times) will no longer mute the delayed channel nor cause a delay of one tick to the note played in the no-longer delayed channel.

## Changelog

### V2.x (unreleased)
- Split wave generation out of main file, reorganised content into header files.
- Optimized some more code paths for Raspberry Casket replayer.
- In the wave generator optimized away a table (32 words), replacement code is even smaller!
- Replaced the period table by byte-deltas, saved 36 bytes and compression is even better!
- Optimized some code paths for octave selection.
- Removed two 25 bytes tables each, saving another 42 bytes.
- Completely reworked track delay handling, fixed oddities and improved output quality.
- This removes a big source of cpu jitter when track delay is enabled (no longer clearing the track delay buffer).
- This also fixes usages of illegal period 0 in the lead-in that could cause the replay to miss the first trigger.
- Moved pattern table init from PlayerInit to SongInit, optimized SongInit a bit.
- Wave order table filling moved and optimized in SongInit.
- Bugfix: Songend detection for back-jumps was broken since at least V1.1.
- Optimized some more wave selection code.
- Nosync/sync wave selection optimized.
- Optimized wave generation a lot (esp. noise generator).
- Added Presto player draft.
- Drop-in replacement code size: 5716 bytes.

### V1.x (unreleased)
- Fixed a bug regarding the copper output mode with looping waves having a loop-offset.
- Fixed wrong register use on triggering waves regarding the loop offset.
- Minor code size optimizations.

### V1.1 (28-Dec-22)
- Optimized base displacement by reordering variables.
- Further optimized ADSR code.
- Optimized wave loop code.
- Baked in this strange vibrato speed multiplication to precalculated vibrato value (where possible).
- Various small optimizations.
- Store instrument number * 4 on loading to avoid using two adds every frame.
- Optimized speed/shuffle code. Idea of using xor turned out to make things too complicated for pattern breaks/jumps.
- Rearranged code for more short branches.
- Optimized track delay code further.
- Optimized pattern / song advance code.
- Maximum jitter now about one rasterline less, average about 0.5 rasterlines less (measurements, your mileage may vary).
- Drop-in replacement code size: 6228 bytes.

### V1.0 (26-Dec-22)

- Initial release.
- Drop-in replacement code size: 6446 bytes.