Bugfix for songend detection and more optimizations.

This commit is contained in:
Chris Hodges 2023-08-19 21:40:08 +02:00
parent 36d435f502
commit 815d5f6e29
4 changed files with 89 additions and 71 deletions

View File

@ -15,6 +15,7 @@ surplus code (maybe artefacts from earlier ideas that did nothing),
optimizing the code where possible. This resulted in both reduced optimizing the code where possible. This resulted in both reduced
size of the replayer, faster sample calculation and speeding the size of the replayer, faster sample calculation and speeding the
tick routine up significantly. tick routine up significantly.
Bugs from the original replayer were fixed.
I also added a few optional features that come in handy, such as I also added a few optional features that come in handy, such as
song-end detection and precalc progress support. song-end detection and precalc progress support.
@ -25,9 +26,8 @@ Pretracker 1.5 tunes, given you don't use sfx or sub-songs.
Also: Open source. It's 2023, keeping the code closed is just not Also: Open source. It's 2023, keeping the code closed is just not
part of the demoscene spirit (anymore?), at least for a replayer. part of the demoscene spirit (anymore?), at least for a replayer.
Also note that this is not the final state of the source code. This player is still being optimized and worked on since its
I could go over many places still and try to rework them. first release in late 2022.
But I wanted the code to be out in public.
Productions that I know have been using Raspberry Casket so far: Productions that I know have been using Raspberry Casket so far:
@ -44,7 +44,12 @@ During the process this identical state and identical samples promise
had to be dropped due to bugs in the original player and optimizations. had to be dropped due to bugs in the original player and optimizations.
This is especially the case for the track delay feature of Pretracker This is especially the case for the track delay feature of Pretracker
that could in some cases cause odd behaviour and unwanted muting that that could in some cases cause odd behaviour and unwanted muting that
has been fixed in Raspberry Casket. has been fixed in Raspberry Casket. So the verification is now heavily
reduced to about 20 songs that still are identical.
I do, however, now also have an emulated Paula output verification that
compares the generated sound between the original code and Raspberry Casket.
Divergences are manually checked from time to time.
If you find some problems, If you find some problems,
please let me know under chrisly@platon42.de. Thank you. please let me know under chrisly@platon42.de. Thank you.
@ -96,7 +101,7 @@ The original code compressed with *Blueberry's* Shrinkler goes from
18052 bytes down to 9023 bytes. 18052 bytes down to 9023 bytes.
Raspberry Casket, depending on the features compiled in, is about Raspberry Casket, depending on the features compiled in, is about
5840 bytes and shrinkles down to ~4144 bytes (in isolation). 5802 bytes and shrinkles down to ~4125 bytes (in isolation).
So this means that the optimization is not just "on the outside". So this means that the optimization is not just "on the outside".
@ -107,7 +112,7 @@ the remaining code for playback.
#### Sample precalculation #### Sample precalculation
Sample generation is a faster than the original 1.0 player and also Sample generation is faster than the original 1.0 player and also
faster than the 1.5 player, which got a slightly better performance faster than the 1.5 player, which got a slightly better performance
than the 1.0 one (compiler change?). than the 1.0 one (compiler change?).
@ -161,7 +166,9 @@ solve this problem.
- Moved pattern table init from PlayerInit to SongInit, optimized SongInit a bit. - Moved pattern table init from PlayerInit to SongInit, optimized SongInit a bit.
- Wave order table filling moved and optimized in SongInit. - Wave order table filling moved and optimized in SongInit.
- Added Presto player draft. - Added Presto player draft.
- Drop-in replacement code size: 5840 bytes. - Bugfix: Songend detection for back-jumps was broken since at least V1.1.
- Optimized some more wave selection code.
- Drop-in replacement code size: 5802 bytes.
### V1.x (unreleased) ### V1.x (unreleased)
- Fixed a bug regarding the copper output mode with looping waves having a loop-offset. - Fixed a bug regarding the copper output mode with looping waves having a loop-offset.

Binary file not shown.

View File

@ -1,5 +1,5 @@
;-------------------------------------------------------------------- ;--------------------------------------------------------------------
; Raspberry Casket Player V2.x (16-Aug-2023) ; Raspberry Casket Player V2.x (19-Aug-2023)
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ;
; Provided by Chris 'platon42' Hodges <chrisly@platon42.de> ; Provided by Chris 'platon42' Hodges <chrisly@platon42.de>
@ -16,24 +16,33 @@
; optimizing the code where possible. This resulted in both reduced ; optimizing the code where possible. This resulted in both reduced
; size of the replayer, faster sample calculation and speeding the ; size of the replayer, faster sample calculation and speeding the
; tick routine up significantly. ; tick routine up significantly.
; Bugs from the original replayer were fixed.
; ;
; I also added a few optional features that come in handy, such as ; I also added a few optional features that come in handy, such as
; song-end detection and precalc progress support. ; song-end detection and precalc progress support.
; ;
; It took me more than a month and it was not fun.
;
; Also: Open source. It's 2023, keeping the code closed is just not ; Also: Open source. It's 2023, keeping the code closed is just not
; part of the demoscene spirit (anymore?), at least for a replayer. ; part of the demoscene spirit (anymore?), at least for a replayer.
; ;
; Also note that this is not the final state of the source code. ; This player is still being optimized and worked on since its
; I could go over many places still and try to rework them. ; first release in late 2022.
; But I wanted the code to be out in public.
; ;
; Verification ; Verification
; ~~~~~~~~~~~~ ; ~~~~~~~~~~~~
; The replayer has been verified on about 60 Pretracker tunes to ; The first versions of the replayer had been verified against about
; create an identical internal state for each tick and identical ; 60 Pretracker tunes to create an identical internal state for each tick
; samples (if certain optimizations switches are disabled). ; and identical samples (if certain optimizations switches are disabled).
;
; During the process this identical state and identical samples promise
; had to be dropped due to bugs in the original player and optimizations.
; This is especially the case for the track delay feature of Pretracker
; that could in some cases cause odd behaviour and unwanted muting that
; has been fixed in Raspberry Casket. So the verification is now heavily
; reduced to about 20 songs that still are identical.
;
; I do, however, now also have an emulated Paula output verification
; that compares the generated sound between the original code and
; Raspberry Casket. Divergences are manually checked from time to time.
; ;
; I might have introduced bugs though. If you find some problems, ; I might have introduced bugs though. If you find some problems,
; please let me know under chrisly@platon42.de. Thank you. ; please let me know under chrisly@platon42.de. Thank you.
@ -87,20 +96,35 @@
; 18052 bytes down to 9023 bytes. ; 18052 bytes down to 9023 bytes.
; ;
; Raspberry Casket, depending on the features compiled in, is about ; Raspberry Casket, depending on the features compiled in, is about
; 5840 bytes and shrinkles down to ~4144 bytes (in isolation). ; 5802 bytes and shrinkles down to ~4125 bytes (in isolation).
; ;
; So this means that the optimization is not just "on the outside". ; So this means that the optimization is not just "on the outside".
; ;
; About 2.4 KB of the code (and data) are spent for the sample generation,
; the remaining code for playback.
;
; Timing ; Timing
; ~~~~~~ ; ~~~~~~
; Sample generation is a bit faster (I guess around 10-15%), but most ;
; of the time is spent on muls operations, so this is the limiting ; 1. Sample precalculation
; factor. ;
; Sample generation is faster than the original 1.0 player and also
; faster than the 1.5 player, which got a slightly better performance
; than the 1.0 one (compiler change?).
;
; According to my measurements on my set of Pretracker tunes,
; Raspberry Casket needs between 10% to 20% less instructions.
; Of these instructions, about 5% are `muls` operations and the new
; player is only able to shave off between 3% and 8% percent of those,
; so this is probably the limiting factor.
;
; 2. Playback
; ;
; Raspberry Casket is about twice as fast as the old replayer for playback. ; Raspberry Casket is about twice as fast as the old replayer for playback.
; ;
; Unfortunately, the replayer is still pretty slow and has high ; Unfortunately, the replayer is still pretty slow and has high
; jitter compared to other standard music replayers. ; jitter compared to other standard music replayers.
;
; This means it may take up to 32 raster lines (13-18 on average) ; This means it may take up to 32 raster lines (13-18 on average)
; which is significant more than a standard Protracker replayer ; which is significant more than a standard Protracker replayer
; (the original one could take about 60 raster lines worst case and ; (the original one could take about 60 raster lines worst case and
@ -745,10 +769,9 @@ pre_PlayerTick:
; Nothing sets pcd_pat_adsr_rel_delay_b from inside the player. ; Nothing sets pcd_pat_adsr_rel_delay_b from inside the player.
; It is used as a counter to release a note (ADSR) after a given time. ; It is used as a counter to release a note (ADSR) after a given time.
; It's not the same as the instrument ADSR release (see pcd_note_off_delay_b) ; It's not the same as the instrument ADSR release (see pcd_note_off_delay_b)
move.b pcd_pat_adsr_rel_delay_b(a5),d1 tst.b pcd_pat_adsr_rel_delay_b(a5)
ble.s .handle_2nd_instrument ble.s .handle_2nd_instrument
subq.b #1,d1 subq.b #1,pcd_pat_adsr_rel_delay_b(a5)
move.b d1,pcd_pat_adsr_rel_delay_b(a5)
bne.s .handle_2nd_instrument bne.s .handle_2nd_instrument
move.w pcd_adsr_volume_w(a5),d3 move.w pcd_adsr_volume_w(a5),d3
@ -769,10 +792,9 @@ pre_PlayerTick:
move.b pcd_pat_2nd_inst_num4_b(a5),d1 move.b pcd_pat_2nd_inst_num4_b(a5),d1
beq.s .handle_current_instrument beq.s .handle_current_instrument
move.b pcd_pat_2nd_inst_delay_b(a5),d3 tst.b pcd_pat_2nd_inst_delay_b(a5)
beq.s .trigger_2nd_instrument beq.s .trigger_2nd_instrument
subq.b #1,d3 subq.b #1,pcd_pat_2nd_inst_delay_b(a5)
move.b d3,pcd_pat_2nd_inst_delay_b(a5)
bra.s .handle_current_instrument bra.s .handle_current_instrument
.trigger_2nd_instrument .trigger_2nd_instrument
@ -1027,7 +1049,7 @@ pre_PlayerTick:
beq.s .release_note beq.s .release_note
tst.b d7 tst.b d7
beq.s .start_patt_effect_handling beq .start_patt_effect_handling
add.w d7,d0 ; pattern pitch shift add.w d7,d0 ; pattern pitch shift
lsl.w #4,d0 lsl.w #4,d0
@ -1274,7 +1296,7 @@ pre_PlayerTick:
.pat_play_nop .pat_play_nop
.pat_play_cont .pat_play_cont
cmp.b #NUM_CHANNELS-1,pcd_channel_num_b(a5) cmp.b #NUM_CHANNELS-1,pcd_channel_num_b(a5)
beq .pat_channels_loop_end beq.s .pat_channels_loop_end
lea pcd_SIZEOF(a5),a5 lea pcd_SIZEOF(a5),a5
@ -1290,7 +1312,7 @@ pre_PlayerTick:
; Pattern advancing and pattern break and jump handling. Song looping and song-end detection. ; Pattern advancing and pattern break and jump handling. Song looping and song-end detection.
.pattern_advancing .pattern_advancing
subq.b #1,pv_pat_line_ticks_b(a4) subq.b #1,pv_pat_line_ticks_b(a4)
bne .no_pattern_advance bne.s .no_pattern_advance
; clear note delay info ; clear note delay info
moveq.l #0,d0 moveq.l #0,d0
@ -1334,8 +1356,8 @@ pre_PlayerTick:
bmi.s .no_new_position ; has a new pattern position bmi.s .no_new_position ; has a new pattern position
st pv_next_pat_pos_b(a4) st pv_next_pat_pos_b(a4)
IFNE PRETRACKER_SONG_END_DETECTION IFNE PRETRACKER_SONG_END_DETECTION
cmp.b d4,d3 cmp.b d3,d4
bgt.s .no_backjump bhi.s .no_backjump
st pv_songend_detected_b(a4) ; detect jumping back st pv_songend_detected_b(a4) ; detect jumping back
.no_backjump .no_backjump
ENDC ENDC
@ -1349,7 +1371,7 @@ pre_PlayerTick:
.no_new_position .no_new_position
cmp.w sv_pat_pos_len_w(a6),d3 cmp.w sv_pat_pos_len_w(a6),d3
blt.s .no_restart_song blo.s .no_restart_song
move.w sv_pat_restart_pos_w(a6),d3 move.w sv_pat_restart_pos_w(a6),d3
IFNE PRETRACKER_SONG_END_DETECTION IFNE PRETRACKER_SONG_END_DETECTION
st pv_songend_detected_b(a4) st pv_songend_detected_b(a4)
@ -1472,7 +1494,7 @@ pre_PlayerTick:
tst.b d3 tst.b d3
bne.s .skippitchloading bne.s .skippitchloading
andi.w #$3F,d1 and.w #$3f,d1
beq.s .skippitchloading ; no new note beq.s .skippitchloading ; no new note
subq.w #1,d1 subq.w #1,d1
lsl.w #4,d1 lsl.w #4,d1
@ -1516,21 +1538,20 @@ pre_PlayerTick:
; d6 = scratch ; d6 = scratch
; ---------------------------------------- ; ----------------------------------------
.inst_select_wave .inst_select_wave
pea .inst_cmd_cont_next(pc)
subq.w #1,d5 subq.w #1,d5
cmp.w #MAX_WAVES,d5 cmp.w #MAX_WAVES,d5
bhs .inst_cmd_cont_next bhs.s .inst_set_wave_rts
add.w d5,d5 add.w d5,d5
add.w d5,d5 add.w d5,d5
cmp.w pcd_inst_wave_num4_w(a5),d5 cmp.w pcd_inst_wave_num4_w(a5),d5
beq .inst_cmd_cont_next beq.s .inst_set_wave_rts
.inst_select_wave_subroutine
move.w d5,pcd_inst_wave_num4_w(a5) move.w d5,pcd_inst_wave_num4_w(a5)
move.l sv_waveinfo_table(a6,d5.w),a3 move.l sv_waveinfo_table(a6,d5.w),a3
move.l a3,pcd_waveinfo_ptr(a5) move.l a3,pcd_waveinfo_ptr(a5)
move.l pv_wave_sample_table(a4,d5.w),a1 move.l pv_wave_sample_table(a4,d5.w),a1
pea .inst_cmd_cont_next(pc)
.inst_select_wave_subroutine
move.w wi_chipram_w(a3),d5 move.w wi_chipram_w(a3),d5
move.b pcd_channel_mask_b(a5),d3 move.b pcd_channel_mask_b(a5),d3
@ -1542,16 +1563,19 @@ pre_PlayerTick:
tst.w d6 tst.w d6
bne.s .inst_select_wave_nosync bne.s .inst_select_wave_nosync
move.w wi_loop_offset_w(a3),d6 ; is unlikely 32768 move.w wi_loop_offset_w(a3),d6 ; is unlikely >= 32768 -- if it is, it will be past end of sample
tst.w wi_subloop_len_w(a3) tst.w wi_subloop_len_w(a3)
bne.s .inst_set_wave_has_subloop bne.s .inst_set_wave_has_subloop
.inst_set_wave_has_no_subloop .inst_set_wave_has_no_subloop
adda.w d6,a1 ; add loop offset adda.w d6,a1 ; add loop offset (which is actually not a loop offset for one-shot samples)
sub.w d6,d5 sub.w d6,d5
;cmp.w #1,d5 ; not necessary as increases in steps of 2 ;cmp.w #1,d5 ; not necessary as increases in steps of 2
bhi.s .inst_set_wave_has_min_length bhi.s .inst_set_wave_has_min_length
moveq.l #2,d5 moveq.l #2,d5
IFNE PRETRACKER_BUGFIX_CODE
move.l pv_sample_buffer_ptr(a4),a1 ; fix start address to empty sample
ENDC
.inst_set_wave_has_min_length .inst_set_wave_has_min_length
move.w d5,pcd_out_len_w(a5) move.w d5,pcd_out_len_w(a5)
@ -1566,7 +1590,7 @@ pre_PlayerTick:
move.b wi_subloop_wait_b(a3),d5 move.b wi_subloop_wait_b(a3),d5
addq.w #1,d5 addq.w #1,d5
move.w d5,pcd_inst_subloop_wait_w(a5) move.w d5,pcd_inst_subloop_wait_w(a5)
.inst_set_wave_rts
rts rts
; ---------------------------------------- ; ----------------------------------------
@ -1763,20 +1787,14 @@ pre_PlayerTick:
bpl.s .inst_wave_selected bpl.s .inst_wave_selected
.inst_no_wave_selected ; FIXME this code is dubious at best -- it selects wave 0 if no wave was selected before .inst_no_wave_selected ; FIXME this code is dubious at best -- it selects wave 0 if no wave was selected before
clr.w pcd_inst_wave_num4_w(a5) moveq.l #0,d5
move.l sv_waveinfo_ptr(a6),a3
move.l a3,pcd_waveinfo_ptr(a5)
move.l pv_wave_sample_table(a4),a1
moveq.l #0,d6 moveq.l #0,d6
bsr .inst_select_wave_subroutine bsr .inst_select_wave_subroutine
.inst_wave_selected .inst_wave_selected
cmp.b #$FF,d2 cmp.b #$ff,d2
beq.s .inst_pat_loop_exit3 beq.s .inst_pat_loop_exit3
subq.b #1,d2 subq.b #1,pcd_inst_line_ticks_b(a5)
move.b d2,pcd_inst_line_ticks_b(a5)
.inst_pat_loop_exit3 .inst_pat_loop_exit3
.inst_no_inst_active .inst_no_inst_active
@ -1879,10 +1897,9 @@ pre_PlayerTick:
move.w d2,pcd_adsr_volume_w(a5) move.w d2,pcd_adsr_volume_w(a5)
; handle note cut-off command (EAx command) ; handle note cut-off command (EAx command)
move.b pcd_note_off_delay_b(a5),d4 tst.b pcd_note_off_delay_b(a5)
beq.s .dont_release_note beq.s .dont_release_note
subq.b #1,d4 subq.b #1,pcd_note_off_delay_b(a5)
move.b d4,pcd_note_off_delay_b(a5)
bne.s .dont_release_note bne.s .dont_release_note
; cut off note ; cut off note
clr.w pcd_adsr_volume_w(a5) clr.w pcd_adsr_volume_w(a5)
@ -1943,12 +1960,12 @@ pre_PlayerTick:
bra.s .wave_submove_cont bra.s .wave_submove_cont
.wave_with_subloop_but_no_wave_offset .wave_with_subloop_but_no_wave_offset
move.w pcd_inst_loop_offset_w(a5),d1
subq.w #1,pcd_inst_subloop_wait_w(a5) subq.w #1,pcd_inst_subloop_wait_w(a5)
bgt.s .wave_subloop_wait bgt.s .wave_subloop_wait
; subloop moves! ; subloop moves!
move.b pcd_inst_ping_pong_dir_b(a5),d4 move.b pcd_inst_ping_pong_dir_b(a5),d4
move.w pcd_inst_loop_offset_w(a5),d1
.wave_submove_cont .wave_submove_cont
; reset subloop wait ; reset subloop wait
moveq.l #0,d5 moveq.l #0,d5
@ -1996,18 +2013,13 @@ pre_PlayerTick:
subq.w #1,pcd_inst_subloop_wait_w(a5) ; why, oh why? subq.w #1,pcd_inst_subloop_wait_w(a5) ; why, oh why?
.wave_new_loop_pos_fits .wave_new_loop_pos_fits
move.w d1,d4 move.w d1,pcd_inst_loop_offset_w(a5)
move.w d4,pcd_inst_loop_offset_w(a5)
bra.s .done_lof_calc
.wave_subloop_wait .wave_subloop_wait
move.w pcd_inst_loop_offset_w(a5),d4 move.w d1,pcd_out_lof_w(a5)
.done_lof_calc
move.w d4,pcd_out_lof_w(a5)
move.w pcd_inst_wave_num4_w(a5),d1 ; FIXME can we move this to wave loading? moveq.l #0,d1
move.l pv_wave_sample_table(a4,d1.w),pcd_out_ptr_l(a5) bra.s .wave_load_sample_offset
bra.s .loop_handling_done
.wave_has_no_subloop .wave_has_no_subloop
moveq.l #0,d1 moveq.l #0,d1
@ -2036,7 +2048,7 @@ pre_PlayerTick:
ENDC ENDC
.waveoffset_is_not_past_end .waveoffset_is_not_past_end
move.w d2,pcd_out_len_w(a5) move.w d2,pcd_out_len_w(a5)
.wave_load_sample_offset
move.w pcd_inst_wave_num4_w(a5),d2 move.w pcd_inst_wave_num4_w(a5),d2
add.l pv_wave_sample_table(a4,d2.w),d1 add.l pv_wave_sample_table(a4,d2.w),d1
move.l d1,pcd_out_ptr_l(a5) move.l d1,pcd_out_ptr_l(a5)
@ -2059,10 +2071,9 @@ pre_PlayerTick:
; ---------------------------------------- ; ----------------------------------------
; vibrato processing ; vibrato processing
move.b pcd_vibrato_delay_w+1(a5),d1 tst.b pcd_vibrato_delay_w+1(a5)
beq.s .vibrato_already_active beq.s .vibrato_already_active
subq.b #1,d1 subq.b #1,pcd_vibrato_delay_w+1(a5)
move.b d1,pcd_vibrato_delay_w+1(a5)
bne.s .vibrato_still_delayed bne.s .vibrato_still_delayed
.vibrato_already_active .vibrato_already_active
@ -2121,8 +2132,8 @@ pre_PlayerTick:
beq.s .has_no_loop_offset beq.s .has_no_loop_offset
subq.w #1,d7 subq.w #1,d7
lsr.w d1,d7 lsr.w d1,d7
move.w d7,pcd_out_lof_w(a5) ; halve/quarter/eighth loop offset
lsr.w d1,d3 lsr.w d1,d3
move.w d7,pcd_out_lof_w(a5) ; halve/quarter/eighth loop offset
move.w d3,pcd_out_len_w(a5) ; halve/quarter/eighth loop length move.w d3,pcd_out_len_w(a5) ; halve/quarter/eighth loop length
bra.s .cont_after_loop_fix bra.s .cont_after_loop_fix
@ -2306,7 +2317,7 @@ pre_PlayerTick:
or.b d5,d2 or.b d5,d2
move.b d2,(a3)+ ; ocd_trigger (this track) move.b d2,(a3)+ ; ocd_trigger (this track)
or.b d2,pv_trigger_mask_w+1(a4) or.b d2,pv_trigger_mask_w+1(a4)
bra.s .check_next_channel bra .check_next_channel
; ---------------------------------------- ; ----------------------------------------
.updatechannels .updatechannels

View File

@ -1108,7 +1108,7 @@ pre_WaveGen:
btst #5,wi_mod_density_b(a3) ; post bit btst #5,wi_mod_density_b(a3) ; post bit
beq.s .nopostmodulator beq.s .nopostmodulator
bsr pre_Modulator bsr.s pre_Modulator
.nopostmodulator .nopostmodulator
; ---------------------------------------- ; ----------------------------------------