My thinking is that the audio would be recording for the entire duration of the time lapse session.
Starting: I would think the most intuitive thing would be to begin audio recording when the first photo is taken. If "Start immediately After Tap" is chosen, it would start immediately after the tap coinciding with the first photo. If a delay is chosen, it starts after the delay, coinciding with the first photo.
In post, the user can always simply not use the first photo if they want audio leading up to the second picture.
Stopping: Stop the recording after the final photo. Time it out so it is the equivalent of the time between photos. If 18 seconds between photos, record an extra 18 seconds, for example.
For for example, with 'Snap Photo Every 10 Seconds', 'Start 15 seconds after tap', and 'Stop after 25 Photos' options chosen, you would end up with a single MP3 or WAV file that lasts 250 seconds. It would begin after 15 seconds (corresponding to first photo) and end 10 seconds after the last photo (#25) is taken. Unless the user presses 'Stop'. In that case it stops recording whenever that happens.
I realize the timing might not be perfect. But if it's for slideshow scenarios, perfect synchronization is not critical.
Having said all this, I could see your idea of doing sound samples prior to and/or after each photo having it's uses also.