Why do I see jitter when the input and output are the same sample rate using the automatic mode, but not for manual mode?
Using 48 kHz clock for input clock (2-3 ns of jitter) and output clock, with both clocks at same frequency, but different phase. Monitoring the output, jitter of about 28 us is seen when using the automatic ratio tracking.
If the automatic tracking turns off and manual mode is used, then the jitter is removed.
It is probable that this behavior could be seen in some situations, but this should not be a problem in any practical sense.
Why is this not a problem? It is important to keep in mind that the ear is sensitive primarily to frequency, and only to phase as it pertains to phase differences between the two ears. Since AES audio is processed as a stereo pair, there will be no phase difference between left and right channels. The more important figure of merit is THD +N, which is essentially frequency variation. The Asynchronous Sample Rate Converter (ASRC) core is optimized to give the best possible THD performance.
The variance in audio phase to video phase only becomes relevant when the delay is on the order of dozens of milliseconds, which is thousands of times more than the 28 us variation in the example.
Why is there any variation? This is occurring because the automatic ratio tracking is a closed loop control system. It strives to lock quickly, track closely, and most of all be stable in the locked mode so as not to introduce distortions. When it is locked, the goal is to keep the FIFO level at or very close to 16 by changing the ratio by units of 0.25 ppm. In this way, the ratio adjustment does not add measurable noise. In order to promote stability in the ratio, there is a dead zone of approximately one FIFO location. In other words, when the FIFO is within one sample of the target fill level, no adjustment occurs. The measurements taken, and what is called jitter above, are just measurements of this dead zone. The level is in units of 1/16 or a sample location. It varies by 19 or about one sample. In this example, the variation in the audio phase relative to video is approximately 28 us, or approximately 1 and1/3 sample times of a 48 kHz audio sample (21 us). Therefore, the audio-to-video phase variation of a little over one sample corresponds to the variation of the level, which corresponds to the dead zone of the control loop, all of which is expected and required for stable automatic ratio tracking.
Why is there not variation in manual mode? In manual mode, there is no control loop servicing the ratio, thus no variation in the ratio and no variation in the relative audio and video phase.
It should also be noted that the audio output is not just a delayed version of the input, even though the same sampling clock is used. The output is a completely different set of samples that have been synthesized to match the frequency content of the input. It just happens that in this case the input and output are exactly the same frequency. Because of this, if there are multiple channels, it is recommended that all of them be passed through the ASRC in order to keep the channels properly aligned.