Description: An AVSR-based audio/visual deepfake detection method that leverages speech correlation. The model uses dual-branch encoders for audio and video to support independent detection of each modality.
Scope: Audio or Videos featuring a single face. A centered, frontal face orientation is preferred, without sunglasses, occlusion, or other visual obstructions.