VIOLET : End-to-End Video-Language Transformers with Masked Visual-tokenModeling

· Dev.to