Publications

G. Yariv, I. Gat, S. Benaim, L. Wolf, I. Schwartz, Y. Adi (2023). Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation . AAAI'24.

Y. Tewel, Y. Shalev, R. Nadler, I. Schwartz, L. Wolf (2023). Zero-shot video captioning with evolving pseudo-tokens. BMVC'23.

I. Schwartz, V. Snæbjarnarson, S. Benaim, H. Chefer, R. Cotterell, L. Wolf, S. Belongie (2023). Discriminative Class Tokens for Text-to-Image Diffusion Models. ICCV'23.

G. Yariv, I. Gat, L. Wolf, Y. Adi, I. Schwartz (2023). AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation. INTERSPEECH'23.

O. Hupert, I. Schwartz, L. Wolf (2022). Describing Sets of Images with Textual-PCA. Findings of EMNLP'22.

H. Chefer, I. Schwartz, L. Wolf (2022). Optimizing Relevance Maps of Vision Transformers Improves Robustness. Neruips'22.

Y. Tewel, Y. Shalev, I. Schwartz, L. Wolf (2022). ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic. CVPR'22.

A. Ali, I. Schwartz, T. Hazan, L. Wolf (2022). Video and Text Matching with Conditioned Embeddings. WACV'22.

T. Braude, I. Schwartz, A. ~G. Schwing, A. Shamir (2022). Ordered attention for coherent visual storytelling. ACM-MM'22.

I. Gat, G. Lorberbom, I. Schwartz, T. Hazan (2022). Latent space explanation by intervention. AAAI'22.

I. Gat, I. Schwartz, A. ~G. Schwing (2021). Perceptual Score: Measuring Perceptiveness of Multi-Modal Classifiers. NeurIPS'21.

I. Schwartz (2021). Ensemble of MRR and NDCG models for Visual Dialog. NAACL'21.

I. Gat, I. Schwartz, A. ~G. Schwing, T. Hazan (2020). Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies. NeurIPS'20.

I. Schwartz, A. ~G. Schwing, T. Hazan (2019). Factor Graph Attention. CVPR'19.

I. Schwartz, A. ~G. Schwing, T. Hazan (2019). A Simple Baseline for Audio-Visual Scene-Aware Dialog. CVPR'19.

I. Schwartz, A. ~G. Schwing, T. Hazan (2017). High-Order Attention Models for Visual Question Answering. NIPS'17.