Omri Uzan

I am a student researcher working on ML and NLP, advised by Yuval Pinter.
I work as a software engineer at Meta.

Email  /  Scholar  /  X  /  Github /  Linkedin

profile photo

Research

I am broadly interested in understanding language models, their potential capabilities, inherent limitations, and future social implications. My current research focuses on:
1. Exploring the limitations imposed on LLMs by their foundational word representation schemes.
2. Advancing the study of modular NLP systems, with a focus on retrieval-augmented generation, to develop systems that are more efficient, reliable, and adaptable in real-world applications.

Papers

Tokenization Is More Than Compression
Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner
EMNLP, 2024   (Oral Presentation)
ACL Anthology /  arXiv
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan, Craig W.Schmidt, Chris Tanner, Yuval Pinter
ACL, 2024   (Oral Presentation)
🏆Outstanding Paper Award🏆
🏆Senior Area Chair Paper Award🏆
ACL Anthology /  arXiv

Preprints

Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella
arXiv

News

01.25 - Giving a talk on 'Greed is All You Need: An Evaluation of Tokenizer Inference Methods' at NLP-IL Journal Club.

12.24 - Honored to be featured on my university's website and LinkedIn for winning paper awards at ACL 2024.

09.24 - Tokenization Is More Than Compression was accepted to EMNLP 2024 main with an oral presentation.

08.24 - Greed is All You Need: An Evaluation of Tokenizer Inference Methods, received both an outstanding paper award and a senior area chair award at ACL 2024! 🏆

05.24 - Greed is All You Need: An Evaluation of Tokenizer Inference Methods accepted to ACL 2024 main!

This website is based on Jon Barron's website (source code here).