Research
I am broadly interested in understanding language models, their potential capabilities,
inherent limitations, and future social implications.
My current research focuses on:
1. Exploring the limitations imposed on LLMs by their foundational word representation schemes.
2. Advancing the study of modular NLP systems, with a focus on retrieval-augmented generation, to develop systems that are more efficient, reliable, and adaptable in real-world applications.
|
|
Tokenization Is More Than Compression
Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner
EMNLP, 2024   (Oral Presentation)
ACL Anthology /
arXiv
|
|
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan, Craig W.Schmidt, Chris Tanner, Yuval Pinter
ACL, 2024   (Oral Presentation)
🏆Outstanding Paper Award🏆
🏆Senior Area Chair Paper Award🏆
ACL Anthology /
arXiv
|
|
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella
arXiv
|
News
01.25 - Giving a talk on 'Greed is All You Need: An Evaluation of Tokenizer Inference Methods' at NLP-IL Journal Club.
12.24 - Honored to be featured on my university's website and LinkedIn for winning paper awards at ACL 2024.
09.24 - Tokenization Is More Than Compression was accepted to EMNLP 2024 main with an oral presentation.
08.24 - Greed is All You Need: An Evaluation of Tokenizer Inference Methods, received both an outstanding paper award and a senior area chair award at ACL 2024! 🏆
05.24 - Greed is All You Need: An Evaluation of Tokenizer Inference Methods accepted to ACL 2024 main!
|
This website is based on Jon Barron's website (source code here).
|