Интересный пост Томаша Миколова

gonzo-обзоры ML статей

Интересный пост Томаша Миколова

"Yesterday we received a Test of Time Award at NeurIPS for the word2vec paper from ten years ago. I'm really happy about it! I think it's the first "best paper" type of award I ever received. In fact, the original word2vec paper was rejected at the first ICLR conference in 2013 (despite the acceptance rate of around 70%), so it made me think how difficult it is for reviewers to predict future impact of research papers.

I've heard a lot of comments - both positive and negative - about word2vec during those years, and did not really comment online about it. Somehow I felt the research community is constantly flooded by propaganda-style PR from certain researchers who are hacking this way the citation counts and attention of others, and I did not want to be part of this. But after ten years, I think it could be entertaining to share some stories associated with this paper.

One frequent comment I've heard was that the code was difficult to understand to the point that some people thought I made it unreadable intentionally. But no, I'm not so evil :D The code ended up being over-optimized because I was waiting for many months for approval to publish it, and meanwhile I was trying to make it both faster and shorter. In fact, looking back, if there were not Greg and Jeff in the Brain team, I doubt I would ever get that approval - I think word2vec was likely the first widely known AI project that Google open-sourced.

There was also significant controversy around the GloVe project from Stanford NLP group that was published more than a year after word2vec. While it copied many tricks from our project, GloVe always felt like a step back to me: it was slower, required more memory, and the resulting vectors had lower quality than the original word2vec. However, it was published with word vectors pre-trained on much more data and thus gained a lot of popularity - although the comparison was really apples-to-oranges. We anyways did fix this later in the fastText project, where we did show that word2vec is much better than GloVe when trained on the same data.

I also received a lot of comments on the word analogies - from "I knew that too but forgot to publish it!" (Geoff Hinton, I believe you :) happens to everyone, and anyways I think everybody knows what the origin of Distributed Representations is) to "it's a total hack and I'm sure it doesn't work!" (random guys who didn't bother to read the papers and try it out themselves - including Ian Goodfellow raging about it on Twitter).

Despite word2vec being my most cited paper, I did never think of it as my most impactful project. In fact, word2vec code originally started as a subset of my previous project - RNNLM - which I think ended up forgotten too quickly. In my eyes, it was at least as revolutionary as AlexNet. Just to name ideas that were for the first time ever demonstrated within RNNLM already in 2010 (when it was still dark ages for deep learning): scalable training of recurrent neural networks (as I invented gradient clipping), first ever text generation from neural language model (I was showing examples of this since 2007), dynamic evaluation, character and sub-word level neural language modeling, neural language model adaptation (nowadays called fine-tuning), first publicly available LM benchmark (the modified Penn Treebank dataset - there really was nothing like this on the web when I started my PhD). I published the first ever study showing that neural nets beat n-gram language models increasingly more with more training data when everything is done correctly (today this sounds obvious, but back in the days this was widely considered impossible - even most Google guys did think that the more data you have, the more futile is to work on anything besides n-grams and smoothing techniques).

www.group-telegram.com/us/gonzo_ML.com/2178

12.5K viewsedited Dec 14, 2023 at 00:34

group-telegram.com/gonzo_ML/2178

Create: 2023-12-14
Last Update: 2025-06-18 18:08:24

BY gonzo-обзоры ML статей

Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/gonzo_ML/2178

Telegram | DID YOU KNOW?

Интересный пост Томаша Миколова