Telegram Group & Telegram Channel
Интересный пост Томаша Миколова

"Yesterday we received a Test of Time Award at NeurIPS for the word2vec paper from ten years ago. I'm really happy about it! I think it's the first "best paper" type of award I ever received. In fact, the original word2vec paper was rejected at the first ICLR conference in 2013 (despite the acceptance rate of around 70%), so it made me think how difficult it is for reviewers to predict future impact of research papers.

I've heard a lot of comments - both positive and negative - about word2vec during those years, and did not really comment online about it. Somehow I felt the research community is constantly flooded by propaganda-style PR from certain researchers who are hacking this way the citation counts and attention of others, and I did not want to be part of this. But after ten years, I think it could be entertaining to share some stories associated with this paper.

One frequent comment I've heard was that the code was difficult to understand to the point that some people thought I made it unreadable intentionally. But no, I'm not so evil :D The code ended up being over-optimized because I was waiting for many months for approval to publish it, and meanwhile I was trying to make it both faster and shorter. In fact, looking back, if there were not Greg and Jeff in the Brain team, I doubt I would ever get that approval - I think word2vec was likely the first widely known AI project that Google open-sourced.

There was also significant controversy around the GloVe project from Stanford NLP group that was published more than a year after word2vec. While it copied many tricks from our project, GloVe always felt like a step back to me: it was slower, required more memory, and the resulting vectors had lower quality than the original word2vec. However, it was published with word vectors pre-trained on much more data and thus gained a lot of popularity - although the comparison was really apples-to-oranges. We anyways did fix this later in the fastText project, where we did show that word2vec is much better than GloVe when trained on the same data.

I also received a lot of comments on the word analogies - from "I knew that too but forgot to publish it!" (Geoff Hinton, I believe you :) happens to everyone, and anyways I think everybody knows what the origin of Distributed Representations is) to "it's a total hack and I'm sure it doesn't work!" (random guys who didn't bother to read the papers and try it out themselves - including Ian Goodfellow raging about it on Twitter).

Despite word2vec being my most cited paper, I did never think of it as my most impactful project. In fact, word2vec code originally started as a subset of my previous project - RNNLM - which I think ended up forgotten too quickly. In my eyes, it was at least as revolutionary as AlexNet. Just to name ideas that were for the first time ever demonstrated within RNNLM already in 2010 (when it was still dark ages for deep learning): scalable training of recurrent neural networks (as I invented gradient clipping), first ever text generation from neural language model (I was showing examples of this since 2007), dynamic evaluation, character and sub-word level neural language modeling, neural language model adaptation (nowadays called fine-tuning), first publicly available LM benchmark (the modified Penn Treebank dataset - there really was nothing like this on the web when I started my PhD). I published the first ever study showing that neural nets beat n-gram language models increasingly more with more training data when everything is done correctly (today this sounds obvious, but back in the days this was widely considered impossible - even most Google guys did think that the more data you have, the more futile is to work on anything besides n-grams and smoothing techniques).



group-telegram.com/gonzo_ML/2178
Create:
Last Update:

Интересный пост Томаша Миколова

"Yesterday we received a Test of Time Award at NeurIPS for the word2vec paper from ten years ago. I'm really happy about it! I think it's the first "best paper" type of award I ever received. In fact, the original word2vec paper was rejected at the first ICLR conference in 2013 (despite the acceptance rate of around 70%), so it made me think how difficult it is for reviewers to predict future impact of research papers.

I've heard a lot of comments - both positive and negative - about word2vec during those years, and did not really comment online about it. Somehow I felt the research community is constantly flooded by propaganda-style PR from certain researchers who are hacking this way the citation counts and attention of others, and I did not want to be part of this. But after ten years, I think it could be entertaining to share some stories associated with this paper.

One frequent comment I've heard was that the code was difficult to understand to the point that some people thought I made it unreadable intentionally. But no, I'm not so evil :D The code ended up being over-optimized because I was waiting for many months for approval to publish it, and meanwhile I was trying to make it both faster and shorter. In fact, looking back, if there were not Greg and Jeff in the Brain team, I doubt I would ever get that approval - I think word2vec was likely the first widely known AI project that Google open-sourced.

There was also significant controversy around the GloVe project from Stanford NLP group that was published more than a year after word2vec. While it copied many tricks from our project, GloVe always felt like a step back to me: it was slower, required more memory, and the resulting vectors had lower quality than the original word2vec. However, it was published with word vectors pre-trained on much more data and thus gained a lot of popularity - although the comparison was really apples-to-oranges. We anyways did fix this later in the fastText project, where we did show that word2vec is much better than GloVe when trained on the same data.

I also received a lot of comments on the word analogies - from "I knew that too but forgot to publish it!" (Geoff Hinton, I believe you :) happens to everyone, and anyways I think everybody knows what the origin of Distributed Representations is) to "it's a total hack and I'm sure it doesn't work!" (random guys who didn't bother to read the papers and try it out themselves - including Ian Goodfellow raging about it on Twitter).

Despite word2vec being my most cited paper, I did never think of it as my most impactful project. In fact, word2vec code originally started as a subset of my previous project - RNNLM - which I think ended up forgotten too quickly. In my eyes, it was at least as revolutionary as AlexNet. Just to name ideas that were for the first time ever demonstrated within RNNLM already in 2010 (when it was still dark ages for deep learning): scalable training of recurrent neural networks (as I invented gradient clipping), first ever text generation from neural language model (I was showing examples of this since 2007), dynamic evaluation, character and sub-word level neural language modeling, neural language model adaptation (nowadays called fine-tuning), first publicly available LM benchmark (the modified Penn Treebank dataset - there really was nothing like this on the web when I started my PhD). I published the first ever study showing that neural nets beat n-gram language models increasingly more with more training data when everything is done correctly (today this sounds obvious, but back in the days this was widely considered impossible - even most Google guys did think that the more data you have, the more futile is to work on anything besides n-grams and smoothing techniques).

BY gonzo-обзоры ML статей


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/gonzo_ML/2178

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

Telegram, which does little policing of its content, has also became a hub for Russian propaganda and misinformation. Many pro-Kremlin channels have become popular, alongside accounts of journalists and other independent observers. Telegram boasts 500 million users, who share information individually and in groups in relative security. But Telegram's use as a one-way broadcast channel — which followers can join but not reply to — means content from inauthentic accounts can easily reach large, captive and eager audiences. Such instructions could actually endanger people — citizens receive air strike warnings via smartphone alerts. In 2014, Pavel Durov fled the country after allies of the Kremlin took control of the social networking site most know just as VK. Russia's intelligence agency had asked Durov to turn over the data of anti-Kremlin protesters. Durov refused to do so. On December 23rd, 2020, Pavel Durov posted to his channel that the company would need to start generating revenue. In early 2021, he added that any advertising on the platform would not use user data for targeting, and that it would be focused on “large one-to-many channels.” He pledged that ads would be “non-intrusive” and that most users would simply not notice any change.
from us


Telegram gonzo-обзоры ML статей
FROM American