notes  in  learning

Refine: inbox, tooob
Thursday, January 24, 03:33PM  by:shuri
Viewable by:

source Twitch
Twitch is the world's leading video platform and community for gamers.
Saturday, January 12, 10:36PM  by:shuri
Viewable by:

source IBM teaches AI to debate humans by crowdsourcing arguments
IBM's AI wants to take on all-comers in debates on every topic. But, first, its going to crowdsource its arguments from humans online and at CES 2019.
Friday, January 11, 05:24PM  by:shuri
Viewable by:

source BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  1. ELMo (Peterset al., 2018), - Matthew  Peters,  Mark  Neumann,  Mohit  Iyyer,  Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018.  Deep contextualized word rep- resentations. In NAACL
  2. Generative Pre-trained Transformer (OpenAIGPT) (Radford et al., 2018) - Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language under- standing with unsupervised learning. Technical re- port, OpenAI.
  3. Transformer (Vaswani et al., 2017) - Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- cessing Systems , pages 6000–6010.
  4. SQuAD question answering (Rajpurkar et al., 2016) - Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 .
  5. Masked language model - inspired by  by the Cloze task (Taylor, 1953).
  6. introduce a “next sentence prediction” task that jointly pre-trains text-pair representations.
  7. natural language inference (Bowman et al., 2015; Williams et al., 2018)
  8. There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning.
  9. The feature-based approach, such as ELMo (Peters et al., 2018), uses tasks-specific architectures that include the pre-trained representations as additional features.
  10. The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018),
  11. The major limitation is that standard language models are unidirectional,
  12. In addition to the masked language model, we also introduce a “next sentence prediction” task that jointly pre-trains text-pair representations.
  13. ELMo advances the state-of-the-art for several major NLP benchmarks (Peters et al., 2018) including question answering (Rajpurkar et al., 2016) on SQuAD, sentiment analysis (Socher et al., 2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003).
  14. A recent trend in transfer learning from language
    models (LMs) is to pre-train some model architecture
    on a LM objective before fine-tuning
    that same model for a supervised downstream
    task (Dai and Le, 2015; Howard and Ruder, 2018;
    Radford et al., 2018)
  15. The advantage of these approaches is that few parameters need to be learned from scratch. At least partly due this advantage, OpenAI GPT (Radford et al., 2018) achieved previously state-of-the-art results on many sentencelevel tasks from the GLUE benchmark (Wang et al., 2018).
  16. transfer from supervised tasks with large datasets, such as natural language inference (Conneau et al., 2017) and machine translation (Mc- Cann et al., 2017).
  17. We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocabulary
  18. denoising auto-encoders (Vincent et al., 2008)
  19. Adam with learning rate of 1e-4, 1 = 0:9,
    2 = 0:999, L2 weight decay of 0:01, learning
    rate warmup over the first 10,000 steps, and linear
    decay of the learning rate.
  20. We use a dropout probability of 0.1 on all layers.
  21. We use a gelu activation (Hendrycks and Gimpel, 2016) rather than the standard relu,
  22. training loss is the sum of the mean masked LM likelihood and mean next sentence prediction likelihood.
  23. We also observed that large data sets (e.g., 100k+ labeled training examples) were far less sensitive to hyperparameter choice than small data sets. Fine-tuning is typically very fast, so it is reasonable to simply run an exhaustive search over the above parameters and choose the model that performs best on the development set.
Friday, January 11, 03:22PM  by:shuri
Viewable by:
Monday, December 31 2018, 11:31AM  by:shuri
fake videos,
nonconsensual pornography,
revenge porn,
scarlett johansson,
artificial intelligence,
fake porn,
Viewable by:

source Fake-porn videos are being weaponized to harass and humiliate women: ‘Everybody is a potential target’
"Deepfake" creators target both celebrities and everyday women with photos taken from the Web. Even Scarlett Johansson says she's powerless to fight them.
Monday, November 05 2018, 07:07PM  by:shuri
software development,
big data,
data quality,
predictive analytics,
machine learning,
Viewable by:

source Advanced ETL Functionality and Machine Learning Pre-Processing [Video] - DZone AI
This video is an overview of the pre-processing techniques needed before training a predictive model and of the native KNIME nodes suitable implement them.
Monday, November 05 2018, 05:48PM  by:shuri
Viewable by:

source Michelangelo PyML: Introducing Uber's Platform for Rapid Python ML Model Development

Uber developed Michelangelo PyML to run identical copies of machine learning models locally in both real time experiments and large-scale offline prediction jobs.


  • "Unsurprisingly, data scientists overwhelmingly prefer to work in Python"
  • "most data scientists prefer to gather data upfront and iterate on their prototypes locally, using tools like pandas, scikit-learn, PyTorch, and TensorFlow."
  • "it can be challenging to ensure that both the online and offline versions of the model are equivalent."
  • "Feature transformations are limited to the vocabulary and expressiveness of Michelangelo’s DSL" we did the same thing in scotch for better or worse.
Wednesday, September 26 2018, 11:57AM  by:shuri
Viewable by:

Lime: Explaining the predictions of any machine learning classifier - marcotcr/lime
Wednesday, September 26 2018, 11:54AM  by:shuri
Viewable by:

source What is Shapley value regression and how does one implement it?
I have seen references to Shapley value regression elsewhere on this site, e.g.: Alternative to Shapley value regression Shapley Value Regression for prediction Shapley value regression / driver
Thursday, September 13 2018, 06:07PM  by:shuri
Viewable by:

source Running PySpark on Jupyter Notebook with Docker – Suci Lin – Medium
It is much much easier to run PySpark with docker now, especially using an image from the repository of Jupyter. When you just want to try or learn Python. it is very convenient to use Jupyter…
Wednesday, September 12 2018, 04:45PM  by:shuri
Viewable by:

source Feature Importance and Feature Selection With XGBoost in Python
A benefit of using ensembles of decision tree methods like gradient boosting is that they can automatically provide estimates of feature importance from a trained predictive model. In this post you will discover how you can estimate the importance of features for a predictive modeling problem using the XGBoost library in Python. After reading this …
Wednesday, September 12 2018, 01:59PM  by:shuri
Viewable by:

source Train/Test/Validation Set Splitting in Sklearn
How could I split randomly a data matrix and the corresponding label vector into a X_train, X_test, X_val, y_train, y_test, y_val with Sklearn? As far as I know, sklearn.cross_validation.train_test...
Sunday, September 09 2018, 01:44AM  by:shuri
intel ai,
intel software,
intel developer zone,
software developer,
software tools,
developer tools,
Viewable by:

source AIDC 2018 | CLOSING KEYNOTE | I Andrew NG, CEO
SUBSCRIBE TO THE INTEL SOFTWARE YOUTUBE CHANNEL: http://bit.ly/2iZTCsz About Intel Software: The Intel® Developer Zone encourages and supports software devel...
Saturday, September 08 2018, 10:13PM  by:shuri
deep learning,
Viewable by:

source Nuts and Bolts of Applying Deep Learning (Andrew Ng)
The talks at the Deep Learning School on September 24/25, 2016 were amazing. I clipped out individual talks from the full live streams and provided links to ...
Friday, September 07 2018, 12:16PM  by:shuri
Viewable by:

source Differences between L1 and L2 as Loss Function and Regularization
[2014/11/30: Updated the L1-norm vs L2-norm loss function via a programmatic validated diagram. Thanks readers for the pointing out the confusing diagram. Ne...
Friday, September 07 2018, 11:22AM  by:shuri
Viewable by:

source Memorizing is not learning! — 6 tricks to prevent overfitting in machine learning.
Overfitting may be the most frustrating issue of Machine Learning. In this article, we’re going to see what it is, how to spot it, and most importantly how to prevent it from happening. The word…