notes  in  intelligence

Tuesday, January 05, 07:21PM  by:shuri
Viewable by:

source CLIP: Connecting Text and Images
We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.
Tuesday, January 05, 07:19PM  by:shuri
Viewable by:

source JupyterLab 3.0 is released!. The 3.0 release of JupyterLab brings… | by Jeremy Tuloup | Jan, 2021 | Jupyter Blog
The 3.0 release of JupyterLab brings many new features to users and substantial improvements to the extension system. (Note that many third-party extensions are still in the process of updating to be…
Thursday, December 17 2015, 01:11PM  by:shuri
Viewable by:

source Evaluating Prerequisite Qualities for Learning end-to-end Dialog Systems

"this paper proposes a collection of four tasks designed to evaluate different prerequisite qualities of end-to-end dialog systems"

"QA Dataset: Tests the ability to answer factoid questions that can be answered without relation to previous dialog. The context consists of the question only.
• Recommendation Dataset: Tests the ability to provide personalized responses to the user via recommendations (in this case, of movies) rather than universal facts as above.
• QA+Recommendation Dataset: Tests the ability of maintaining short dialogs involving both factoid and personalized content where conversational state has to be maintained.
• Reddit Dataset: Tests the ability to identify most likely replies in discussions on Reddit.
• Joint Dataset: All our tasks are dialogs. They can be combined into a single dataset, testing the ability of an end-to-end model to perform well at all skills at once."

"We employ the MemN2N architecture of Sukhbaatar et al. (2015) in our experiments, with some additional modifications to construct both long-term and short-term context memories"

"Retrieving long-termmemories For each word in the last N messages we performa hash lookup to return all long-term memories (sentences) from a database that also contain that word. Words above a certain frequency cutoff can be ignored to avoid sentences that only share syntax or unimportant words. We employ the movie knowledge base of Sec. 2.1 for our long-term memories,"

"The wholemodel is trained using stochastic gradient descent byminimizing a standard cross-entropy loss between ˆa and the true label a."

"For matching two documents supervised semantic indexing (SSI) was shown to be superior to unsupervised latent semantic indexing (LSI) (Bai et al., 2009"

"we believe this is a surprisingly strong baseline that is often neglected in evaluations"


"Recurrent Neural Networks (RNNs) have proven successful at several tasks involving natural language, language modeling (Mikolov et al., 2011"

"LSTMs are not known however for tasks such as QA or item recommendation, and so we expect them to find our datasets challenging."


"We chose the method of Bordes et al. (2014)10 as our baseline. This system learns embeddings that match questions to database entries, and then ranks the set of entries, and has been shown to achieve good performance on the WEBQUESTIONS benchmark (Berant et al., 2013)."

"Answering Factual Questions Memory Networks and the baseline QA system are the two methods that have an explicit long-term memory via access to the knowledge base (KB). On the task of answering factual questions where the answers are contained in the KB, they outperform the other methods convincingly, with LSTMS being particularly poor"


"Making Recommendations In this task a long-term memory does not bring any improvement, with LSTMs, Supervised Embeddings and Memory Networks all performing similarly, and all outperforming the SVD baseline."

"LSTMs performpoorly: the posts in Reddit are quite long and the memory of the LSTMis relatively short, as pointed out by Sordoni et al. (2015).

"Testing more powerful recurrent networks such as Seq2Seq or LSTMs with attention on these benchmarks remains as future wor"

Sunday, March 23 2014, 03:17PM  by:shuri
Viewable by:

source Winning the Personalized Web Search Challenge: team Dataiku Data Science Studio
What was your background prior to entering this challenge? We're a team of four. Christophe Bourguignat is a telecommunication engineer during the day, but he becomes a serial Kaggler at night, Ken...
Sunday, March 23 2014, 11:21AM  by:shuri
Viewable by:

source Myspace co-founder Chris DeWolfe on Social Gaming Success
The social gaming battlefield is littered with the corpses of former success stories that quickly flopped when the public got bored or when their developers failed to spot a threat on the horizon.
Monday, February 10 2014, 12:39AM  by:shuri
Viewable by:

source Overkill Analytics: Wordpress Winner Describes His Method
Crossposted from Overkill Analytics, the newly launched extra-curricular data science blog by Gigaom-Wordpress Challenge winner Carter S.  You can also read more about his 'overkill' philosophy on ...
Tuesday, February 04 2014, 10:32PM  by:shuri
Viewable by:

source Q&A With Job Salary Prediction First Prize Winner Vlad Mnih
What was your background prior to entering this challenge? I just completed a PhD in Machine Learning at the University of Toronto, where Geoffrey Hinton was my advisor. Most of my work is on apply...
Tuesday, February 04 2014, 08:38PM  by:shuri
apache hadoop,
apache lucene,
business data mining,
cluster analysis,
collaborative filtering,
data extraction,
data filtering,
data framework,
data integration,
data matching,
data mining,
data mining algorithms,
data mining analysis,
data mining data,
data mining introduction,
data mining software,
data mining techniques,
data representation,
data set,
feature extraction,
fuzzy k means,
genetic algorithm,
hierarchical clustering,
high dimensional,
introduction to data mining,
knowledge discovery,
learning approach,
learning approaches,
learning methods,
learning techniques,
machine learning,
machine translation,
mahout apache,
mahout taste,
map reduce hadoop,
mining data,
mining methods,
naive bayes,
natural language processing,
text mining,
time series data,
web data mining,
Viewable by:
Tuesday, February 04 2014, 08:36PM  by:shuri
data science,
big analytics,
data mining,
crowdsourced analytics,
Viewable by:

source Software | Kaggle
Kaggle is a platform for data prediction competitions. Companies, organizations and researchers post their data and have it scrutinized by the world's best statisticians.