Jekyll2022-11-18T01:40:45+00:00https://spcman.github.io/getting-to-know-julia/feed.xmlGetting to Know JuliaA portfolio of examples to help Julia newbiesNigel AdamsJulia Project - Song Lyric Text Classification by Artist2019-09-12T00:00:00+00:002019-09-12T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/deep-learning/nlp/song-lyrics-project<p>I had an idea for a work-related project I’d like to do some day given the opportunity. The objective would be to build a machine learning model that can classify notes or documents for compliance purposes. To get such a project off the ground sufficient labelled training data would be needed. There isn’t going to be the luxury of trained data like the famous IMDb data set which contains 50,000 labelled movie reviews. The data we would obtain would likely be less than 1000 rows (at least to start with).</p>
<p>So I went on a search for trained text data sets and somehow ended up with this <a href="https://www.kaggle.com/mousehead/songlyrics">Song Lyric dataset from Kaggle</a>. I thought it would be a fun challenge to pick 5 popular artists who had made many songs and try to build a model that could predict the artist who sung the song with test data unseen by the training step. The filtered dataset used for training is less than 800 rows making it kind of comparable to the work-related project I had in mind.</p>
<p>The task of predicting the artist isn’t quite as straightforward as you first think; each artist will likely have songs in different genres (e.g. upbeat, downbeat and ballads). The songs may have been written by different band members and will also be of variable length. Just guessing the artist correctly is going to be a 1 in 5 chance or (0.2 probability).</p>
<p>This project brings together all my recent Julia blog post learnings with NLP, Flux, Neural Networks and Convolutional Neural Networks, (i.e. CNN’s or ConvNets). An added challenge was the lack of similar examples on the web for Word Embeddings with Flux or Word Embeddings with Flux CNN’s. I’m quite proud I got these working without having to copy anyone else’s work. The code may not be super-pretty but it works!!</p>
<p>Let’s get started loading the libraries we need.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">CSV</span><span class="x">,</span> <span class="n">DataFrames</span><span class="x">,</span> <span class="n">Random</span><span class="x">,</span> <span class="n">TextAnalysis</span><span class="x">,</span> <span class="n">Languages</span><span class="x">,</span> <span class="n">Statistics</span><span class="x">,</span> <span class="n">PyPlot</span><span class="x">,</span> <span class="n">Flux</span><span class="x">,</span> <span class="n">BSON</span>
<span class="c">#Display Flux Version</span>
<span class="k">import</span> <span class="n">Pkg</span> <span class="x">;</span> <span class="n">Pkg</span><span class="o">.</span><span class="n">installed</span><span class="x">()[</span><span class="s">"Flux"</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>loaded
v"0.7.2"
</code></pre></div></div>
<h2 id="loading-and-initial-data-preparation">Loading and Initial Data Preparation</h2>
<p>Load the data from the CSV file we downloaded from Kaggle and show a count of all songs.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_all</span><span class="o">=</span><span class="n">CSV</span><span class="o">.</span><span class="n">read</span><span class="x">(</span><span class="s">"/mnt/juliabox/NLP/songdata.csv"</span><span class="x">)</span>
<span class="n">categorical!</span><span class="x">(</span><span class="n">df_all</span><span class="x">,</span> <span class="o">:</span><span class="n">artist</span><span class="x">)</span>
<span class="n">show</span><span class="x">(</span><span class="n">by</span><span class="x">(</span><span class="n">df_all</span><span class="x">,</span> <span class="o">:</span><span class="n">artist</span><span class="x">,</span> <span class="n">nrow</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>643×2 DataFrame
│ Row │ artist │ x1 │
├─────┼───────────────┼───────┤
│ 1 │ 'n Sync │ 93 │
│ 2 │ ABBA │ 113 │
│ 3 │ Ace Of Base │ 74 │
│ 4 │ Adam Sandler │ 70 │
│ 5 │ Adele │ 54 │
│ 6 │ Aerosmith │ 171 │
│ 7 │ Air Supply │ 174 │
⋮
│ 636 │ Zeromancer │ 30 │
│ 637 │ Ziggy Marley │ 64 │
│ 638 │ Zoe │ 1 │
│ 639 │ Zoegirl │ 38 │
│ 640 │ Zornik │ 12 │
│ 641 │ Zox │ 21 │
│ 642 │ Zucchero │ 30 │
│ 643 │ Zwan │ 14 │
</code></pre></div></div>
<p>This is a great dataset but we need to make a new dataframe containing just the song lyrics and labelled artists selected. The data is randomly shuffled using a known ‘seed’ so we can replicate the same order each time the notebook is run. The first row of data is output.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">artists</span><span class="o">=</span><span class="x">[</span><span class="s">"Queen"</span><span class="x">,</span> <span class="s">"The Beatles"</span><span class="x">,</span> <span class="s">"Michael Jackson"</span><span class="x">,</span> <span class="s">"Eminem"</span><span class="x">,</span> <span class="s">"INXS"</span><span class="x">]</span>
<span class="n">df</span><span class="o">=</span><span class="n">df_all</span><span class="x">[[</span><span class="n">x</span> <span class="k">in</span> <span class="n">artists</span> <span class="k">for</span> <span class="n">x</span> <span class="k">in</span> <span class="n">df_all</span><span class="x">[</span><span class="o">:</span><span class="n">artist</span><span class="x">]],</span><span class="o">:</span><span class="x">]</span>
<span class="n">df_all</span><span class="o">=</span><span class="nb">nothing</span>
<span class="n">Random</span><span class="o">.</span><span class="n">seed!</span><span class="x">(</span><span class="mi">1000</span><span class="x">);</span>
<span class="n">df</span><span class="o">=</span><span class="n">df</span><span class="x">[</span><span class="n">shuffle</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">(</span><span class="n">df</span><span class="x">,</span> <span class="mi">1</span><span class="x">)),</span><span class="o">:</span><span class="x">]</span>
<span class="n">df</span><span class="x">[</span><span class="mi">1</span><span class="x">,</span><span class="o">:</span><span class="x">]</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/dataframe_songs.png" alt="song lyrics dataframe" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">size</span><span class="x">(</span><span class="n">df</span><span class="x">,</span><span class="mi">1</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>727
</code></pre></div></div>
<p>This size of the dataset is only 727 rows – a shortage of examples will mean this is likely to be a hard task!</p>
<h2 id="preprocessing---clean-up">Preprocessing - clean-up</h2>
<p>The next block of code uses the TextAnalysis library to create a corpus of our song lyrics and cleans it for the next step.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">docs</span><span class="o">=</span><span class="kt">Any</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">(</span><span class="n">df</span><span class="x">,</span><span class="mi">1</span><span class="x">)</span>
<span class="n">txt</span><span class="o">=</span><span class="n">df</span><span class="o">.</span><span class="n">text</span>
<span class="n">txt</span><span class="o">=</span><span class="n">replace</span><span class="x">(</span><span class="n">df</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">text</span><span class="x">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span> <span class="o">=></span> <span class="s">" "</span><span class="x">)</span>
<span class="n">txt</span><span class="o">=</span><span class="n">replace</span><span class="x">(</span><span class="n">df</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">text</span><span class="x">,</span> <span class="s">"'"</span> <span class="o">=></span> <span class="s">""</span><span class="x">)</span>
<span class="n">dm</span><span class="o">=</span><span class="n">TextAnalysis</span><span class="o">.</span><span class="n">DocumentMetadata</span><span class="x">(</span><span class="n">Languages</span><span class="o">.</span><span class="n">English</span><span class="x">(),</span> <span class="n">df</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">song</span><span class="x">,</span><span class="s">""</span><span class="x">,</span><span class="s">""</span><span class="x">)</span>
<span class="n">doc</span><span class="o">=</span><span class="n">StringDocument</span><span class="x">(</span><span class="n">txt</span><span class="x">,</span> <span class="n">dm</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">docs</span><span class="x">,</span> <span class="n">doc</span><span class="x">)</span>
<span class="k">end</span>
<span class="n">crps</span><span class="o">=</span><span class="n">Corpus</span><span class="x">(</span><span class="n">docs</span><span class="x">)</span>
<span class="n">orig_corpus</span><span class="o">=</span><span class="n">deepcopy</span><span class="x">(</span><span class="n">crps</span><span class="x">);</span>
<span class="n">prepare!</span><span class="x">(</span><span class="n">crps</span><span class="x">,</span> <span class="n">strip_non_letters</span> <span class="o">|</span> <span class="n">strip_punctuation</span> <span class="o">|</span> <span class="n">strip_case</span> <span class="o">|</span> <span class="n">strip_stopwords</span> <span class="o">|</span> <span class="n">strip_whitespace</span><span class="x">)</span>
</code></pre></div></div>
<p>Let’s take a look at the first song to see what just took place. In the original corpus the first song by Queen looked like this.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">orig_corpus</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>StringDocument{String}("Oh my love weve had \nOur share of tears \nOh my friends weve had \nOur hopes and fears \nOh my friend its been \nA long hard year \nBut now its Christmas \nYes its Christmas \nThank God its Christmas \n \nThe moon and stars \nSeem awful cold and bright \nLets hope the snow will \nMake this Christmas right \n \nMy friend the world will share \nThis special night \nBecause its Christmas \nYes its Christmas \nThank God its Christmas \nFor one night \nThank God its Christmas \nYeah thank God its Christmas \nThank God its Christmas \nCan it be Christmas \nLet it be Christmas every day \n \nOh my love we live \nIn troubled days \nOh my friend we have \nThe strangest ways \nOh my friends on this \nOne day of days \nThank God its Christmas \nYes its Christmas \nThank God its Christmas \nFor one day \n \nThank God its Christmas \nYes its Christmas \nThank God its Christmas \nWooh yeah \nThank God its Christmas \nYeah yeah yeah yes its Christmas \nThank God its Christmas \nFor one day yeah - Christmas \n \nA very merry Christmas to you all \n\n", TextAnalysis.DocumentMetadata(Languages.English(), "Thank God It's Christmas", "", ""))
</code></pre></div></div>
<p>After the pre-processing step it looked like this.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crps</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>StringDocument{String}("oh love weve share tears oh friends weve hopes fears oh friend hard christmas christmas thank god christmas moon stars awful cold bright hope snow christmas friend world share special night christmas christmas thank god christmas night thank god christmas yeah thank god christmas thank god christmas christmas christmas day oh love live troubled days oh friend strangest oh friends day days thank god christmas christmas thank god christmas day thank god christmas christmas thank god christmas wooh yeah thank god christmas yeah yeah yeah christmas thank god christmas day yeah christmas merry christmas ", TextAnalysis.DocumentMetadata(Languages.English(), "Thank God It's Christmas", "", ""))
</code></pre></div></div>
<h2 id="preprocesing---prep-for-training">Preprocesing - prep for training</h2>
<p>The update lexicon commands will quickly count our words and consequently let us lookup words to see in which songs they occur.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">update_lexicon!</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">update_inverse_index!</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
</code></pre></div></div>
<p>The word “christmas” is located in the song corpus with these indexes</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crps</span><span class="x">[</span><span class="s">"christmas"</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8-element Array{Int64,1}:
1
162
239
328
332
490
606
638
</code></pre></div></div>
<p>The following code builds our word dictionary (<code class="language-plaintext highlighter-rouge">word_dict</code>).</p>
<p>Each word in our song corpus can now be represented by a unique integer.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m_dtm</span><span class="o">=</span><span class="n">DocumentTermMatrix</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">word_dict</span><span class="o">=</span><span class="n">m_dtm</span><span class="o">.</span><span class="n">column_indices</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Dict{String,Int64} with 8449 entries:
"ont" => 5080
"youd" => 8421
"bsta" => 897
"enjoy" => 2388
"chocolate" => 1226
"fight" => 2675
"null" => 5007
"princess" => 5603
"snuggle" => 6777
"carousels" => 1068
"needin" => 4914
"helping" => 3378
"manufacture" => 4437
"sheezy" => 6462
"sleepless" => 6682
"favor" => 2612
"henry" => 3391
"eddie" => 2303
"aaaah" => 5
"borders" => 779
"tenor" => 7459
"star" => 7001
"prick" => 5594
"worship" => 8340
"itll" => 3775
⋮ => ⋮
</code></pre></div></div>
<p>This function returns the <code class="language-plaintext highlighter-rouge">word_dict</code> index value of the word passed in <code class="language-plaintext highlighter-rouge">s</code>. It returns 0 if the word is not found.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tk_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">haskey</span><span class="x">(</span><span class="n">word_dict</span><span class="x">,</span> <span class="n">s</span><span class="x">)</span> <span class="o">?</span> <span class="n">i</span><span class="o">=</span><span class="n">word_dict</span><span class="x">[</span><span class="n">s</span><span class="x">]</span> <span class="o">:</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tk_idx (generic function with 1 method)
</code></pre></div></div>
<p>Let’s try it out.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tk_idx</span><span class="x">(</span><span class="s">"christmas"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1249
</code></pre></div></div>
<p>For the training step all the songs need to be the same length of words and the words need converting to numbers. The following function performs this task by padding shorter songs with zeros and truncating longer songs to the size specified.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> pad_corpus</span><span class="x">(</span><span class="n">c</span><span class="x">,</span> <span class="n">size</span><span class="x">)</span>
<span class="n">M</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">doc</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">c</span><span class="x">)</span>
<span class="n">tks</span> <span class="o">=</span> <span class="n">tokens</span><span class="x">(</span><span class="n">c</span><span class="x">[</span><span class="n">doc</span><span class="x">])</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o">>=</span><span class="n">size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">]]</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o"><</span><span class="n">size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">zeros</span><span class="x">(</span><span class="kt">Int64</span><span class="x">,</span><span class="n">size</span><span class="o">-</span><span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">))</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">tk_indexes</span><span class="x">,</span> <span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">])</span>
<span class="k">end</span>
<span class="n">doc</span><span class="o">==</span><span class="mi">1</span> <span class="o">?</span> <span class="n">M</span><span class="o">=</span><span class="n">tk_indexes</span><span class="err">'</span> <span class="o">:</span> <span class="n">M</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">M</span><span class="x">,</span> <span class="n">tk_indexes</span><span class="err">'</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">M</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pad_corpus (generic function with 1 method)
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_terms_in_songs</span><span class="o">=</span><span class="x">[</span><span class="n">length</span><span class="x">(</span><span class="n">tokens</span><span class="x">(</span><span class="n">crps</span><span class="x">[</span><span class="n">i</span><span class="x">]))</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">crps</span><span class="x">)]</span>
<span class="n">println</span><span class="x">(</span><span class="s">"min </span><span class="si">$</span><span class="s">(minimum(num_terms_in_songs)) max </span><span class="si">$</span><span class="s">(maximum(num_terms_in_songs)) mean </span><span class="si">$</span><span class="s">(mean(num_terms_in_songs))"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>min 19 max 400 mean 99.43053645116919
</code></pre></div></div>
<p>We can see that the mean is around 100 words, however, I found (when hyperparameter tuning) that a higher number improved accuracy. We will set <code class="language-plaintext highlighter-rouge">doc_pad_size</code> to 200.</p>
<p><code class="language-plaintext highlighter-rouge">X</code> becomes our training data which is now in a format suitable for input into a neural network model.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">doc_pad_size</span><span class="o">=</span><span class="mi">200</span>
<span class="n">padded_docs</span> <span class="o">=</span> <span class="n">pad_corpus</span><span class="x">(</span><span class="n">crps</span><span class="x">,</span> <span class="n">doc_pad_size</span><span class="x">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">padded_docs</span><span class="err">'</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>200×727 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
0 0 0 0 0 0 … 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 … 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 … 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
⋮ ⋮ ⋱ ⋮
8398 7460 1684 5002 7490 4321 3472 4321 7863 3667 3456 3269
8398 3456 5061 1144 409 2632 1180 409 6799 6839 3244 4677
8398 7580 8423 833 3623 4408 … 7733 3021 3168 3623 3456 4326
1249 6817 3028 915 1093 4321 2524 4321 1803 3071 3244 1564
7482 6372 4220 6472 7562 2632 4448 409 5576 3667 3456 4186
3064 2122 3623 4968 7368 4408 6630 3021 6921 3667 3456 7618
1249 1684 8309 4968 4189 4321 3676 5083 7589 3623 3456 1684
1801 8398 3575 8092 3614 2632 … 5448 4321 3177 3071 3456 3699
8398 8398 8423 4859 2823 4408 3377 409 7631 3667 3456 2182
1249 8398 7589 7057 582 4321 7652 409 3991 1092 3244 3211
4579 8398 7589 3956 3338 8398 4562 5083 7636 7490 3456 3754
1249 8398 7589 1243 2823 2632 7458 3021 622 411 3244 4368
</code></pre></div></div>
<p>Our data labels <code class="language-plaintext highlighter-rouge">y</code> (i.e artists) also need processing into a one-hot-matrix for classification. First let’s define a dictionary of artists called <code class="language-plaintext highlighter-rouge">artist_dict</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">artist_dict</span> <span class="o">=</span> <span class="kt">Dict</span><span class="x">()</span>
<span class="k">for</span> <span class="x">(</span><span class="n">n</span><span class="x">,</span> <span class="n">a</span><span class="x">)</span> <span class="k">in</span> <span class="n">enumerate</span><span class="x">(</span><span class="n">unique</span><span class="x">(</span><span class="n">df</span><span class="o">.</span><span class="n">artist</span><span class="x">))</span>
<span class="n">artist_dict</span><span class="x">[</span><span class="s">"</span><span class="si">$</span><span class="s">a"</span><span class="x">]</span> <span class="o">=</span> <span class="n">n</span>
<span class="k">end</span>
<span class="n">artist_dict</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Dict{Any,Any} with 5 entries:
"Queen" => 1
"Eminem" => 5
"The Beatles" => 3
"Michael Jackson" => 4
"INXS" => 2
</code></pre></div></div>
<p>We’ll now use onehotbatch magic to make the required transformation for this classification problem.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">artist_indexes</span><span class="o">=</span><span class="x">[</span><span class="n">artist_dict</span><span class="x">[</span><span class="n">df</span><span class="x">[</span><span class="o">:</span><span class="n">artist</span><span class="x">][</span><span class="n">i</span><span class="x">]]</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">(</span><span class="n">df</span><span class="x">,</span><span class="mi">1</span><span class="x">)]</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onehotbatch</span><span class="x">(</span><span class="n">artist_indexes</span><span class="x">,</span> <span class="mi">1</span><span class="o">:</span><span class="mi">5</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>5×727 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
true false false true false … false false false false false
false true false false false false false false false true
false false true false false false true false false false
false false false false true true false true true false
false false false false false false false false false false
</code></pre></div></div>
<p>Let’s now split our <code class="language-plaintext highlighter-rouge">X</code> data into training and test data sets.</p>
<ul>
<li>
<p><strong>Training data</strong> will be used to ‘train’ the model.</p>
</li>
<li>
<p><strong>Test data</strong> will be new ‘unseen’ data used to make new predictions. As we have knowledge of the artists we will be able to score the accuracy of the model.</p>
</li>
</ul>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X_train</span> <span class="o">=</span> <span class="n">X</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="mi">1</span><span class="o">:</span><span class="mi">649</span><span class="x">]</span>
<span class="n">y_train</span> <span class="o">=</span> <span class="n">y</span><span class="x">[</span><span class="o">:</span><span class="x">,</span><span class="mi">1</span><span class="o">:</span><span class="mi">649</span><span class="x">]</span>
<span class="n">X_test</span> <span class="o">=</span> <span class="n">X</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="mi">650</span><span class="o">:</span><span class="k">end</span><span class="x">]</span>
<span class="n">y_test</span> <span class="o">=</span> <span class="n">y</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="mi">650</span><span class="o">:</span><span class="k">end</span><span class="x">]</span>
<span class="n">println</span><span class="x">(</span><span class="s">"X_train </span><span class="si">$</span><span class="s">(size(X_train)) y_train </span><span class="si">$</span><span class="s">(size(y_train)) X_test </span><span class="si">$</span><span class="s">(size(X_test)) y_test </span><span class="si">$</span><span class="s">(size(y_test))"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X_train (200, 649) y_train (5, 649) X_test (200, 78) y_test (5, 78)
</code></pre></div></div>
<p>The final preprocessing step neatly combines our training data and labels into a <code class="language-plaintext highlighter-rouge">training_set</code> tuple for Flux.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_set</span> <span class="o">=</span> <span class="x">[(</span><span class="n">X_train</span><span class="x">,</span> <span class="n">y_train</span><span class="x">)]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1-element Array{Tuple{Array{Int64,2},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}},1}:
([0 0 … 0 0; 0 0 … 0 0; … ; 4579 8398 … 7635 3929; 1249 8398 … 7454 3038], [true false … false false; false true … true false; … ; false false … false false; false false … false true])
</code></pre></div></div>
<h2 id="embedding-prep">Embedding Prep</h2>
<p>Our <code class="language-plaintext highlighter-rouge">X</code> data is now numbers and these numbers point to words in the <code class="language-plaintext highlighter-rouge">word_dict</code>. In it’s current state the numbers don’t really have all that much value for training. The next step is to load the GloVe word embeddings and prepare them as the first layer in our Flux neural network. Word embeddings give our words ‘meaning’ and have been covered in detail in my previous blog posts; please refer to these if you need more background on word vectors and embedding them in Neural Networks.</p>
<p>Let’s load in the embeddings.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> load_embeddings</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span>
<span class="kd">local</span> <span class="n">LL</span><span class="x">,</span> <span class="n">indexed_words</span><span class="x">,</span> <span class="n">index</span>
<span class="n">indexed_words</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">String</span><span class="x">}()</span>
<span class="n">LL</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Float32</span><span class="x">}}()</span>
<span class="n">open</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span> <span class="k">do</span> <span class="n">f</span>
<span class="n">index</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">line</span> <span class="k">in</span> <span class="n">eachline</span><span class="x">(</span><span class="n">f</span><span class="x">)</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">split</span><span class="x">(</span><span class="n">line</span><span class="x">)</span>
<span class="n">word</span> <span class="o">=</span> <span class="n">xs</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="n">push!</span><span class="x">(</span><span class="n">indexed_words</span><span class="x">,</span> <span class="n">word</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">LL</span><span class="x">,</span> <span class="n">parse</span><span class="o">.</span><span class="x">(</span><span class="kt">Float32</span><span class="x">,</span> <span class="n">xs</span><span class="x">[</span><span class="mi">2</span><span class="o">:</span><span class="k">end</span><span class="x">]))</span>
<span class="n">index</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">reduce</span><span class="x">(</span><span class="n">hcat</span><span class="x">,</span> <span class="n">LL</span><span class="x">),</span> <span class="n">indexed_words</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>load_embeddings (generic function with 1 method)
</code></pre></div></div>
<p>Note we have gone for the 300 dimension file this time for better results.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embeddings</span><span class="x">,</span> <span class="n">vocab</span> <span class="o">=</span> <span class="n">load_embeddings</span><span class="x">(</span><span class="s">"glove.6B.300d.txt"</span><span class="x">)</span>
<span class="n">embed_size</span><span class="x">,</span> <span class="n">max_features</span> <span class="o">=</span> <span class="n">size</span><span class="x">(</span><span class="n">embeddings</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"Loaded embeddings, each word is represented by a vector with </span><span class="si">$</span><span class="s">embed_size features. The vocab size is </span><span class="si">$</span><span class="s">max_features"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Loaded embeddings, each word is represented by a vector with 300 features. The vocab size is 400000
</code></pre></div></div>
<p>Now we define our usual functions for returning word vectors.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Function to return the index of the word 's' in the embedding (returns 0 if the word is not found)</span>
<span class="k">function</span><span class="nf"> vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span>
<span class="n">i</span><span class="o">=</span><span class="n">findfirst</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">==</span><span class="n">s</span><span class="x">,</span> <span class="n">vocab</span><span class="x">)</span>
<span class="n">i</span><span class="o">==</span><span class="nb">nothing</span> <span class="o">?</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span> <span class="o">:</span> <span class="n">i</span>
<span class="k">end</span>
<span class="c">#Function to return the word vector for string 's'</span>
<span class="n">wvec</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">embeddings</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)]</span>
<span class="c">#return the wordvec for "christmas" as a test</span>
<span class="n">wvec</span><span class="x">(</span><span class="s">"christmas"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>300-element Array{Float32,1}:
-0.12172
-0.50138
-0.094431
0.1533
-0.53234
0.77088
-0.18902
0.45391
-0.55459
-0.60449
-0.070504
0.020576
0.49627
⋮
-0.013262
-0.28618
-0.0091329
0.057448
-0.073389
0.45916
-0.30745
-0.40096
-0.039834
0.11326
0.092584
-0.37479
</code></pre></div></div>
<p>As you may of noticed a few step above, the vocab in the GloVe file we loaded earlier was 400,000. We don’t need all these words and it will make training very slow or cause memory issues if we try to keep them all. We also need to handle ‘missing words’. In the next step we make an embedding matrix of word vectors based on our own word dictionary.</p>
<p>Let’s see how big the embedding matrix should be at the minimum.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">length</span><span class="x">(</span><span class="n">word_dict</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8449
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">max_features</span> <span class="o">=</span> <span class="mi">300</span>
<span class="n">vocab_size</span> <span class="o">=</span> <span class="mi">8450</span>
<span class="n">println</span><span class="x">(</span><span class="s">"max_features=</span><span class="si">$</span><span class="s">max_features x vocab_size=</span><span class="si">$</span><span class="s">vocab_size"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>max_features=300 x vocab_size=8450
</code></pre></div></div>
<p>We’ll make the vocab_size at least 1 item bigger (to store a zero unknown word).</p>
<p>It’s likely that there will be a few words from the lyrics that aren’t in GloVe. We need to make sure that any missing words don’t spoil the training by being zero, ‘too big’ or ‘too small’. We therefore pre-fill the matrix with comparable random numbers as a first step using <code class="language-plaintext highlighter-rouge">glorot_normal</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embedding_matrix</span><span class="o">=</span><span class="n">Flux</span><span class="o">.</span><span class="n">glorot_normal</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="n">vocab_size</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>300×8450 Array{Float32,2}:
-0.0295832 -0.00962955 -0.00975701 … -0.0178692 -0.00624065
-0.00807149 -0.0167376 -0.0101676 0.021685 -0.0144029
0.0135334 0.00604884 -0.00648684 0.00542843 0.00395646
-0.00981705 -0.0340076 -0.014655 -0.00343636 0.0193315
-0.0175665 -0.00567335 0.0157851 -0.00380507 -0.00563199
0.00679509 -0.0167397 -0.0349645 … 0.0162774 0.00153693
0.00236159 0.0258442 0.0297015 -0.0117106 0.00243774
0.0119477 0.0113597 -0.0330014 -0.022494 -0.000611503
-0.0117824 0.00965574 0.0291393 -0.00894787 -0.00370767
-0.0251287 0.0157542 -0.00152643 0.00256018 -0.0117952
-0.0102175 0.00565934 -0.00816817 … -0.0257206 0.0139027
-0.00337642 -0.00810942 -0.026816 0.00700659 0.0145595
-0.0189478 0.0183039 -0.0253489 0.00468408 0.00352472
⋮ ⋱
0.0341929 0.0230084 -0.00523734 -0.00861224 0.00337825
-0.0102566 0.0121515 0.00860467 -0.00747732 -0.00846712
-0.00629439 -0.0118928 0.00331296 … -0.022168 -0.0182947
0.0127277 -0.0146548 -0.0358121 -0.00254599 -0.00691585
-0.00704753 -0.0109151 -0.0131335 0.00149089 -0.00471239
-0.00688779 -0.0127001 0.00146849 0.00887815 -0.0080609
-0.00544714 -0.0144375 0.0112734 0.0162863 0.0125952
0.0122895 0.018809 0.0105552 … 0.0117019 -0.0186995
0.0277002 -0.0295917 0.00182625 0.0267027 0.010207
-0.0244318 0.0156611 0.0113718 -0.00889063 0.0157727
0.00685371 0.0027254 -0.000454166 0.0062418 -0.0218112
-0.00230126 0.00790164 0.0146713 0.0186511 0.00484746
</code></pre></div></div>
<p>The for loop below inserts the known word vectors from GloVe by overwriting the pre-filled random numbers. It is important to note that they are inserted at the index determined from <code class="language-plaintext highlighter-rouge">word_dict</code> plus 1. The plus one makes a correction for words that are zero.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">term</span> <span class="k">in</span> <span class="n">m_dtm</span><span class="o">.</span><span class="n">terms</span>
<span class="k">if</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">term</span><span class="x">)</span><span class="o">!=</span><span class="mi">0</span>
<span class="n">embedding_matrix</span><span class="x">[</span><span class="o">:</span><span class="x">,</span><span class="n">word_dict</span><span class="x">[</span><span class="n">term</span><span class="x">]</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span><span class="o">=</span><span class="n">wvec</span><span class="x">(</span><span class="n">term</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">embedding_matrix</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>300×8450 Array{Float32,2}:
-0.0295832 -0.47974 0.090805 -0.00158495 … -0.0178692 0.0060653
-0.00807149 0.093277 0.25026 0.000283305 0.021685 -0.56901
0.0135334 -0.44665 -0.14494 0.00417346 0.00542843 -0.4516
-0.00981705 0.33504 0.81738 0.0119664 -0.00343636 0.13047
-0.0175665 -0.83164 -0.76269 0.0211371 -0.00380507 0.063553
0.00679509 0.36115 0.58164 0.0119629 … 0.0162774 -0.44511
0.00236159 0.07612 -0.081049 0.00540261 -0.0117106 0.17436
0.0119477 0.6984 0.28666 0.00103992 -0.022494 -0.19654
-0.0117824 -0.21912 -0.24209 -0.010946 -0.00894787 0.54479
-0.0251287 -0.1397 -0.083947 -0.00893104 0.00256018 0.037594
-0.0102175 0.28931 -0.15224 -0.0118294 … -0.0257206 0.26817
-0.00337642 0.28525 0.22769 0.0355204 0.00700659 -0.11157
-0.0189478 -0.61277 -0.27592 0.00716774 0.00468408 -1.16
⋮ ⋱
0.0341929 0.40865 0.30203 0.00646084 -0.00861224 -0.0442
-0.0102566 -0.66024 -0.47214 0.00124003 -0.00747732 0.42311
-0.00629439 -0.3993 -0.38838 -0.0138936 … -0.022168 0.14924
0.0127277 0.1155 -0.35227 0.00467165 -0.00254599 0.53348
-0.00704753 -0.4311 -0.65561 -0.0033085 0.00149089 0.21203
-0.00688779 -0.70635 -0.4813 0.00513726 0.00887815 -0.7755
-0.00544714 -0.16662 0.16227 -0.0096694 0.0162863 0.21987
0.0122895 0.054079 -0.095315 0.000943435 … 0.0117019 -0.6204
0.0277002 0.73493 1.1127 0.00626607 0.0267027 0.39769
-0.0244318 -0.40104 -0.12874 -0.0130443 -0.00889063 0.062195
0.00685371 0.0041243 0.023493 -0.0203715 0.0062418 0.34639
-0.00230126 0.047944 -0.36228 0.0113611 0.0186511 0.60853
</code></pre></div></div>
<h1 id="first-model">First Model</h1>
<p>For our first model architecture we use the pre-trained embeddings and a normal dense layer.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">embedding_matrix</span> <span class="o">*</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onehotbatch</span><span class="x">(</span><span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">doc_pad_size</span><span class="o">*</span><span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="mi">2</span><span class="x">)),</span> <span class="mi">0</span><span class="o">:</span><span class="n">vocab_size</span><span class="o">-</span><span class="mi">1</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">doc_pad_size</span><span class="x">,</span> <span class="n">trunc</span><span class="x">(</span><span class="kt">Int64</span><span class="x">(</span><span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="mi">2</span><span class="x">)</span><span class="o">/</span><span class="n">doc_pad_size</span><span class="x">))),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">mean</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">dims</span><span class="o">=</span><span class="mi">2</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="o">:</span><span class="x">),</span>
<span class="n">Dense</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="mi">5</span><span class="x">),</span>
<span class="n">softmax</span>
<span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chain(getfield(Main, Symbol("##13#17"))(), getfield(Main, Symbol("##14#18"))(), getfield(Main, Symbol("##15#19"))(), getfield(Main, Symbol("##16#20"))(), Dense(300, 5), NNlib.softmax)
</code></pre></div></div>
<p><strong>Layer 1:</strong> The embedding layer. The onehotbatch multiplication ensures that the correct word vectors are used for every song in <code class="language-plaintext highlighter-rouge">x</code>. The output shape is 300x12980; i.e. all the documents are one long rolled out vector.</p>
<p><strong>Layer 2:</strong> Reshapes the output from layer into the dimensions to 300x200x649.</p>
<p><strong>Layer 3:</strong> Finds the mean vector for the song. The output shape is 300x1x649.</p>
<p><strong>Layer 4:</strong> Reshapes the output from layer 3 into a shape suitable for training 300x649.</p>
<p><strong>Layer 5:</strong> The dense training layer. The output is 5x649.</p>
<p><strong>Layer 6:</strong> Softmax to give us nice probabilities.</p>
<p>More information on this model architecture can be found in a previous post <a href="https://spcman.github.io/getting-to-know-julia/nlp/flux-embeddings-tutorial-2/">Julia Word Embedding Layer in Flux - Pre-trained GloVe</a></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_h</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_train</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_test</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="o">.==</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">y</span><span class="x">))</span>
<span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">sum</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">crossentropy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">),</span> <span class="n">y</span><span class="x">))</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Momentum</span><span class="x">(</span><span class="mf">0.2</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Momentum(0.2, 0.9, IdDict{Any,Any}())
</code></pre></div></div>
<p>Now our loss and accuracy functions are set-up lets begin training the first model.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">epoch</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">400</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">params</span><span class="x">(</span><span class="n">m</span><span class="x">),</span> <span class="n">train_set</span><span class="x">,</span> <span class="n">optimizer</span><span class="x">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">loss</span><span class="x">(</span><span class="n">X_train</span><span class="x">,</span> <span class="n">y_train</span><span class="x">)</span><span class="o">.</span><span class="n">data</span>
<span class="n">push!</span><span class="x">(</span><span class="n">loss_h</span><span class="x">,</span> <span class="n">l</span><span class="x">)</span>
<span class="n">accuracy_trn</span><span class="o">=</span><span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">X_train</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">y_train</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">accuracy_trn</span><span class="x">)</span>
<span class="n">accuracy_tst</span><span class="o">=</span><span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">X_test</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">y_test</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_test</span><span class="x">,</span> <span class="n">accuracy_tst</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"</span><span class="si">$</span><span class="s">epoch -> loss= </span><span class="si">$</span><span class="s">l accuracy train=</span><span class="si">$</span><span class="s">accuracy_trn accuracy test=</span><span class="si">$</span><span class="s">accuracy_tst"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 -> loss= 1.5928491 accuracy train=0.2218798151001541 accuracy test=0.1794871794871795
2 -> loss= 1.5850185 accuracy train=0.22033898305084745 accuracy test=0.19230769230769232
3 -> loss= 1.5755149 accuracy train=0.22496147919876733 accuracy test=0.15384615384615385
4 -> loss= 1.5658044 accuracy train=0.24345146379044685 accuracy test=0.15384615384615385
5 -> loss= 1.5568578 accuracy train=0.23728813559322035 accuracy test=0.2051282051282051
⋮
396 -> loss= 0.9313582 accuracy train=0.6764252696456087 accuracy test=0.6282051282051282
397 -> loss= 0.9309272 accuracy train=0.6764252696456087 accuracy test=0.6282051282051282
398 -> loss= 0.9304973 accuracy train=0.6764252696456087 accuracy test=0.6282051282051282
399 -> loss= 0.93006843 accuracy train=0.6764252696456087 accuracy test=0.6282051282051282
400 -> loss= 0.9296404 accuracy train=0.6764252696456087 accuracy test=0.6282051282051282
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">12</span><span class="x">,</span><span class="mi">5</span><span class="x">))</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">121</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Loss"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">loss_h</span><span class="x">)</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">122</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Accuracy"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"train"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_test</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"test"</span><span class="x">)</span>
<span class="n">legend</span><span class="x">()</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/output_59_0.png" alt="loss accuracy" /></p>
<p>We observe that the accuracy on the test data is peaking at just over 60% accuracy. Not bad but let’s try a new model.</p>
<h2 id="second-model---1d-cnn">Second Model - 1d CNN</h2>
<p>The second model uses a 1-Dimensional Convolutional Neural Network.</p>
<p>Here are two great videos to help explain the general approach and why the architecture works.</p>
<!-- Courtesy of embedresponsively.com //-->
<div class="responsive-video-container">
<iframe src="https://www.youtube-nocookie.com/embed/wNBaNhvL4pg" frameborder="0" allowfullscreen=""></iframe>
</div>
<p>Also check this out too.</p>
<!-- Courtesy of embedresponsively.com //-->
<div class="responsive-video-container">
<iframe src="https://www.youtube-nocookie.com/embed/8YsZXTpFRO0" frameborder="0" allowfullscreen=""></iframe>
</div>
<p>The training data for Flux CNNs must be in WHCN order; i.e. Width, Height, Channels and Number of items in the mini-batch.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">size</span><span class="x">(</span><span class="n">X_train</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(200, 649)
</code></pre></div></div>
<p>Presently the size of <code class="language-plaintext highlighter-rouge">X_train</code> is 200x649. We now pick a batch size and split data into mini-batches.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Base</span><span class="o">.</span><span class="n">Iterators</span><span class="o">:</span> <span class="n">repeated</span><span class="x">,</span> <span class="n">partition</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">32</span>
<span class="n">mb_idxs</span> <span class="o">=</span> <span class="n">partition</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">(</span><span class="n">X_train</span><span class="x">,</span><span class="mi">2</span><span class="x">),</span> <span class="n">batch_size</span><span class="x">)</span>
<span class="n">train_set</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">mb_idxs</span>
<span class="n">push!</span><span class="x">(</span><span class="n">train_set</span><span class="x">,</span> <span class="x">(</span><span class="n">X_train</span><span class="x">[</span><span class="o">:</span><span class="x">,</span><span class="n">i</span><span class="x">],</span> <span class="n">y_train</span><span class="x">[</span><span class="o">:</span><span class="x">,</span><span class="n">i</span><span class="x">]))</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The training set <code class="language-plaintext highlighter-rouge">train_set</code> now consists of 21 mini-batches. Each batch has 32 training (x,y) tuples with the exception of the last batch which has 9.</p>
<p>Now we build the 1d convolution model in Flux.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">embedding_matrix</span> <span class="o">*</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onehotbatch</span><span class="x">(</span><span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">doc_pad_size</span><span class="o">*</span><span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="mi">2</span><span class="x">)),</span> <span class="mi">0</span><span class="o">:</span><span class="n">vocab_size</span><span class="o">-</span><span class="mi">1</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">doc_pad_size</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="n">trunc</span><span class="x">(</span><span class="kt">Int64</span><span class="x">(</span><span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="mi">2</span><span class="x">)</span><span class="o">/</span><span class="n">doc_pad_size</span><span class="x">))),</span>
<span class="n">Conv</span><span class="x">((</span><span class="mi">300</span><span class="x">,</span><span class="mi">1</span><span class="x">),</span> <span class="mi">1</span><span class="o">=></span><span class="mi">400</span><span class="x">,</span> <span class="n">relu</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">maxpool</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="x">(</span><span class="mi">1</span><span class="x">,</span><span class="mi">300</span><span class="x">)),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="o">:</span><span class="x">,</span> <span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="mi">4</span><span class="x">)),</span>
<span class="n">Dense</span><span class="x">(</span><span class="mi">400</span><span class="x">,</span> <span class="mi">600</span><span class="x">,</span> <span class="n">relu</span><span class="x">),</span>
<span class="n">Dense</span><span class="x">(</span><span class="mi">600</span><span class="x">,</span> <span class="mi">5</span><span class="x">),</span>
<span class="n">softmax</span>
<span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chain(getfield(Main, Symbol("##13#17"))(), getfield(Main, Symbol("##14#18"))(), Conv((300, 1), 1=>400, NNlib.relu), getfield(Main, Symbol("##15#19"))(), getfield(Main, Symbol("##16#20"))(), Dense(400, 600, NNlib.relu), Dense(600, 5), NNlib.softmax)
</code></pre></div></div>
<p><strong>Layer 1 and 2</strong> handles the word embeddings as per model 1. The output shape from layer 2 (for the first batch) is 300×200×1x32.</p>
<p><strong>Layer 3</strong> Applies the 1d convolution filters. We use 400 channels to find new feature relationships. Activation is relu. The output size is 1x200x400x32.</p>
<p><strong>Layer 4</strong> Applies max pooling using a window size of 300x1. The output size is 1x1x400x32.</p>
<p><strong>Layer 5</strong> Flattens the shape to 400x32. This is now suitable for training in the next layer.</p>
<p><strong>Layer 6 & 7</strong> Dense layers with relu activation. Output after layer 7 will be 5x32.</p>
<p><strong>Layer 7</strong> Softmax to output probabilities per artist between 0 and 1</p>
<p>Whilst tuning the model I found it really useful to test the model layers on the first batch with the following command for layer 1: <code class="language-plaintext highlighter-rouge">m[1](train_set[1][1])</code> and <code class="language-plaintext highlighter-rouge">m[1:2](train_set[1][1])</code> for layers 1 to 2 and so on. To check the entire model is running obviously use the syntax below.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span><span class="x">(</span><span class="n">train_set</span><span class="x">[</span><span class="mi">1</span><span class="x">][</span><span class="mi">1</span><span class="x">])</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tracked 5×32 Array{Float32,2}:
0.64655 0.0773002 0.00207123 0.643971 … 0.000502719 0.0176868
0.121 0.891177 0.0061145 0.0336226 0.000556032 0.196564
0.0158486 0.0108271 0.96432 0.00717857 0.000501922 0.764975
0.216433 0.0196466 0.0274192 0.279381 0.998012 0.0190336
0.000169021 0.00104961 7.55584e-5 0.0358473 0.000427233 0.00174087
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_h</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_train</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_test</span><span class="o">=</span><span class="x">[]</span>
<span class="n">best_acc</span><span class="o">=</span><span class="mf">0.0</span>
<span class="n">last_improvement</span><span class="o">=</span><span class="mi">0</span>
<span class="n">stat</span><span class="o">=</span><span class="s">""</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="o">.==</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">y</span><span class="x">))</span>
<span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">sum</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">crossentropy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">),</span> <span class="n">y</span><span class="x">))</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Momentum</span><span class="x">(</span><span class="mf">0.004</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Momentum(0.004, 0.9, IdDict{Any,Any}())
</code></pre></div></div>
<p>Lets begin training the second model. Note this training loop has been modified to allow for automatic learning rate drops if the accuracy does not improve.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">epoch</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">40</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">params</span><span class="x">(</span><span class="n">m</span><span class="x">),</span> <span class="n">train_set</span><span class="x">,</span> <span class="n">optimizer</span><span class="x">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">loss</span><span class="x">(</span><span class="n">X_train</span><span class="x">,</span> <span class="n">y_train</span><span class="x">)</span><span class="o">.</span><span class="n">data</span>
<span class="n">push!</span><span class="x">(</span><span class="n">loss_h</span><span class="x">,</span> <span class="n">l</span><span class="x">)</span>
<span class="n">accuracy_trn</span> <span class="o">=</span> <span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">X_train</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">y_train</span><span class="x">)</span>
<span class="n">accuracy_tst</span> <span class="o">=</span> <span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">X_test</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">y_test</span><span class="x">)</span>
<span class="k">if</span> <span class="n">accuracy_tst</span> <span class="o">>=</span> <span class="n">best_acc</span>
<span class="n">stat</span><span class="o">=</span><span class="s">" - improvement, saving model"</span>
<span class="n">BSON</span><span class="o">.</span><span class="nd">@save</span> <span class="s">"artist_conv.bson"</span> <span class="n">m</span> <span class="n">epoch</span> <span class="n">accuracy_tst</span>
<span class="n">best_acc</span> <span class="o">=</span> <span class="n">accuracy_tst</span>
<span class="n">last_improvement</span><span class="o">=</span><span class="n">epoch</span>
<span class="k">else</span>
<span class="n">stat</span><span class="o">=</span><span class="s">" - decline"</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">epoch</span> <span class="o">-</span> <span class="n">last_improvement</span> <span class="o">>=</span> <span class="mi">5</span>
<span class="n">optimizer</span><span class="o">.</span><span class="n">eta</span> <span class="o">/=</span> <span class="mf">10.0</span>
<span class="n">stat</span><span class="o">=</span><span class="s">" - no improvements for a while, dropping learning rate by factor of 10"</span>
<span class="n">last_improvement</span> <span class="o">=</span> <span class="n">epoch</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">epoch</span> <span class="o">-</span> <span class="n">last_improvement</span> <span class="o">>=</span> <span class="mi">15</span>
<span class="n">stat</span><span class="o">=</span><span class="s">" - No improvement for 15 epochs STOPPING"</span>
<span class="n">break</span>
<span class="k">end</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">accuracy_trn</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_test</span><span class="x">,</span> <span class="n">accuracy_tst</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"</span><span class="si">$</span><span class="s">epoch -> loss= </span><span class="si">$</span><span class="s">l accuracy train=</span><span class="si">$</span><span class="s">accuracy_trn accuracy test=</span><span class="si">$</span><span class="s">accuracy_tst </span><span class="si">$</span><span class="s">stat"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 -> loss= 1.5475174 accuracy train=0.4391371340523883 accuracy test=0.3333333333333333 - improvement, saving model
2 -> loss= 1.3981959 accuracy train=0.5177195685670262 accuracy test=0.4230769230769231 - improvement, saving model
3 -> loss= 1.2068622 accuracy train=0.576271186440678 accuracy test=0.46153846153846156 - improvement, saving model
4 -> loss= 1.014195 accuracy train=0.674884437596302 accuracy test=0.5256410256410257 - improvement, saving model
5 -> loss= 0.8740304 accuracy train=0.6964560862865947 accuracy test=0.5769230769230769 - improvement, saving model
6 -> loss= 0.7762191 accuracy train=0.7134052388289677 accuracy test=0.5641025641025641 - decline
7 -> loss= 0.70594215 accuracy train=0.7195685670261941 accuracy test=0.6153846153846154 - improvement, saving model
8 -> loss= 0.63908446 accuracy train=0.7411402157164869 accuracy test=0.6153846153846154 - improvement, saving model
9 -> loss= 0.5899226 accuracy train=0.7704160246533128 accuracy test=0.6025641025641025 - decline
10 -> loss= 0.5465148 accuracy train=0.7935285053929122 accuracy test=0.6025641025641025 - decline
11 -> loss= 0.58453256 accuracy train=0.7796610169491526 accuracy test=0.5769230769230769 - decline
12 -> loss= 1.08682 accuracy train=0.6332819722650231 accuracy test=0.5641025641025641 - decline
13 -> loss= 0.84157795 accuracy train=0.687211093990755 accuracy test=0.5897435897435898 - no improvements for a while, dropping learning rate by factor of 10
14 -> loss= 1.0953864 accuracy train=0.6409861325115562 accuracy test=0.5256410256410257 - decline
15 -> loss= 0.38960773 accuracy train=0.9029275808936826 accuracy test=0.6153846153846154 - improvement, saving model
16 -> loss= 0.3409845 accuracy train=0.9229583975346687 accuracy test=0.6282051282051282 - improvement, saving model
17 -> loss= 0.3086353 accuracy train=0.9291217257318952 accuracy test=0.6666666666666666 - improvement, saving model
18 -> loss= 0.2871488 accuracy train=0.938366718027735 accuracy test=0.6410256410256411 - decline
19 -> loss= 0.27186963 accuracy train=0.9414483821263482 accuracy test=0.6410256410256411 - decline
20 -> loss= 0.25938293 accuracy train=0.9460708782742681 accuracy test=0.6410256410256411 - decline
21 -> loss= 0.24849279 accuracy train=0.9460708782742681 accuracy test=0.6410256410256411 - decline
22 -> loss= 0.23899835 accuracy train=0.9491525423728814 accuracy test=0.6538461538461539 - no improvements for a while, dropping learning rate by factor of 10
23 -> loss= 0.23771816 accuracy train=0.9506933744221879 accuracy test=0.6666666666666666 - improvement, saving model
24 -> loss= 0.23479633 accuracy train=0.9522342064714946 accuracy test=0.6538461538461539 - decline
25 -> loss= 0.2335223 accuracy train=0.9522342064714946 accuracy test=0.6538461538461539 - decline
26 -> loss= 0.2326564 accuracy train=0.9506933744221879 accuracy test=0.6538461538461539 - decline
27 -> loss= 0.23182735 accuracy train=0.9491525423728814 accuracy test=0.6538461538461539 - decline
28 -> loss= 0.23101975 accuracy train=0.9491525423728814 accuracy test=0.6538461538461539 - no improvements for a while, dropping learning rate by factor of 10
29 -> loss= 0.23071226 accuracy train=0.9522342064714946 accuracy test=0.6538461538461539 - decline
30 -> loss= 0.23060969 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
31 -> loss= 0.23052704 accuracy train=0.9522342064714946 accuracy test=0.6538461538461539 - decline
32 -> loss= 0.23044708 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
33 -> loss= 0.2303679 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - no improvements for a while, dropping learning rate by factor of 10
34 -> loss= 0.2303387 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
35 -> loss= 0.23032849 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
36 -> loss= 0.23032042 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
37 -> loss= 0.23031256 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
38 -> loss= 0.23030475 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - no improvements for a while, dropping learning rate by factor of 10
39 -> loss= 0.23030235 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
40 -> loss= 0.23030187 accuracy train=0.9537750385208013 accuracy test=0.6538461538461539 - decline
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">12</span><span class="x">,</span><span class="mi">5</span><span class="x">))</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">121</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Loss"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">loss_h</span><span class="x">)</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">122</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Accuracy"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"train"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_test</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"test"</span><span class="x">)</span>
<span class="n">legend</span><span class="x">()</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/output_76_0.png" alt="loss accuracy song lyrics" /></p>
<p>An improvement of about 4% compared to model 1.</p>
<h2 id="load-the-best-model">Load the best model</h2>
<p>You may have noticed we were saving the model as we went in the training loop if there was an accuracy improvement. The next line of code loads our best model. This step negates the need to re-run the training loop every time we run the notebook. Training can take a few minutes to run on a CPU.</p>
<p>The next line of code loads our trained Flux model.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">BSON</span><span class="o">.</span><span class="nd">@load</span> <span class="s">"artist_conv.bson"</span> <span class="n">m</span>
</code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>The model nearly got to 70% accuracy. With a little more perseverance I think I could have got there. The steps I had in mind to improve accuracy were</p>
<ul>
<li>
<p>Study and make updates to the out of vocabulary words.</p>
</li>
<li>
<p>Data augmentation and balance of training examples</p>
</li>
</ul>
<p>I might come back to this another day….</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Function to return the artist name based on the index 'a' </span>
<span class="k">function</span><span class="nf"> artist_name</span><span class="x">(</span><span class="n">a</span><span class="x">)</span>
<span class="n">i</span><span class="o">=</span><span class="n">findfirst</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">==</span><span class="n">a</span><span class="x">,</span> <span class="n">artist_dict</span><span class="x">)</span>
<span class="k">end</span>
<span class="n">artist_name</span><span class="x">(</span><span class="mi">1</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"Queen"
</code></pre></div></div>
<h3 id="put-yourself-to-the-test">Put yourself to the test</h3>
<p>Update <code class="language-plaintext highlighter-rouge">i</code> between 1 and 78 and put yourself to the test with the next three cells</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/contenders.png" alt="ai song artist contenders" /></p>
<h4 id="who-wrote-this-song">Who wrote this song?</h4>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">i</span><span class="o">=</span><span class="mi">5</span>
<span class="n">replace</span><span class="x">(</span><span class="n">df</span><span class="x">[</span><span class="mi">649</span><span class="o">+</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="n">text</span><span class="x">],</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span> <span class="o">=></span> <span class="s">" "</span><span class="x">)</span> <span class="c"># 649 is the test/train split</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Puff Intro] Yeah The old school To the new school Bad Boy, remix, let's go [Black Rob] Like that Black gon'
slide with Mike Jack Puff done remixed one hell of a track Put me on it I wanna know How many want it? Damn, it feels
good to see people love on it For those who love slow down 'Member Motown had a brotha' happy as shit I mean the whole
sound Bangin' and catch six-four since we was shorties Fee owes now rebooked from California Carry 40's but I 'member
them times in '79 When I first started to rhyme Sometimes I gots to look back at what it was The good old days The
triple o'shays when there was love I want you back But I can't grab that far It's how it is When you're living like a
star, bad boy Come on, let's go [Mj] When I had you to myself I didn't want you around Those pretty faces Always
made you Stand out in a crowd But someone picked you from the bunch When love was all it took Now it's much too late for
me To take a second look Oh baby, give me one more chance (To show you that I love you) Won't you please let me (Back
in your heart) Oh, darlin' I was blind to let you go (Let you go baby) But now since I see you in his arms (I want you back)
Oh, I do now (I want you back) Oh, oh, baby (I want you back) Yeah, yeah, yeah, yeah (I want you back) Nah,
nah, nah, nah Trying to live without your love Is one long sleepless night Let me show you girl That I know wrong
from right Every street you walk on I lay tear stains on the ground Following the girl I didn't even want you
around Let me tell ya now Oh baby all I need is one more chance (To show you that I love you) Won't you please let
me (Back in your heart) Oh darlin' I was blind to let you go (Let you go baby) But now since I see you in his arms (I
want you back) [Black Rob] It's just like Jermain Jackson Tito, Mike and Marlon Only think on my mind now is
stardom Blowin' the F-up My game's stepped up 'Member when Mike and them First came to record Singin' hits like
Skywriter My Girl, People Make The World Go 'Round Mama's Pearl, Can't Loose it Joyful jukebox music Never Can Say
Goodbye That's why we use it It's money honey So I gots to be there And I'm be yo Sugar Daddy Say it's real
Versachi chair, pd, life of the party Bad Boy, make joys for everbody Jackson 5 Chorus in background while: [Puff Daddy]
From the old to the new Come on Motown Rock on Yeah, yeah, yeah, yeah [Jackson 5 Chorus until fade]
</code></pre></div></div>
<h4 id="pause-and-think--here-is-the-answer">Pause and think! Here is the answer.</h4>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="x">[</span><span class="mi">649</span><span class="o">+</span><span class="n">i</span><span class="x">,[</span><span class="o">:</span><span class="n">artist</span><span class="x">,</span> <span class="o">:</span><span class="n">song</span><span class="x">]]</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/dataframe2.png" alt="dataframe" /></p>
<h4 id="this-is-the-prediction-that-model-gave">This is the prediction that model gave.</h4>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">artist_name</span><span class="x">(</span><span class="n">test_predictions</span><span class="x">[</span><span class="n">i</span><span class="x">])</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"Eminem"
</code></pre></div></div>
<p>OK so our model got it wrong, but may be you did too?</p>
<p>Although this one was labelled ‘Michael Jackson’ it was in the dataset as a rap remix of the his song with lyrics from P. Didy and Black Rob so I still think Eminem was the best prediction.</p>
<h3 id="confusion-matrix">Confusion Matrix</h3>
<p>The confusion matrix shows where the model predictions were correct (the diagonal) and where they failed (the other cells).</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">MLBase</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cm</span><span class="o">=</span><span class="n">confusmat</span><span class="x">(</span><span class="mi">5</span><span class="x">,</span><span class="n">test_predictions</span><span class="x">,</span> <span class="n">test_actual</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>5×5 Array{Int64,2}:
8 3 2 2 2
1 9 1 2 1
1 2 10 1 0
2 3 2 14 0
0 0 0 1 11
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">labels</span><span class="o">=</span><span class="x">[</span><span class="n">artist_name</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="k">for</span> <span class="n">x</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">artists</span><span class="x">)]</span>
<span class="n">cmap</span><span class="o">=</span><span class="n">get_cmap</span><span class="x">(</span><span class="s">"Blues"</span><span class="x">)</span>
<span class="n">cax</span><span class="o">=</span><span class="n">matshow</span><span class="x">(</span><span class="n">cm</span><span class="x">)</span>
<span class="n">imshow</span><span class="x">(</span><span class="n">cm</span><span class="x">,</span> <span class="n">interpolation</span><span class="o">=</span><span class="s">"nearest"</span><span class="x">,</span> <span class="n">cmap</span><span class="o">=</span><span class="n">cmap</span><span class="x">)</span>
<span class="n">colorbar</span><span class="x">()</span>
<span class="n">xticks</span><span class="x">(</span><span class="n">collect</span><span class="x">(</span><span class="mi">0</span><span class="o">:</span><span class="mi">4</span><span class="x">),</span> <span class="n">labels</span><span class="x">,</span> <span class="n">rotation</span><span class="o">=</span><span class="mi">45</span><span class="x">)</span>
<span class="n">yticks</span><span class="x">(</span><span class="n">collect</span><span class="x">(</span><span class="mi">0</span><span class="o">:</span><span class="mi">4</span><span class="x">),</span> <span class="n">labels</span><span class="x">)</span>
<span class="n">xlabel</span><span class="x">(</span><span class="s">"Actual"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Prediction"</span><span class="x">)</span>
<span class="n">show</span><span class="x">()</span>
</code></pre></div></div>
<p>A deeper blue means more certainty.</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj004/output_92_0.png" alt="song artist confusion chart" /></p>
<h3 id="all-predictions">All Predictions</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">test_predictions</span><span class="o">=</span><span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">X_test</span><span class="x">))</span>
<span class="n">test_actual</span><span class="o">=</span><span class="n">Flux</span><span class="o">.</span><span class="n">onecold</span><span class="x">(</span><span class="n">y_test</span><span class="x">)</span>
<span class="n">showall</span><span class="x">(</span><span class="n">DataFrame</span><span class="x">(</span><span class="n">Actual</span> <span class="o">=</span> <span class="n">artist_name</span><span class="o">.</span><span class="x">(</span><span class="n">test_actual</span><span class="x">);</span> <span class="n">Prediction</span> <span class="o">=</span> <span class="n">artist_name</span><span class="o">.</span><span class="x">(</span><span class="n">test_predictions</span><span class="x">)))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>78×2 DataFrame
│ Row │ Actual │ Prediction │
│ │ [90mString[39m │ [90mString[39m │
├─────┼─────────────────┼─────────────────┤
│ 1 │ Queen │ Queen │
│ 2 │ Eminem │ INXS │
│ 3 │ Michael Jackson │ Michael Jackson │
│ 4 │ The Beatles │ The Beatles │
│ 5 │ Michael Jackson │ Eminem │
│ 6 │ INXS │ INXS │
│ 7 │ Eminem │ Eminem │
│ 8 │ Eminem │ Eminem │
│ 9 │ Eminem │ Queen │
│ 10 │ The Beatles │ The Beatles │
│ 11 │ The Beatles │ The Beatles │
│ 12 │ The Beatles │ Michael Jackson │
│ 13 │ INXS │ Queen │
│ 14 │ The Beatles │ The Beatles │
│ 15 │ Michael Jackson │ Michael Jackson │
│ 16 │ Queen │ Queen │
│ 17 │ Michael Jackson │ Michael Jackson │
│ 18 │ INXS │ INXS │
│ 19 │ Michael Jackson │ Michael Jackson │
│ 20 │ Michael Jackson │ Michael Jackson │
│ 21 │ Michael Jackson │ Michael Jackson │
│ 22 │ Eminem │ Eminem │
│ 23 │ INXS │ INXS │
│ 24 │ INXS │ INXS │
│ 25 │ Queen │ Queen │
│ 26 │ Michael Jackson │ Michael Jackson │
│ 27 │ The Beatles │ Queen │
│ 28 │ Eminem │ Eminem │
│ 29 │ INXS │ Michael Jackson │
│ 30 │ The Beatles │ Michael Jackson │
│ 31 │ Michael Jackson │ The Beatles │
│ 32 │ Queen │ Queen │
│ 33 │ Michael Jackson │ Michael Jackson │
│ 34 │ Michael Jackson │ INXS │
│ 35 │ INXS │ Queen │
│ 36 │ Michael Jackson │ Michael Jackson │
│ 37 │ Queen │ Queen │
│ 38 │ INXS │ Michael Jackson │
│ 39 │ INXS │ INXS │
│ 40 │ Eminem │ Queen │
│ 41 │ The Beatles │ The Beatles │
│ 42 │ INXS │ INXS │
│ 43 │ The Beatles │ The Beatles │
│ 44 │ Michael Jackson │ Michael Jackson │
│ 45 │ Michael Jackson │ Michael Jackson │
│ 46 │ INXS │ Michael Jackson │
│ 47 │ The Beatles │ The Beatles │
│ 48 │ INXS │ The Beatles │
│ 49 │ Eminem │ Eminem │
│ 50 │ Eminem │ Eminem │
│ 51 │ Michael Jackson │ Michael Jackson │
│ 52 │ INXS │ The Beatles │
│ 53 │ The Beatles │ The Beatles │
│ 54 │ Eminem │ Eminem │
│ 55 │ Queen │ Michael Jackson │
│ 56 │ Michael Jackson │ INXS │
│ 57 │ Queen │ Queen │
│ 58 │ Eminem │ Eminem │
│ 59 │ Eminem │ Eminem │
│ 60 │ Queen │ Michael Jackson │
│ 61 │ INXS │ INXS │
│ 62 │ INXS │ Queen │
│ 63 │ INXS │ INXS │
│ 64 │ Queen │ Queen │
│ 65 │ Michael Jackson │ Michael Jackson │
│ 66 │ Queen │ INXS │
│ 67 │ Eminem │ Eminem │
│ 68 │ Eminem │ Eminem │
│ 69 │ Queen │ The Beatles │
│ 70 │ Queen │ Queen │
│ 71 │ The Beatles │ The Beatles │
│ 72 │ The Beatles │ Queen │
│ 73 │ The Beatles │ The Beatles │
│ 74 │ Michael Jackson │ Michael Jackson │
│ 75 │ The Beatles │ INXS │
│ 76 │ Michael Jackson │ Queen │
│ 77 │ Michael Jackson │ Queen │
│ 78 │ INXS │ INXS │
</code></pre></div></div>
<p>Let me know if anything could be improved.</p>Nigel AdamsCan we predict the artist of the song given the lyrics?Julia Project - Monte Carlo Simulation for Investment Portfolio Earnings2019-09-12T00:00:00+00:002019-09-12T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/monte%20carlo/monte-carlo-investment-earnings<h2 id="introduction">Introduction</h2>
<p>In this notebook we use Julia to look at typical investment risk profiles and employ the Monte Carlo method with Geometric Brownian motion (GBM) to simulate the growth of an investment portfolio.</p>
<p>The simulations do not take into account any ongoing payments or tax. Nor do they encompass other factors such as inflation.</p>
<p>Note, do not rely on any part of this article for your own personal circumstances. This is not financial advice!</p>
<p>With the disclaimer out of the way let’s begin as usual by loading the Julia libraries we’ll need.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">DataFrames</span><span class="x">,</span> <span class="n">CSV</span><span class="x">,</span> <span class="n">Distributions</span><span class="x">,</span> <span class="n">PyPlot</span><span class="x">,</span> <span class="n">Dates</span><span class="x">,</span> <span class="n">Statistics</span><span class="x">,</span> <span class="n">StatsFuns</span>
</code></pre></div></div>
<p>Now let’s load in our risk profile data. Users of the financial software XPLAN will recognise headings used in this dataframe. We only actually need the data from the ‘Total’ column being the overall expected growth (Growth + Income) and ‘StdDev’ which is the risk profile’s standard deviation.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="o">=</span><span class="n">CSV</span><span class="o">.</span><span class="n">read</span><span class="x">(</span><span class="s">"/mnt/juliabox/Monte Carlo/assumptions.csv"</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/RiskProfileDataframe.PNG" alt="risk profile dataframe" /></p>
<h2 id="what-is-a-risk-profile">What is a Risk Profile</h2>
<p>A risk profile is an evaluation of an individual’s willingness and ability to take risks. Financial Advisers often fit client’s into one of several risk profiles after asking them discovery questions. The risk profile names and values above have been made up but they are indicative of real values. The first risk profile ‘Defensive’ is made up from 15% growth assets and 85% defensive assets; this risk profile would suit a cautious investor who wants to make steady progress without taking too much risk. At the other end of the table a ‘Very Aggressive’ risk profile is made up from 100% growth assets and would suit an individual who is more willing to take a risk to gain higher returns.</p>
<p>The function below plots a normal distribution curve of a given risk profile.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> plot_rp</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span>
<span class="n">μ</span> <span class="o">=</span> <span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">Total</span>
<span class="n">σ</span> <span class="o">=</span> <span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">StdDev</span>
<span class="n">dist</span> <span class="o">=</span> <span class="n">Normal</span><span class="x">(</span><span class="n">μ</span><span class="x">,</span> <span class="n">σ</span><span class="x">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">μ</span> <span class="o">-</span> <span class="mi">3</span><span class="n">σ</span> <span class="o">:</span> <span class="mf">0.01</span> <span class="o">:</span> <span class="n">μ</span> <span class="o">+</span> <span class="mi">3</span><span class="n">σ</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">pdf</span><span class="x">(</span><span class="n">dist</span><span class="x">,</span><span class="n">x</span><span class="x">),</span> <span class="n">label</span><span class="o">=</span><span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">ProfileName</span><span class="x">)</span>
<span class="n">legend</span><span class="x">(</span><span class="n">loc</span><span class="o">=</span><span class="s">"upper right"</span><span class="x">,</span> <span class="n">fontsize</span> <span class="o">=</span> <span class="s">"small"</span><span class="x">)</span>
<span class="n">axis</span><span class="x">([</span><span class="o">-</span><span class="mi">30</span><span class="x">,</span><span class="mi">50</span><span class="x">,</span><span class="mi">0</span><span class="x">,</span><span class="mf">0.12</span><span class="x">])</span>
<span class="n">title</span><span class="x">(</span><span class="s">"Normal Distribution"</span><span class="x">)</span>
<span class="n">axvline</span><span class="x">(</span><span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="x">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"k"</span><span class="x">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">"--"</span><span class="x">)</span>
<span class="n">xlabel</span><span class="x">(</span><span class="s">"Annual Growth (%)"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"PDF"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Let’s plot all the curves and then interpret the output.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">rp</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">df</span><span class="x">)</span>
<span class="n">plot_rp</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_7_0.png" alt="risk profile normal distribution" /></p>
<h2 id="interpretation">Interpretation</h2>
<ul>
<li>
<p>The vertical dotted line shows the boundary of positive growth (i.e. making money) vs negative growth (i.e. losing money).</p>
</li>
<li>
<p>The first and least risky investment profile is ‘Defensive’. You can observe that probability of achieving the mean total growth of 4.2% is the highest and most of the bell curve area is in the positive growth area.</p>
</li>
<li>
<p>The last and most risky investment profile is ’Very Aggressive’. You can observe that the probability of achieving the mean total growth of 7.27% is the lowest. The elongated bell curve shape means there is scope to earn much higher returns at the expense of possible negative returns.</p>
</li>
</ul>
<p>For more information on Probability Distributions <a href="https://en.wikipedia.org/wiki/Probability_distribution">click here</a></p>
<h2 id="deterministic-prediction-function">Deterministic Prediction Function</h2>
<p>The following function makes a deterministic prediction of the future portfolio value based on the following parameters: -</p>
<p>P is the original principal sum</p>
<p>r is the nominal annual interest rate</p>
<p>n is the compounding frequency</p>
<p>t is the overall length of time the interest is applied (expressed using the same time units as r, usually years).</p>
<p>This prediction assumes no additional contributions and perfect market conditions. For more information see this article on <a href="https://en.wikipedia.org/wiki/Compound_interest">Periodic Compounding</a>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">deterministic_predict</span><span class="x">(</span><span class="n">P</span><span class="x">,</span> <span class="n">r</span> <span class="x">,</span> <span class="n">n</span><span class="x">,</span> <span class="n">t</span><span class="x">)</span> <span class="o">=</span> <span class="n">P</span><span class="o">*</span><span class="x">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">r</span><span class="o">/</span><span class="n">n</span><span class="x">)</span><span class="o">^</span><span class="x">(</span><span class="n">n</span><span class="o">*</span><span class="n">t</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Example 1 from wikipedia as a first sanity check</span>
<span class="c">#Suppose a principal amount of $1,500 is deposited in a bank paying an annual interest rate of 4.3%, compounded quarterly.</span>
<span class="c">#Then the balance after 6 years is found by using the formula above, with P = 1500, r = 0.043 (4.3%), n = 4, and t = 6:</span>
<span class="n">deterministic_predict</span><span class="x">(</span><span class="mf">1500.0</span> <span class="x">,</span> <span class="mf">0.043</span><span class="x">,</span> <span class="mf">4.0</span><span class="x">,</span> <span class="mf">6.0</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1938.8368221341054
</code></pre></div></div>
<p>Now let’s apply this function to a retirement saving scenario. Our client is age 40 and wants to retire in 20 years’ time. They currently have $100,000 in their retirement portfolio. What will their balance be like at age 60?</p>
<p>First let’s set a couple of variables and functions that will come in useful.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">original_principle_sum</span><span class="o">=</span><span class="mi">100000</span> <span class="c"># Initial portfolio value</span>
<span class="n">interest_rate</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span> <span class="o">=</span> <span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">Total</span><span class="o">/</span><span class="mi">100</span> <span class="c"># Simple function to get a risk profile's growth interest rate</span>
</code></pre></div></div>
<p>Here’s another test output of the function for risk profile 1 (Defensive).
For this test let’s assume the interest compounds monthly (12 times a year for each of the 20 years).</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">deterministic_predict</span><span class="x">(</span><span class="n">original_principle_sum</span><span class="x">,</span> <span class="n">interest_rate</span><span class="x">(</span><span class="mi">1</span><span class="x">),</span> <span class="mi">12</span><span class="x">,</span> <span class="mi">20</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>231297.23315323537
</code></pre></div></div>
<p>By using the Moneysmart <a href="https://www.moneysmart.gov.au/tools-and-resources/calculators-and-apps/compound-interest-calculator">Compound Interest Calculator</a> as a second sanity check we can see our deterministic function is working.</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/moneysmart.PNG" alt="money smart" /></p>
<h2 id="stochastic-prediction-function">Stochastic Prediction Function</h2>
<p>The reality with real share portfolios is that unit prices fluctuate up and down on a daily basis. Price fluctuations are generally more volatile for stocks that have the potential to earn more income for the investor. The function below uses <a href="https://en.wikipedia.org/wiki/Geometric_Brownian_motion">Geometric Brownian motion</a> (GBM) to simulate randomised returns based on the given risk profiles. Additional parameters are built into the function to repeat the GBM simulations over-and-over to generate what is known as a Monte Carlo experiment.</p>
<p>Here are some animated gifs showing 20 simulations per risk profile.</p>
<p>The animated gifs below show 20 simulations per risk profile. We can see that as we take more risk the simulations become more volatile.</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp1.gif" alt="risk profile animated gif" /></p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp2.gif" alt="risk profile animated gif" /></p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp3.gif" alt="risk profile animated gif" /></p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp4.gif" alt="risk profile animated gif" /></p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp5.gif" alt="risk profile animated gif" /></p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/rp6.gif" alt="risk profile animated gif" /></p>
<p>These gifs were generated with the functions below. Let’s take a closer look at the code used. we start by setting up the known variables and add a few useful functions at the same time.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">age</span><span class="o">=</span><span class="mi">40</span> <span class="c"># Age at start of projections</span>
<span class="n">frequency</span> <span class="o">=</span> <span class="mi">252</span> <span class="c"># Assume 252 trading days per year</span>
<span class="n">days</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="n">frequency</span> <span class="c"># Convenient way to express days</span>
<span class="n">yrs_to_days</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">=</span><span class="n">x</span><span class="o">*</span><span class="n">frequency</span> <span class="c"># Simple function to convert years to days</span>
<span class="n">sigma</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span> <span class="o">=</span> <span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">StdDev</span><span class="o">/</span><span class="mi">100</span> <span class="c"># Simple function to get a risk profile's standard deviation</span>
</code></pre></div></div>
<p>Now we build the Monte Carlo function. Calling the function produces a matplotlib (PyPlot) chart based on the input parameters. I’ve included some comments in the code but if you need to more depth insight I recommend <a href="https://www.youtube.com/watch?v=3gcLRU24-w0&t=208s">this great video</a> which gave me the math needed.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> montecarlo</span><span class="x">(</span><span class="n">rp</span><span class="x">,</span> <span class="n">N</span><span class="x">,</span> <span class="n">iterations</span><span class="x">,</span> <span class="n">show_Q</span><span class="x">)</span>
<span class="c"># rp = the index of the risk profile to use</span>
<span class="c"># N = Number of years forward to project</span>
<span class="c"># iterations - no of times to iterate and produce a simulation</span>
<span class="c"># show_Q - if True, so quantile lines</span>
<span class="n">growth</span> <span class="o">=</span> <span class="n">interest_rate</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span>
<span class="c"># Periodic Daily Return (PDR)</span>
<span class="n">pdr</span> <span class="o">=</span> <span class="n">log</span><span class="x">(</span><span class="n">deterministic_predict</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span> <span class="n">growth</span><span class="x">,</span> <span class="n">frequency</span><span class="x">,</span> <span class="mi">2</span><span class="o">*</span><span class="n">days</span><span class="x">)</span> <span class="o">/</span> <span class="n">deterministic_predict</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span> <span class="n">growth</span><span class="x">,</span> <span class="n">frequency</span><span class="x">,</span> <span class="mi">1</span><span class="o">*</span><span class="n">days</span><span class="x">))</span>
<span class="n">pdr_std</span> <span class="o">=</span> <span class="n">sigma</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span> <span class="o">*</span> <span class="n">sqrt</span><span class="x">(</span><span class="n">days</span><span class="x">)</span>
<span class="n">pdr_var</span> <span class="o">=</span> <span class="n">pdr_std</span><span class="o">^</span><span class="mi">2</span>
<span class="n">drift</span> <span class="o">=</span> <span class="n">pdr</span> <span class="o">-</span> <span class="x">(</span><span class="n">pdr_var</span><span class="o">/</span><span class="mi">2</span><span class="x">)</span>
<span class="n">predictions_all</span><span class="o">=</span><span class="x">[]</span>
<span class="n">axis</span><span class="x">([</span><span class="mi">40</span><span class="x">,</span><span class="mi">60</span><span class="x">,</span><span class="mi">0</span><span class="x">,</span><span class="mi">800000</span><span class="x">])</span>
<span class="n">title</span><span class="x">(</span><span class="n">df</span><span class="x">[</span><span class="n">rp</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="o">.</span><span class="n">ProfileName</span><span class="x">)</span>
<span class="n">xlabel</span><span class="x">(</span><span class="s">"Age"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Portfolio Value"</span><span class="x">)</span>
<span class="k">for</span> <span class="n">s</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">iterations</span>
<span class="n">predictions</span><span class="o">=</span><span class="x">[]</span>
<span class="kd">global</span> <span class="n">df_pred</span><span class="o">=</span><span class="n">DataFrame</span><span class="x">(</span><span class="n">Age</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span> <span class="n">MC_Price</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span><span class="n">MC_Balance</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span> <span class="n">Deterministic_Balance</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[])</span>
<span class="n">last_price</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">:</span><span class="n">yrs_to_days</span><span class="x">(</span><span class="n">N</span><span class="x">)</span>
<span class="n">i</span><span class="o">==</span><span class="mi">0</span> <span class="o">?</span> <span class="n">mc_price</span><span class="o">=</span><span class="mi">1</span> <span class="o">:</span> <span class="n">mc_price</span><span class="o">=</span><span class="n">last_price</span><span class="o">*</span><span class="n">exp</span><span class="x">(</span><span class="n">drift</span><span class="o">+</span><span class="n">pdr_std</span><span class="o">*</span><span class="n">norminvccdf</span><span class="x">(</span><span class="n">rand</span><span class="x">()))</span>
<span class="n">push!</span><span class="x">(</span><span class="n">df_pred</span><span class="x">,</span> <span class="x">[</span><span class="n">age</span><span class="o">+</span><span class="n">i</span><span class="o">*</span><span class="n">days</span><span class="x">,</span>
<span class="n">mc_price</span><span class="x">,</span>
<span class="n">original_principle_sum</span><span class="o">*</span><span class="n">mc_price</span><span class="x">,</span>
<span class="n">deterministic_predict</span><span class="x">(</span><span class="n">original_principle_sum</span><span class="x">,</span> <span class="n">growth</span><span class="x">,</span> <span class="n">frequency</span><span class="x">,</span> <span class="n">i</span><span class="o">*</span><span class="n">days</span><span class="x">)])</span>
<span class="n">push!</span><span class="x">(</span><span class="n">predictions</span><span class="x">,</span> <span class="n">original_principle_sum</span><span class="o">*</span><span class="n">mc_price</span><span class="x">)</span>
<span class="n">last_price</span> <span class="o">=</span> <span class="n">mc_price</span>
<span class="k">end</span>
<span class="n">s</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">?</span> <span class="n">predictions_all</span> <span class="o">=</span> <span class="n">predictions</span> <span class="o">:</span> <span class="n">predictions_all</span> <span class="o">=</span> <span class="n">hcat</span><span class="x">(</span><span class="n">predictions_all</span><span class="x">,</span> <span class="n">predictions</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">df_pred</span><span class="x">[</span><span class="o">:</span><span class="n">Age</span><span class="x">],</span> <span class="n">df_pred</span><span class="x">[</span><span class="o">:</span><span class="n">MC_Balance</span><span class="x">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"#B8BFC5"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Monte Carlo Iteration"</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">show_Q</span>
<span class="c">#Show quantile predictions</span>
<span class="n">df_Q</span> <span class="o">=</span> <span class="n">DataFrame</span><span class="x">(</span><span class="n">Age</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span> <span class="n">Q1</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span> <span class="n">Q5</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[],</span> <span class="n">Q9</span> <span class="o">=</span> <span class="kt">Float64</span><span class="x">[])</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">yrs_to_days</span><span class="x">(</span><span class="n">N</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">df_Q</span><span class="x">,</span> <span class="x">[</span><span class="n">age</span><span class="o">+</span><span class="n">i</span><span class="o">*</span><span class="n">days</span><span class="x">,</span>
<span class="n">quantile</span><span class="x">(</span><span class="n">predictions_all</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="mf">0.1</span><span class="x">),</span>
<span class="n">quantile</span><span class="x">(</span><span class="n">predictions_all</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="mf">0.5</span><span class="x">),</span>
<span class="n">quantile</span><span class="x">(</span><span class="n">predictions_all</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">],</span><span class="mf">0.9</span><span class="x">)])</span>
<span class="k">end</span>
<span class="n">plot</span><span class="x">(</span><span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Age</span><span class="x">],</span> <span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Q1</span><span class="x">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"r"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"10th Percentile"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Age</span><span class="x">],</span> <span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Q5</span><span class="x">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"b"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"50th Percentile"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Age</span><span class="x">],</span> <span class="n">df_Q</span><span class="x">[</span><span class="o">:</span><span class="n">Q9</span><span class="x">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"g"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"90th Percentile"</span><span class="x">)</span>
<span class="k">else</span>
<span class="n">plot</span><span class="x">(</span><span class="n">df_pred</span><span class="x">[</span><span class="o">:</span><span class="n">Age</span><span class="x">],</span> <span class="n">df_pred</span><span class="x">[</span><span class="o">:</span><span class="n">Deterministic_Balance</span><span class="x">],</span> <span class="n">color</span><span class="o">=</span><span class="s">"b"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Deterministic Prediction"</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The following code was used to produce a sequence of PNG image files that I later used to create the animated gifs above. I used a free app for the Mac called <a href="https://apps.apple.com/au/app/picgif-lite/id844918735?mt=12">PicGIF lite</a> to generate the final animated gifs.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">PyCall</span>
<span class="nd">@pyimport</span> <span class="n">matplotlib</span><span class="o">.</span><span class="n">animation</span> <span class="n">as</span> <span class="n">anim</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span> <span class="o">=</span> <span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">5</span><span class="x">,</span><span class="mi">4</span><span class="x">))</span>
<span class="k">for</span> <span class="n">rp</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">df</span><span class="x">)</span>
<span class="n">withfig</span><span class="x">(</span><span class="n">fig</span><span class="x">)</span> <span class="k">do</span>
<span class="k">for</span> <span class="n">k</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">20</span>
<span class="n">clf</span><span class="x">()</span>
<span class="n">montecarlo</span><span class="x">(</span><span class="n">rp</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="nb">false</span><span class="x">)</span>
<span class="n">savefig</span><span class="x">(</span><span class="s">"rp_"</span> <span class="o">*</span> <span class="n">string</span><span class="x">(</span><span class="n">rp</span><span class="x">)</span> <span class="o">*</span> <span class="s">"_"</span> <span class="o">*</span> <span class="n">string</span><span class="x">(</span><span class="n">k</span><span class="x">),</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">"tight"</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>By running many simulations (see grey lines below) we can take the mean and quantiles each of each day’s simulations and after a while we start to see deterministic predictions emerging. The area between green and blue can be interpreted as ‘good’ market conditions. The area between the blue and the red would be ‘bad’ market conditions.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Terminal command line to zip up the PNG files.</span>
<span class="c"># zip rp.zip rp*</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_26_0.png" alt="risk profile monte carlo" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">2</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_27_0.png" alt="risk profile monte carlo" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">3</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_28_0.png" alt="risk profile monte carlo" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">4</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_29_0.png" alt="risk profile monte carlo" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">5</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_30_0.png" alt="risk profile monte carlo" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">montecarlo</span><span class="x">(</span><span class="mi">6</span><span class="x">,</span> <span class="mi">20</span><span class="x">,</span> <span class="mi">100</span><span class="x">,</span> <span class="nb">true</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj005/output_31_0.png" alt="risk profile monte carlo" /></p>Nigel AdamsA look at Monte Carlo with Geometric Brownian motion (GBM)Julia Flux Convolutional Neural Network Explained2019-09-01T00:00:00+00:002019-09-01T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/deep-learning/vision/flux-cnn-zoo<p>In this blog post we’ll breakdown the convolutional neural network (CNN) demo given in the <a href="https://github.com/FluxML/model-zoo/blob/master/vision/mnist/conv.jl">Flux Model Zoo</a>. We’ll pay most attention to the CNN model build-up and will skip over some of the data preparation and training code.</p>
<p>The objective is to train a CNN to recognize hand-written digits using the famous MNIST dataset.</p>
<p>If you are new to CNN’s I recommend watching all the videos below to obtain the concepts needed to understand this post. Note, some of the videos dive into Kera’s coding but it’s actually very comparable to Flux.</p>
<p><a href="https://www.youtube.com/watch?v=YRhxdVk_sIs">Convolutional Neural Networks (CNNs) explained</a></p>
<p><a href="https://www.youtube.com/watch?v=qSTv_m-KFk0&t=611s">Zero Padding in Convolutional Neural Networks explained</a></p>
<p><a href="https://www.youtube.com/watch?v=ZjM_XQa5s6s&t=407s">Max Pooling in Convolutional Neural Networks explained</a></p>
<p><a href="https://www.youtube.com/watch?v=U4WB9p6ODjM">Batch Size in a Neural Network explained</a></p>
<p>OK, we’ve got the concepts let’s dive into the Flux example. The first block of code prepares the data for training.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Classifies MNIST digits with a convolutional network.</span>
<span class="c"># Writes out saved model to the file "mnist_conv.bson".</span>
<span class="c"># Demonstrates basic model construction, training, saving,</span>
<span class="c"># conditional early-exit, and learning rate scheduling.</span>
<span class="c">#</span>
<span class="c"># This model, while simple, should hit around 99% test</span>
<span class="c"># accuracy after training for approximately 20 epochs.</span>
<span class="k">using</span> <span class="n">Flux</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Data</span><span class="o">.</span><span class="n">MNIST</span><span class="x">,</span> <span class="n">Statistics</span>
<span class="k">using</span> <span class="n">Flux</span><span class="o">:</span> <span class="n">onehotbatch</span><span class="x">,</span> <span class="n">onecold</span><span class="x">,</span> <span class="n">crossentropy</span><span class="x">,</span> <span class="n">throttle</span>
<span class="k">using</span> <span class="n">Base</span><span class="o">.</span><span class="n">Iterators</span><span class="o">:</span> <span class="n">repeated</span><span class="x">,</span> <span class="n">partition</span>
<span class="k">using</span> <span class="n">Printf</span><span class="x">,</span> <span class="n">BSON</span>
<span class="c"># Load labels and images from Flux.Data.MNIST</span>
<span class="nd">@info</span><span class="x">(</span><span class="s">"Loading data set"</span><span class="x">)</span>
<span class="n">train_labels</span> <span class="o">=</span> <span class="n">MNIST</span><span class="o">.</span><span class="n">labels</span><span class="x">()</span>
<span class="n">train_imgs</span> <span class="o">=</span> <span class="n">MNIST</span><span class="o">.</span><span class="n">images</span><span class="x">()</span>
<span class="c"># Bundle images together with labels and group into minibatchess</span>
<span class="k">function</span><span class="nf"> make_minibatch</span><span class="x">(</span><span class="n">X</span><span class="x">,</span> <span class="n">Y</span><span class="x">,</span> <span class="n">idxs</span><span class="x">)</span>
<span class="n">X_batch</span> <span class="o">=</span> <span class="kt">Array</span><span class="x">{</span><span class="kt">Float32</span><span class="x">}(</span><span class="nb">undef</span><span class="x">,</span> <span class="n">size</span><span class="x">(</span><span class="n">X</span><span class="x">[</span><span class="mi">1</span><span class="x">])</span><span class="o">...</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="n">length</span><span class="x">(</span><span class="n">idxs</span><span class="x">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">idxs</span><span class="x">)</span>
<span class="n">X_batch</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="o">:</span><span class="x">,</span> <span class="o">:</span><span class="x">,</span> <span class="n">i</span><span class="x">]</span> <span class="o">=</span> <span class="kt">Float32</span><span class="o">.</span><span class="x">(</span><span class="n">X</span><span class="x">[</span><span class="n">idxs</span><span class="x">[</span><span class="n">i</span><span class="x">]])</span>
<span class="k">end</span>
<span class="n">Y_batch</span> <span class="o">=</span> <span class="n">onehotbatch</span><span class="x">(</span><span class="n">Y</span><span class="x">[</span><span class="n">idxs</span><span class="x">],</span> <span class="mi">0</span><span class="o">:</span><span class="mi">9</span><span class="x">)</span>
<span class="k">return</span> <span class="x">(</span><span class="n">X_batch</span><span class="x">,</span> <span class="n">Y_batch</span><span class="x">)</span>
<span class="k">end</span>
<span class="n">batch_size</span> <span class="o">=</span> <span class="mi">128</span>
<span class="n">mb_idxs</span> <span class="o">=</span> <span class="n">partition</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">train_imgs</span><span class="x">),</span> <span class="n">batch_size</span><span class="x">)</span>
<span class="n">train_set</span> <span class="o">=</span> <span class="x">[</span><span class="n">make_minibatch</span><span class="x">(</span><span class="n">train_imgs</span><span class="x">,</span> <span class="n">train_labels</span><span class="x">,</span> <span class="n">i</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">mb_idxs</span><span class="x">]</span>
<span class="c"># Prepare test set as one giant minibatch:</span>
<span class="n">test_imgs</span> <span class="o">=</span> <span class="n">MNIST</span><span class="o">.</span><span class="n">images</span><span class="x">(</span><span class="o">:</span><span class="n">test</span><span class="x">)</span>
<span class="n">test_labels</span> <span class="o">=</span> <span class="n">MNIST</span><span class="o">.</span><span class="n">labels</span><span class="x">(</span><span class="o">:</span><span class="n">test</span><span class="x">)</span>
<span class="n">test_set</span> <span class="o">=</span> <span class="n">make_minibatch</span><span class="x">(</span><span class="n">test_imgs</span><span class="x">,</span> <span class="n">test_labels</span><span class="x">,</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">test_imgs</span><span class="x">))</span>
</code></pre></div></div>
<p>Let’s pause here to look at how the training and test data has been arranged. As usual in Flux the training data is arranged as a tuple of x training data and y labels. Let’s verify.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">typeof</span><span class="x">(</span><span class="n">train_set</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array{Tuple{Array{Float32,4},Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}},1}
</code></pre></div></div>
<p>We see that the x part of the tuple is a 4 dimensional Float32 array and the y part is a Flux.OneHotVector.</p>
<p>Let’s take a look at the size of first training batch.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">size</span><span class="x">(</span><span class="n">train_set</span><span class="x">[</span><span class="mi">1</span><span class="x">][</span><span class="mi">1</span><span class="x">])</span> <span class="c"># training data Float32</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(28, 28, 1, 128)
</code></pre></div></div>
<p>It is important to note these dimensions are arranged in WHCN order standing for Width, Height, Channels and Number (of batches).</p>
<p>So as expected for MNIST, each image is W=28 pixels x H=28 pixels.</p>
<p>C = 1 as there is only one channel for the grey scale intensity.</p>
<p>N=128 as the batch size.</p>
<p>Now let’s have a look at the size of the first batch of y labels.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">size</span><span class="x">(</span><span class="n">train_set</span><span class="x">[</span><span class="mi">1</span><span class="x">][</span><span class="mi">2</span><span class="x">])</span> <span class="c"># OneHotVector labels</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(10, 128)
</code></pre></div></div>
<p>Each OneHotVector in the batch encodes the labelled digit; i.e. whether it is 1 through to 10. You can see the first OneHotVector in the first batch with the following code.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">train_set</span><span class="x">[</span><span class="mi">1</span><span class="x">][</span><span class="mi">2</span><span class="x">][</span><span class="o">:</span><span class="x">,</span><span class="mi">1</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>10-element Flux.OneHotVector:
false
false
false
false
false
true
false
false
false
false
</code></pre></div></div>
<h2 id="flux-cnn-model-explained">Flux CNN Model Explained</h2>
<p>Here’s the next block of code from the model zoo that we’re mostly interested in: -</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Define our model. We will use a simple convolutional architecture with</span>
<span class="c"># three iterations of Conv -> ReLU -> MaxPool, followed by a final Dense</span>
<span class="c"># layer that feeds into a softmax probability output.</span>
<span class="nd">@info</span><span class="x">(</span><span class="s">"Constructing model..."</span><span class="x">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span>
<span class="c"># First convolution, operating upon a 28x28 image</span>
<span class="n">Conv</span><span class="x">((</span><span class="mi">3</span><span class="x">,</span> <span class="mi">3</span><span class="x">),</span> <span class="mi">1</span><span class="o">=></span><span class="mi">16</span><span class="x">,</span> <span class="n">pad</span><span class="o">=</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">),</span> <span class="n">relu</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">maxpool</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="x">(</span><span class="mi">2</span><span class="x">,</span><span class="mi">2</span><span class="x">)),</span>
<span class="c"># Second convolution, operating upon a 14x14 image</span>
<span class="n">Conv</span><span class="x">((</span><span class="mi">3</span><span class="x">,</span> <span class="mi">3</span><span class="x">),</span> <span class="mi">16</span><span class="o">=></span><span class="mi">32</span><span class="x">,</span> <span class="n">pad</span><span class="o">=</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">),</span> <span class="n">relu</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">maxpool</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="x">(</span><span class="mi">2</span><span class="x">,</span><span class="mi">2</span><span class="x">)),</span>
<span class="c"># Third convolution, operating upon a 7x7 image</span>
<span class="n">Conv</span><span class="x">((</span><span class="mi">3</span><span class="x">,</span> <span class="mi">3</span><span class="x">),</span> <span class="mi">32</span><span class="o">=></span><span class="mi">32</span><span class="x">,</span> <span class="n">pad</span><span class="o">=</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">),</span> <span class="n">relu</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">maxpool</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="x">(</span><span class="mi">2</span><span class="x">,</span><span class="mi">2</span><span class="x">)),</span>
<span class="c"># Reshape 3d tensor into a 2d one, at this point it should be (3, 3, 32, N)</span>
<span class="c"># which is where we get the 288 in the `Dense` layer below:</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="o">:</span><span class="x">,</span> <span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="mi">4</span><span class="x">)),</span>
<span class="n">Dense</span><span class="x">(</span><span class="mi">288</span><span class="x">,</span> <span class="mi">10</span><span class="x">),</span>
<span class="c"># Finally, softmax to get nice probabilities</span>
<span class="n">softmax</span><span class="x">,</span>
<span class="x">)</span>
</code></pre></div></div>
<h3 id="layer-1">Layer 1</h3>
<p><code class="language-plaintext highlighter-rouge">Conv((3, 3), 1=>16, pad=(1,1), relu),</code></p>
<p>The first layer can be broken down as follows: -</p>
<p><code class="language-plaintext highlighter-rouge">(3,3)</code> is the convolution filter size (3x3) that will slide over the image detecting new features.</p>
<p><code class="language-plaintext highlighter-rouge">1=>16</code> is the network input and output size. The input size is 1 recalling that one batch is of size 28x28x1x128. The output size is 16 meaning we’ll create 16 new channels for every training digit in the batch.</p>
<p><code class="language-plaintext highlighter-rouge">pad=(1,1)</code> This pads a single layer of zeros around the images meaning that the dimensions of the convolution output can remain at 28x28.</p>
<p><code class="language-plaintext highlighter-rouge">relu</code> is our activation function.</p>
<p>The output from this layer only can be viewed with <code class="language-plaintext highlighter-rouge">model[1](train_set[1][1])</code> and has the dimensions 28×28×16×128.</p>
<h3 id="layer-2">Layer 2</h3>
<p><code class="language-plaintext highlighter-rouge">x -> maxpool(x, (2,2)),</code></p>
<p>Convolutional layers are generally followed by a maxpool layer. In our case the parameter <code class="language-plaintext highlighter-rouge">(2,2)</code> is the window size that slides over x reducing it to half the size whilst retaining the most important feature information for learning.</p>
<p>The output from this layer only can be viewed with <code class="language-plaintext highlighter-rouge">model[1:2](train_set[1][1])</code> and has the output dimensions 14×14×16×128.</p>
<h3 id="layer-3">Layer 3</h3>
<p><code class="language-plaintext highlighter-rouge">Conv((3, 3), 16=>32, pad=(1,1), relu),</code></p>
<p>This is the second convolution operating on the output from layer 2.</p>
<p><code class="language-plaintext highlighter-rouge">Conv((3, 3),</code> is the same filter size as before.</p>
<p><code class="language-plaintext highlighter-rouge">16=>32</code> This time the input is 16 (from layer 2). The output size of the layer will be 32.</p>
<p>The padding, filter size and activation remains the same as before.</p>
<p>The output from this layer only can be viewed with <code class="language-plaintext highlighter-rouge">model[1:3](train_set[1][1])</code> and has the output dimensions 14×14×32×128.</p>
<h3 id="layer-4">Layer 4</h3>
<p><code class="language-plaintext highlighter-rouge">x -> maxpool(x, (2,2)),</code></p>
<p>Maxpool reduces the dimensionality in half again whilst retaining the most important feature information for learning.</p>
<p>The output from this layer only can be viewed with <code class="language-plaintext highlighter-rouge">model[1:4](train_set[1][1])</code> and has the output dimensions 7×7×32×128.</p>
<h3 id="layers-5--6">Layers 5 & 6</h3>
<p><code class="language-plaintext highlighter-rouge">Conv((3, 3), 32=>32, pad=(1,1), relu),</code></p>
<p><code class="language-plaintext highlighter-rouge">x -> maxpool(x, (2,2)),</code></p>
<p>Perform a final convolution and maxpool. The output from layer 6 is 3×3×32×128</p>
<h3 id="layer-7">Layer 7</h3>
<p><code class="language-plaintext highlighter-rouge">x -> reshape(x, :, size(x, 4)),</code></p>
<p>The reshape layer effectively flattens the data from 4-dimensions to 2-dimensions suitable for the dense layer and training.</p>
<p>The output from this layer only can be viewed with <code class="language-plaintext highlighter-rouge">model[1:7](train_set[1][1])</code> and has the output dimensions 288×128. If you’re wondering where 288 comes from, it is determined by multiplying the output of layer 6; i.e. 3x3x32.</p>
<h3 id="layer-8">Layer 8</h3>
<p><code class="language-plaintext highlighter-rouge">Dense(288, 10),</code></p>
<p>Our final training layer takes the input of 288 and outputs a size of 10x128.</p>
<p>(10 for 10 digits 0-9)</p>
<h3 id="layer-9">Layer 9</h3>
<p><code class="language-plaintext highlighter-rouge">softmax,</code></p>
<p>Outputs probabilities between 0 and 1 of which digit the model has predicted.</p>
<p>The remainder of the code is pasted below for completeness.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Load model and datasets onto GPU, if enabled</span>
<span class="n">train_set</span> <span class="o">=</span> <span class="n">gpu</span><span class="o">.</span><span class="x">(</span><span class="n">train_set</span><span class="x">)</span>
<span class="n">test_set</span> <span class="o">=</span> <span class="n">gpu</span><span class="o">.</span><span class="x">(</span><span class="n">test_set</span><span class="x">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">gpu</span><span class="x">(</span><span class="n">model</span><span class="x">)</span>
<span class="c"># Make sure our model is nicely precompiled before starting our training loop</span>
<span class="n">model</span><span class="x">(</span><span class="n">train_set</span><span class="x">[</span><span class="mi">1</span><span class="x">][</span><span class="mi">1</span><span class="x">])</span>
<span class="c"># `loss()` calculates the crossentropy loss between our prediction `y_hat`</span>
<span class="c"># (calculated from `model(x)`) and the ground truth `y`. We augment the data</span>
<span class="c"># a bit, adding gaussian random noise to our image to make it more robust.</span>
<span class="k">function</span><span class="nf"> loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="c"># We augment `x` a little bit here, adding in random noise</span>
<span class="n">x_aug</span> <span class="o">=</span> <span class="n">x</span> <span class="o">.+</span> <span class="mf">0.1f0</span><span class="o">*</span><span class="n">gpu</span><span class="x">(</span><span class="n">randn</span><span class="x">(</span><span class="n">eltype</span><span class="x">(</span><span class="n">x</span><span class="x">),</span> <span class="n">size</span><span class="x">(</span><span class="n">x</span><span class="x">)))</span>
<span class="n">y_hat</span> <span class="o">=</span> <span class="n">model</span><span class="x">(</span><span class="n">x_aug</span><span class="x">)</span>
<span class="k">return</span> <span class="n">crossentropy</span><span class="x">(</span><span class="n">y_hat</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="k">end</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">onecold</span><span class="x">(</span><span class="n">model</span><span class="x">(</span><span class="n">x</span><span class="x">))</span> <span class="o">.==</span> <span class="n">onecold</span><span class="x">(</span><span class="n">y</span><span class="x">))</span>
<span class="c"># Train our model with the given training set using the ADAM optimizer and</span>
<span class="c"># printing out performance against the test set as we go.</span>
<span class="n">opt</span> <span class="o">=</span> <span class="n">ADAM</span><span class="x">(</span><span class="mf">0.001</span><span class="x">)</span>
<span class="nd">@info</span><span class="x">(</span><span class="s">"Beginning training loop..."</span><span class="x">)</span>
<span class="n">best_acc</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="n">last_improvement</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">epoch_idx</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">100</span>
<span class="kd">global</span> <span class="n">best_acc</span><span class="x">,</span> <span class="n">last_improvement</span>
<span class="c"># Train for a single epoch</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">params</span><span class="x">(</span><span class="n">model</span><span class="x">),</span> <span class="n">train_set</span><span class="x">,</span> <span class="n">opt</span><span class="x">)</span>
<span class="c"># Calculate accuracy:</span>
<span class="n">acc</span> <span class="o">=</span> <span class="n">accuracy</span><span class="x">(</span><span class="n">test_set</span><span class="o">...</span><span class="x">)</span>
<span class="nd">@info</span><span class="x">(</span><span class="nd">@sprintf</span><span class="x">(</span><span class="s">"[%d]: Test accuracy: %.4f"</span><span class="x">,</span> <span class="n">epoch_idx</span><span class="x">,</span> <span class="n">acc</span><span class="x">))</span>
<span class="c"># If our accuracy is good enough, quit out.</span>
<span class="k">if</span> <span class="n">acc</span> <span class="o">>=</span> <span class="mf">0.999</span>
<span class="nd">@info</span><span class="x">(</span><span class="s">" -> Early-exiting: We reached our target accuracy of 99.9%"</span><span class="x">)</span>
<span class="n">break</span>
<span class="k">end</span>
<span class="c"># If this is the best accuracy we've seen so far, save the model out</span>
<span class="k">if</span> <span class="n">acc</span> <span class="o">>=</span> <span class="n">best_acc</span>
<span class="nd">@info</span><span class="x">(</span><span class="s">" -> New best accuracy! Saving model out to mnist_conv.bson"</span><span class="x">)</span>
<span class="n">BSON</span><span class="o">.</span><span class="nd">@save</span> <span class="s">"mnist_conv.bson"</span> <span class="n">model</span> <span class="n">epoch_idx</span> <span class="n">acc</span>
<span class="n">best_acc</span> <span class="o">=</span> <span class="n">acc</span>
<span class="n">last_improvement</span> <span class="o">=</span> <span class="n">epoch_idx</span>
<span class="k">end</span>
<span class="c"># If we haven't seen improvement in 5 epochs, drop our learning rate:</span>
<span class="k">if</span> <span class="n">epoch_idx</span> <span class="o">-</span> <span class="n">last_improvement</span> <span class="o">>=</span> <span class="mi">5</span> <span class="o">&&</span> <span class="n">opt</span><span class="o">.</span><span class="n">eta</span> <span class="o">></span> <span class="mf">1e-6</span>
<span class="n">opt</span><span class="o">.</span><span class="n">eta</span> <span class="o">/=</span> <span class="mf">10.0</span>
<span class="nd">@warn</span><span class="x">(</span><span class="s">" -> Haven't improved in a while, dropping learning rate to </span><span class="si">$</span><span class="s">(opt.eta)!"</span><span class="x">)</span>
<span class="c"># After dropping learning rate, give it a few epochs to improve</span>
<span class="n">last_improvement</span> <span class="o">=</span> <span class="n">epoch_idx</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">epoch_idx</span> <span class="o">-</span> <span class="n">last_improvement</span> <span class="o">>=</span> <span class="mi">10</span>
<span class="nd">@warn</span><span class="x">(</span><span class="s">" -> We're calling this converged."</span><span class="x">)</span>
<span class="n">break</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p><a href="https://www.linkedin.com/pulse/creating-deep-neural-network-model-learn-handwritten-digits-mike-gold/">Need more help?, try this article by Mike Gold</a></p>Nigel AdamsTaming the CNN vision example in the Flux Model ZooJulia Word Embedding Layer in Flux - Self Trained2019-08-25T00:00:00+00:002019-08-25T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/deep-learning/nlp/flux-embeddings-tutorial-1<p>In this example we take a look at how to use an embedding layer in Julia with Flux. If you need help on what embeddings are check out <a href="https://spcman.github.io/getting-to-know-julia/nlp/word-embeddings/">this page</a> and then return here to see how we can use them as the first layer in a neural network.</p>
<p>The objective for this exercise is to machine learn the sentiment of 10 string arrays. The idea came from this <a href="https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/">tutorial written by Jason Brownlee</a> who used Keras on a similar dataset.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Languages</span><span class="x">,</span> <span class="n">TextAnalysis</span><span class="x">,</span> <span class="n">Flux</span><span class="x">,</span> <span class="n">PyPlot</span><span class="x">,</span> <span class="n">Statistics</span>
<span class="c">#Display Flux Version</span>
<span class="k">import</span> <span class="n">Pkg</span> <span class="x">;</span> <span class="n">Pkg</span><span class="o">.</span><span class="n">installed</span><span class="x">()[</span><span class="s">"Flux"</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>v"0.7.2"
</code></pre></div></div>
<h2 id="data-preparation">Data Preparation</h2>
<p>The first block of code defines our training ‘documents’ and labels (y).</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Arr</span> <span class="o">=</span> <span class="x">[</span><span class="s">"well done"</span><span class="x">,</span>
<span class="s">"good work"</span><span class="x">,</span>
<span class="s">"great effort"</span><span class="x">,</span>
<span class="s">"nice work"</span><span class="x">,</span>
<span class="s">"excellent"</span><span class="x">,</span>
<span class="s">"weak"</span><span class="x">,</span>
<span class="s">"poor effort"</span><span class="x">,</span>
<span class="s">"not good"</span><span class="x">,</span>
<span class="s">"poor work"</span><span class="x">,</span>
<span class="s">"could have done better"</span><span class="x">]</span>
<span class="c"># positve or negative sentiment to each 'document' string</span>
<span class="n">y</span> <span class="o">=</span> <span class="x">[</span><span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span><span class="x">]</span>
</code></pre></div></div>
<p>Next we set up a dictionary of words used. Each word points to an integer index. To do this the TextAnalysis package was used. If you’re interested in how this works watch <a href="https://www.youtube.com/watch?v=f7RNuOLDyM8&t=4838s">this video</a>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">docs</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">Arr</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">docs</span><span class="x">,</span> <span class="n">StringDocument</span><span class="x">(</span><span class="n">Arr</span><span class="x">[</span><span class="n">i</span><span class="x">]))</span>
<span class="k">end</span>
<span class="n">crps</span><span class="o">=</span><span class="n">Corpus</span><span class="x">(</span><span class="n">docs</span><span class="x">)</span>
<span class="n">update_lexicon!</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">doc_term_matrix</span><span class="o">=</span><span class="n">DocumentTermMatrix</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">word_dict</span><span class="o">=</span><span class="n">doc_term_matrix</span><span class="o">.</span><span class="n">column_indices</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Dict{String,Int64} with 14 entries:
"done" => 3
"not" => 10
"excellent" => 5
"have" => 8
"well" => 13
"work" => 14
"nice" => 9
"effort" => 4
"great" => 7
"poor" => 11
"could" => 2
"better" => 1
"good" => 6
"weak" => 12
</code></pre></div></div>
<p>The following function returns the index of the word in the word dictionary and returns 0 if the word is not found.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tk_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">haskey</span><span class="x">(</span><span class="n">word_dict</span><span class="x">,</span> <span class="n">s</span><span class="x">)</span> <span class="o">?</span> <span class="n">i</span><span class="o">=</span><span class="n">word_dict</span><span class="x">[</span><span class="n">s</span><span class="x">]</span> <span class="o">:</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span>
</code></pre></div></div>
<p>The following function is used to ensure each document in the corpus has an equal length.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> pad_corpus</span><span class="x">(</span><span class="n">c</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">)</span>
<span class="n">M</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">doc</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">c</span><span class="x">)</span>
<span class="n">tks</span> <span class="o">=</span> <span class="n">tokens</span><span class="x">(</span><span class="n">c</span><span class="x">[</span><span class="n">doc</span><span class="x">])</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o">>=</span><span class="n">pad_size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="n">pad_size</span><span class="x">]]</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o"><</span><span class="n">pad_size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">zeros</span><span class="x">(</span><span class="kt">Int64</span><span class="x">,</span><span class="n">pad_size</span><span class="o">-</span><span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">))</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">tk_indexes</span><span class="x">,</span> <span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">])</span>
<span class="k">end</span>
<span class="n">doc</span><span class="o">==</span><span class="mi">1</span> <span class="o">?</span> <span class="n">M</span><span class="o">=</span><span class="n">tk_indexes</span><span class="err">'</span> <span class="o">:</span> <span class="n">M</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">M</span><span class="x">,</span> <span class="n">tk_indexes</span><span class="err">'</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">M</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The final step in our data preparation creates a dense matrix where the numbers greater than zero relate to a word. As the maximum document length is 4 (i.e. “could have done better”) we will use the pad size of 4. The matrix is transposed ready for training whereby each column represents one document.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pad_size</span><span class="o">=</span><span class="mi">4</span>
<span class="n">padded_docs</span> <span class="o">=</span> <span class="n">pad_corpus</span><span class="x">(</span><span class="n">crps</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">padded_docs</span><span class="err">'</span>
<span class="n">data</span> <span class="o">=</span> <span class="x">[(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 0 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 0 8
13 6 7 9 0 0 11 10 11 3
3 14 4 14 5 12 4 6 14 1
</code></pre></div></div>
<h2 id="flux-embedding-preparation">Flux Embedding Preparation</h2>
<p>Next let’s get ready for the embedding layer. In this example we’ll learn 8 features per word but for a larger corpus you’ll probably need a higher dimension, perhaps even 300. The vocab size is set to 20 which is higher than the maximum index in our dictionary.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">N</span> <span class="o">=</span> <span class="n">size</span><span class="x">(</span><span class="n">padded_docs</span><span class="x">,</span><span class="mi">1</span><span class="x">)</span> <span class="c">#Number of documents (10)</span>
<span class="n">max_features</span> <span class="o">=</span> <span class="mi">8</span>
<span class="n">vocab_size</span> <span class="o">=</span> <span class="mi">20</span>
</code></pre></div></div>
<p>The next block of code sets up a Julia constructor called EmbeddingLayer. The layer is initialized with a special random initializer called glorot_normal. Also note Flux must be set to treelike otherwise it will not update/learn the embeddings.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span><span class="nc"> EmbeddingLayer</span>
<span class="n">W</span>
<span class="n">EmbeddingLayer</span><span class="x">(</span><span class="n">mf</span><span class="x">,</span> <span class="n">vs</span><span class="x">)</span> <span class="o">=</span> <span class="n">new</span><span class="x">(</span><span class="n">param</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">glorot_normal</span><span class="x">(</span><span class="n">mf</span><span class="x">,</span> <span class="n">vs</span><span class="x">)))</span>
<span class="k">end</span>
<span class="nd">@Flux.treelike</span> <span class="n">EmbeddingLayer</span>
<span class="x">(</span><span class="n">m</span><span class="o">::</span><span class="n">EmbeddingLayer</span><span class="x">)(</span><span class="n">x</span><span class="x">)</span> <span class="o">=</span> <span class="n">m</span><span class="o">.</span><span class="n">W</span> <span class="o">*</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onehotbatch</span><span class="x">(</span><span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">pad_size</span><span class="o">*</span><span class="n">N</span><span class="x">),</span> <span class="mi">0</span><span class="o">:</span><span class="n">vocab_size</span><span class="o">-</span><span class="mi">1</span><span class="x">)</span>
</code></pre></div></div>
<h2 id="buliding-the-model-and-training">Buliding the Model and Training</h2>
<p>The model needs some explanation.</p>
<p><strong>Layer 1.</strong> As x is fed into the model, the first layer’s embedding function matches the words in each document to corresponding word vectors. This is done by rolling all the word vectors one after the other and using onehotbatch to filter out the unwanted words. The output is a 8x40 array.</p>
<p><strong>Layer 2</strong>. Unrolls the vectors into the shape 8x4x10; i.e. 8 features and 10 documents of padded size 4.</p>
<p><strong>Layer 3.</strong> Now that our data is in the shape provided by layer 2 we can sum the word vectors to get an overall ‘meaning’ vector for each document. The output is now in the shape size of 8 x 1 x 10.</p>
<p><strong>Layer 4:</strong> Drops an axis so that the shape of x is a size suitable for training. After this step the shape is 8x10.</p>
<p><strong>Layer 5:</strong> is a normal Dense layer with the sigmoid activation function to give us nice probabilities.</p>
<p>If you’d like to see each layer in action I recommend using <code class="language-plaintext highlighter-rouge">m[1](x)</code> to see sample output from the first layer. <code class="language-plaintext highlighter-rouge">m[1:2](x)</code> to see output from the second layer and so on.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">EmbeddingLayer</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="n">vocab_size</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">,</span> <span class="n">N</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">sum</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">dims</span><span class="o">=</span><span class="mi">2</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">N</span><span class="x">),</span>
<span class="n">Dense</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="n">σ</span><span class="x">)</span>
<span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chain(EmbeddingLayer(Float32[0.278128 0.111989 … -0.244614 -0.377189; 0.0647178 0.0683725 … -0.112626 -0.434706; … ; 0.397401 0.407925 … 0.438091 0.0588613; -0.361919 -0.114776 … -0.356307 -0.10119] (tracked)), getfield(Main, Symbol("##3#6"))(), getfield(Main, Symbol("##4#7"))(), getfield(Main, Symbol("##5#8"))(), Dense(8, 1, NNlib.σ))
</code></pre></div></div>
<p>Now let’s initialize some arrays and create a function to calculate accuracy.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_h</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_train</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">x</span> <span class="o">.==</span> <span class="n">y</span><span class="x">)</span>
</code></pre></div></div>
<p>As this is a binary (1 or 0) classification problem we need to use binarycrossentropy to calculate the loss. The optimizer is gradient descent.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">sum</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">binarycrossentropy</span><span class="o">.</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">),</span> <span class="n">y</span><span class="x">))</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Descent</span><span class="x">(</span><span class="mf">0.01</span><span class="x">)</span>
</code></pre></div></div>
<p>Train the model.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">epoch</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">400</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">params</span><span class="x">(</span><span class="n">m</span><span class="x">),</span> <span class="n">data</span><span class="x">,</span> <span class="n">optimizer</span><span class="x">)</span>
<span class="c">#println(loss(x, y).data, " ", accuracy(m(x).>0.5,y))</span>
<span class="n">push!</span><span class="x">(</span><span class="n">loss_h</span><span class="x">,</span> <span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">,</span><span class="n">y</span><span class="x">))</span>
<span class="k">end</span>
<span class="n">println</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">)</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">,</span><span class="n">y</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Bool[true false true true true false false false false false]
0.9
</code></pre></div></div>
<p>Outputs over 0.5 are considered positive (true) and our final accuracy is 90%.</p>
<p>The second example is incorrectly scored as false. I think this is because the words “good” and “work” also appear in the negative examples. Next we’ll see what happens using the pre-trained word embeddings.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">12</span><span class="x">,</span><span class="mi">5</span><span class="x">))</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">121</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Loss"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">loss_h</span><span class="x">)</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">122</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Accuracy"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"train"</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj003/output_26_0.png" alt="loss accuracy" /></p>
<p>Note, some parts of this could be done more elegantly, let me know.</p>Nigel AdamsA simple example of a word embedding layer with Flux (not pre-trained)Julia Word Embedding Layer in Flux - Pre-trained GloVe2019-08-25T00:00:00+00:002019-08-25T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/deep-learning/nlp/flux-embeddings-tutorial-2<p>This example follows on from <a href="https://spcman.github.io/getting-to-know-julia/nlp/flux-embeddings-tutorial-1/">tutorial #1</a> in which we trained our own embedding layer. This time we use pre-trained word vectors (GloVe) instead of learning them. We’ll skip over some of the explanations as this is covered in tutorial #1.</p>
<p>As before, the objective for this exercise is to machine learn the sentiment of 10 string arrays. The idea came from this <a href="https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/">tutorial written by Jason Brownlee</a> who used Keras on a similar dataset.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Languages</span><span class="x">,</span> <span class="n">TextAnalysis</span><span class="x">,</span> <span class="n">Flux</span><span class="x">,</span> <span class="n">PyPlot</span><span class="x">,</span> <span class="n">Statistics</span>
<span class="c">#Display Flux Version</span>
<span class="k">import</span> <span class="n">Pkg</span> <span class="x">;</span> <span class="n">Pkg</span><span class="o">.</span><span class="n">installed</span><span class="x">()[</span><span class="s">"Flux"</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>v"0.7.2"
</code></pre></div></div>
<h2 id="data-preparation">Data Preparation</h2>
<p>This code block is the same as <a href="https://spcman.github.io/getting-to-know-julia/nlp/flux-embeddings-tutorial-1/">tutorial #1</a>. See this for more explanation.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Arr</span> <span class="o">=</span> <span class="x">[</span><span class="s">"well done"</span><span class="x">,</span>
<span class="s">"good work"</span><span class="x">,</span>
<span class="s">"great effort"</span><span class="x">,</span>
<span class="s">"nice work"</span><span class="x">,</span>
<span class="s">"excellent"</span><span class="x">,</span>
<span class="s">"weak"</span><span class="x">,</span>
<span class="s">"poor effort"</span><span class="x">,</span>
<span class="s">"not good"</span><span class="x">,</span>
<span class="s">"poor work"</span><span class="x">,</span>
<span class="s">"could have done better"</span><span class="x">]</span>
<span class="c"># positve or negative sentiment to each 'document' string</span>
<span class="n">y</span> <span class="o">=</span> <span class="x">[</span><span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">true</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span> <span class="nb">false</span><span class="x">]</span>
<span class="n">docs</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">Arr</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">docs</span><span class="x">,</span> <span class="n">StringDocument</span><span class="x">(</span><span class="n">Arr</span><span class="x">[</span><span class="n">i</span><span class="x">]))</span>
<span class="k">end</span>
<span class="n">crps</span><span class="o">=</span><span class="n">Corpus</span><span class="x">(</span><span class="n">docs</span><span class="x">)</span>
<span class="n">update_lexicon!</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">doc_term_matrix</span><span class="o">=</span><span class="n">DocumentTermMatrix</span><span class="x">(</span><span class="n">crps</span><span class="x">)</span>
<span class="n">word_dict</span><span class="o">=</span><span class="n">doc_term_matrix</span><span class="o">.</span><span class="n">column_indices</span>
<span class="n">tk_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">haskey</span><span class="x">(</span><span class="n">word_dict</span><span class="x">,</span> <span class="n">s</span><span class="x">)</span> <span class="o">?</span> <span class="n">i</span><span class="o">=</span><span class="n">word_dict</span><span class="x">[</span><span class="n">s</span><span class="x">]</span> <span class="o">:</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span>
<span class="k">function</span><span class="nf"> pad_corpus</span><span class="x">(</span><span class="n">c</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">)</span>
<span class="n">M</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">doc</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">c</span><span class="x">)</span>
<span class="n">tks</span> <span class="o">=</span> <span class="n">tokens</span><span class="x">(</span><span class="n">c</span><span class="x">[</span><span class="n">doc</span><span class="x">])</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o">>=</span><span class="n">pad_size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">[</span><span class="mi">1</span><span class="o">:</span><span class="n">pad_size</span><span class="x">]]</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">)</span><span class="o"><</span><span class="n">pad_size</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">zeros</span><span class="x">(</span><span class="kt">Int64</span><span class="x">,</span><span class="n">pad_size</span><span class="o">-</span><span class="n">length</span><span class="x">(</span><span class="n">tks</span><span class="x">))</span>
<span class="n">tk_indexes</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">tk_indexes</span><span class="x">,</span> <span class="x">[</span><span class="n">tk_idx</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">tks</span><span class="x">])</span>
<span class="k">end</span>
<span class="n">doc</span><span class="o">==</span><span class="mi">1</span> <span class="o">?</span> <span class="n">M</span><span class="o">=</span><span class="n">tk_indexes</span><span class="err">'</span> <span class="o">:</span> <span class="n">M</span><span class="o">=</span><span class="n">vcat</span><span class="x">(</span><span class="n">M</span><span class="x">,</span> <span class="n">tk_indexes</span><span class="err">'</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">M</span>
<span class="k">end</span>
<span class="n">pad_size</span><span class="o">=</span><span class="mi">4</span>
<span class="n">padded_docs</span> <span class="o">=</span> <span class="n">pad_corpus</span><span class="x">(</span><span class="n">crps</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">padded_docs</span><span class="err">'</span>
<span class="n">data</span> <span class="o">=</span> <span class="x">[(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)]</span>
</code></pre></div></div>
<h2 id="flux-embedding-preparation">Flux Embedding Preparation</h2>
<h3 id="load-the-pre-trained-embeddings">Load the pre-trained embeddings</h3>
<p>This function loads the pre-trained GloVe embeddings. Try Embeddings.jl for a better way to do this if you can get it to work.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> load_embeddings</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span>
<span class="kd">local</span> <span class="n">LL</span><span class="x">,</span> <span class="n">indexed_words</span><span class="x">,</span> <span class="n">index</span>
<span class="n">indexed_words</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">String</span><span class="x">}()</span>
<span class="n">LL</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Float32</span><span class="x">}}()</span>
<span class="n">open</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span> <span class="k">do</span> <span class="n">f</span>
<span class="n">index</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">line</span> <span class="k">in</span> <span class="n">eachline</span><span class="x">(</span><span class="n">f</span><span class="x">)</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">split</span><span class="x">(</span><span class="n">line</span><span class="x">)</span>
<span class="n">word</span> <span class="o">=</span> <span class="n">xs</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="n">push!</span><span class="x">(</span><span class="n">indexed_words</span><span class="x">,</span> <span class="n">word</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">LL</span><span class="x">,</span> <span class="n">parse</span><span class="o">.</span><span class="x">(</span><span class="kt">Float32</span><span class="x">,</span> <span class="n">xs</span><span class="x">[</span><span class="mi">2</span><span class="o">:</span><span class="k">end</span><span class="x">]))</span>
<span class="n">index</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">reduce</span><span class="x">(</span><span class="n">hcat</span><span class="x">,</span> <span class="n">LL</span><span class="x">),</span> <span class="n">indexed_words</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We’ll use one of the smaller embedding files (glove.6B.50d.txt) as this problem is trivial. This file can be downloaded <a href="https://nlp.stanford.edu/projects/glove/">from here</a> and must reside in the current working folder.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embeddings</span><span class="x">,</span> <span class="n">vocab</span> <span class="o">=</span> <span class="n">load_embeddings</span><span class="x">(</span><span class="s">"glove.6B.50d.txt"</span><span class="x">)</span>
<span class="n">embed_size</span><span class="x">,</span> <span class="n">max_features</span> <span class="o">=</span> <span class="n">size</span><span class="x">(</span><span class="n">embeddings</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"Loaded embeddings, each word is represented by a vector with </span><span class="si">$</span><span class="s">embed_size features. The vocab size is </span><span class="si">$</span><span class="s">max_features"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Loaded embeddings, each word is represented by a vector with 50 features. The vocab size is 400000
</code></pre></div></div>
<p>This function provides the index of a word in the GloVe embedding.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Function to return the index of the word in the embedding (returns 0 if the word is not found)</span>
<span class="k">function</span><span class="nf"> vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span>
<span class="n">i</span><span class="o">=</span><span class="n">findfirst</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">==</span><span class="n">s</span><span class="x">,</span> <span class="n">vocab</span><span class="x">)</span>
<span class="n">i</span><span class="o">==</span><span class="nb">nothing</span> <span class="o">?</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span> <span class="o">:</span> <span class="n">i</span>
<span class="k">end</span>
</code></pre></div></div>
<p>This function provides the GloVe word vector of the given word.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wvec</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">embeddings</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)]</span>
<span class="n">wvec</span><span class="x">(</span><span class="s">"done"</span><span class="x">)</span>
</code></pre></div></div>
<p>Here you can see the GloVe vector representation of one of our words “done”.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>50-element Array{Float32,1}:
0.33076
-0.4387
-0.32163
-0.4931
0.10254
-0.0027421
-0.5172
0.024336
-0.12816
0.14349
-0.16691
0.56121
-0.56241
⋮
0.060552
-0.16143
-0.26668
-0.1766
0.01582
0.25528
-0.096739
-0.097282
-0.084483
0.33312
-0.22252
0.74457
</code></pre></div></div>
<h3 id="embedding-preparation">Embedding Preparation</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">N</span> <span class="o">=</span> <span class="n">size</span><span class="x">(</span><span class="n">padded_docs</span><span class="x">,</span><span class="mi">1</span><span class="x">)</span> <span class="c">#Number of documents (10)</span>
<span class="n">max_features</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">vocab_size</span> <span class="o">=</span> <span class="mi">20</span>
</code></pre></div></div>
<p>The next block of code initializes a random embedding matrix as per the size of our vocab.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embedding_matrix</span><span class="o">=</span><span class="n">Flux</span><span class="o">.</span><span class="n">glorot_normal</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="n">vocab_size</span><span class="x">)</span>
</code></pre></div></div>
<p>Now we overwrite the random embedding matrix with our word vectors from GloVe. The word vectors are inserted as columns as per the index from word_dict plus 1. The reason we add 1 is so that 0 can represent a zero-word that has been padded.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">term</span> <span class="k">in</span> <span class="n">doc_term_matrix</span><span class="o">.</span><span class="n">terms</span>
<span class="k">if</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">term</span><span class="x">)</span><span class="o">!=</span><span class="mi">0</span>
<span class="n">embedding_matrix</span><span class="x">[</span><span class="o">:</span><span class="x">,</span><span class="n">word_dict</span><span class="x">[</span><span class="n">term</span><span class="x">]</span><span class="o">+</span><span class="mi">1</span><span class="x">]</span><span class="o">=</span><span class="n">wvec</span><span class="x">(</span><span class="n">term</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="buliding-the-model-and-training">Buliding the Model and Training</h2>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">embedding_matrix</span> <span class="o">*</span> <span class="n">Flux</span><span class="o">.</span><span class="n">onehotbatch</span><span class="x">(</span><span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">pad_size</span><span class="o">*</span><span class="n">N</span><span class="x">),</span> <span class="mi">0</span><span class="o">:</span><span class="n">vocab_size</span><span class="o">-</span><span class="mi">1</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">pad_size</span><span class="x">,</span> <span class="n">N</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">sum</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">dims</span><span class="o">=</span><span class="mi">2</span><span class="x">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">reshape</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">max_features</span><span class="x">,</span> <span class="n">N</span><span class="x">),</span>
<span class="n">Dense</span><span class="x">(</span><span class="n">max_features</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="n">σ</span><span class="x">)</span>
<span class="x">)</span>
</code></pre></div></div>
<p>The model (m) needs some explanation.</p>
<p><strong>Layer 1.</strong> The first layer’s embedding function matches the words in each document to corresponding word vectors. This is done by rolling all the word vectors one after the other and using onehotbatch to filter out the unwanted words. The output is a 50x40 array.</p>
<p><strong>Layer 2</strong>. Unrolls the vectors into the shape 50x4x10; i.e. 8 features and 10 documents of padded size 4.</p>
<p><strong>Layer 3.</strong> Now that our data is in the shape provided by layer 2 we can sum the word vectors to get an overall ‘meaning’ vector for each document. The output is now in the shape size of 50 x 1 x 10.</p>
<p><strong>Layer 4:</strong> Drops the axis (1) so that the shape of x is a size suitable for training. After this step the shape is 50x10.</p>
<p><strong>Layer 5:</strong> is a normal Dense layer with the sigmoid activation function to give us nice probabilities.</p>
<p>If you’d like to see each layer in action I recommend using<code class="language-plaintext highlighter-rouge">m[1](x)</code> to see sample output from the first layer. <code class="language-plaintext highlighter-rouge">m[1:2](x)</code> to see output from the second layer and so on.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_h</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy_train</span><span class="o">=</span><span class="x">[]</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">x</span> <span class="o">.==</span> <span class="n">y</span><span class="x">)</span>
</code></pre></div></div>
<p>As this is a binary (1 or 0) classification problem we need to use binarycrossentropy to calculate the loss. The optimizer is gradient descent.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">sum</span><span class="x">(</span><span class="n">Flux</span><span class="o">.</span><span class="n">binarycrossentropy</span><span class="o">.</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">),</span> <span class="n">y</span><span class="x">))</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Descent</span><span class="x">(</span><span class="mf">0.001</span><span class="x">)</span>
</code></pre></div></div>
<p>Train the model</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">epoch</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="mi">400</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">params</span><span class="x">(</span><span class="n">m</span><span class="x">),</span> <span class="n">data</span><span class="x">,</span> <span class="n">optimizer</span><span class="x">)</span>
<span class="c">#println(loss(x, y).data, " ", accuracy(m(x).>0.5,y))</span>
<span class="n">push!</span><span class="x">(</span><span class="n">loss_h</span><span class="x">,</span> <span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">,</span><span class="n">y</span><span class="x">))</span>
<span class="k">end</span>
<span class="n">println</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">)</span>
<span class="n">accuracy</span><span class="x">(</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span><span class="o">.></span><span class="mf">0.5</span><span class="x">,</span><span class="n">y</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Bool[true true true true true false false false false false]
1.0
</code></pre></div></div>
<p>Outputs over 0.5 are considered positive (true) and our final accuracy this time is 100%.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">12</span><span class="x">,</span><span class="mi">5</span><span class="x">))</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">121</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Loss"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">loss_h</span><span class="x">)</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">122</span><span class="x">)</span>
<span class="n">PyPlot</span><span class="o">.</span><span class="n">xlabel</span><span class="x">(</span><span class="s">"Epoch"</span><span class="x">)</span>
<span class="n">ylabel</span><span class="x">(</span><span class="s">"Accuracy"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">accuracy_train</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"train"</span><span class="x">)</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj003/output_40_0.png" alt="loss accuracy" /></p>
<p>Note, I think some parts of this could be done more elegantly, let me know if anything could be improved (I’m still learning too).</p>Nigel AdamsA simple example of a pre-trained word embedding layer (GloVe) with Julia and FluxJulia Word Embedding with Dracula2019-08-05T00:00:00+00:002019-08-05T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/nlp/word-embeddings<p>According to Wikipedia “Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers”.</p>
<p>It is word vectors that make technologies such as speech recognition and machine translation possible. The algorithms to create them come from the likes of Google’s (Word2Vec), Facebook’s (FastText) and Stanford University’s (GloVe). For this notebook we will use a pre-trained embedding file built using GloVe.</p>
<p>The embedding file I used below is <code class="language-plaintext highlighter-rouge">glove.6B.50d.txt</code>. This file can be downloaded from <a href="https://nlp.stanford.edu/projects/glove/">GloVe</a> and needs to be in the current working folder for this example.</p>
<p>The ideas explored below come from a brilliant GitHub Post <a href="https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469">Understanding word vectors
… for, like, actual poets. By Allison Parrish</a>. This was a Python notebook and I have basically re-written part of it in Julia. Very little credit goes to me!</p>
<p>Let’s begin by loading the packages we will need.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Distances</span><span class="x">,</span> <span class="n">Statistics</span>
<span class="k">using</span> <span class="n">MultivariateStats</span>
<span class="k">using</span> <span class="n">PyPlot</span>
<span class="k">using</span> <span class="n">WordTokenizers</span>
<span class="k">using</span> <span class="n">TextAnalysis</span>
<span class="k">using</span> <span class="n">DelimitedFiles</span>
</code></pre></div></div>
<h2 id="load-the-embeddings">Load the Embeddings</h2>
<p>There is a Julia package to load the embeddings with one or two lines of code called <a href="https://github.com/JuliaText/Embeddings.jl">Embeddings.jl</a> but I couldn’t get the package to install. I figured out the code to load the embeddings by delving into the repository.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> load_embeddings</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span>
<span class="kd">local</span> <span class="n">LL</span><span class="x">,</span> <span class="n">indexed_words</span><span class="x">,</span> <span class="n">index</span>
<span class="n">indexed_words</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">String</span><span class="x">}()</span>
<span class="n">LL</span> <span class="o">=</span> <span class="kt">Vector</span><span class="x">{</span><span class="kt">Vector</span><span class="x">{</span><span class="kt">Float32</span><span class="x">}}()</span>
<span class="n">open</span><span class="x">(</span><span class="n">embedding_file</span><span class="x">)</span> <span class="k">do</span> <span class="n">f</span>
<span class="n">index</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">line</span> <span class="k">in</span> <span class="n">eachline</span><span class="x">(</span><span class="n">f</span><span class="x">)</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">split</span><span class="x">(</span><span class="n">line</span><span class="x">)</span>
<span class="n">word</span> <span class="o">=</span> <span class="n">xs</span><span class="x">[</span><span class="mi">1</span><span class="x">]</span>
<span class="n">push!</span><span class="x">(</span><span class="n">indexed_words</span><span class="x">,</span> <span class="n">word</span><span class="x">)</span>
<span class="n">push!</span><span class="x">(</span><span class="n">LL</span><span class="x">,</span> <span class="n">parse</span><span class="o">.</span><span class="x">(</span><span class="kt">Float32</span><span class="x">,</span> <span class="n">xs</span><span class="x">[</span><span class="mi">2</span><span class="o">:</span><span class="k">end</span><span class="x">]))</span>
<span class="n">index</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">return</span> <span class="n">reduce</span><span class="x">(</span><span class="n">hcat</span><span class="x">,</span> <span class="n">LL</span><span class="x">),</span> <span class="n">indexed_words</span>
<span class="k">end</span>
</code></pre></div></div>
<p>The function above takes the input of the embeddings filename and returns two arrays: -</p>
<ul>
<li>
<p><strong>embeddings</strong> – A Float32 Array, each row represents one word as an d dimensional vector</p>
</li>
<li>
<p><strong>vocab</strong> – A string array of all the words</p>
</li>
</ul>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embeddings</span><span class="x">,</span> <span class="n">vocab</span> <span class="o">=</span> <span class="n">load_embeddings</span><span class="x">(</span><span class="s">"glove.6B.50d.txt"</span><span class="x">)</span>
<span class="n">vec_size</span><span class="x">,</span> <span class="n">vocab_size</span> <span class="o">=</span> <span class="n">size</span><span class="x">(</span><span class="n">embeddings</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"Loaded embeddings, each word is represented by a vector with </span><span class="si">$</span><span class="s">vec_size features. The vocab size is </span><span class="si">$</span><span class="s">vocab_size"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Loaded embeddings, each word is represented by a vector with 50 features. The vocab size is 400000
</code></pre></div></div>
<p>Lost? Don’t worry hang in there! Let’s see what in these arrays by way of some simple functions and examples.</p>
<h2 id="functions-well-need">Functions we’ll need</h2>
<p>The function <code class="language-plaintext highlighter-rouge">vec_idx</code> returns the index position of a given word in the vocab. We can see that “cheese” is the 5796th word.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">=</span> <span class="n">findfirst</span><span class="x">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">==</span><span class="n">s</span><span class="x">,</span> <span class="n">vocab</span><span class="x">)</span>
<span class="n">vec_idx</span><span class="x">(</span><span class="s">"cheese"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>5796
</code></pre></div></div>
<p>The function <code class="language-plaintext highlighter-rouge">vec</code> returns the word vector of the given word. Below is the vector for the word “cheese”.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> vec</span><span class="x">(</span><span class="n">s</span><span class="x">)</span>
<span class="k">if</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)</span><span class="o">!=</span><span class="nb">nothing</span>
<span class="n">embeddings</span><span class="x">[</span><span class="o">:</span><span class="x">,</span> <span class="n">vec_idx</span><span class="x">(</span><span class="n">s</span><span class="x">)]</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">vec</span><span class="x">(</span><span class="s">"cheese"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>50-element Array{Float32,1}:
-0.053903
-0.30871
-1.3285
-0.43342
0.31779
1.5224
-0.6965
-0.037086
-0.83784
0.074107
-0.30532
-0.1783
1.2337
⋮
1.9502
-0.53274
1.1359
0.20027
0.02245
-0.39379
1.0609
1.585
0.17889
0.43556
0.68161
0.066202
</code></pre></div></div>
<p>It’s pretty difficult to imagine words in a 50 dimensional space so let’s have a think about how some word vectors might look in 2 dimensions.</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj002/word-vectors.png" alt="word vectors" /></p>
<p>The words that are closer together have a similar meaning or context. Using the distances between word vectors things get interesting. Let’s define a function to do this using the cosine distance between two word vectors and then test it out (back to 50 dimensions!)</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cosine</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span><span class="o">=</span><span class="mi">1</span><span class="o">-</span><span class="n">cosine_dist</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
</code></pre></div></div>
<p>The following cell shows that the cosine similarity between dog and puppy is larger than the similarity between trousers and octopus, thereby demonstrating that the vectors are working how we expect them to</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cosine</span><span class="x">(</span><span class="n">vec</span><span class="x">(</span><span class="s">"dog"</span><span class="x">),</span> <span class="n">vec</span><span class="x">(</span><span class="s">"puppy"</span><span class="x">))</span> <span class="o">></span> <span class="n">cosine</span><span class="x">(</span><span class="n">vec</span><span class="x">(</span><span class="s">"trousers"</span><span class="x">),</span><span class="n">vec</span><span class="x">(</span><span class="s">"octopus"</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>true
</code></pre></div></div>
<p>Now let’s define a function to give us a list of nearest neighbouring words.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> closest</span><span class="x">(</span><span class="n">v</span><span class="x">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">20</span><span class="x">)</span>
<span class="n">list</span><span class="o">=</span><span class="x">[(</span><span class="n">x</span><span class="x">,</span><span class="n">cosine</span><span class="x">(</span><span class="n">embeddings</span><span class="err">'</span><span class="x">[</span><span class="n">x</span><span class="x">,</span><span class="o">:</span><span class="x">],</span> <span class="n">v</span><span class="x">))</span> <span class="k">for</span> <span class="n">x</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">size</span><span class="x">(</span><span class="n">embeddings</span><span class="x">)[</span><span class="mi">2</span><span class="x">]]</span>
<span class="n">topn_idx</span><span class="o">=</span><span class="n">sort</span><span class="x">(</span><span class="n">list</span><span class="x">,</span> <span class="n">by</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="x">[</span><span class="mi">2</span><span class="x">],</span> <span class="n">rev</span><span class="o">=</span><span class="nb">true</span><span class="x">)[</span><span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="k">return</span> <span class="x">[</span><span class="n">vocab</span><span class="x">[</span><span class="n">a</span><span class="x">]</span> <span class="k">for</span> <span class="x">(</span><span class="n">a</span><span class="x">,</span><span class="n">_</span><span class="x">)</span> <span class="k">in</span> <span class="n">topn_idx</span><span class="x">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Testing this out on the word “wine” we can see that list of words returned are all related words. It’s pretty remarkable given the word relationships were ‘learned’ and not specified by a human thesaurus word boffin.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest</span><span class="x">(</span><span class="n">vec</span><span class="x">(</span><span class="s">"wine"</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"wine"
"wines"
"tasting"
"coffee"
"beer"
"champagne"
"drink"
"taste"
"grape"
"drinks"
"beers"
"bottled"
"gourmet"
"blend"
"chocolate"
"tastes"
"dessert"
"flavor"
"fruit"
"cooking"
</code></pre></div></div>
<h2 id="math-on-words">Math on words</h2>
<h3 id="water--frozen">Water + Frozen</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest</span><span class="x">(</span><span class="n">vec</span><span class="x">(</span><span class="s">"water"</span><span class="x">)</span> <span class="o">+</span> <span class="n">vec</span><span class="x">(</span><span class="s">"frozen"</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"water"
"frozen"
"dry"
"dried"
"salt"
"milk"
"oil"
"waste"
"liquid"
"ice"
"freezing"
"covered"
"hot"
"drain"
"food"
"sand"
"sugar"
"soil"
"contaminated"
"cold"
</code></pre></div></div>
<p>Amazingly the list contains ice!</p>
<h3 id="halfway-between-day-and-night">Halfway between Day and Night</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest</span><span class="x">(</span><span class="n">mean</span><span class="x">([</span><span class="n">vec</span><span class="x">(</span><span class="s">"day"</span><span class="x">),</span> <span class="n">vec</span><span class="x">(</span><span class="s">"night"</span><span class="x">)]))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"night"
"day"
"days"
"weekend"
"morning"
"sunday"
"afternoon"
"saturday"
"came"
"week"
"evening"
"coming"
"next"
"on"
"before"
"hours"
"weeks"
"went"
"hour"
"time"
</code></pre></div></div>
<p>The list contains morning and afternoon!</p>
<h3 id="blue-is-to-sky-as-x-is-to-grass">Blue is to Sky as X is to Grass</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">blue_to_sky</span> <span class="o">=</span> <span class="n">vec</span><span class="x">(</span><span class="s">"blue"</span><span class="x">)</span> <span class="o">-</span> <span class="n">vec</span><span class="x">(</span><span class="s">"sky"</span><span class="x">)</span>
<span class="n">closest</span><span class="x">(</span><span class="n">blue_to_sky</span> <span class="o">+</span> <span class="n">vec</span><span class="x">(</span><span class="s">"grass"</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"grass"
"green"
"leaf"
"cane"
"bamboo"
"trees"
"grasses"
"tree"
"yellow"
"lawn"
"cotton"
"lawns"
"red"
"pink"
"farm"
"turf"
"vine"
"rubber"
"soft"
"chestnut"
</code></pre></div></div>
<p>Green is there at the top!</p>
<h3 id="man---woman--queen">Man - Woman + Queen</h3>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest</span><span class="x">(</span><span class="n">vec</span><span class="x">(</span><span class="s">"man"</span><span class="x">)</span> <span class="o">-</span> <span class="n">vec</span><span class="x">(</span><span class="s">"woman"</span><span class="x">)</span> <span class="o">+</span> <span class="n">vec</span><span class="x">(</span><span class="s">"queen"</span><span class="x">))</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"queen"
"king"
"prince"
"crown"
"coronation"
"royal"
"knight"
"lord"
"lady"
"ii"
"great"
"majesty"
"honour"
"name"
"palace"
"crowned"
"famous"
"throne"
"dragon"
"named"
</code></pre></div></div>
<p>King = Magic!</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj002/man-woman-queen-king.png" alt="man woman queen king" /></p>
<h2 id="sentence-similarity-with-dracula">Sentence Similarity with Dracula</h2>
<p>Load the book Dracular by Bram Stoker from this website as plain text - <a href="https://www.gutenberg.org/ebooks/345">https://www.gutenberg.org/ebooks/345</a></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">txt</span> <span class="o">=</span> <span class="n">open</span><span class="x">(</span><span class="s">"pg345.txt"</span><span class="x">)</span> <span class="k">do</span> <span class="n">file</span>
<span class="n">read</span><span class="x">(</span><span class="n">file</span><span class="x">,</span> <span class="kt">String</span><span class="x">)</span>
<span class="k">end</span>
<span class="n">println</span><span class="x">(</span><span class="s">"Loaded Dracula, length=</span><span class="si">$</span><span class="s">(length(txt)) characters"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Loaded Dracula, length=883114 characters
</code></pre></div></div>
<p>The next cell tidies up the book’s data by removing characters that are not alpha-numeric and splits the text up into an array of sentences.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">txt</span> <span class="o">=</span> <span class="n">replace</span><span class="x">(</span><span class="n">txt</span><span class="x">,</span> <span class="n">r</span><span class="s">"</span><span class="se">\n</span><span class="s">|</span><span class="se">\r</span><span class="s">|_|,"</span> <span class="o">=></span> <span class="s">" "</span><span class="x">)</span>
<span class="n">txt</span> <span class="o">=</span> <span class="n">replace</span><span class="x">(</span><span class="n">txt</span><span class="x">,</span> <span class="n">r</span><span class="s">"[</span><span class="se">\"</span><span class="s">*();!]"</span> <span class="o">=></span> <span class="s">""</span><span class="x">)</span>
<span class="n">sd</span><span class="o">=</span><span class="n">StringDocument</span><span class="x">(</span><span class="n">txt</span><span class="x">)</span>
<span class="n">prepare!</span><span class="x">(</span><span class="n">sd</span><span class="x">,</span> <span class="n">strip_whitespace</span><span class="x">)</span>
<span class="n">sentences</span> <span class="o">=</span> <span class="n">split_sentences</span><span class="x">(</span><span class="n">sd</span><span class="o">.</span><span class="n">text</span><span class="x">)</span>
<span class="n">i</span><span class="o">=</span><span class="mi">1</span>
<span class="k">for</span> <span class="n">s</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">sentences</span><span class="x">)</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">split</span><span class="x">(</span><span class="n">sentences</span><span class="x">[</span><span class="n">s</span><span class="x">]))</span><span class="o">></span><span class="mi">3</span>
<span class="n">sentences</span><span class="x">[</span><span class="n">i</span><span class="x">]</span><span class="o">=</span><span class="n">lowercase</span><span class="x">(</span><span class="n">replace</span><span class="x">(</span><span class="n">sentences</span><span class="x">[</span><span class="n">s</span><span class="x">],</span> <span class="s">"."</span><span class="o">=></span><span class="s">""</span><span class="x">))</span>
<span class="n">i</span><span class="o">+=</span><span class="mi">1</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="n">sentences</span><span class="x">[</span><span class="mi">1000</span><span class="o">:</span><span class="mi">1010</span><span class="x">]</span>
</code></pre></div></div>
<p>Ouput of sentences 1000 to 1010</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>11-element Array{SubString{String},1}:
"he seems absolutely imperturbable"
"i can fancy what a wonderful power he must have over his patients"
"he has a curious habit of looking one straight in the face as if trying to read one's thoughts"
"he tries this on very much with me but i flatter myself he has got a tough nut to crack"
"i know that from my glass"
"do you ever try to read your own face?"
"i do and i can tell you it is not a bad study and gives you more trouble than you can well fancy if you have never tried it"
"he says that i afford him a curious psychological study and i humbly think i do"
"i do not as you know take sufficient interest in dress to be able to describe the new fashions"
"dress is a bore"
"that is slang again but never mind arthur says that every day"
</code></pre></div></div>
<p>This next function <code class="language-plaintext highlighter-rouge">sentvec</code> takes the vectors of each word in the array and finds the mean vector of the whole sentence.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> sentvec</span><span class="x">(</span><span class="n">s</span><span class="x">)</span>
<span class="kd">local</span> <span class="n">arr</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">split</span><span class="x">(</span><span class="n">sentences</span><span class="x">[</span><span class="n">s</span><span class="x">])</span>
<span class="k">if</span> <span class="n">vec</span><span class="x">(</span><span class="n">w</span><span class="x">)</span><span class="o">!=</span><span class="nb">nothing</span>
<span class="n">push!</span><span class="x">(</span><span class="n">arr</span><span class="x">,</span> <span class="n">vec</span><span class="x">(</span><span class="n">w</span><span class="x">))</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">if</span> <span class="n">length</span><span class="x">(</span><span class="n">arr</span><span class="x">)</span><span class="o">==</span><span class="mi">0</span>
<span class="n">ones</span><span class="x">(</span><span class="kt">Float32</span><span class="x">,</span> <span class="x">(</span><span class="mi">50</span><span class="x">,</span><span class="mi">1</span><span class="x">))</span><span class="o">*</span><span class="mi">999</span>
<span class="k">else</span>
<span class="n">mean</span><span class="x">(</span><span class="n">arr</span><span class="x">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sentences</span><span class="x">[</span><span class="mi">101</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"there was everywhere a bewildering mass of fruit blossom--apple plum pear cherry and as we drove by i could see the green grass under the trees spangled with the fallen petals"
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sentvec</span><span class="x">(</span><span class="mi">100</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>50-element Array{Float32,1}:
0.3447293
0.39965677
-0.054723457
-0.07291292
0.21394199
0.15642972
-0.49596983
-0.24674776
-0.23787305
-0.4288543
-0.314565
-0.18126178
-0.15339927
⋮
0.08461739
-0.20704514
-0.22955278
-0.011368492
0.03529108
0.057512715
-0.0074529666
0.02252327
0.037329756
-0.52179056
-0.076994695
-0.49725753
</code></pre></div></div>
<p>This function returns the n nearest sentences (without any pretraining).</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> closest_sent</span><span class="x">(</span><span class="n">input_str</span><span class="x">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">20</span><span class="x">)</span>
<span class="n">mean_vec_input</span><span class="o">=</span><span class="n">mean</span><span class="x">([</span><span class="n">vec</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">split</span><span class="x">(</span><span class="n">input_str</span><span class="x">)])</span>
<span class="n">list</span><span class="o">=</span><span class="x">[(</span><span class="n">x</span><span class="x">,</span><span class="n">cosine</span><span class="x">(</span><span class="n">mean_vec_input</span><span class="x">,</span> <span class="n">sentvec</span><span class="x">(</span><span class="n">x</span><span class="x">)))</span> <span class="k">for</span> <span class="n">x</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">sentences</span><span class="x">)]</span>
<span class="n">topn_idx</span><span class="o">=</span><span class="n">sort</span><span class="x">(</span><span class="n">list</span><span class="x">,</span> <span class="n">by</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="x">[</span><span class="mi">2</span><span class="x">],</span> <span class="n">rev</span><span class="o">=</span><span class="nb">true</span><span class="x">)[</span><span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="k">return</span> <span class="x">[</span><span class="n">sentences</span><span class="x">[</span><span class="n">a</span><span class="x">]</span> <span class="k">for</span> <span class="x">(</span><span class="n">a</span><span class="x">,</span><span class="n">_</span><span class="x">)</span> <span class="k">in</span> <span class="n">topn_idx</span><span class="x">]</span>
<span class="k">end</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest_sent</span><span class="x">(</span><span class="s">"my favorite food is strawberry ice cream"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"we get hot soup or coffee or tea and off we go"
"there is not even a toilet glass on my table and i had to get the little shaving glass from my bag before i could either shave or brush my hair"
"i had for dinner or rather supper a chicken done up some way with red pepper which was very good but thirsty"
"drink it off like a good child"
"no you don't you couldn't with eyebrows like yours"
"oh yes they like the lotus flower make your trouble forgotten"
"this with some cheese and a salad and a bottle of old tokay of which i had two glasses was my supper"
"but lor' love yer 'art now that the old 'ooman has stuck a chunk of her tea-cake in me an' rinsed me out with her bloomin' old teapot and i've lit hup you may scratch my ears for all you're worth and won't git even a growl out of me"
"i know that from my glass"
"i found my dear one oh so thin and pale and weak-looking"
"And I like it not."
"she has more colour in her cheeks than usual and looks oh so sweet"
"i can go with you now if you like"
"make them get heat and fire and a warm bath"
"i felt in my heart a wicked burning desire that they would kiss me with those red lips"
"give me some water my lips are dry and i shall try to tell you"
"oh what a strange meeting and how it all makes my head whirl round i feel like one in a dream"
"so i said:-- you like life and you want life?"
"i had for breakfast more paprika and a sort of porridge of maize flour which they said was mamaliga and egg-plant stuffed with forcemeat a very excellent dish which they call impletata"
"for a little bit her breast heaved softly and her breath came and went like a tired child's"
</code></pre></div></div>
<p>It’s interesting to see the sentences returned - they are indeed mostly similar.</p>
<p>As the sentence similarity function is slow to run I created a pre-trained array of all the sentences and the corresponding word vectors.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">drac_sent_vecs</span><span class="o">=</span><span class="x">[]</span>
<span class="k">for</span> <span class="n">s</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">sentences</span><span class="x">)</span>
<span class="n">i</span><span class="o">==</span><span class="mi">1</span> <span class="o">?</span> <span class="n">drac_sent_vecs</span><span class="o">=</span><span class="n">sentvec</span><span class="x">(</span><span class="n">s</span><span class="x">)</span> <span class="o">:</span> <span class="n">push!</span><span class="x">(</span><span class="n">drac_sent_vecs</span><span class="x">,</span><span class="n">sentvec</span><span class="x">(</span><span class="n">s</span><span class="x">))</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Save data to files (to save doing the training step next time).</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">writedlm</span><span class="x">(</span> <span class="s">"drac_sent_vec.csv"</span><span class="x">,</span> <span class="n">drac_sent_vecs</span><span class="x">,</span> <span class="sc">','</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">writedlm</span><span class="x">(</span> <span class="s">"drac_sentences.csv"</span><span class="x">,</span> <span class="n">sentences</span><span class="x">,</span> <span class="sc">','</span><span class="x">)</span>
</code></pre></div></div>
<p>Open the files</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sentences</span><span class="o">=</span><span class="n">readdlm</span><span class="x">(</span><span class="s">"drac_sentences.csv"</span><span class="x">,</span> <span class="sc">'!'</span><span class="x">,</span> <span class="kt">String</span><span class="x">,</span> <span class="n">header</span><span class="o">=</span><span class="nb">false</span><span class="x">)</span>
<span class="n">drac_sent_vecs</span><span class="o">=</span><span class="n">readdlm</span><span class="x">(</span><span class="s">"drac_sent_vec.csv"</span><span class="x">,</span> <span class="sc">','</span><span class="x">,</span> <span class="kt">Float32</span><span class="x">,</span> <span class="n">header</span><span class="o">=</span><span class="nb">false</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>8093×50 Array{Float32,2}:
0.395886 0.136462 0.0393325 … -0.00172208 -0.094155
0.105341 0.298508 -0.108769 -0.11237 0.108809
0.306499 0.372668 0.0499599 0.011585 -0.0269931
0.439134 0.237768 -0.157471 -0.047655 -0.206138
0.479465 0.0339237 0.0574679 -0.0110334 -0.0810052
0.305005 0.236101 -0.167058 … -0.161612 -0.481633
0.274253 -0.103281 -0.0939105 -0.0443089 -0.0691436
0.454941 0.308015 -0.376682 0.118407 -0.017146
0.280243 0.0355603 -0.371213 -0.054871 0.0895917
0.303624 0.24452 -0.259576 -0.0073874 0.372042
0.292713 0.0700706 -0.128396 … -0.0598984 0.0768687
0.427364 0.0626689 -0.00844564 -0.0528361 0.20124
0.42247 0.139159 -0.134028 -0.109309 -0.322777
⋮ ⋱
0.527544 0.0679754 -0.0678955 -0.0834867 -0.141069
0.274218 -0.120684 -0.176243 0.156214 -0.2699
0.364304 0.277423 0.163191 0.00988463 -0.119377
0.386379 0.203583 0.148782 -6.83427e-5 -0.125681
0.0938667 0.214723 0.586457 … -0.0834033 0.454743
-0.66594 -0.6551 0.92148 -0.42447 -0.058735
0.447467 0.25429 -0.151193 -0.0932182 -0.244452
0.215579 0.135113 0.0431876 -0.307311 -0.121217
0.374962 0.121228 -0.172914 -0.106937 -0.301211
0.194821 -0.0167174 -0.0303678 … 0.0276704 0.168872
0.605342 0.221943 0.21447 -0.143455 0.00104976
999.0 999.0 999.0 999.0 999.0
</code></pre></div></div>
<p>Redefine the sentence similarity function to look at the pre-trained array.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> closest_sent_pretrained</span><span class="x">(</span><span class="n">pretrained_arr</span><span class="x">,</span> <span class="n">input_str</span><span class="x">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">20</span><span class="x">)</span>
<span class="n">mean_vec_input</span><span class="o">=</span><span class="n">mean</span><span class="x">([</span><span class="n">vec</span><span class="x">(</span><span class="n">w</span><span class="x">)</span> <span class="k">for</span> <span class="n">w</span> <span class="k">in</span> <span class="n">split</span><span class="x">(</span><span class="n">input_str</span><span class="x">)])</span>
<span class="n">list</span><span class="o">=</span><span class="x">[(</span><span class="n">x</span><span class="x">,</span><span class="n">cosine</span><span class="x">(</span><span class="n">mean_vec_input</span><span class="x">,</span> <span class="n">pretrained_arr</span><span class="x">[</span><span class="n">x</span><span class="x">,</span><span class="o">:</span><span class="x">]))</span> <span class="k">for</span> <span class="n">x</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">length</span><span class="x">(</span><span class="n">sentences</span><span class="x">)]</span>
<span class="n">topn_idx</span><span class="o">=</span><span class="n">sort</span><span class="x">(</span><span class="n">list</span><span class="x">,</span> <span class="n">by</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="x">[</span><span class="mi">2</span><span class="x">],</span> <span class="n">rev</span><span class="o">=</span><span class="nb">true</span><span class="x">)[</span><span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="k">return</span> <span class="x">[</span><span class="n">sentences</span><span class="x">[</span><span class="n">a</span><span class="x">]</span> <span class="k">for</span> <span class="x">(</span><span class="n">a</span><span class="x">,</span><span class="n">_</span><span class="x">)</span> <span class="k">in</span> <span class="n">topn_idx</span><span class="x">]</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Test it out and the results are instant this time.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">closest_sent_pretrained</span><span class="x">(</span><span class="n">drac_sent_vecs</span><span class="x">,</span> <span class="s">"i walked into a door"</span><span class="x">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>20-element Array{String,1}:
"with a glad heart i opened my door and ran down to the hall"
"i held my door open as he went away and watched him go into his room and close the door"
"again a shock: my door was fastened on the outside"
"suddenly he called out:-- look madam mina look look i sprang up and stood beside him on the rock he handed me his glasses and pointed"
"then lucy took me upstairs and showed me a room next her own where a cozy fire was burning"
"i keep the key of our door always fastened to my wrist at night but she gets up and walks about the room and sits at the open window"
"just before twelve o'clock i just took a look round afore turnin' in an' bust me but when i kem opposite to old bersicker's cage i see the rails broken and twisted about and the cage empty"
"if he go through a doorway he must open the door like a mortal"
"i went to the door"
"when i came back i found him walking hurriedly up and down the room his face all ablaze with excitement"
"i came back to my room and threw myself on my knees"
"after a few minutes' staring at nothing jonathan's eyes closed and he went quietly into a sleep with his head on my shoulder"
"every window and door was fastened and locked and i returned baffled to the porch"
"i sat down beside him and took his hand"
"bah with a contemptuous sneer he passed quickly through the door and we heard the rusty bolt creak as he fastened it behind him"
"passing through this he opened another door and motioned me to enter"
"Suddenly he called out:-- Look Madam Mina look look I sprang up and stood beside him on the rock he handed me his glasses and pointed."
"just outside stretched on a mattress lay mr morris wide awake"
"i could see easily for we did not leave the room in darkness she had placed a warning hand over my mouth and now she whispered in my ear:-- hush there is someone in the corridor i got up softly and crossing the room gently opened the door"
"i have to be away till the afternoon so sleep well and dream well with a courteous bow he opened for me himself the door to the octagonal room and i entered my bedroom"
</code></pre></div></div>Nigel AdamsMaths on words, word similarity, sentence similarity ... and Dracula?Up and Running! How the website was created2019-08-04T00:00:00+00:002019-08-04T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/firstpost<p>Hello World!</p>
<p>This static website is hosted for free on <a href="https://help.github.com/en/articles/about-github-pages-and-jekyll">GitHub Pages with Jekyll</a></p>
<p>The posts and pages are markdown files (.md) which makes them quick to compose. The super cool thing is that you can save your Notebooks as markdown files from Jupyter and the output file only requires a small amount of editing to make it look good on the Jekyll website.</p>
<p>Big shout out to Michael Rose for the theme <a href="https://mmistakes.github.io/minimal-mistakes/">Minimal Mistakes</a></p>
<p>To get the math equations looking good I use <a href="https://www.mathjax.org/">MathJax</a> in the markdown.</p>Nigel AdamsHello World! My first post and how I created this siteJulia Flux Simple Regression Model2019-08-04T00:00:00+00:002019-08-04T00:00:00+00:00https://spcman.github.io/getting-to-know-julia/deep-learning/fluxsimple<p>Flux is a Neural Network Machine Learning library for the Julia programming language. Flux may be likened to TensorFlow but it shows potential to be easier as there is no additional ‘graphing’ language layer to learn – it’s just plain Julia.</p>
<p>Let’s get started with a simple example.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Distributions</span><span class="x">,</span> <span class="n">PyPlot</span><span class="x">,</span> <span class="n">Random</span><span class="x">,</span> <span class="n">Flux</span>
</code></pre></div></div>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Display Flux Version</span>
<span class="k">import</span> <span class="n">Pkg</span> <span class="x">;</span> <span class="n">Pkg</span><span class="o">.</span><span class="n">installed</span><span class="x">()[</span><span class="s">"Flux"</span><span class="x">]</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>v"0.7.2"
</code></pre></div></div>
<p>Generate some data randomly distributed about the polynomial function \(-0.1x^2 + 2x\)</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.1</span><span class="o">*</span><span class="n">x</span><span class="o">^</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span>
<span class="n">Random</span><span class="o">.</span><span class="n">seed!</span><span class="x">(</span><span class="mi">1000</span><span class="x">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">collect</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="mi">10</span><span class="x">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="x">[</span><span class="n">f</span><span class="x">(</span><span class="n">i</span><span class="x">)</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="n">x</span><span class="x">]</span> <span class="o">.+</span> <span class="n">rand</span><span class="x">(</span><span class="n">Normal</span><span class="x">(</span><span class="mi">0</span><span class="x">,</span><span class="mf">0.75</span><span class="x">),</span><span class="mi">10</span><span class="x">)</span>
<span class="c">#Plot f(x) and models using n data points</span>
<span class="n">n</span><span class="o">=</span><span class="mi">100</span>
<span class="n">x_rng</span><span class="o">=</span><span class="kt">LinRange</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span> <span class="mi">10</span><span class="x">,</span> <span class="n">n</span><span class="x">)</span>
<span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">3</span><span class="x">,</span><span class="mi">3</span><span class="x">))</span>
<span class="n">scatter</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x_rng</span><span class="x">,</span><span class="n">f</span><span class="o">.</span><span class="x">(</span><span class="n">x_rng</span><span class="x">),</span> <span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="x">)</span>
<span class="n">show</span><span class="x">()</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj001/output_4_0.png" alt="output" /></p>
<p>The Julia function below takes the inputs of our ‘random’ data \(x, y\) and returns a one of two trained Flux models. The goal is to predict a fit close to the known polynomial f(x).</p>
<p><strong>Model 1</strong> is the most trivial with one dense input; i.e. \(y = σ.(W * x .+ b)\)</p>
<p><strong>Model 2</strong> has 1 hidden layer with a definable amount of neurons for experimentation</p>
<p>Training is done with the optimiser : Gradient Descent</p>
<p>NOTE: σ = identity = i.e. the identity matrix for regression</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> train_model</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">,</span> <span class="n">hl_neurons</span><span class="o">=</span><span class="mi">0</span><span class="x">)</span>
<span class="c"># x must be an `in` × N matrix</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="err">'</span>
<span class="c"># Create data iterator for 1000 epochs</span>
<span class="n">data_iterator</span> <span class="o">=</span> <span class="n">Iterators</span><span class="o">.</span><span class="n">repeated</span><span class="x">((</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">),</span> <span class="mi">1000</span><span class="x">)</span>
<span class="c"># Set-up model layout</span>
<span class="k">if</span> <span class="n">hl_neurons</span><span class="o">==</span><span class="mi">0</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">Dense</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">),</span> <span class="n">identity</span><span class="x">)</span>
<span class="k">else</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">Chain</span><span class="x">(</span><span class="n">Dense</span><span class="x">(</span><span class="mi">1</span><span class="x">,</span> <span class="n">hl_neurons</span><span class="x">,</span> <span class="n">tanh</span><span class="x">),</span>
<span class="n">Dense</span><span class="x">(</span><span class="n">hl_neurons</span><span class="x">,</span> <span class="mi">1</span><span class="x">,</span> <span class="n">identity</span><span class="x">))</span>
<span class="k">end</span>
<span class="c">#Our loss function to minimize</span>
<span class="n">loss</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span> <span class="o">=</span> <span class="n">sum</span><span class="x">((</span><span class="n">m</span><span class="x">(</span><span class="n">x</span><span class="x">)</span> <span class="o">.-</span> <span class="n">y</span><span class="err">'</span><span class="x">)</span><span class="o">.^</span><span class="mi">2</span><span class="x">)</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">Flux</span><span class="o">.</span><span class="n">Descent</span><span class="x">(</span><span class="mf">0.0001</span><span class="x">)</span>
<span class="n">Flux</span><span class="o">.</span><span class="n">train!</span><span class="x">(</span><span class="n">loss</span><span class="x">,</span> <span class="n">Flux</span><span class="o">.</span><span class="n">params</span><span class="x">(</span><span class="n">m</span><span class="x">),</span> <span class="n">data_iterator</span><span class="x">,</span> <span class="n">optimizer</span><span class="x">)</span>
<span class="k">return</span> <span class="n">m</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Make predictions and plot against our source data. Note, in the example I included 10 neurons.</p>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj001/nn_1_10_1.png" alt="Neural Network 1-10-1" /></p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="o">=</span><span class="n">train_model</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">y_linear</span><span class="o">=</span><span class="n">reshape</span><span class="x">(</span><span class="n">model</span><span class="x">(</span><span class="n">x</span><span class="err">'</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">length</span><span class="x">(</span><span class="n">x</span><span class="x">),)</span>
<span class="n">model</span><span class="o">=</span><span class="n">train_model</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">,</span> <span class="mi">10</span><span class="x">)</span>
<span class="n">y_hid</span><span class="o">=</span><span class="n">reshape</span><span class="x">(</span><span class="n">model</span><span class="x">(</span><span class="n">x_rng</span><span class="err">'</span><span class="x">)</span><span class="o">.</span><span class="n">data</span><span class="x">,</span> <span class="n">n</span><span class="x">,)</span>
<span class="n">figure</span><span class="x">(</span><span class="n">figsize</span><span class="o">=</span><span class="x">(</span><span class="mi">12</span><span class="x">,</span><span class="mi">5</span><span class="x">))</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">121</span><span class="x">)</span>
<span class="n">scatter</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x_rng</span><span class="x">,</span><span class="n">f</span><span class="o">.</span><span class="x">(</span><span class="n">x_rng</span><span class="x">),</span> <span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Source Polynomial f(x)"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y_linear</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Predictions using Dense Layer Model"</span><span class="x">)</span>
<span class="n">legend</span><span class="x">()</span>
<span class="n">subplot</span><span class="x">(</span><span class="mi">122</span><span class="x">)</span>
<span class="n">scatter</span><span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x_rng</span><span class="x">,</span><span class="n">f</span><span class="o">.</span><span class="x">(</span><span class="n">x_rng</span><span class="x">),</span> <span class="n">color</span><span class="o">=</span><span class="s">"gray"</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Source Polynomial f(x)"</span><span class="x">)</span>
<span class="n">plot</span><span class="x">(</span><span class="n">x_rng</span><span class="x">,</span><span class="n">y_hid</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Predictions using Hidden Layer Model"</span><span class="x">)</span>
<span class="n">legend</span><span class="x">()</span>
<span class="n">show</span><span class="x">()</span>
</code></pre></div></div>
<p><img src="https://spcman.github.io/getting-to-know-julia/images/proj001/output_8_0.png" alt="output" /></p>
<p>The introduction of the hidden layer approximates our function well! Apparently, a one layer neural network can approximate any continuous function. I might put this to the test another day.</p>
<p>The trained parameters of the model can be obtained with <code class="language-plaintext highlighter-rouge">Flux.params(model)</code>. For the 10-neuron model you end up with 10 sets of parameters for the trained weights and biases. You cannot approximate the original polynomial co-efficients of f(x) as such.</p>Nigel AdamsOverkill - but a simple introduction to Flux