buleft.blogg.se

Python list of dictionaries artist and album
Python list of dictionaries artist and album









>for token in sorted(set(stemmed_tokens)):

python list of dictionaries artist and album

Once our tokens are stemmed, we can rest easy knowing that roll, Rolling, Rolls will all stem to roll. Here, we'll use the most popular english language stemmer, the Potter stemmer, which comes with NLTK. Most stemmers are pretty basic and just chop off standard affixes indicating things like tense (e.g., "-ed") and possessive forms (e.g., "-'s"). As such automate(s), automatic, automation all reduced to automat. That is, reduce it down to its base/stem/root form. Also when we enter roll, we would like to match Roll, and rolling. So why do we even need to "normalize" terms? We want to match U.S.A. > AmericanGangster_corpus = create_corpus(AmericanGangster_wordlist, )īuilding a concordance, gets us to the area of elementary information retrieval (IR) 1, think, basic search engine.

python list of dictionaries artist and album

> AmericanGangster_wordlist = PlaintextCorpusReader(corpus_root, 'JayZ_American Gangster_.*') The first thing I want to do is to create a corpus that only contain words from the American Gangster album. To do this, we need to see the context in which it was used. Let's take a closer look at this phenomenon, and determine if "roll" was used in the "basketball" sense of the term. You can read more about it here.įrom the plot we see that the basketball term "roll" seems to be used extensively in the song Party Life. All this is enabled by NLTK's built in function for Conditional Frequency Distribution. Thus, it will count words like turnover, alley-oop and so on.

#Python list of dictionaries artist and album code#

The following code converts the words in the basket ball concept to lowercase using w.lower(), then checks if they start with any of the "targets", that is, each of the words in the basketball_bag_of_words, using the command startswith(). Remember that Albums is just a list data type, so we can slice it, to its first 14 indexes. Lets reduce our investigation of this concept to just the American Gangster album, which is the first 14 songs in the corpus. So let's do a simple analysis on the occurence of the concept "basketball" in JayZ's lyrics as represented by the list of 40 terms that are common when we talk about basketball.īasketball_bag_of_words = ['bounce','crossover','technical', To get this prefix out of the filename, we extracted the first five characters, using fileid. Notice that JayZ_ text appears before each of the filenames. Go ahead and save all the album titles by typing in Albums = wordlist.fileids(). In this section, lets investigate the use of basketball language in Jay Z's lyrics. 'JayZ_American Gangster_Ignorant Shit.txt', 'JayZ_American Gangster_Hello Brooklyn 20.txt',

python list of dictionaries artist and album

'JayZ_American Gangster_American Gangster.txt', ['JayZ_American Gangster_American Dreamin.txt', PlaintextCorpusReader type is made up of a list of fileids, which are made up of a list of paragraphs, which is in turn made up of a list of sentences, which is in turn made up of a list of words.įileids(): list of (list of (list of (list of str))) This uses the datastructure of a nested list. Remember we had created a wordlist of type PlaintextCorpusReader.

python list of dictionaries artist and album

This is where the "arts" coming into the sciences.Īll the code for this section can be found in this file. This section deals with creating gainful insights.









Python list of dictionaries artist and album