ML / AI Task | Notes | Notebook | Universal Interface / API |
---|---|---|---|
Sentiment Analysis | * Sentiment Analysis - Positive, Neutral, Negative, (Very Positive, Very Negative) * Reputation management * Report sentiment statistics on Twitter of competitors and yourself * Customer support sentiment * Stock price prediction |
Pipeline_Sentiment_Analysis.ipynb | CODE > from transformers import pipeline > classifier = pipeline("sentiment-analysis") > classifier("This is such a great movie!") OUTPUT [{âlabelâ: âPOSITIVEâ, âscoreâ: 0.9998759031295776}] |
Text Generation | * Autoregressive Language Models * Language is a âtime seriesâ (i.e. a sequence) of categorical objects * An autoregressive language model is one where we find the conditional distribution of the next word given past words * Transformers (attention-mechanism) have been a key technology (long-range dependencies) * Uses cases: weâve used them to generate poetry * Use cases: help writing emails / creative writing * Use cases: Github Copilot (can generate working code from text prompt) * Use cases: Build full website designs (actual code), compose music, medical queries |
Pipeline_Text_Generation.ipynb | CODE > !wget -nc https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/hmm_class/robert_frost.txt > from transformers import pipeline, set_seed > lines = [line.rstrip() for line in open('robert_frost.txt')] > gen = pipeline("text-generation") > gen(lines[0]) OUTPUT [{âgenerated_textâ: âTwo roads diverged in a yellow wood, which they had left behind a few yards from where they had cut from. At the end of the road stood a tall red pole and, just out of view, the white-lipped man could seeâ}] |
Masked Language Modeling | * Example - Article Spinning (create content with keyboards that match usersâ queries) * Article spinning idea: change enough words (while keeping the article coherent) such that it doesnât match the original * Article Spinning is a Black Hat SEO technique |
Pipeline_Masked_Language_Modeling.ipynb | CODE > !wget -nc https://lazyprogrammer.me/course_files/nlp/bbc_text_cls.csv > from transformers import pipeline > mlm = pipeline('fill-mask') > mlm('Bombardier chief to leave <mask>') OUTPUT [{âscoreâ: 0.06950818747282028, âsequenceâ: âBombardier chief to leave jobâ, âtokenâ: 633, âtoken_strâ: â jobâ}, {âscoreâ: 0.06693071871995926, âsequenceâ: âBombardier chief to leave Franceâ, âtokenâ: 1470, âtoken_strâ: â Franceâ}, {âscoreâ: 0.052735257893800735, âsequenceâ: âBombardier chief to leave officeâ, âtokenâ: 558, âtoken_strâ: â officeâ}, {âscoreâ: 0.025823095813393593, âsequenceâ: âBombardier chief to leave Parisâ, âtokenâ: 2201, âtoken_strâ: â Parisâ}, {âscoreâ: 0.021368568763136864, âsequenceâ: âBombardier chief to leave Canadaâ, âtokenâ: 896, âtoken_strâ: â Canadaâ}] |
Named Entity Recognition | * Named entity recognition (NER) allows us to identify (i.e. tag) all the people, places, and companies in a document | Pipeline_NER.ipynb | CODE > from transformers import pipeline > ner = pipeline("ner", aggregation_strategy='simple', device=0) > inputs[9] > from nltk.tokenize.treebank import TreebankWordDetokenizer > detokenizer = TreebankWordDetokenizer() > ner(detokenizer.detokenize(inputs[9])) OUTPUT NER Input = [âHeâ,âwasâ,âwellâ,âbackedâ,âbyâ,âEnglandâ,âhopefulâ,âMarkâ,âButcherâ,âwhoâ,âmadeâ,â70â,âasâ,âSurreyâ, âclosedâ,âonâ,â429â,âforâ,âsevenâ,â,â,âaâ,âleadâ,âofâ,â234â,â.â] NER Result = [{âentity_groupâ: âLOCâ,âscoreâ: 0.99967515,âwordâ: âEnglandâ,âstartâ: 22,âendâ: 29}, {âentity_groupâ: âPERâ,âscoreâ: 0.99974275,âwordâ: âMark Butcherâ,âstartâ: 38,âendâ: 50}, {âentity_groupâ: âORGâ,âscoreâ: 0.9996264,âwordâ: âSurreyâ,âstartâ: 66,âendâ: 72}] |
Text Summarization | * We already do this all the time! - Scientific paper abstracts, Executive summaries * Paraphrase / Summarize * Summarization is a way for learning systems to demonstrate understanding of a concept * Extractive vs. Abstractive * Extractive summaries consist of text taken from the original document * Abstractive summaries can contain novel sequences of text not necessarily taken from the input |
Pipeline_Text_Generation.ipynb | CODE > from transformers import pipeline > summarizer = pipeline("summarization") > summarizer(doc.iloc[0].split("\n", 1)[1]) OUTPUT [{âsummary_textâ: â Retail sales dropped by 1% on the month in December, after a 0.6% rise in November . Clothing retailers and non-specialist stores were the worst hit with only internet retailers showing any significant growth . The last time retailers endured a tougher Christmas was 23 years ago, when sales plunged 1.7% .â}] |
Neural Machine Translation | * Convert phrases from one language to another * Sequence-to-sequence task * Many valid translations * BLEU score is the most popular metric |
Pipeline_Neural_Machine_Translation.ipynb | CODE > from transformers import pipeline > translator = pipeline("translationâ, model='Helsinki-NLP/opus-mt-en-es', device=0) > translator("I like eggs and ham") OUTPUT [{âtranslation_textâ: âMe gustan los huevos y el jamĂłn.â}] |
Question Answering | * SQuAD [Stanford Question Answering Dataset] * It is an extractive question answering dataset * The answer is contained in the input, and the model simply âextractsâ the portion which makes up the answer * [CLS] question tokens [SEP] context tokens |
Pipeline_Question_Answering.ipynb | CODE > from transformers import pipeline > qa = pipeline("question-answering") > context = "Today I went to the store to purchase a carton of milk." > question = "What did I buy?" > qa(context=context, question=question) OUTPUT {âanswerâ: âa carton of milkâ,âendâ: 54,âscoreâ: 0.5626223683357239,âstartâ: 38} |
Zero-Shot Classification | * Classification without labels | Pipeline_Zero_Shot_Classification.ipynb | CODE > from transformers import pipeline > classifier = pipeline("zero-shot-classification", device=0) > text = âDue to the presence of isoforms of its components, there are 12 â + \ âversions of AMPK in mammals, each of which can have different tissue â + \ âlocalizations, and different functions under different conditions. â + \ âAMPK is regulated allosterically and by post-translational â + \ âmodification, which work together.â > classifier(text, candidate_labels=["biology", "math", "geology"]) OUTPUT {âlabelsâ: [âbiologyâ, âmathâ, âgeologyâ], âscoresâ: [0.8908600807189941, 0.06606598943471909, 0.04307396709918976], âsequenceâ: âDue to the presence of isoforms of its components, there are 12 versions of AMPK in mammals, each of which can have different tissue localizations, and different functions under different conditions. AMPK is regulated allosterically and by post-translational modification, which work together.â} |