Skip to main content

Documents Computer Science

Lecture 11 Learning from Text -- Alec Radford.pdf

Lecture 11 Learning from Text -- Alec Radford

pdf

School

University of Cincinnati, Main Campus *

*We aren’t endorsed by this school

Course

OPTIMIZATI

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

124

Uploaded by BrigadierGorillaMaster2190

Learning From Text Language Models and More April 15th 2020 - Berkeley - alec@openai.com

Standard supervised learning requires “ YGMIONTOUZKQ XKQGM^ZOUZMS MS^GMJPKQ ” data There is not a lot of “ YGMIONTOUZKQ XKQGM^ZOUZMS MS^GMJPKQ ” data (compared to what current models need) This lecture focuses on a variety of methods for learning from natural language in order to improve the performance of models on standard NLP datasets/tasks. Learning From Text

Autoregressive maximum likelihood language modeling will be the core. But, there are many proxy tasks involving predicting / modeling text somehow, someway that work well (sometimes even better than standard LMs!) ● Word2Vec / Paragraph2Vec ● Contrast Predictive Coding (CPC) ● BERT ● ELECTRA A Variety of Methods

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Motivation and Intro

A Wild Internet Appears

A Wild Internet Appears

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

How to use it? Let’s try word-word co-occurrences water steam ice water 32879 ... ... steam 250 324 ... ice 765 23 859 hot 19540 1832 17 hot ... ... ... 48323

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions (Clark et al 2016) How good is counting a bunch of stuff?

It’s still huge! 1 million words x 1 million words x 4 byte int32 = 4 terabytes Want to come up with a much more compact, but faithful representation of the relations between words and the information they represent. Problems working with word-word co-occurrence matrix

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Take the matrix X counting word-word co-occurrences (cheap so do it for 840B tokens!) So entry X ij would be the count of word i occuring in a context with word j Learn low dim vector representations of each word such that their dot product = log prob of co-occuring Goes from MxM to MxN where N is the dimensionality of the word vectors (300 << 1,000,000!) GLoVE (Pennington et al 2014)

Word2Vec

Usefulness of Word Vectors [McCann et al 2017]

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Language is a lot more than just counts of words! It has a ton of structure on top of / in addition to words. Context is very important and a ﬁxed static representation of a word is insuﬃcient. 1. I went to the river bank. 2. I made a withdrawal from the bank. 3. “I wouldn’t bank on it” Problems with word vectors

Great, so I’ve got a 1,000,000 x 300 matrix ... now what? How to use it is up to the practitioner. Often involves a lot of task speciﬁc models slapped on top. Learning just word vectors is like learning just edge detectors in computer vision. Problems with word vectors

Intro to Language Models

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

70 years of samples [From Oriol Vinyals’ twitter]

Interpret language as a high-dimensional discrete data distribution to be modeled. Observe a bunch of strings of language and Learn a function that can compute the probability of new ones: p± “Is it going to rain today?” ° Statistical / Probabilistic Language Modeling

p± The cat sat on the mat. ° = ??? What does it mean to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

p± The cat sat on the mat. ° = ??? Noam Chomsky in 1969: &a` OU` Ya_` HNKQ ^KQIO[MSZOU^KQJP `NTGM` `NTKQ Z[`OU[Z [LR ±\^[HNGMHNOUXOU`] [LR GM _KQZ`KQZIOKQ± OU_ GMZ KQZ`OU^KQX] a_KQXKQ__ [ZKQ² aZJPKQ^ GMZ] WZ[[Z OUZ`KQ^\^KQ`GM`OU[Z [LR `NTOU_ `KQ^Y³ What does it mean to compute the probability of a string?

p± The cat sat on the mat. ° > p± The cat sats on the mat. ° [grammar] Should p± The cat sats on the mat. ° be 0? p± The hyena sat on the mat. ° < p± The cat sat on the mat. ° [world knowledge] Should p± “4” | “2 + 2 =” ° be 1? p( “1 star out of 5” | “That movie was terrible! I’d rate it” ) [sentiment analysis] How can you use the probability of a string?

Speech Recognition and Machine Translation are supervised tasks Speech Recognition = (audio 1 , transcript 1 ) (audio 2 , transcript 2 ) (audio 3 , transcript 3 ) Machine Translation = (french 1 , english 1 ) (french 2 , english 2 ) (french 3 , english 3 ) A major promise of language modeling is to leverage a bunch of “uncurrated” text to help with these problems. How can you use the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Speech Recognition ● prune the space of possible transcriptions from an acoustic model ● famous example: “wreck a nice beach” vs “recognize speech” Machine Translation ● re-rank possible translations ● Integrate directly with decoder How can you use the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

First, maybe do some preprocessing (like lower-casing) “THe CaT SAt oN ThE MAT.” -> “the cat sat on the mat.” How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Often we’ll set a maximum # of words (or minimum frequency) for computational reasons so: “the cat sat on the countertop.” -> “the cat sat on the <UNK>.” How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A tokenizer takes a string as input and returns a sequence of tokens: “the cat sat on the mat.” -> [the, cat, sat, on, the, mat, .] [the, cat, sat, on, the, mat, .] -> [23, 1924, 742, 101, 23, 3946, 7] How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A tokenizer takes a string as input and returns a sequence of tokens: “the cat sat on the mat.” -> [t, h, e, “ “, c, a, t, “ “, s, a, t, “ “, ...] How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Character level (throw out non-ascii) Byte level (work on UTF-8 byte stream) Unicode symbols / codepoints Tokenized / pre-processed word level Byte Pair Encoding (Sennrich 2016) All the different ways to dice a string! t h -> th i n -> in e d -> ed a n -> an th e -> the o u -> ou e r -> er in g -> ing t o -> to e r -> er h e -> he an d -> and

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. Assume a uniform prior over tokens 2. Assume all tokens are independent p(t 0 ) = 1/vocab size p(t 0 , t 1 , t 2 , t 3 ) = product of p(t i ) for all i How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. Assume a uniform prior over tokens 2. Assume all tokens are independent Estimate the probability of a token by counting its occurrences and normalize this count by the total number of tokens seen. p(t 0 , t 1 , t 2 , t 3 …) = p(t 0 )p(t 1 )p(t 2 )p(t 3 ). .. This is a unigram language model How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. Assume a uniform prior over tokens 2. Assume all tokens are independent Estimate the probability of a token conditioned on the previous token by counting how many times it co-occurs with that previous token and normalize this count by the total number of occurrences of that context. p(t 0 , t 1 , t 2 , t 3 …) = p(t 0 )p(t 1 | t 0 )p(t 2 | t 1 )p(t 3 | t 2 ) This is a bigram language model How to compute the probability of a string?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

p± self-attention ° = 0 = inﬁnite loss… p± self-attention | the cool thing about ° = 0 = inﬁnite loss... Generalization?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

p± self-attention ° = 0 = inﬁnite loss… p± self-attention | the cool thing about ° = 0 = inﬁnite loss... Smooth things out by using a mixture model p mixture (t 1 ) = 0.01 * p uniform (t 1 ) + 0.99 * p unigram (t 1 ) Smoothing

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Language model research in the 80s and 90s focused a lot on how to better estimate, smooth, and interpolate n-gram language models Smoothing

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Probabilities are often within rounding error of zero. (Language is a huge space!) ● They also are a function of the length of the string. The most common quantity is the average negative log probability per “token”² Character level LMs use base 2 and report bits per character (can also be per byte) Word level LMs exponentiate and report perplexity Evaluation Type 1

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Working with abstract #s like these can be diﬃcult ● What’s 1.23 BPC vs 1.21 BPC? (especially important when you just spent 3 months of your life on it!) These quantities are dataset dependent (it’s really easy to guess all 0s - really hard to guess the arXiv) Random guessing gets you log 2 (1/256) = 8 bits per character Current human estimate ranges ~0.6-1.3 BPC. Best models are now a little lower than 1 BPC so probably closer to 0.6. Grounding bits per character and perplexity

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Random guessing PPL is just vocab size so with a vocab of 50K = 50K PPL One way of thinking about perplexity is as a “branching factor of language”. PPL n = space of possible generations of length n ● A model can get 10 PPL by uniformly assigning probability across 10 equally likely next words (and always having the correct word within these top 10) Human level is probably between 5 and 10 from BPC estimate Translation is a well constrained space and best models are between ³ and PPL Grounding bits per character and perplexity

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

There are a lot of ways to use a language models. You can evaluate them based on their usefulness for a downstream task. Improve: ● W+ER for speech recognition ● +(BL+EU for translation ● ,F´ for POS tagging ● *'A,)C,)C for document classiﬁcation This is an increasingly common evaluation setting. Evaluation Type 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

History of Neural Language Models

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● So many things! ● A neural net ● Skip connections ● Learn distributed representation of words ● Large scale asynchronous SGD A Neural Probabilistic Language Model (Bengio et al. 2003)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Replace MLP with RNN (allows for unbounded context) ● Showed improvements on speech recognition Recurrent neural network based language model (Mikolov et al 2010)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Character level RNN ● Approximates a tensor RNN which has a different set of weights for every input character ● Very complicated optimization scheme Ms . Claire Parters will also have a history temple for him to raise jobs until naked Prodiena to paint baseball partners , provided people to ride both of Manhattan in 1978 , but what was largely directed to China in 1946 , focusing on the trademark period is the sailboat yesterday and comments on whom they obtain overheard within the 120th anniversary , where many civil rights deﬁned , oﬃcials said early that forms , ” said Bernard J. Marco Jr. of Pennsylvania , was monitoring New York (not actually a lot better than word level n-gram models) Generating Text with Recurrent Neural Networks (Sutskever et al 2011)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Generating Sequences with Recurrent Neural Networks (Graves 2013)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Generating Sequences with Recurrent Neural Networks (Graves 2013)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Proposed using an RNN sequence encoder trained to provide context to an LM as a sentence level text feature extractor² Skip-Thought Vectors (Kiros et al 2015)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Proposes ﬁnetuning an LM directly for downstream tasks 1. Use LM objective as a pre-training task 2. Then initialize the parameters of downstream model with LM weights 3. Then train like a normal supervised model Semi-supervised Sequence Learning (Dai et al 2015)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● A larger dataset 1BW (Chelba et al 2013) ● A 8K projection LSTM (Sak et al 2014) ● Character aware (Kim et al 2015) ● A large vocab - 800K words ○ Approximate with sampled softmax ● 32 K40s for 3 weeks ● 41.0 -> 23.7 perplexity Exploring The Limits of Language Modeling (Jozefowicz et al. 2016)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Was one of the ﬁrst neural language models (I’m aware of) to generally have ~coherent non-trivial sentences. =OU`NT KQZKQZ Y[^KQ ZKQ[ `KQIONTZ[X[MSOUKQ_ IO[YOUZMS [Z`[ `NTKQ YGM^WKQ` ]aOUIOWX] JPa^OUZMS `NTKQ \GM_` `NT^KQKQ ]KQGM^_ ² GMZ OUZIO^KQGM_OUZMS ZaYHNKQ^ [LR IO[Y\GMZOUKQ_ Z[[ Ya_` `GMIOWXKQ `NTKQ KQZKQ^´IONTGMZMSOUZMS GMZJP KQZKQ^´IONTGMZMSOUZMS KQZZOU^[ZYKQZ`GMX IONTGMXXKQZMSKQ_ [ZXOUZKQ ³ Exploring The Limits of Language Modeling (Jozefowicz et al. 2016)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

There’s a whole internet out there Soooooooooo much information A perfect language model would need to ﬁt the internet into its parameters. This suggests we’re going to need a lot of parameters, compute, and data to get as close to this as possible. Why scale?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

This is what a very small charRNN learns: ± %X_ MSGMYHN^GMZ`^ µ_ [ `NTWKQ^MS`^KQ GMWXJP `KQZ[ ¶ ·¸¹¶º `OUKQ ,KQ 'aXKQ GM ² __[` +[_NTaXGMZ Z HNXZKQ ` ² `[ NTKQ^KQJP GM^KQ^[^OUZZKQ^ ^^W LR ³ ² GM`KQ &GMZGM`± The best architecture in the world is useless without capacity. Even classic resources like WordNet are larger than many models trained today. (5.5M relational features and the package is 55MB on disk!) Ungrounded language learning is grotesquely ineﬃcient. How to make peace with this? For now, address it with scale? Why scale?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ɣ (KQKQ\ 0KQGM^ZOUZMS 9IOGMXOUZMS OU_ 6^KQJPOUIO`GMHNXKQ² )Y\OU^OUIOGMXX] »,KQ_`ZKQ__ KQ` GMX³ ¼¸·¹½ Ɣ +6OU\KQ )ƽIOOUKQZ` :^GMOUZOUZMS [LR +OUGMZ` 4KQa^GMX 4KQ`[[^W_ »,aGMZMS KQ` GMX³ ¼¸·¾½ Ɣ %- GMZJP '[Y\a`KQ »(GMY[JPKQOU GMZJP ,KQ^ZGMZJPKQ^ ¼¸·¾½ These trends have been consistent across many orders of magnitude Why scale?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ƭ Maybe data is the bottleneck! ○ Make dataset bigger -> 80 million product reviews (40 GB of text) ● 4096 unit byte level mLSTM - 1 month - 4 Pascal Titan X GPUs ● Model ended up just underﬁtting by a lot ● But learned what sentiment is Learning To Generate Reviews and Discovering Sentiment (Radford et al. 2017)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ƭ Hidden State Size is the bottleneck! ○ Make an LSTMs with a much larger state size -> 18,432 units ● Slightly more eﬃcient than a dense model with the same # of parameters ● Also better performance on sentiment analysis (when evaluated by a linear model) GPU Kernels for Block-Sparse Weights (Gray et al 2017)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

LM pre-training for sentiment analysis Small World LSTM is here

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Story Cloze Task: UW NLP System (Schwartz et al 2017)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ƭ Maybe parameter count is the bottleneck! ○ Make a model with as many parameters as possible -> 137 Billion ● More eﬃcient than equivalent compute dense models ● And a lot of very impressive systems work The Sparsely-Gated Mixture-of-Experts Layer (Shazeer et al 2017)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Replace word vectors with a learned weighted sum of features of deep bi-directional LM ● Improves baseline models to SOTA ● Uses the LM from (Jozefowicz et al. 2016) ● Extends beneﬁts of LMs to a much wider variety of tasks Deep contextualized word representations (Peters et al 2018)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Deep contextualized word representations (Peters et al 2018) Word representation Forward LSTM Layer 1 State Backward LSTM Layer 1 State Forward LSTM Layer 2 State Backward LSTM Layer 2 State Word representation Forward LSTM Layer 1 State Backward LSTM Layer 1 State Forward LSTM Layer 2 State Backward LSTM Layer 2 State Contextualized representation Contextualized representation

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Transformer based LM ● 12 self-attention blocks - 12 heads - 768 dim state ○ ~100M params ● Trained on 7,000 books ~ 5 GB of text (BookCorpus Zhu et al 2015) ● Fine-tune on supervised tasks (like Dai et al. 2015) ● Removes the need for task speciﬁc architectures /]SSm`VVpXXr_UUo\\vQYOO^TTnOWMM 2GIQ^TTnOWMM[[uGIQOWMMMUKK ;=E^TTnLTJMUKKXXrYYsZZtGIQ^TTnLTJQYOO^TTnOWMM HJRHy -MUKK^TTnMUKKXXrGIQZZtQYOO\\vMUKK 68XXrMUKK°:<XXrGIQQYOO^TTnQYOO^TTnOWMM ±-68:<°²³

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

/]SSm`VVpXXr_UUo\\vQYOO^TTnOWMM 2GIQ^TTnOWMM[[uGIQOWMMMUKK ;=E^TTnLTJMUKKXXrYYsZZtGIQ^TTnLTJQYOO^TTnOWMM HJRHy -MUKK^TTnMUKKXXrGIQZZtQYOO\\vMUKK 68XXrMUKK°:<XXrGIQQYOO^TTnQYOO^TTnOWMM ±-68:<°²³

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A digression into Transformers

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

the FDt sDt on : uery 4 ey ? Dlue informDtion you FDn retrieve whDt you FDn FompDre to whDt you wDnt to look for

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

the FDt sDt on : uery 4 ey ? Dlue informDtion you FDn retrieve whDt you FDn FompDre to whDt you wDnt to look for

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

the FDt sDt on : uery 4 ey ? Dlue informDtion you FDn retrieve whDt you FDn FompDre to whDt you wDnt to look for

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

the FDt sDt on : uery 4 ey ? Dlue informDtion you FDn retrieve whDt you FDn FompDre to whDt you wDnt to look for

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

the FDt sDt on “ZNK FDZ” : uery 4 ey ? Dlue informDtion you FDn retrieve whDt you FDn FompDre to whDt you wDnt to look for

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

[Vaswani et al 2017]

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Beyond standard LMs

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A lot of Improvements! 6[uRZtO752 Premise: Hills and mountains are especially sanctified in Jainism. Hypotheis: Janism hates nature. Label: Contradiction ,U5D Sentence: The wagon rumbled down the road. Label: Acceptable Sentence: The car honked down the road. Label: Unacceptable

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A lot of Improvements! More on this later! 6[uRZtO752 Premise: Hills and mountains are especially sanctified in Jainism. Hypotheis: Janism hates nature. Label: Contradiction ,U5D Sentence: The wagon rumbled down the road. Label: Acceptable Sentence: The car honked down the road. Label: Unacceptable

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al 2018) Left-Right LM: The cat sat on the [mask] -> The cat sat on the mat Right-Left LM: [mask] cat sat on the mat -> The cat sat on the mat Masked LM: The [mask] sat on the [mask] -> The cat sat on the mat

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Really well executed reﬁnement µ engineering on +(B+ERT ● Better tuned (many HPs) ● Remove a few hacks (remove annealing context size) ● Better data generation (online instead of cached) ● A more ﬂexible vocab scheme (more on this later) ● Use more compute / train longer (but same model capacity - BERT was undertrained) RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al 2019)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

ELECTRA (Clark et al 2019)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

T5: Exploring the Limits of Transfer Learning with a Uniﬁed Text-to-Text Transformer (Raffel et al 2019) Very thorough (50 pages!) exploration of the design space of pretraining with a pleasing task formulation (from McCann et al 2018)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Why we need Unsupervised Learning

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Natural Language Inference - SNLI (Bowman et al. 2015) ○ Predict logical relation between two sentences - P and H. ○ Contradiction -> A man inspects a uniform. A man is sleeping. ○ Neutral -> An older and younger man smiling. Two men are smiling at cats playing on the ﬂoor. ○ Entailment -> A soccer game with multiple males playing. Some men are playing a sport. ● Models are near human level according to the standard test set ● Humans ~ 88.0% ● ESIM (Chen et al. 2017) ~ 88.0% How well does supervised learning work?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Turkers were paid to create the training data of SNLI ○ They often use a few tricks or heuristics to quickly make data ● For instance: ○ Words like (not, never, nothing) hint at negation ○ Generic words like (person, animal, sport) hint at entailment ○ Modiﬁers like (tall, sad, popular) hint at neutral ● If you train a classiﬁer on only the second sentence... ○ You get ~67.0% compared to ~33.0% ● ESIM performance drops from ~88% to ~72% on the hard examples Annotation Artifacts In Natural Language Inference Data (Gururangan et al. 2018)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Use known relations between words to construct a new test set The man is holding a {object} . The man is holding a {different object} . Contradiction A little girl is very aGMJPPVKQIO`OUZKQc . A little girl is very a_]Z[Z]Yc . Entailment Built a new test set of 8,000 examples from 14 categories to probe this. ESIM drops from ~88% to ~66% on this new test set Breaking NLI Systems with Sentences that Require Simple Lexical Inferences (Glockner et al. 2018)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Near SOTA QA model (BERT on SQUAD) drops from 86.5 F1 to: 35.6 F1 on TriviaQA 56.2 F1 on QuAC Learning and Evaluating General Linguistic Intelligence (Yogatama et. al 2019)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Standard training datasets might not encourage generalization ● Models learn spurious associations in the training set ● Models exploit distributional bias of the creation of the training set ● Models “stop learning” once they get to 0 training error ● Current techniques are brittle ● Current techniques are closer to memorization than generalization What might be going wrong?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Better models / architectures? ● More data? ● Different paths all together? How to make progress?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

:NTKQ HNKQGMa`OULRaX _`[^] [LR Y[JPKQ^Z JPKQKQ\ XKQGM^ZOUZMS [GM_ _a\\[_KQJP `[ HNKQ `NTGM` [KQ IOXKQZKQ^X] KQZIO[JPKQJP NTOUMSNT´XKQZKQX JP[YGMOUZ WZ[[XKQJPMSKQ OUZ`[ [a^ GM^IONTOU`KQIO`a^KQ_ GMZJP HNaOUX` `NTKQ_KQ XGM^MSKQ^ XGMHNKQXKQJP JPGM`GM_KQ`_ GMZJP `NTKQZ XKQ` 9+( ﰿMSa^KQ [a` GMXX `NTKQ GMZZ[]OUZMS JPKQ`GMOUX_ LR[^ a_³ How to make progress?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● This set us up for a mindset of architecture engineering. ● There’s a very large design space: ○ Multiply by a sigmoid here ○ Add a temporal max-pool there ○ Convolve with not 1 (or 2) but `NT^KQKQ different width ﬁlters ○ Throw some attention on top of it all for good measure How to make progress?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

We really like playing with blocks!

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● We can encode useful information through the choice of model: ○ Convolution ○ Recurrence ○ Weight Sharing ○ Attention ○ Hierarchy ○ Depth These are all important and impactful How to make progress?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

What’s going on?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

What we’ve been mostly doing

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Huh?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

The value of architecture engineering?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Supervised Learning is the dominant approach ● The largest supervised dataset is JFT-300M (Sun et al. 2017) Ɣ 300 million images Ɣ 18,000 classes How to learn?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Supervised Learning is the dominant approach ● The largest supervised dataset is JFT-300M (Sun et al. 2017) Ɣ 300 million images Ɣ 18,000 classes Ƭ JFT is only 530 MB of constraints How to learn?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Spent most of 2015 trying to build what I hoped would be “ -YGMMSKQZKQ` LR[^ `KQ\` ” (to enable impactful transfer learning) ● 20 Newsgroups but for reddit ¶ giant weakly supervised dataset ○ 150M labeled examples across 1,000 communities ○ Trained RNNs to predict the community from the discussion Pursuing this route for language

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

What we’ve been trying

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

How to do this instead?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● KIM (Chen et al. 2017) ● Gets 83.5% on the new NLI test set Information (and representation) Engineering alongside Architecture Engineering

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Word vectors are the classic approach! ● GLoVE (Pennington et al. 2014) ○ Common Crawl (a good chunk of the internet) ○ Represent co-occurrences of words in 840 billion tokens Information (and representation) Engineering alongside Architecture Engineering

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Word vectors are the classic approach! ● GLoVE (Pennington et al. 2014) ○ Common Crawl (a good chunk of the internet) ○ Represent co-occurrences of words in 840 billion tokens Information (and representation) Engineering alongside Architecture Engineering The NLI models were already using word vectors So this hasn’t been ﬁgured out yet! But CoVe -> ELMo -> GPT-1 -> BERT helps a ton!

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

GPT-1 performs similarly to KIM (83.75%) on the new NLI test set BERT is basically SOTA on everything (that I’m aware of) It’s just a “stock” transformer! But it makes up for this with all that its learned through pre-training. Information Engineering Taking Off (CoVe, ELMo, ULMFiT, GPT-1, BERT)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Instead of manually specifying what to predict through the creation of large supervised datasets… Figure out how to learn from and predict everything “out there”. ?E[a IOGMZ `NTOUZW [LR KQZKQ^]`OUYKQ [KQ HNaOUXJP GM JPGM`GM_KQ` GM_ _KQ``OUZMS `NTKQ OUY\[^`GMZIOKQ [LR KQZKQ^]`NTOUZMS KQX_KQ OUZ `NTKQ [[^XJP `[ ¸ GMZJP `NTKQ OUY\[^`GMZIOKQ [LR KQZKQ^]`NTOUZMS OUZ `NTKQ JPGM`GM_KQ` `[ · ³ Our poor models! They know so little and yet still have so much hidden from them.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. High capacity and ﬂexible model classes + 2. Algos for extracting information and learning the structure of domains + 3. An almost infeasible amount of data tiling everything (billions of unlabeled examples?) + 4. An [LRLRKQZ_OUZKQ amount of compute with which to learn (peta to exaﬂops?) = ? A Potential Recipe

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. High capacity and ﬂexible model classes + 2. Algos for extracting information and learning the structure of domains + 3. An almost infeasible amount of data tiling everything (billions of unlabeled examples?) + 4. An [LRLRKQZ_OUZKQ amount of compute with which to learn (peta to exaﬂops?) = Is it time to stop? To call it quits? A Potential Recipe

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

1. High capacity and ﬂexible model classes + 2. Algos for extracting information and learning the structure of domains + 3. An almost infeasible amount of data tiling everything (billions of unlabeled examples?) + 4. An [LRLRKQZ_OUZKQ amount of compute with which to learn (peta to exaﬂops?) = Or will it drive a good chunk of progress over the next few years? A Potential Recipe

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

What about Multitask Learning?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

GPT-2?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● More data ○ 40GB of text ○ 10B tokens ○ 8 million webpages ● Bigger model ○ Up to 1.5 billion parameters ○ 1024 token context ○ 48 layers, 1600 dim state Just a language model - predicts everything (with some unfortunate restrictions as BERT shows) GPT-2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Performance across tasks

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Performance across tasks

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Why it’s working?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question Answering and Reading Comprehension: 6 Million 5 Ws questions in the dataset Summarization: ~100K TL;DR, In summary… Translation: ~10MB French data Why it’s working?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

A concrete example of why unsupervised learning is necessary

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Performance not (usually) limited by something a single paper ﬁxes ● Diminishing returns mean there is always some other bottleneck ○ Fancy model -> compute utilization, trainability ○ Parameters -> compute ○ Data -> capacity ○ Capacity -> data, compute Ƭ Be pragmatic about scaling ● If you do everything sensibly - compute will probably be the bottleneck ○ If it’s not… there’s an interesting research problem! Takeaways from scaling language modeling

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

How to do research on large scale models?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Don’t do research on large scale models How to do research on large scale models?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Don’t do research on large scale models ● Prototype on models which are 10x smaller and 10x faster ○ Run 10x as many experiments in parallel instead ○ )ZKQ^] HNKQNTGMZOU[^ in the GPT-2 paper shows up on these models ● After the proof of concept - then you scale ○ GPT-1 was a proof point on zero-shot task transfer ○ GPT-1 on WebText is already SOTA on several LM tasks ● Used the same strategy for Sentiment Unit ○ First trained a 512 dim LSTM in a few days ○ Final 4096 dim LSTM took a month How to do research on large scale models?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Develop and test everything quickly at small scale ﬁrst ● Tune the hyperparams, decide on a model, checkout datasets, etc... ● Whatever does best at a reasonable scale will also probably do well at large scale ● Optimize the language model as a language model ○ Log-prob of held-out text ○ Then see what it else it can do How to do research on large scale models?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Sometimes issues don’t show up until at scale ● Plan for _[YKQ`NTOUZMS to break about every order of magnitude of scale ● Will have to re-tune hyperparameters ● For GPT-2 models this happened at >= 24 self-attention blocks ○ Performance of models appears to saturate ○ Fix was better weight init and pre-activation style residual network ■ Rewon Child ﬁgured this out The Gotcha’s

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ɣ Self-attention architectures + long sequences = lots of memory ● Recompute ● Half Precision (FP-16) ● Data Parallelism More Model More Problem

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Naive tensorﬂow code can now be over 5 times slower than what is achievable on modern hardware Case study: GPT-1 ● Took 25 days on 8 P6000s (how do you do research on models that take a month to train without going insane?) ● Trains in 3 days on 8 v100s ○ 1.75x from TF data parallel -> MPI + NCCL AllReduce ○ 1.50x from native TF ops -> Blocksparse ops ○ 3.50x from FP32 Pascal -> FP16 Volta Write Eﬃcient / Smart Code!

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Accelerated primitives for common Tensorﬂow ops ○ Dropout, normalization, optimizers, activations ● Custom self-attention operations ○ Avoid transposes, fuse operations, sparse compute ● Targets Volta / Turing hardware ○ Tensorcores allow for 3x+ speedup over previous gen hardware from blocksparse±transformer import BlocksparseTransformer² softmax_cross_entropy from blocksparse±optimize import AdamOptimizer² ClipGlobalNorm from blocksparse±norms import layer_norm from blocksparse±embed import embedding_lookup from blocksparse±ewops import bias_relu² dropout from blocksparse±nccl import allreduce Blocksparse Library - Scott Gray

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● If you’re paying for it: ○ A 4 2080 Ti desktop ■ The results in GPT-2 do show up on models trainable on this hardware (but will take a week) ■ Can ~ match BERT-Base in that time too ■ Cost about $6,000 :( ● If someone else is paying for it: ○ 8 v100s from a cloud provider (AWS, GCE, etc...) What is the Sweet Spot in terms of compute?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● Scale matters go beyond classic datasts like PTB ● Better results come from combining several sources of improvement ● Don’t get bottlenecked by something that can be ﬁxed easily ● Don’t let scale slow you down during development ● A medium+ language model on a new dataset / domain will probably learn something interesting - but might take some digging to ﬁnd ○ Most of my research for the past few years has been exploring the capabilities, behaviors, and uses of language models in this regime Takeaways from language modeling

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● In the next few years language models will be trained on pretty much the whole internet (might as well throw in millions of books too!) ● Will scaling trends breakdown? ● How far will this get? Where is this Heading?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● In the next few years language models will be trained on pretty much the whole internet (might as well throw in millions of books too!) ● Will scaling trends breakdown? ● How far will this get? ● If trendlines continue… Where is this Heading? [Ian Goodfellow’s twitter]

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

● In the next few years language models will be trained on pretty much the whole internet (might as well throw in millions of books too!) ● Will scaling trends breakdown? ● How far will this get? ● If trendlines continue… It will probably feel unsatisfying, though Where is this Heading? [Ian Goodfellow’s twitter]

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Lecture 9 Semi-Supervised Learning and Unsupervised Distribution Alignment.pdf

InequalitiesSummary.pdf

Lecture 2 Autoregressive Models.pdf

Lecture11-Tree-Structured-Indexing.pdf

Lecture 7 Self-Supervised Learning.pdf

Lecture 3 Flow Models.pdf

kakadenotes1.pdf

10) 8 Silent Letter Rules with (almost) NO EXCEPTIONS.pdf

kakadenotes2.pdf

Recommended textbooks for you

Text book image

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Text book image

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning

Text book image

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Text book image

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Browse Popular Homework Q&A

Q: eAssignment/takeAssignmentMain.do?invoker=&takeAssignmentSession Locator=&inprogress=false eBook…

Q: A national online business magazine reports that the average cost of a speeding ticket in Miami,…

Q: you will develop a c++ program to sort the elements of a character array in descending order. Your…

Q: Could this please be written in C++ 11, thanks ! Design a "Tips" class that calculates the…

Q: A spherical party balloon is being inflated with helium pumped in at a rate of 4 cubic feet per…

Q: Given vectors u=i+2j and v = 3i + yj, find y so that the angle between the vectors is 60°. roo…

Q: What does programmable array logic really mean?

Q: You are holding a portfolio with the following investments and betas: Stock Dollar investment Beta…

Q: Q10. In the image below, the directions of the velocities of ball A and ball B before and after…

Q: A 30.9kg kid walks to the end of a diving board, and he is now 1.6m away from the base of the diving…

Q: Consider the following matrix: 30-8 -4 03 8 4 0 0 11 4 0 0-24 -9 A= a) Find the distinct eigenvalues…

Q: The B&K Real Estate Company sells homes and is currently serving the Southeast region. It has…

Q: Prepare journal entries to record the following transactions for a retail store. The company uses a…

Q: Ibuprofen has the following mass percent composition: C75.69%,H8.80%,O15.51% What is the…

Q: Design an interface Polynomial that defines the following operations. This is your polynomial…

Q: Each of the three rolled-steel beams shown (numbered 1, 2, and 3) is to carry a 64-kip load…

Q: Niobium metal becomes a superconductor when cooled below 9 K. Its superconductivity is destroyed…

Q: 7 points lie on a circle. How many chords can be drawn through them? (A chord is a line that passes…

Q: Pierce Manufacturing determines that the daily revenue, in dollars, from the sale of x lawn chairs…

Q: will write a c++ program that prompts the user to input elements for two 2x2 matrices. Your program…

Q: What is the mass percentage of each element in the compound Al2O3? Enter the percent mass of…

Q: hydrogen by mass. Assume a 100.-g sample of this compound. How many grams of each element are in…

Q: 1) Given the following equations of a certain line, determine the equation of the perpendicular line…

Q: Complete the following table: Note: Do not round intermediate calculations. Round your final…