Text Mining and Analytics Quiz Answer. In this post you will get Quiz Answer Of Text Mining and Analytics Quiz
Text Mining and Analytics Quiz
Offered By ”University of Illinois”
Orientation Quiz
1.
Question 1
This course lasts for ___ weeks.
1 point
- 3
- 6
- 2
- 4
=================================================
2.
Question 2
I am required to read a textbook for this course.
1 point
- False
- True
=================================================
3.
Question 3
Which of the following activities are required each week? Check all that apply.
1 point
- Reading Assignments
- Programming Assignments
- Quizzes
- Forum Assignments
=================================================
4.
Question 4
The following tools will help me use the discussion forums:
1 point
- “Up-voting” posts that are thoughtful, interesting, or helpful.
- “Searching” for your question or a topic before you post a new thread.
- “Following” any forums that are particularly interesting to me.
=================================================
5.
Question 5
If I have a problem in the course I should:
1 point
- Email the instructor
- Drop the class
- Report it to the Learner Help Center (if the problem is technical) or to the Content Issues forum (if the problem is an error in the course materials).
- Call the instructor
Week 1 Quiz
1.
Question 1
True or false? A paradigmatic relation is a relation between two words that tend to co-occur with each other, while a syntagmatic relation is between two words that tend to occur in a similar context.
1 point
False
True
=================================================
2.
Question 2
In a collection of English news articles, which word do you expect to have a higher IDF?
1 point
“the”
“learning”
=================================================
3.
Question 3
Suppose the pseudo-document representations for the contexts of the terms A and B in the vector space model are given as follows:
dA = (0.10, 0.50, 0.00, 0.40, 0.00, 0.00)
dB = (0.20, 0.40, 0.30, 0.00, 0.10, 0.00)
What is the EOWC similarity score?
1 point
1
0.02
0.20
0.22
=================================================
4.
Question 4
True or false? Syntactic analysis (parsing) is an easier task than lexical analysis (part-of-speech tagging).
1 point
False
True
=================================================
5.
Question 5
“A man saw a boy with a telescope.” What kind of ambiguity does the sentence have?
1 point
Word-level ambiguity
Syntactic ambiguity
=================================================
6.
Question 6
In an online text mining application where response time is the key factor to consider, what kind of NLP features can be used? Check all that apply.
1 point
POS-tagging
Word tokenization
Syntactic parsing
Relation extraction
=================================================
7.
Question 7
True or false? Deeper NLP requires more human effort and usually is less accurate.
1 point
False
True
=================================================
8.
Question 8
True or false? Word-based representation is not powerful.
1 point
False
True
=================================================
9.
Question 9
Which of the following is correct about paradigmatic and syntagmatic words relations?
1 point
Syntagmatic relation implies paradigmatic relation.
Monday, Tuesday are words of paradigmatic relation.
Syntagmatic related words have high context similarity.
Paradigmatic related words have high co-occurrence.
=================================================
10.
Question 10
Why does EOWC not work well?
1 point
It favors matching rare words.
It treats words unequally.
It favors matching frequent terms.
Week 2 Quiz
1.
Question 1
You are given a unigram language model \thetaθ distributed over a vocabulary set VV composed of only 4 words: “the”, “global”, “warming”, and “effects”. The distribution of \thetaθ is given in the table below:
w P(w|θ)
the 0.3
global 0.2
warming 0.2
effects X
What is X, i.e., P(\text{“effects”}|\theta)P(“effects”∣θ) ?
1 point
0.1
0.2
0.3
0
=================================================
2.
Question 2
Assume you are given the same unigram language model as in Question 1. Which of the following is not true?
1 point
P(\text{“global warming”}|\theta) > P(\text{“warming global”}|\theta)P(“global warming”∣θ)>P(“warming global”∣θ)
P(\text{“global warming”}|\theta) = 0.04 P(“global warming”∣θ)=0.04
P(\text{“the global warming effects”}|\theta) \lt P(\text{“global warming effects”}|\theta) P(“the global warming effects”∣θ)<P(“global warming effects”∣θ)
P(\text{“text mining”}|\theta) = 0 P(“text mining”∣θ)=0
=================================================
3.
Question 3
Assume that words are being generated by a mixture of two unigram language models, \theta_1θ
1
and \theta_2θ
2
, where P(\theta_1)=0.5P(θ
1
)=0.5 and P(\theta_2)=0.5P(θ
2
)=0.5. The distributions of the two models are given in the table below:
ww P(w|\theta_1)P(w∣θ
1
) P(w|\theta_2)P(w∣θ
2
)
sports 0.35 0.05
basketball 0.2 0.05
fast 0.3 0.3
computer 0.1 0.4
smartphone 0.05 0.2
Then the probability of observing “computer” from this mixture model is: P(\text{“computer”}) =P(“computer”)=
1 point
0.45
0.4
0.05
0.25
=================================================
4.
Question 4
Assume the same given as in Question 3. We now want to infer which of the two word distributions, \theta_1θ
1
and \theta_2θ
2
, has been used to generate “computer”, and would thus like to compute the probability that it has been generated using \theta_1θ
1
and \theta_2θ
2
, i.e., P(\theta_1|\text{“computer”})P(θ
1
∣“computer”) and P(\theta_2|\text{“computer”})P(θ
2
∣“computer”), respectively, then the values of P(\theta_1|\text{“computer”})P(θ
1
∣“computer”) and P(\theta_2|\text{“computer”})P(θ
2
∣“computer”) are:
Hint: Apply Bayes rule.
1 point
0.9 and 0.1
0.1 and 0.9
0.8 and 0.2
0.2 and 0.8
=================================================
5.
Question 5
Suppose words are being generated using a mixture of two unigram language models \theta_1θ
1
and \theta_2θ
2
. Let P(w)P(w) denote the probability of generating a word ww from this mixture model.
If P(\theta_1) =1P(θ
1
)=1 then which of the following statements is true?
1 point
P(w| \theta_2) = 0P(w∣θ 2)=0, for any word w
P(w) = P(w| \theta_1)P(w)=P(w∣θ 1 ), for any word w
P(w | \theta_1) = 0P(w∣θ 1 )=0, for any word w
=================================================
6.
Question 6
True or false? Let X_{text}X
text
, X_{mining}X
mining
, and X_{the}X
the
be binary random variables associated with the words “text”, “mining”, and “the”, respectively. Assume that the probabilities of the random variables are estimated based on a large corpus. Then we should expect H(X_{text}|X_{mining}) > H(X_{text}|X_{the})H(X
text
∣X
mining
)>H(X
text
∣X
the
).
1 point
False
True
=================================================
7.
Question 7
True or false? I(X;Y)=0 if and only if X and Y are independent.
1 point
False
True
=================================================
8.
Question 8
Let w be a word and X_wX
w
be a binary random variable that indicates whether w appears in a text document in the corpus. Assume that the probability P(X_w=1)P(X
w
=1) is estimated by Count(w)/N, where Count(w) is the number of documents w appears in and N is the total number of documents in the corpus.
You are given that “the” is a very frequent word that appears in 99% of the documents and that “photon” is a very rare word that occurs in 1% of the documents. Which word has a higher entropy?
1 point
“photon”
“the”
Both words have the same entropy.
=================================================
9.
Question 9
Let X be a binary random variable. Which of the following is not true? Select all that apply.
1 point
If P(X=0)=1, then H(X) = 1
H(X) ≤ 1
If P(X=1)=1, then H(X) = 1
If P(X=0)=1, then H(X) = 0
=================================================
10.
Question 10
True or false? An unbiased coin has a higher entropy than any biased coin.
1 point
True
False
Quiz: Week 3 Quiz
1.
Question 1
You are given two unigram language models \theta_1θ
1
and \theta_2θ
2
as defined in the table below:
w P(w|θ1)
P(w|θ2)
concert 0.1 0.4
music 0.1 0.4
data 0.4 0.1
software 0.4 0.1
Suppose we are using a mixture model for document clustering based on the two given unigram language models, \theta_1θ
1
and \theta_2θ
2
, such that P(\theta_1θ
1
)=0.5 and P(\theta_2θ
2
)=0.5. To generate a document, first, one of the two language models is chosen according to P(\theta_iθ
i
), and then all the words in the document are generated based on the chosen language model.
The probability of generating the document d: “music software” using the given mixture model is P(“music software”)=
1 point
0.05
0.04
0.5
0.6
=================================================
2.
Question 2
Assume the same unigram language models, θ1 and θ2, defined as in the table of Question 1 with P(θ1)=0.5 and P(θ2)=0.5.
We now want to generate documents based on the mixture model used in topic modeling. To generate a document for each word, we first choose one of the two language models, θ1 and θ2, and then generate the word according to the chosen model. The probability of generating the document d: “music software” according to this mixture model is P(“music software”)=
1 point
0.125
0.0125
0.0625
0.625
=================================================
3.
Question 3
Let X_wX
w
be a random variable denoting whether word w occurs in a text document in a collection of English news articles. Which random variable do you expect to have a lower entropy?
1 point
H(X_{learning})H(X learning)
H(X_{the})H(X the )
=================================================
4.
Question 4
We want to run PLSA on a collection of N documents with a fixed number of topics k where the vocabulary size is M. What is the number of parameters that PLSA tries to estimate? Consider each P(w|θj) or πd,j as a separate parameter.
1 point
MNk
Mk+Nk
Nk
Mk
=================================================
5.
Question 5
You are given a document dd that contains only two words: “the” and “machine”. Assume that this document was generated from a mixture of two unigram language models: a known background language model \theta_Bθ
B
and an unknown topic language model \theta_dθ
d
. Let P(\theta_B) = \lambdaP(θ
B
)=λ and P(\theta_d) = 1-\lambdaP(θ
d
)=1−λ and assume that P(\text{“the”}|\theta_B) = 0.9P(“the”∣θ
B
)=0.9 and P(\text{“machine”}|\theta_B) = 0.1P(“machine”∣θ
B
)=0.1. We want to estimate \theta_dθ
d
using maximum likelihood. Then, as \lambdaλ increases, P(\text{“machine”}|\theta_d)P(“machine”∣θ
d
) will:
Hint: First get the maximum likelihood estimates of the two words in \theta_dθ
d
(refer to the lecture on “Probabilistic Topic Models: Mixture Model Estimation”). Then, write P(\text{“machine”}|\theta_d)P(“machine”∣θ
d
) as a function of \lambdaλ and study the behavior of the function.
1 point
Increase
Remain the same
Decrease
=================================================
6.
Question 6
True or false? In general, PLSA using the EM algorithm does not stop until it achieves the global maximum of the likelihood function.
1 point
True
False
=================================================
7.
Question 7
True or false? Let \theta_1θ
1
,…,\theta_kθ
k
be the k unigram language model’s output by PLSA and V be the vocabulary set. Then, for any i∈{1,…,k}, the following relation always holds: ∑w∈VP(w|\theta_iθ
i
)=1.
1 point
False
True
=================================================
8.
Question 8
True or false? The EM algorithm cannot decrease the likelihood of the data.
1 point
True
False
=================================================
9.
Question 9
True or false? Assume that the likelihood function of PLSA has multiple local maxima and one global maximum. There exists an initial set of parameters for which PLSA will converge to the global maximum of the likelihood function.
1 point
True
False
=================================================
10.
Question 10
True or false? When using PLSA to mine topics from a text collection, the number of parameters of the PLSA model stays the same as we keep adding new documents into the text collection assuming that the new documents do not introduce new words that have not occurred in the current text collection.
1 point
True
False
Week 4 Quiz
1.
Question 1
What is NOT the motivation for text clustering?
1 point
To quickly get an idea about a large collection of documents
To remove spam documents based on a small collection of human annotated spam documents
To link similar documents and remove duplicated documents
To create structure of text data
=================================================
2.
Question 2
What is TRUE about the mixture model and topic modeling?
1 point
Topic modeling can also be used for document clustering directly.
In topic modeling, the topic of each word is independently sampled, while in the mixture model, only one topic is drawn for each document.
Only topic modeling can learn topics, while the mixture model does not yield such information after learning.
=================================================
3.
Question 3
In the mixture model, if we want to encourage the formation of a large cluster:
1 point
Try different initialization
Add prior to P(\theta)P(θ) so that the distribution is skewed
Use a smaller number of clusters for training
=================================================
4.
Question 4
In the EM algorithm, which step improves the model likelihood?
1 point
M-step
E-step
=================================================
5.
Question 5
True or false? In the EM algorithm, the model likelihood monotonically increases.
1 point
False
True
=================================================
6.
Question 6
What is the most difficult part of directly applying maximal likelihood to PLSA?
1 point
The objective function needs to sum over all words for each document.
The objective function needs to sum over all topics for each word.
The objective function needs to sum over all documents in the collection.
=================================================
7.
Question 7
For the agglomerative clustering algorithm, which of the following is not TRUE?
1 point
The depth of the hierarchy is always log_2(N)log
2
(N) where N is the number of items.
It’s a bottom-up algorithm to form a hierarchy.
The user needs to specify a similarity measurement.
=================================================
8.
Question 8
Which evaluation method is best for clustering results of a large collection of documents?
1 point
Use the indirect evaluation method and test performance for an application with or without clustering.
Use the direct evaluation method and create human annotations for each document in the collection.
=================================================
9.
Question 9
Which of the following is NOT sensitive to outliers?
1 point
Complete-link
Single-link
Average-link
=================================================
10.
Question 10
Which of the following is a generative classification algorithm?
1 point
Logistic Regression
SVM
K-NN
Naive Bayes
Week 5 Quiz
1.
Question 1
Assume that documents are being classified into two categories, c1 and c2, such that a document can belong to more than one category. The table below shows the prediction of a classifier, denoted by “y” or “n”, in addition to the true label (ground truth) represented by a “+” or “-”, where a correct prediction is either y (+) or n (-).
c1 c2
D1 y(+) y(+)
D2 n(-) y(+)
D3 n(+) n(-)
D4 y(-) y(+)
D5 n(+) n(-)
Let P(ci) and R(ci) denote the precision and recall associated with category ci, respectively.
The precision and recall of c1 and c2 are:
1 point
P(c1) = 1/2 R(c1) = 1/2 P(c2) = 1 R(c2) = 1
P(c1) = 1/2 R(c1) = 1/2 P(c2) = 1/2 R(c2) = 1/2
P(c1) = 1/2 R(c1) = 1/3 P(c2) = 1 R(c2) = 1
P(c1) = 1/3 R(c1) = 1/2 P(c2) = 1 R(c2) = 1
=================================================
2.
Question 2
Given the same data as in Question 1, the classification accuracy of the classifier is:
1 point
9/10
8/10
3/10
7/10
=================================================
3.
Question 3
Given the same data as in Question 1, what is the recall of the classifier using micro-averaging (i.e., by pooling all decisions together)?
1 point
1
4/5
5/6
2/3
=================================================
4.
Question 4
Suppose we are performing document clustering on a collection of N documents using a mixture model as discussed in the lecture Text Clustering: Generative Probabilistic Models (Part 3). Let the number of clusters be K and the vocabulary size be M. What is the number of parameters that the EM algorithm tries to estimate? Consider each P(θi) or P(w|θi) as a separate parameter.
1 point
MNK
KN+MK
K+MK
MK
=================================================
5.
Question 5
Which one of the following statements is not an opinion?
1 point
PLSA is the best method for a topic mining task.
PLSA always performs similarly to LDA.
PLSA is a mixture model.
=================================================
6.
Question 6
True or false? Word unigrams are the best performing features for sentiment classification.
1 point
True
False
=================================================
7.
Question 7
True or false? Suppose we are using logistic regression for binary classification (i.e., k=2) where the number of features is M. Then, the number of parameters to be estimated is M+1.
1 point
False
True
=================================================
8.
Question 8
True or false? Assume we are using word n-grams as features to perform sentiment classification. Then, higher values of n will usually be less prone to overfitting (i.e., for higher values of n, the difference between training and testing accuracies will be smaller).
1 point
True
False
=================================================
9.
Question 9
Why is accuracy sometimes not good for classification evaluation? Check all that apply.
1 point
Computation of accuracy is difficult.
For imbalanced dataset, high accuracy does not imply good performance.
Some decisions are more serious than others.
=================================================
10.
Question 10
If you want to put more emphasis on precision than recall, how should you adjust the value of \betaβ?
1 point
Choose a low value of \betaβ
Choose a high value of \betaβ
Week 6 Quiz
1.
Question 1
Given a set of restaurant reviews along with the overall numeric rating of every restaurant, you are asked to infer the ratings of each of the restaurants on cleanliness, taste, and value. Which of the following methods is the most suitable to solve such an inference problem?
1 point
Sentiment analysis
Topic modeling
Contextual text mining
Latent Aspect Rating Analysis
=================================================
2.
Question 2
Examine the objective function of NetPLSA in the lecture entitled Contextual Text Mining: Mining Topics with Social Network Context. Increasing λ will:
1 point
Make neighbor nodes have less similar topic coverage
Not affect the topic coverage of neighbor nodes
Make neighbor nodes have more similar topic coverage
=================================================
3.
Question 3
You are given an undirected citation network composed of papers {p1,…,pn} as nodes, where a link between papers pi and pj means that one of the papers cited the other. Suppose you want to use the given data to discover the topics (research areas) of the papers. Which of the following methods is expected to work best?
Hint: Papers that have a citation relationship are more likely to belong to the same research area.
1 point
CPLSA
Sentiment analysis
NetPLSA
PLSA
=================================================
4.
Question 4
You are given a collection of news articles along with their publishing dates and want to reveal which topics have attracted increasing attention in a certain time period. Which of the following methods is most suitable for this task?
1 point
CPLSA
NetPLSA
Sentiment analysis
=================================================
5.
Question 5
Suppose we are performing Latent Aspect Rating Analysis where the number of aspect segments is K and the number of words in each aspect segment is M. What is the total number of parameters for term sentiment weights, i.e., the β values, that have to be estimated?
1 point
MK
M+K
M
K
=================================================
6.
Question 6
Which of the following is true?
1 point
Ordinal logistic regression trains k−1 independent classifiers, k being the number of classes.
Different types of features, such as POS tags and word n-grams, can be combined when performing sentiment analysis.
The objective function of NetPLSA does not try to make neighbor nodes have similar topic coverage.
=================================================
7.
Question 7
Imagine a company is interested in understanding any factors related to their fluctuating sales of a new product in the past year. They collected the companion text data including the consumer reviews of the product from multiple websites with time stamps in the past year and hope to gain potential insights from such text data. Which of the following text mining techniques would you recommend to them?
1 point
Iterative topic modeling with time series supervision
Text clustering
Contextual PLSA (CPLSA)
=================================================
8.
Question 8
The US government implemented a new health care policy in year 2010. Suppose the government is interested in understanding the impact of such a policy and how the policy has affected what people talk about in social media. For this purpose, we can collect social media text data such as forum posts and tweets with time stamps before 2010 and after 2010. Which of the following text mining techniques is most suitable for such a text mining task?
1 point
Iterative Topic Modeling with Time Series Supervision
Contextual PLSA (CPLSA)
Text clustering
=================================================
9.
Question 9
Context can be used to (check all that apply):
1 point
Partition text
Annotate topics
=================================================
10.
Question 10
Which of the following statement of CPLSA is NOT correct?
1 point
It enables contextual text mining.
The EM algorithm can be used for optimization.
CPLSA is an extension of PLSA.
It models the joint probability of text and context.