Monday, August 25, 2014

Continuous or Discrete Latent Structure? Correspondence Analysis vs. Nonnegative Matrix Factorization

A map gives us the big picture, which is why mapping has become so important in marketing research. What is the perceptual structure underlying the European automotive market? All we need is a contingency table with cars as the rows, attributes as the columns, and the cells as counts of the number of times each attribute is associated with each feature. As shown in a previous post, correspondence analysis (CA) will produce maps like the following.


Although everything you need to know about this graphic display can be found in that prior post, I do wish to emphasize a few points. First, the representation is a two-dimensional continuous space with coordinates for each row and each column. Second, the rows (cars) are positioned so that the distance between any two rows indicates the similarity of their relative attribute perceptions (i.e., different cars may score uniformly higher or lower, but they have the same pattern of strengths and weaknesses). Correspondingly, the columns (attributes) are located closer to each other when they are used similarly to describe the automobiles. Distances between the rows and columns are not directly shown on this map, yet the cross-tabulation from the original post shows that autos are near the attributes on which they performed the best. The analysis was conducted with the R package anacor, however, the ca r package might provide a gentler introduction to CA especially when paired with Greenacre's online tutorial.

CA yields a continuous representation. The first dimension separates economy from luxury vehicles, and the second dimension differentiates between the smaller and the larger cars. Still, one can identify regions or clusters within this continuous space. For example, one could easily group the family cars in the third quadrant. Such an interpretation is consistent with the R package from which the dataset was borrowed (e.g., Slide #6). A probabilistic latent feature model (plfm) assumes that the underlying structure is defined by binary features that are hidden or unobserved.

What is in the mind of our raters? Do they see the vehicles as possessing more or less of the two dimensions from the CA, or are their perceptions driven by a set of on-off features (e.g., small popular, sporty status, spacious family, quality luxury, green, and city car)? If the answer is a latent category structure, then the success of CA stems from its ability to reproduce the row and column profiles from a dimensional representation even when the data were generated from the perceived presence or absence of latent features. Alternatively, the seemingly latent features may well be nothing more than an uneven distribution of rows and columns across the continuous space. We have the appearance of discontinuity simply because there are empty spaces that could be filled by adding more objects and features.

Spoiler alert: An adaptive consumer improvises and adopts whenever representational system works in that context. Dimensional maps provide the overview of the terrain and seem to be employed whenever many objects and/or features need to be consider jointly. Detailed trade-offs focus in on the features. No one should be surprised to discover a pragmatic consumer switching between decision strategies with their associated spatial or category representations over the purchase process as needed to complete their tasks.

Nonnegative Matrix Factorization of Car Perceptions

I will not repeat the comprehensive and easy to follow analysis of this automobile data from the plfm R package. All the details are provided in Section 4.3 of Michel Meulders' Journal of Statistical Software article (see p. 13 for a summary). Instead, I will demonstrate how nonnegative matrix factorization (NMF) produces the same results utilizing a different approach. At the end of my last post, you can find links to all that I have written about NMF. What you will learn is that NMF extracts latent features when it restricted everything to be nonnegative. This is not a necessary result, and one can find exceptions in the literature. However, as we will see later in this post, there are good reasons to believe that NMF will deliver feature-like latent variables with marketing data.

We require very little R code to perform the NMF. As shown below, we attach the plfm package and the dataset named car, which is actually a list of three elements. The cross-tabulation is an element of the list with the name car$freq1. The nmf function from the NMF package takes the data matrix, the number of latent features (plfm set the rank to 6), the method (lee) and the number of times to repeat the analysis with different starting values. Like K-means, NMF can find itself lost in a local minimum, so it is a good idea to rerun the factorization with different random start values and keep the best solution. We are looking for a global minimum, thus we should set nrun to a number large enough to ensure that one will find a similar result when the entire nmf function is executed again.

library(plfm)
data(car)
 
library(NMF)
fit<-nmf(car$freq1, 6, "lee", nrun=20)
 
h<-coef(fit)
max_h<-apply(h,1,function(x) max(x))
h_scaled<-h/max_h
library(psych)
fa.sort(t(round(h_scaled,3)))
 
w<-basis(fit)
wp<-w/apply(w,1,sum)
fa.sort(round(wp,3))
 
coefmap(fit)
basismap(fit)

Created by Pretty R at inside-R.org

In order not to be confused by the output, one needs to note the rows and columns of the data matrix. The cars are the rows and the features are the columns. The basis is always rows-by-latent features, therefore, our basis with be a cars-by-six latent features. The coefficient matrix is always latent features-by-columns or six-latent features-by-observed features. It is convenient to print the transpose of the coefficient matrix since the number of latent features is often much less than the number of observed features.

Basic Matrix
Green
Family
Luxury
Popular
City
Sporty
Toyota Prius
0.82
0.08
0.08
0.00
0.01
0.02
Renault Espace
0.09
0.71
0.02
0.00
0.19
0.00
Citroen C4 Picasso
0.18
0.58
0.00
0.10
0.14
0.00
Ford Focus Cmax
0.00
0.50
0.04
0.35
0.11
0.00
Volvo V50
0.24
0.39
0.29
0.08
0.00
0.00
Mercedes C-class
0.04
0.01
0.69
0.00
0.11
0.16
Audi A4
0.00
0.10
0.43
0.14
0.12
0.21
Opel Corsa
0.17
0.00
0.00
0.83
0.01
0.00
Volkswagen Golf
0.00
0.02
0.29
0.67
0.02
0.00
Mini Cooper
0.00
0.00
0.15
0.00
0.70
0.15
Fiat 500
0.33
0.00
0.00
0.18
0.49
0.00
Mazda MX5
0.01
0.00
0.03
0.00
0.26
0.70
BMW X5
0.00
0.18
0.26
0.00
0.00
0.56
Nissan Qashgai
0.06
0.35
0.00
0.08
0.00
0.51
Coefficient Matrix
Green
Family
Luxury
Popular
City
Sporty
Environmentally friendly
1.00
0.05
0.08
0.34
0.19
0.00
Technically advanced
0.68
0.00
0.62
0.00
0.00
0.35
Green
0.66
0.02
0.06
0.06
0.04
0.00
Family Oriented
0.35
1.00
0.24
0.08
0.00
0.00
Versatile
0.15
0.53
0.27
0.25
0.00
0.16
Luxurious
0.00
0.10
1.00
0.00
0.12
0.56
Reliable
0.21
0.27
0.95
0.69
0.06
0.18
Safe
0.08
0.34
0.88
0.41
0.00
0.10
High trade-in value
0.00
0.00
0.85
0.21
0.00
0.13
Comfortable
0.08
0.57
0.84
0.15
0.04
0.19
Status symbol
0.08
0.00
0.81
0.00
0.40
0.60
Sustainable
0.33
0.23
0.71
0.44
0.00
0.02
Workmanship
0.24
0.03
0.58
0.00
0.00
0.25
Practical
0.09
0.60
0.17
1.00
0.52
0.00
City focus
0.51
0.00
0.00
0.94
0.93
0.00
Popular
0.00
0.23
0.25
0.94
0.52
0.00
Economical
0.90
0.13
0.00
0.93
0.27
0.00
Good price-quality ratio
0.35
0.25
0.00
0.88
0.08
0.12
Value for the money
0.12
0.16
0.10
0.60
0.01
0.10
Agile
0.12
0.06
0.18
0.87
1.00
0.16
Attractive
0.04
0.08
0.58
0.33
0.79
0.50
Nice design
0.04
0.10
0.38
0.23
0.77
0.46
Original
0.36
0.00
0.00
0.03
0.76
0.21
Exclusive
0.10
0.00
0.13
0.00
0.38
0.26
Sporty
0.00
0.00
0.40
0.27
0.45
1.00
Powerful
0.00
0.12
0.70
0.02
0.00
0.74
Outdoor
0.00
0.29
0.00
0.07
0.00
0.57

As the number of rows and columns increases, these matrices become more and more cumbersome. Although we do not require a heatmap for this cross-tabulation, we will when the rows of the data matrix represent individual respondents. Now is a good time to introduce such a heatmap since we have the basis and coefficient matrices from which they are built. The basis heatmap showing the association between the vehicles and the latent features will be shown first. Lots of yellow is good for it indicates simple structure. As suggested in earlier post, NMF is easiest to learn if we use the language of factor analysis and simple structure implies that each car is associated with only one latent feature (one reddish block per row and the rest pale or yellow).


The Toyota Prius falls at the bottom where it "loads" on only the first column. Looking back at the basis matrix, we can see the actual numbers with the Prius having a weight of 0.82 on the first latent feature that we named "Green" because of its association with the observed features in the Coefficient Matrix that seem to measure an environmental or green construct. The other columns and vehicles are interpreted similarly, and we can see that the heatmap is simply a graphic display of the basis matrix. It is redundant when there are few rows and columns. It will become essential when we have 1000 respondents as the rows of our data matrix.

For completeness, I will add the coefficient heatmap displaying the coefficient matrix before it was transposed. Again, we are looking for simple structure with observed features associated with only one latent feature. We have some degree of success, but you can still see overlap between family (latent feature #2) and luxury (latent feature #3) and between popular (#4) and city (#5).


We observed the same pattern defined by the same six latent features as that reported by Meulders using a probabilistic latent feature model. That is, one can simply compare the estimated object and attribute parameters from the JSS article (p. 12) and the two matrices above to confirm the correspondence with correlations over 0.90 for all six latent variables. However, we have reached the same conclusions via very different statistical models. The plfm is a process model specifying a cognitive model of how object-attribute associations are formed. NMF is a matrix factorization algorithm from linear algebra.

The success of NMF has puzzled researchers for some time. We like to say that the nonnegative constraints direct us toward separating the whole into its component parts (Lee and Seung). Although I cannot tell you why NMF seems to succeed in general, I can say something about why it works with consumer data. Products do well when they deliver communicable benefits that differentiate them from their competitors. Everyone knows the reasons for buying a BMW even if they have no interest in owning or driving the vehicle. Products do not survive in a competitive market unless their perceptions are clear and distinct, nor will the market support many brands occupying the same positioning. Early entries create barriers so that additional "me-too" brands cannot enter. Such is the nature of competitive advantage. As a result, consumer perceptions can be decomposed into their separable brand components with their associated attributes.

Discrete or Continuous Latent Structure?

Of course, my answer has already been given in a prior spoiler alert. We do both using dimensions for the big picture and features for more detailed comparisons. The market is separable into brands offering differentiated benefits. However, this categorization has a dissimilarity structure. The categories are contrastive, which is what creates the dimensions. For example, the luxury-economy dimension from the CA is not a quantity like length or weight or volume in which more is more of the same thing. Two liters of water is just the concatenation of two one-liter volumes of water. Yet, no number of economy cars make a luxury automobile. These axes are not quantities but dimensions that impose a global ordering on the vehicle types while retaining a local structure defined by the features.

Hopefully, one last example will clarify this notion of dimension-as-ordering-of-distinct-types. Odors clearly fall along an approach-avoidance continuum. Lemons attract and sewers repel. Nevertheless, odors are discrete categories even when they are equally appealing or repulsive. A published NMF analysis of the human odor descriptor space used the term "categorical dimensions" because the "odor space is not occupied homogeneously, but rather in a discrete and intrinsically clustered manner." Brands are also discrete categories that can be ordered along a continuum anchored by most extreme features at each end. Moreover, these features that we associate with various brands differ in kind and not just intensity. Both the brands and the features can be arrayed along the same dimensions, however, those dimensions contain discontinuities or gaps where there are no intermediate brands or features.

Applying the concept of categorical dimensions to our perceptual data suggests that we may wish to combine the correspondence map and the NMF using a neighborhood interpretation of the map with the neighborhoods determined by the latent features of the NMF. Such a diagram is not uncommon in multidimensional scaling (MDS) where circles are drawn around the points falling into the same hierarchical clusters. Kruskal and Wish give us an example in Figure 14 (page 44). In 1978, when their book was published, hierarchical cluster analysis was the most likely technique for clustering a distance matrix. MDS and hierarchical clustering use the same data matrix, but make different assumptions concerning the distance metric. Yet, as with CA and NMF, when the structure is well-formed, the two methods yield comparable results.

In the end, we are not forced to decide between categories or dimensions. Both CA and NMF scale rows and columns simultaneously. The dimensions of CA order those rows and columns along continuum with gaps and clusters. This is the nature of ordinal scales that depend not on intensity or quantity but on the stuff that is being scaled. In a similar manner, the latent features or categories of NMF have a similarity structure and can be ordered. The term "categorical dimensions" captures this hybrid scaling that is not exclusively continuous or categorical.

Thursday, August 21, 2014

Extracting Latent Variables from Rating Scales: Factor Analysis vs. Nonnegative Matrix Factorization

For many of us, factor analysis provides a gateway to learning how to run and interpret nonnegative matrix factorization (NMF). This post will analyze a set of ratings on a 218 item adjective checklist using both principal axis factor analysis and NMF. The entire analysis will be performed in R using less than two dozen lines of code located at the end of this post.

The R package mokken contains a dataset called acl with 433 students who looked at 218 adjectives and told us whether the adjectives described themselves using a 5-point rating scale from 0=completely disagree to 4=completely agree. I borrowed this dataset because it has a sizable number of items so that it might pose a challenge for traditional factor analysis. There will be no discussion of the nonparametric item response modeling performed by the mokken package.

Factor Analysis of the Adjective Checklist Ratings

Although R offers many alternatives for running a factor analysis, you might wish to become familiar with the psych package including its extensive documentation and broad coverage of psychometrics. We will start the analysis by attempting to determine the number of factors to extract. The scree plot is presented below.

There appears to be a steep drop for the first six components and then a leveling off. I tried 10 factors, but found that 9 factors yielded an interpretable factor structure. Technically, one might argue that we are not performing a factor analysis since we are not replacing the main diagonal of the correlation matrix with communality estimates. We call this a principal axis or just principal factor analysis for it consists of extracting and rotating principal components.

I have shown only a portion of the factor loadings for the 218 items below. You can reproduce the analysis yourself by installing the mokken package and running the R code at the end of the post. The nine factors were named based on the factor loadings indicating the correlations between the observed adjective ratings and the latent variables. The names have been kept short, and you are free to change them as you deem appropriate. Naming is an art, yet be careful not to add surplus meaning by being overly creative. Several of the factors loadings are negative, for example, an open person is not self-centered or egotistical. The star next to the name indicates that the scaling has been reversed so that unfriendly* actually means friendly and hostile* is not hostile. This is how the dataset handles negatively worded items.
Open
Calm
Creative
Orderly
Outgoing
Friendly
Smart
Headstrong
Pretty
egotistical
-0.63
0.22
illiberal*
0.62
0.13
0.20
0.10
0.12
hostile*
0.61
0.31
0.24
-0.11
self-centered
-0.60
0.14
0.15
unfriendly*
0.60
0.29
0.13
tense*
0.11
0.74
0.14
relaxed
0.68
0.17
0.16
nervous
-0.12
-0.68
-0.15
0.10
leisurely
0.62
0.21
0.22
0.11
stable
0.61
0.10
0.22
-0.10
0.19
0.29
anxious
-0.19
-0.61
-0.18
0.11
resourceful
0.68
0.16
0.14
inventive
0.64
0.11
0.15
enterprising
0.15
0.64
0.25
0.23
0.18
initiative
0.17
0.60
0.19
0.23
0.21
0.13
versatile
0.60
0.12
0.11
0.25
0.14
orderly
0.77
organized
0.76
planful
0.16
0.71
-0.12
0.12
slipshod*
0.70
-0.11
-0.11
disorderly*
0.13
0.67
withdrawn*
0.24
0.67
0.10
silent*
0.18
0.66
0.18
0.11
shy*
0.28
0.13
0.62
inhibited*
0.11
0.27
0.18
0.62
timid*
0.17
0.16
0.61
0.17
friendly
0.31
0.11
-0.16
0.58
0.16
0.15
sociable
0.29
0.18
0.25
0.57
appreciative
0.31
0.20
0.10
0.53
0.13
cheerful
0.38
0.32
0.11
0.30
0.51
-0.11
0.10
jolly
0.14
0.27
0.27
0.49
-0.11
-0.14
intelligent
0.23
0.12
0.57
0.18
clever
0.29
0.57
0.27
rational
-0.11
0.11
0.24
0.54
clear-thinking
0.16
0.14
0.22
0.17
0.48
0.15
realistic
0.25
0.28
0.46
stubborn*
0.28
-0.14
-0.60
persistent*
0.23
-0.23
-0.14
-0.59
headstrong
-0.31
-0.12
-0.16
0.53
opinionated
-0.24
0.23
-0.13
0.12
0.14
0.50
handsome
-0.14
0.12
0.12
0.15
0.72
attractive
-0.11
0.13
0.15
0.70
good-looking
-0.14
0.19
0.15
0.17
0.68
sexy
-0.18
0.20
0.15
0.21
0.19
0.59
charming
0.27
0.31
0.17
0.48

Although the factor loadings are not particularly large, the factor structure is clear. The blank spaces indicate factor loadings with absolute values less than 0.10. I am presenting only the largest loadings in order to avoid 218 rows of decimals. Again, the R code is so easy to copy and paste into R studio that you ought to replicate these findings. In addition, you might wish to examine the 10 factor solution. The authors of the adjective checklist believed that there were 22 subscales. I could not find them with a factor analysis.

Nonnegative Matrix Factorization of the Adjective Checklist Ratings

Nonnegative matrix factorization (NMF) can be interpreted as if it were a factor analysis. Since our factor analysis is a varimax-rotated principal component analysis, I will use the terms principal component analysis (PCA) and factor analysis interchangeably. Both PCA and NMF are matrix factorizations. For PCA, a singular value decomposition (SVD) factors the correlation matrix (R) into the product of factor loadings (F): R = FF'. NMF, on the other hand, decomposes the data matrix (V) into the product of W and H. The term "nonnegative" indicates that all the cells in all three matrices must be zero or greater. We will not see any negative factor loadings.

PCA has its own set of rules. The SVD creates a series of linear combinations of the variables that extract at each step the maximum variation possible with the restriction that each successive principal component is orthogonal to all previous linear combinations. Some of those coefficients will need to be negative in order for SVD to fulfill its mission. The varimax rotation seeks a more simple structure in which each row of factor loadings will contain as many small values as possible. However, as we saw in the above example, those loadings can be negative.

In general, NMF does not demand orthogonal rows of H. Instead, it keeps W and H nonnegative. Consequently, NMF will not extract bipolar factors such as the first factor Open from our factor analysis. That is, the Open factor is actually Open vs. Self-Centered so that one is more open if they reject egotistical as a descriptor. One scores higher on Openness by both embracing the openness adjectives and dismissing the self-centered items. But this is not the case with NMF. As we shall see below, we will need two latent features, one for self-centered and the other for open.

[Note: Do not be concerned if NMF takes a minute or two to find a solution when n or p or the rank are large.]

For our adjective checklist data, V is the n x p data matrix with n=433 students and p=218 items. Do we need all 218 adjectives to describe each student? For example, do we need to know your rating for handsome once we know that you consider yourself attractive? The above factor loading suggest that both items load on the same last latent variable, which I named "pretty" to keep the label short.

NMF uses the matrix H to store something like the factor loadings. However, we must remember that NMF is reproducing the data matrix and not the correlation matrix. As a result, the scaling depends on the units in the data matrix. Unlike a PCA of a correlation matrix, which returns factor loadings that are correlations, NMF generates a H matrix that needs to be rescaled for easier interpretation. There are many alternatives, but since zero is the smallest value, it makes sense to rescale H so that the highest value is one (by dividing every row of H by the maximum value in that row). It is all in the R code below, and I encourage you to copy and paste it into R studio.

Only because the checklist was designed to measure 22 subscales scores, I set the rank to 22. The R code also includes a 9 latent variable solution in order for you to make direct comparisons with the factor analysis. However, the 22 latent feature solution illustrates why one might want to replace PCA with NMF.

When printing, H is best transposed so that there are more rows than columns. Instead of the transposed H matrix with 218 rows and 22 columns, I have simply grouped the adjectives with the highest loading on each of the 22 latent features. Of course, you should use the R code listed below and print out the entire H matrix using the fa.sort() function. You are looking for simple structure, that is, a sparse matrix of variable weights. H has been transposed to make it easier to read, and the columns have been rescaled to vary between zero and one. Each column of the transposed H matrix represents a latent feature. If we are successful, those columns will contain only a few adjectives with values near one and most of the remaining values at or near zero.

Adjectives describing happy and reasonable fall into the two groupings in the first column. We find our self-centered terms in the first grouping of the second column, but unfriendly* will not appear until the second grouping in the last column. I make no claim that 22 latent features are needed for this data matrix. It was only a test of the 22 subscales that I could not recover in the factor analysis. It seems that NMF performs better, although we are not close to identifying the hypothesized subscale pattern. In particular, instead of subscales of equal size, we find some large clusters and several two-item pairings (e.g., unambitious* and ambitious).

Still, does not the NMF seem to capture our personality folklore? We have a lot of adjectives available to describe ourselves and others. At times, we search for just the right one. Egotistical captures the high opinion of oneself that self-centered misses. However, the two terms appear to be synonyms in both the PCA and the NMF of this data matrix.

good-natured
self-centered
sexy
distrustful*
dominant
suspicious*
healthy
individualistic
flirtatious
defensive*
shy*
interests narrow*
stubborn*
egotistical
good-looking
fickle*
silent*
intolerant*
sarcastic*
persistent*
charming
deceitful*
bossy
forgiving
contented
indifferent
handsome
calm*
outgoing
poised
appreciative
inventive
attractive
quiet*
outspoken
tolerant
wise
ingenious
practical
worrying*
talkative
steady
spineless
efficient
timid*
modest*
opinionated
snobbish*
hurried*
resentful*
interests wide
unrealistic*
creative
peculiar*
zany*
prejudiced*
realistic
irritable
original
apathetic*
rattlebrained*
self-seeking*
excitable
patient*
fault-finding*
emotional*
rational
impatient
imaginative
moderate*
hostile*
tense*
honest
argumentative
jolly
dull*
selfish*
stable
reliable
quarrelsome
dissatisfied*
shiftless*
unintelligent*
relaxed
gloomy
sociable
unfriendly*
leisurely
fair-minded
self-punishing
spontaneous
forceful
independent*
irresponsible*
anxious
initiative
arrogant
affectionate
orderly
reasonable
self-pitying
resourceful
vindictive
illiberal*
slipshod*
unscrupulous*
confident*
enthusiastic
strong
obnoxious*
precise
sincere
self-denying
moody*
tactless
kind
disorderly*
reflective
nervous
courageous
aggressive
friendly
painstaking
logical
pessimistic
bitter*
tactful*
foolish*
organized
cynical
weak
cheerful
persevering
thankless*
conscientious
clear-thinking
fearful
versatile
quitting*
cold*
planful
hasty*
self-confident*
pleasure-seeking
unkind
pleasant
methodical
intelligent
complaining
energetic
sympathetic
dependable
despondent
optimistic
reserved*
touchy*
curious
active
inhibited*
impulsive
helpful
submissive
spunky
withdrawn*
reckless
mature
cowardly
considerate
aloof*
demanding
self-controlled
high-strung*
adventurous
foresighted
absent-minded*
unambitious*
restless*
enterprising
affected*
confused*
ambitious
prudish*
serious
forgetful*
noisy
cautious*
independent
preoccupied*
cruel*
loud
rebellious
deliberate
dreamy*
whiny
daring
responsible
distractible*
lazy*
infantile
temperamental
understanding
alert
industrious
dependent
fussy*
insightful
witty
nagging*
clever
humorous
mischievous
conceited
determined
headstrong
thorough
sharp-witted
uninhibited
boastful

Does this not look like topic modeling from Figure 4 of Lee and Seung? As you might recall, topic modeling gathers word counts from documents and not respondents. The goal is to learn what topics are covered in the documents, but all we have are words. Students are the documents, and adjectives are the words. We do not have counts of the number of times each adjective was used by each student. Instead, we have a set of ratings of the intensity with which each adjective describes the respondent. One could have conducted a similar analysis using open-ended text and counting words. Self-description can take many forms. As long as the measures are nonnegative, NMF will work, especially when the data matrix is sparse.

Try It!

This is my last appeal. R makes it so easy to fit many models to the same data. Better yet, why not try NMF with your own data? Every time you perform a factor analysis copy the half dozen lines of R code and see what a NMF yields.

PCA and factor analysis seek global dimensions explaining as much variation as possible with each successive latent variable. Subsequent rotations to simple structure help achieve more interpretable factors. Yet, as we have observed in this example, we still uncover bipolar dimensions that simultaneously push and pull by assigning positive and negative weights to many different variables at the same time. Sometimes, we want bipolar dimensions that separate the sweet from the sour but not when the objects of our study can be both sweet and sour at the same time.

NMF, on the other hand, forces us to keep the variable weighting nonnegative. The result is a type of simple structure without rotation. Consequently, local structures residing within only a small subset of the variables can be revealed. We have separated the sweet from the sour, and our individual respondents can be both friendly and self-centered. PCA and NMF impose different constraints, and those restrictions deliver different representational systems. One might say that PCA returns latent dimensions and NMF generates latent features or categories. I will have more to say about latent dimensions and categories in later posts. Specifically, I will be reviewing the theoretical basis underlying the R package plfm and arguing that NMF produces results that look very much like probabilistic latent features.

Warning: NMF May Transpose the Data Matrix Reversing Role of W and H

While all my data matrices are structured with individuals as rows and variables as columns, this is not the way it is done in topic modeling or with gene expression data. Do not be fooled. Ask yourself, "Where are the latent variables?" When the data matrix V is individuals-by-variables, the latent variables are the rows of H defined by their relationship with the observed variables in the columns of H. When the data matrix is words-by-documents, the latent variables are the columns of W defined by the words in the rows of W. This is why the NMF package refers to W as the basis and H as the mixture coefficient matrix.

We can represent our data matrix in a geometric space with the respondents as points and the variables as axes or dimensions (biplots). Since p is often smaller than n (fewer variables than respondents), this space is p-dimensional and the p vectors form the basis for this variable space. When the variables are features, we speak of the p-dimensional feature space. Now, NMF seeks a lower dimensional representation of the data, that is, lower than p (218 items in our example). The 22 groups of items presented above is such a lower dimensional representation with a basis of rank equal to 22. We interpret this new 22-dimensional basis using H because H tells us the contribution of each adjective. The adjective groupings bring together the items with the largest weights on each of dimension of the new 22-dimensional basis. Thus, H is the basis matrix for our data matrix.

Do not be confused by the names used in the NMF package. With our data matrices (individuals-by-variables) the basis or "factor loadings" can be found in what NMF called the mixture coefficient matrix H. Thus, use the coef() and coefmap() functions to learn which variables "load" on which latent variables. For greater detail or to learn more about the other matrix W, you can read my earlier posts on NMF.

How Much Can We Learn from Top Rankings Using NMF?
Uncovering the Preferences Shaping Consumer Data
Taking Inventory
Customer Segmentation Using Purchase History
Exploiting Heterogeneity to Reveal Consumer Preference


R code to run all the analyses in this post:

# Attach acl data file after
# mokken package installed
library(mokken)
data(acl)
 
# run scree plot
# print eigenvalues
# principal-axis factor analysis
# may need to install GPArotation
library(psych)
eigen<-scree(acl, factors=FALSE)
eigen$pcv
pca<-principal(acl, nfactors=9)
fa.sort(pca$loadings)
 
library(NMF)
fit<-nmf(acl, 9, "lee", nrun=20)
h<-coef(fit)
max_h<-apply(h,1,function(x) max(x))
h_scaled<-h/max_h
fa.sort(t(round(h_scaled,3)))
 
fit22<-nmf(acl, 22, "lee", nrun=20)
h<-coef(fit22)
max_h<-apply(h,1,function(x) max(x))
h_scaled<-h/max_h
fa.sort(t(round(h_scaled,3)))

Created by Pretty R at inside-R.org