
TEXT MINING - ARM
AI ART - ART
MOD 2 - ASSOCIATION RULE MINING
This is the second section of the Mod 2 assignment for analyzing the text data. I will explore ARM analysis, and the resulting lift, support, and confidence. ARM will show me which words appear together in documents.
Theres a few interesting rules that pop out in this top 15 sorted by confidence. The one that catches my eye is {‘know, people, work} ⇒ {make}, which has a confidence of 87%. To me, this suggests discussions about artistic or AI processes often emphasize human knowledge, creativity, and productivity. Rules like this can hint at larger themes in conversations happening among the posts and text I collected from articles.
DATA FORMAT
Before I reached this point in the project, I’ve already done some big work cleaning up the data. In the previous step, for clustering, I had to use a document term matrix (DTM). Similarly, in order for ARM to work, we need to format the data correctly. I took the data from its original format, where each row is a document, its label, and its content. Then, as seen to the left, I transformed that into transaction data, where each doc’s ‘content’ turns into a basket of items, where each item is a word from the doc! You can see that they’re just collections of those same words, separated by commas. During this process, I did run it through some extra cleaning, just to be sure I caught any possible stragglers. In ARM, it is kind of important to get rid of useless words, to be sure that the rules show us something meaningful. We don’t care if people use the words “I” in conjunction with “AM”!
two version of the network charts, one with some new stopwords removed. The left is the most current data.
ARM rules are kind of just a bunch of If - Then statements about what appears on the ‘right hand side’ when items on the left hand side are present. I have managed to scale down to the 48 most influential rules by setting my thresholds so high. Support, shown to the left, defines things like found that the words {make, people, think,} are strongly associated with {Work}, suggesting that discussions in the texts often revolve around the concept of work. I know from my exploration of the articles that often, the concern many artists have on either side of the AI debate centers around job security, and if AI art influences their ability to get work. There is also likely discussion about how much “work” goes into a piece.
ARM revealed that the associations between key terms are not random but instead reflect central themes in the corpus. For instance, the strong association between terms like “know,” “people,” “work,” and “make” suggests that the creative process—especially concerns about originality and labor—is a critical topic.
The lift rules like number 4, which feeds {Make, People, Work} feeding to {Know}. This rule suggests the term "know" is particularly distinctive and central when discussing creativity ("make"), human aspects ("people"), and productivity ("work"). (a lift over 4 means that it’s distinctive!) It implies that knowledge - maybe of how a piece is made, or the original artists’ artistic expertise, or understanding as a viewer in general is a central theme in how creative labor and productivity is discsussed. To me, this points to conversations in the data surrounding if people ‘know’ something was made by a person or not.
Then, when running the Apriori Algorithm, I had it set up with support = 0.05, confidence = 0.5. Support determines how many documents a rule must apply/appear in - I have it set so that any rule must apply to at least 5% of the corpus of documents. Confidence is a similar limiting threshold, it means that I only kept rules that were correct at least 50% of the time.