ALY 6015 Northeastern University Analysis Report Paper

User Generated

fqsrjrsqft

Economics

ALY 6015

Northeastern University

ALY

Description

Overview and Rationale

To consolidate you theoretical knowledge into technique and skills with practical and applicable value, you are asked to analyze a dataset using methods presented in this course.

Report

The report should include the following elements:

Introduction

Introduce the goals of the project, the question that needs to be answered and the methods used in your analysis. Be sure that the goals, questions and methods used fit together, meaning, questions asked reflect the goals and the methods used will answer the questions.

Methods

You must use at least 2 methods presented in this course. Explain why your chosen methods were appropriate and the data involved in our analysis. Be as specific as possible.


Format

Your assignment/project should It also should follow APA format with at least 7,50 words (excluding title page and references page) and references page.

Unformatted Attachment Preview

ALY6015 Final Project: Initial Analysis Report I. Introduction PeronsaPanels combines machine learning with traditional market research to explore a new frontier in virtual market intelligence. Through the creation of ten animated personas, PersonaPanels can offer clients insight into current consumer opinions for a myriad of key customer segments. These personas interact with the internet to identify material that interests them based on their learned characteristics. Keywords are then highlighted to reflect the high level interests in a word cloud format. The dataset provided includes 88,450 observations over 4 variables. The four variables include persona classification, date of the observation, the key text, and the response or interest score to that key word. Six of the ten personas are represented in the dataset. Factor DCMAM DTRTM EM MINO MM TGM Persona Classification Don’t Call Me a Millennial Do the Right Thing Millennials Environmental Millennials Millennials in Name Only Millennial Moms Tech Geek Millennials # of observations 14,700 14,800 14,750 14,750 14,850 14,600 The millennial grouping focuses our analysis on the consumer segment for adults born between 1980 and 1994 (ages 24-39) but how different are the interests of these personas? The persona Millennials in Name Only (MINO) encompasses White, Black, and Hispanic blue-collar workers, both male and female, with a median age of 35. The descriptor indicates that this persona distinguishes itself from the more stereotypical millennial market niche. By examining shared interests between the 6 personas and identifying which two personas share the highest number of key words with MINO we can begin to test for significant differences or similarities. 145 key text words are identified across the six personas from a time period spanning 9/9/19 to 1/10/20. Are the mean interest scores significantly different for three randomly selected shared key words? The final analysis consists of three goals: 1. Identifying the number of key words shared between the six millennial personas. 2. Selecting the two personas with the highest number of shared key words with the focus persona, MINO, and drawing a random sample of three shared key words to utilize in testing. 3. Running ANOVA testing for similarities between the personas based on differences in mean interest scores for the key word sample. The overarching research question is: Are the mean interest scores for the focus persona, Millennial in Name Only, significantly different than those of similar personas in the millennial grouping? II. Methods Each goal requires a series of statistical tools to organize and interpret the data subsets. My first steps toward organizing the dataset is to check for missing values and outliers. A histogram of the “Response” variable shows a strong right-tailed trend with the majority of responses clustered at or below 0.15. The matching boxplot for the “Response” variable shows that responses greater than 0.15 are potential outliers. However, for the sake of comparing key words with a strong response rate across different personas we should not remove these data points. While they do indicate an unusually high response 2 to a key word they are valuable points of interest to how the different personas react to the same information. The dataset can then be subset to create separate, smaller datasets for each of the 6 personas. The personas each have 50 top key words listed out of the total 145. The basic table of text words and personas shows which personas record responses to the key texts. From this table we need to generate a matrix showing the number of shared words between each pair of personas. DCMAM DTRTM EM MINO MM TGM DCMAM DTRTM EM MINO MM TGM 50 27 27 37 23 10 27 50 27 29 17 12 27 27 50 31 11 20 37 29 31 50 18 13 23 17 11 18 50 10 10 12 20 13 10 50 From this point we can determine which two pairs in the matrix share the greatest number of top 50 key words with the focus persona, MINO. The persona with the highest number of shared interests is Don’t Call Me a Millennial (DCMAM) with 37 matching key words. DCMAM is categorized as mainly Hispanic full-time professionals, trending toward males with a median age of 30. The persona with the second highest number of shared interests is Environmental Millennials (EM) with 31 matching key words. EM is categorized as Black, Hispanic, and Asian males with a median age of 24. The three personas MINO, DCMAM, and EM are described as having differing median incomes, levels of debt, levels of employment, political affiliation, and social patterns. With the two pairs selected we pull a random sample of three phrases from the group of matching phrases for each. An ANOVA test can be used to determine if there exists a difference in mean response to the key text between personas for each of the three phrases. The one-way ANOVA test is the appropriate tool for this analysis as it compares two means (the mean response score per word) from two independent groups (the two personas) using the F-distribution. III. Analysis The statistical analysis will be broken down in the following system: MINO & DCMAM Comparison ANOVA Test 1.1 H0 : µM.1 = µD.1 Ha : µM.1 ≠ µD.1 ANOVA Test 1.2 H0 : µM.2 = µD.2 Ha : µM.2 ≠ µD.2 ANOVA Test 1.3 H0 : µM.3 = µD.3 Ha : µM.3 ≠ µD.3 MINO & EM Comparison ANOVA Test 2.1 H0 : µM.1 = µE.1 Ha : µM.1 ≠ µE.1 ANOVA Test 2.2 H0 : µM.2 = µE.2 Ha : µM.2 ≠ µE.2 ANOVA Test 2.3 H0 : µM.3 = µE.3 Ha : µM.3 ≠ µE.3 3 In total, six separate ANOVA tests are conducted to determine if the MINO persona has significantly different levels of interest in common key words when compared to two other unique personas. The key words act as indicators of interest areas based on internet material browsed by the learning persona. The six ANOVA test results feed into the research question: Are the mean interest scores for the focus persona, Millennial in Name Only, significantly different than those of similar personas in the millennial grouping? The results of each series of ANOVA tests will determine whether to accept or reject the null hypothesis. In each test the null hypothesis states that the mean response score for MINO equals the mean key word response score for the opposing persona (DCMAM or EM) for the chosen shared text. The alternative hypothesis states that the mean response score for MINO does not equal the mean response score for the opposing persona for the shared text. III. Interpretation The six separate ANOVA test results are grouped in sets of 3 per persona comparison. If a minimum of two out of three tests fail to reject the null hypothesis we can conclude that there is significant similarity between the interest scores of MINO and the opposing persona. If a minimum of two out of three tests find evidence to reject the null hypothesis we can conclude that there is significant difference between the interest scores of MINO and the opposing persona. The two sets of comparisons, (MINO & DCMAM) and (MINO & EM), may reach the same or different conclusions. If both comparisons find evidence of significant difference between the mean interest scores this would be a strong indication that the focus persona, Millennial in Name Only, offers unique insights into a distinct consumer niche. IV. References Kabacoff, R.I.(2015). R in Action: Data Analysis and Graphics with R (2nd Ed). Manning Publication Co. 4 V. Appendix A: R Code >view(personas_long_1_) >dim(personas_long_1_) >summary(personas_long_1_) >colSums(is.na(personas_long_1_)) >Personasummary(Persona) >Textsummary(Text) >levels(Text) >hist(Response) >boxplot(Response) >DTRTM DCMAM EM MINO MM TGMtable(Text, Persona) 5 Final Project: Proposal / Dataset Selection Update The final project for ALY6015: Intermediate Analytics combines theoretical knowledge of statistical concepts with the advanced tools of data science and analysis. For this project I assigned to Group 3, including myself and fellow student Xuanwen Wang. Xuanwen and I have started a conversation around the beginning steps of the project but have not reached a consensus on which project tasks best fit our combined skill level and interests. The corporate sponsor partnering on this assignment is PersonaPanels, a consumer research company offering a truly innovative approach to market learning. The company has combined machine learning with traditional market research to create a variety of animated personas from different market niches to act as virtual consumer reviewers for any given product or message. The animated personas evolve and learn by accessing internet articles that “interest” them based on their original programmed personality traits. Unfortunately, my background is not in data or computer science and so I feel that I would not be able to successfully contribute to expanding the article database. After briefly reviewing the previous project work conducted, I think my interest lies in expanding on the analysis done for the top 25 words shared across personas to the top 100. This would be a combination of tasks 2 and 3. I am extremely impressed with the level of analysis and data visualization tools developed in previous project work and hope in the next 4 weeks to be able to build on the dashboard platforms already proposed. For example, XNProject_Team1 created distinct functions and dashboards to compare the top 10 and bottom 10 keywords across the personas. This work could be expanded to include the top 25 words with correlation matrix and regression models. It could also build up to a 100 most common word table. I would like further explanation on what the dataset represents and how crucial the time stamps are to the entries and how the data is processed. As a data novice I may have to scale back my goals to deliver accurate and insightful information within the time constraint of the course.
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Please see the answer attached below in which I expounded the introduction, methodology and analysis sections and let me know in case you have a question. Thank you

Running head: FINAL PROJECT

ALY6015 Final Project:
Name:
Institution affiliation:
Date:

FINAL PROJECT

2
I.

Introduction

Currently, every industry includes highly competitive companies. Therefore,
companies are in need of strategies that will give them a competitive advantage in the market.
The strategies are usually geared towards enabling companies to attract and retain the target
customers as they seek to achieve their objectives and long-term goals. Specifically,
companies seek to increase their market share to increase their revenue and net earnings.
Strategies that companies use to gain a competitive advantage in the market include
cost-effective or differentiation strategies (Mikhailov, 2019). However, prior to deciding on
the best strategy the company should adopt as it seeks to thrive in the market, it is of the
essence for a company to understand the target customers' needs. In turn, the company ends
up being in a position where it can address the needs and be able to attract and retain
customers. Customer needs are usually communicated to companies via customer feedback.
Analysis of customer feedback to ascertain their needs and wants is usually a tedious and
costly process for most companies (Timoshenko & Hauser, 2017). It is for this reason that
companies are in need of virtual market intelligence that will enable companies to easily
determine their needs and wants. In this project, PersonaPanels, a consumer research
company offering a truly innovative approach to market learning, was analyzed.
PersonaPanels combines machine learning with traditional market research to explore
a new frontier in virtual market intelligence. Through the cre...

Related Tags