# For this project, we’re going to use cluster analysis to

For this project, we’re going to use cluster analysis to “tell a story” about our data. I’m asking you to divide the Oregonians in your sample into groups or clusters based on two quantitative variables. Your “story” will be an explanation of your data that highlights some interesting feature(s) or makes a point about the data.

Please note this project will likely take some trial and error. Please relax into it and have some fun with the process: think of it as an exploration. Trial and error is the spice of life!

You will begin by opening a small data set. If you don’t have a data set of your own you’d like to explore, I recommend taking a random subset of the OregonPUMS_data. You will take a small subset of this data (I recommend n=400, so as not to upset XLSTAT too much: some clustering algorithms grind to a halt with large data sets).

Note that “Weight,” just like last time in project 1, is probably not what you think it is! I’m not forbidding “weight” but realize you’ll have to do a bunch of research about what “weight” is in order to write about it well and get credit!

Process:

Step 1: Select your sample of n=400. Lucky for us, XLSTAT is quite good at taking a random sample. Check out: Simple Random Sampling in XLSTAT ; alternatively there are several tutorials online for taking a random sample with regular excel.

Step 2: Choose two QUANTITATIVE variables that you would like to work with. Copy and paste your two variables and their corresponding sampled data (there should be 400 rows of data, two columns) into a new sheet. I prefer to do this so that I am not overwhelmed by variables that I am not using. Next, remove any rows with missing observations. This will save time later when you go to plot your clusters.

For the following steps, be sure that you have installed the XLSTAT add-in. Click on the XLSTAT tab on the top of your Excel sheet.

Step 3: Use different options in the software to create 5 different “data stories”: if you’re overwhelmed about what to pick, you can use these options:

*Scatterplots will have to be created separately using the Results by Object output. Under the colors tab, use whatever colors you would like, but be sure they are bold and distinct. For example, it would be a bad idea to use white or both red and red-orange.

Step 4: Write up your project! Which clustering method out of the five did you prefer? Why?

For your final report, compare and contrast each of the five clustering methods. You may choose to use your XLSTAT output or use Tableau/other software to make a prettier graph. Tell your story using your preferred clustering method, and how the clustering supports that story. Who are these groups? What does this clustering tell us about the people in Oregon? How could a business or entrepreneur use their understanding of this clustering story to further their goals?

Rubric for Project (40 points)

Maximum 5 points total if your variables are not quantitative! You must have two quantitative variables!

15 points: at least 5 different scatter plot graphs, all using the same basic variables (Step 1) but different clustering choices (Step 3). Data process and data product both discussed, particularly for Method 5. Clusters must vary, and at least one method shouldn’t be “just stripes,” e.g. both variables should matter.

10 points: your narration of the progression of your thinking (data process story).

5 points: Instructor’s subjective take on the product story. Was it gripping, interesting, well done?

5 points: graph conventions, labels, etc.

5 points: conventions: correct punctuation, sentences, etc.

Data Note: Be careful about the “Person’s Weight” variable. This does not mean “how much this person weighs” it means “how much weight to assign this person’s answers.” If you’re curious (not required) you can read about statistical weighting here: http://www.applied-survey-methods.com/weight.html (Links to an external site.).

https://help.xlstat.com/s/article/agglomerative-hi…

https://help.xlstat.com/s/article/k-means-clustering-in-excel-tutorial?language=en_US (Links to an external site.)

https://help.xlstat.com/s/article/scatter-plot-with-confidence-ellipses-in-excel?language=en_US

Requirements: around 3 pages   |   .doc file

Pages (275 words)
Standard price: \$0.00

### Latest Reviews

Impressed with the sample above? Wait there is more

Related Questions

### Amelia Mangune Posted Date Apr 7, 2022, 9:51 PM UnreadReplies

Amelia Mangune Posted Date Apr 7, 2022, 9:51 PM UnreadReplies to Maresah Harris Based on Theodore (2020), respiratory acidosis is a disruption in acid-base balance

### Please write a report guided by the following questions: (Similarity

Please write a report guided by the following questions: (Similarity will be checked and word limit no more than 2000 words including tables/chart) 1.

### Pasteurization, a process named in honor of French chemist, Louis

Pasteurization, a process named in honor of French chemist, Louis Pasteur, uses heat to destroy pathogenic microorganisms in foods and beverages such as milk,

### Research visions from organizations in industries other than your own.

Research visions from organizations in industries other than your own. Find a vision statement that provides an engaging picture for the future of that organization

### Over the course of the past 8 weeks, we have

Over the course of the past 8 weeks, we have examined a number of issues that are facing families today. For this assignment you are

### For these responses, they should be a minimum of 5

For these responses, they should be a minimum of 5 college-level sentences per topic. Please do not let the minimum affect your work! Remember

### Paper #1 How effective is the Environmental Protection Agency (EPA) in

Paper #1 How effective is the Environmental Protection Agency (EPA) in terms of environmental policy and practice? Based on your research findings, should the EPA,

### PLEASE FOLLOW THE INSTRUCTIONS BELOW 4 REFERENCES ZERO PLAGIARISM BASE ON THE EXAM

PLEASE FOLLOW THE INSTRUCTIONS BELOW 4 REFERENCES ZERO PLAGIARISM BASE ON THE EXAM I’M WEAK WITH DIAGNOSIS & STRENGTH IS ETHIC Based on your practice

### Assignment Content You are responsible for creating a guide for

Assignment Content You are responsible for creating a guide for new teams that join your organization in order to help them start off on

### This week, we will discuss accents and dialects in literature.

This week, we will discuss accents and dialects in literature. The following are a list of potential authors and works you can use for

### DIRECTIONS I expect at the minimum, one complete paragraph for

DIRECTIONS I expect at the minimum, one complete paragraph for each question or part of the question. You will be graded based on the thoroughness