Tableau Lab

Open the HomePrices.xlsx text file in Tableau. 

Drag the Home Prices Spreadsheet into the data connection canvas.

Run interpreter. 

Review the Excel file that Tableau generated. (Note that this dataset is formatted correctly)

Create a bar chart showing the distribution of building types:

  • Click on Sheet 1
  • Double-click on the Number of Records (or drag it onto the rows shelf). Note that the aggregation Sum(Number of Records) is placed on the Rows Shelf
  • Drag Bldg Type to the Columns Shelf
    • Question 1: What have you learned about the distribution of building types?  Which Building Type has the most homes for sale?  The least? 

Create a histogram of Sale Price:

  • Click on Sheet 2
  • Double-click on Sale Price (or drag onto rows shelf)
  • Select histogram in ShowMe
    • Question 2: What field did Tableau create to make the histogram?  Why?
    • Question 3:  What is the shape of the histogram?  Will the mean or median have a higher value?  Which value (mean or median) would be a better measure for the center of the distribution?

Create a boxplot of the Sale Price:

  • Create new sheet
  • Double-click on Sale Price (or drag onto rows shelf)
  • Disaggregate measures (de-select aggregate measures in the Analysis menu)
  • Select the box plot from the ShowMe menu
  • Drag Bldg Type and House Style to Tooltip on the Marks card
    • Question 4: Are there outliers present?  If yes, are outliers above or below the center of the distribution?
    • Question 5: What house style has the highest sale price?

Create a set of box plots of Sale Price for Building Type:

  • Duplicate the Boxplot of the Sale Price Sheet (right-click the tab and select duplicate from the dropdown menu)
  • Drag Bldg Type to the Columns shelf
    • Question 6:  Which building type has the lowest median Sale Price?
    • Question 7:  Which building type has no outliers in Sale Price?
    • Question 8:  Which distribution has the largest spread?  The smallest spread?

Part II: Scatter Plots and Regression

You are interested in determining what factors influence the house sale price (SalePrice). To investigate this question, do the following: 

Construct three scatter plots.  For each plot, place the Explanatory variable on the x-axis and the Response variable on the y-axis.

Variable combinations:

  • YearBuilt and SalePrice
  • 1stFlrSF and SalePrice
  • LotArea and SalePrice
    • Question 9: Evaluate the regression conditions for each plot.  Explain why or why not it is appropriate to run a regression analysis on each plot.  Please address all conditions covered in this module (Hint: Slide 25 in the lecture notes).

For plots that meet the regression conditions, add a linear trend line. 

Please submit your .twbx file + a screenshot of your notebook here and respond to the questions