Fundamentals of Data Science

The final exam will comprise a mix of theoretical questions and practical applications covering the entire course. This exam is designed to assess your understanding of various statistical models, their applications, and your proficiency in using statistical software tools such as RapidMiner, Python, and Excel for data analysis. The exam is worth 95 points in total.

The final exam will be divided into two main sections: theoretical questions and practical application. Each question will be clearly marked with the number of points it is worth.

• Theoretical Section (45 points): This section will consist of short-answer questions that assess your understanding of various statistical models and concepts.
1. Explain the difference between data and information. Provide examples where relevant. (5 points)
2. What are the different types of variables? Explain with examples. (5 points)
3. Discuss different scales of measurement and their implications for data analysis. (4 points)
4. Define the term ‘frequency distribution’ and explain its importance in data analysis. (3 points)
5. What is Bayes’ theorem and how is it used in statistics? (3 points)
6. Define and distinguish between discrete and continuous random variables. Provide examples. (3 points)
7. Discuss the concept of point and interval estimation. Provide an example where these concepts are applicable. (3 points)
8. What is a non-parametric test? Why and when should they be used? (3 points)
9. Explain cluster analysis and its application in data analysis. (3 points)
10. Discuss the concept of Principal Component Factor Analysis. (3 points)
11. Explain what regression models are and the differences between linear and non-linear regression models. (3 points)
12. Discuss binary and multinomial logistic regression models and provide an example of their application. (3 points)
13. Explain Poisson and Negative Binomial Regression models. When should they be used? (3 points)
• Practical Application Section (45 points): This section will require you to use statistical software tools such as RapidMiner, Python, and Excel to analyze provided datasets. You will need to interpret the results, draw conclusions, and provide insights.
1. You are given a dataset with variables related to student performance (like attendance, hours of study, family income, grades, etc.). Conduct a correlation analysis using statistical software tools such as RapidMiner, Python, and Excel. Interpret the results. (15 points)
2. You are given a dataset that records responses to a survey, with one variable being ‘satisfaction’ (very unsatisfied, unsatisfied, neutral, satisfied, very satisfied). Conduct a frequency distribution analysis using statistical software tools such as RapidMiner, Python, and Excel and interpret the results. (10 points) Unfortunately, I cannot provide a download link since customer satisfaction datasets are generally proprietary and confidential. You may use any or come up with one.This dataset consists of information regarding the satisfaction levels of customers who have interacted with a service provider. It includes variables such as income, satisfaction level, interaction duration, issues resolved, etc
3. Using the same dataset as in question 2, conduct a chi-square test for independence to examine the relationship between ‘satisfaction’ and ‘income level’. Interpret the results. (10 points)
4. Given a dataset with factors that could influence the sales of a product (like advertising spend, price, competition price, etc.), use a multiple regression model to predict sales. Interpret the results. (10 points)

Note: The datasets referenced are

Submission Format: Your submission should be a maximum of 2000 words. Submit your assignment in APA format as a Word document or a PDF file