Data Cleaning

Data, even from an enterprise data warehouse, is always imperfect. So being able to identify corrupt or imperfect data, understanding the analytical consequences of it, and understanding how to mitigate the problems (as much as possible) are important skills for the potential business leader. In this assignment, you will identify data cleaning needs for a data set, clean the data, and provide a summary analysis of the clean data and its business implications.

General Requirements:

Use the following information to ensure successful completion of the assignment:

Refer to “Bank Marketing Data Set,” located in the Course Add-Ons for this course.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
Doctoral learners are required to use APA style for their writing assignments. The APA Style Guide is located in the Student Success Center. An abstract is not required.
This assignment requires that at least three scholarly research sources related to this topic, and at least one in-text citation from each source be included. Support for decisions should include appropriate current (within the last 3 years) or foundational, peer reviewed, and professional research.
You are required to submit this assignment to LopesWrite. Refer to the directions in the Student Success Center.

Inspect “Bank Marketing Data Set,” determine what cleaning the data set requires to be suitable for analysis, and write a paper (750-1,000 words) supported with graphs, charts, and/or tables as appropriate that addresses the data cleaning requirements associated with the data set. Include the following in your paper:

A statement identifying what data, if any, are missing from the data set and a means to manage the missing data. Can you perform a valid analysis with the data missing? How will this affect your analysis?
A statement identifying what outliers, if any, are in the data set and how you would manage these outliers. Are there ethical considerations surrounding your chosen method of managing the outliers? How will the chosen method of managing the outliers affect your analysis?