Project Proposal_1

docx

School

University of North Texas *

*We aren’t endorsed by this school

Course

5600

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

4

Uploaded by MegaTitaniumCaribou39

Report
Project Proposal Hotel Booking Data Analysis Introduction The team project to conduct an in-depth analysis of hotel booking data for the period between July 2015 and August 2017, with the great motivation to provide the agency with significant information for their decision making. Hotel booking dataset is made up of 32 variables, among them are categorical, numeric and binary data. It is a significantly big dataset with 119,391 observations which would in-turn reduce result biasness. The proposed analysis will major on ANOVA and regression analyses to answer five key research questions which would be handy in helping the agency make informed decisions. Research Questions In order to explore the data, we will formulate five guiding research questions aimed at uncovering meaningful insights from the data. These questions will include; 1) What is the effect of the day of the week on average daily rate (ADR)? 2) Is lead time affected by the month of arrival? 3) Do significant differences exist in ADR based on hotel type? 4) How does meal type influence repeated guests? 5) Does correlation exist between current and previous booking cancellations? Exploratory Data Analysis (EDA) To clearly understand the data, we would begin with performing exploratory data analysis. Exploratory data analysis is the art of representing and understanding the data in a pictorial manner. It uncovers the patterns and associations between variables in a dataset thus simplifying the decision-making process of selecting the most appropriate analyses to use. We
will employ a range of visualization techniques depending on the type of variables covered and the goal of the visual. For categorical variables which cover the biggest percentage of the dataset, we will utilize pivot tables and graphical representations such as pie charts and bar plots to elucidate counts and proportions, enabling us to extract pictorial insights. To visualize the numeric variables like average daily rate, we will use histogram and box plots. A comparison can also be made on two numeric variables using scatter plots, to gain a comprehensive understanding of the data's distribution, central tendencies, and the association between variables. Data Analysis We shall perform appropriate analysis on each research question depending on the variables involved and the goal of the question. These are the analyses to be deployed; a) ANOVA Analysis: Analysis of variance is a statistical technique used to compare variances across averages of different groups or categories (Canbolat et al., 2019). We will employ ANOVA tests in comparing numerical variables based on categorical variable groups. In our analysis, this could involve assessing if there are significant differences in ADR across various hotel types or identifying variations in lead time based on the month of arrival. b) Regression Analysis: Regression is a powerful statistical technique used to investigate relationship between two or more variables of interest ( Chicco et al., 2021) . From the definition, we will make good use of the technique to examine the association between applicable variables and point out other variables or categories of variables in the dataset that affect then. For example, to examine the effect of the day of the week on ADR, we will model a regression model that involve variables such as day of the week, ADR, and other possible variables affecting ADR, like type of customer and booking cancellations. Analysis Tool We will perform both explanatory data analysis and hypothetical analyses using R data analysis software. This is a very powerful open-source language majorly used for statistical analysis and data visualization. With prior experience of putting it to practice, the team members agree that it would be best for the analysis.
Member’s contribution The team has agreed to divide the tasks among members with others performing data preparatory activities, others produce the visualizations while the rest run the statistical test. The entire team would then collaboratively interpret the results and decide on the format of the project document. Conclusion With the proposed data exploration and analysis, we’re confident that our investigation will provide significant information that will assist streamline the agency's decision-making process and provide the most beneficial suggestions.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
References Canbolat, A. S., Bademlioglu, A. H., Arslanoglu, N. U. R. U. L. L. A. H., & Kaynakli, O. (2019). Performance optimization of absorption refrigeration systems using Taguchi, ANOVA and Grey Relational Analysis methods.   Journal of Cleaner Production ,   229 , 874-885. Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.   PeerJ Computer Science ,   7 , e623.