bayseian

docx

School

Wichita State University *

*We aren’t endorsed by this school

Course

753Z

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by UltraSeahorse4010

Lab: Bayesian Classifiers 1. Give a brief overview of the selected dataset and provide a link to the source. Specify the meaning of records. (Similar to the assignment for week 1). we collected the dataset from the data world. The dataset is a Pubg Game collection containing 5,000 matches started by random players. Link : https://data.world/darrylhofer/pubgdata/workspace/file? filename=newNewPubg+-+Sheet1.csv Meaning of records : In my dataset, each row represents a separate game or match played by a player. 2. Specify a data analytics question for your lab. We want to predict if a player will be in the top 10 percent of the game with the given information of the match kill rank of a player, number of energy drinks used by a player, number of assists given by the player to the teammates, no. of heals by the player in that match, and no. of weapons acquired by that player in the match. 3. Explain your variables (need more than 4 variables in total). Response variable : The response variable is Winplaceperc_new (win place percentage) which we derived from winplaceperc which is an existing attribute in our dataset. We can see the pre-processing techniques we used to derive this new attribute in the following sections. Type of variable: Categorical. Explaining the response variable using appropriate exploratory data analysis technique: Fig1. Frequency of Winplaceperc_new Figure 1 represents how many are in the top 10 percent of the entire game. Here, 0 represents below 90% and 1 represents top 10% of the people in a game. So here we can see that less than 1000 people are in the top 10% in our entire dataset.

Explanatory Variables: Our explanatory variables are: Match kill Rank It shows rank of a player in a specific match based on no of kills he/she achieved. This datatype is considered a ratio since the player with 0 kills is going to be ranked at the least possible rank based on number of players in a match and that position is a zero valued position (value of the position is zero). This column data lies between 1 to 98 in our dataset, but it is possible to range between 0 to 100. We have a mean of 37.87 in match kill rank column. When 2 or more players have the same amount of kills then the rank will be determined based on other data influencing the kill which are “Distance of kill”, “headshots”(In the game headshots are considered difficult so they have a priority), “time of kill”(player who achieved the kill before other player will have an advantage) and “Damage taken for the kill”(the amount of damage occurred to his enemy during a shoot). Fig. 2. Frequency of Matchkillrank The above histogram in fig.2 represents the frequency of match kill rank. From the above graph, we can say that our dataset contains more data of the less match kill people. In other words, the people who has more rank are less in our dataset. There are around 95 people who has the matchkill rank of 16 in our dataset of 5000 records. Boosts It refers to the consumption/usage of boost items by players during matches. Boost items typically include items like energy drinks, pain killers etc.,. This is a ratio datatype. The values in this column in our dataset lies between 0 to 13 with an average 1.43. This means that a lot of data is tending towards the lower values mainly 0. If a player is in game for longer time, then there are high chances that he is

killing players or trying to win the game. In this process he may lose his energy in the game. Then comes the boost. To regain his energy, he will be taking some boosts present in game. In other words, more the boosts a player is using, higher are the chances for his win. This is the reason for us to consider boosts as one of the attributes in this analysis. Fig. 3. Frequency of Boosts The bar chart in figure 3 is right-skewed. This means there are a lot of people who are using less boosts. This makes sense because there are less people in 1 category in the bar chart shown above in the frequency graph of winplaceperc_new. This may be the reason i.e., people using more boosts are less in the above graph which in turn means there are less people in top 10 in our dataset. These are just our assumptions, and we can’t conclude anything until we complete our analysis. Assists It refers to the assistance provided to a teammate, when a player deals damage to an enemy player but the final strike is dealt by teammate resulting in kill for the teammate then it is considered as an assist. This is a ratio datatype. Since assists are meant to be when a teammate is been helped, this means, in a solo match type there are no assists. The values in this column in our dataset lies between 0 and 7 with an average of 0.33. If a player or the team is participating in a lot of fights and the kills of the team are high, the probability of the player getting more kills also increases. The higher value of the assists indicate that team is actively participating in fights and the probability of winning the games will also increase.

Your preview ends here