Shaun_McKellarJr_HW4
docx
keyboard_arrow_up
School
Syracuse University *
*We aren’t endorsed by this school
Course
707
Subject
Mechanical Engineering
Date
Apr 3, 2024
Type
docx
Pages
10
Uploaded by shaun6
Shaun McKellar Jr
IST 707- Applied Machine Learning
HW 4: Clustering
Use Clustering to Solve a Mystery in History
Introduction:
The Federalist Papers consisted of a collection of 85 essays aimed at persuading the people of New York to support the adoption of the newly proposed U.S. Constitution. These essays, authored by Alexander Hamilton, James Madison, and John Jay, were initially published anonymously under the pseudonym "Publius" in New York newspapers during 1787 and 1788. Although a bound version of the essays emerged in 1788, it wasn't until the 1818 edition, printed by Jacob Gideon, that the true authors were disclosed. The Federalist Papers hold immense significance as a key resource for interpreting the original intentions behind the Constitution.
Among these 85 essays, Alexander Hamilton is credited with writing 51, James Madison with 15,
John Jay with 5, and Hamilton and Madison collaborated on 3. However, there is ongoing debate about the authorship of the remaining 11 essays. Historians have grappled with the mystery of whether these essays can be attributed to Hamilton or Madison, a question that has persisted for many years.
About the Data
The Federalist Papers data set was used to conduct this analysis. This data set initially contained 85 rows and 72 columns. Each row referred to a paper written by one of the authors, and 70 of the columns represented a word used within the paper. The value within the cells referred to the word’s relative frequency within a particular document. The remaining two columns referred to the author’s name and the file’s name/the paper in question.
The data set contained no missing values, but data cleansing and transformation were still necessary. The columns containing the author’s name and the file’s name were not necessary for the clustering analyses but having the file’s name as the row label was necessary to identify a particular observation. However, the file names were very long, and this could decrease utility when attempting to identify observations from the clustering analyses/graphs.
n this R code, a series of data preprocessing and exploratory steps were carried out. To begin with, several R libraries were loaded, such as word cloud
, quanteda
, arules
, and ggplot2
, providing tools for text mining, data analysis, and visualization. The working directory was set to
a specific location on the desktop, ensuring that R could locate and save files. The "Federalist
Shaun McKellar Jr
Papers" dataset was loaded from a CSV file, and a backup copy called "FederalistPapers_Orig" was created to preserve the original data.
The dataset was then explored using the View
function to interactively examine its contents, and a check for missing values was conducted to ensure data completeness. To prepare the text data for analysis, thresholds for term frequency were set to filter out overly common and extremely rare words. Additionally, a list of stop words, including common English words, was defined to exclude them from the analysis.
Furthermore, a summary of the "Federalist Papers" dataset was generated to gain insights into its structure and content. Lastly, available transformations were inspected. These preprocessing steps are crucial in text mining and natural language processing projects, as they lay the foundation for meaningful analysis by addressing data quality, term frequency, and stop words to focus on relevant patterns and insights within the text data.
Model/Results
Shaun McKellar Jr
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Shaun McKellar Jr
The centroid values on these dimensions should ideally be distant from each other to effectively
differentiate the clusters. In the context of k-Means clustering, attributes like "a," "and," "as," "be," "by," "in," "is," "of," "that," and "the" are deemed the most valuable for this clustering process.
Shaun McKellar Jr
The graph above demonstrates a common method for determining the optimal number of clusters in unsupervised learning, particularly for algorithms like k-means, which require the user to specify the number of clusters (k) before the algorithm is run. The analysis of clustering in this context reveals several important insights. Firstly, there is a notable drop in the silhouette score when transitioning from 2 to 3 clusters, indicating that the data does not naturally align with 3 clusters as well as it does with 2 clusters. Following this drop, as the number of clusters increases from 3 to 10, the silhouette scores stabilize, demonstrating only a slight decreasing trend. This plateau signifies that expanding the number of clusters beyond 3 doesn't yield significant improvements in the clustering structure.
Shaun McKellar Jr
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Shaun McKellar Jr
I
The bar chart presented above visually represents the outcomes of a cluster analysis applied to resolve the authorship question surrounding the Federalist Papers. Upon examining the chart, several observations that I made:
Firstly, the majority of the papers appear to be attributed to Hamilton and are distributed across
various clusters. Secondly, Madison's papers are also identifiable, though they are notably fewer
compared to those attributed to Hamilton. Jay, on the other hand, has the fewest papers linked to him, aligning with historical records that indicate he authored only a limited number of papers. Lastly, the disputed papers, potentially indicated as "dispt," appear to be dispersed among different clusters.
Shaun McKellar Jr
Shaun McKellar Jr
In this section of R code, another Hierarchical Agglomerative Clustering (HAC) analysis is performed on the "FederalistPapers" dataset, using a different distance metric and linkage method.
The code calculates the distance matrix "distance3" using the Manhattan distance method, which measures the absolute differences between data points along each dimension. This distance matrix is then used for hierarchical clustering with the complete linkage method, creating the hierarchical structure of the data.
Although difficult to see) The dendrogram shows clustering results that are similar to that of the K-Means analysis. HCA was also performed using average linkage (as opposed to complete linkage), and similar results were achieved. The key takeaway from the HCA is that the majority of disputed papers were clustered on branches containing Madison’s papers.
Conclusion
Therefore, despite instances of imperfect clustering, the data suggests that Madison was the author of the 11 disputed papers. Thus, this brings the distribution of the 85 papers to 51
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Shaun McKellar Jr
written by Hamilton, 26 written by Madison, 5 written by Jay, and 3 written by both Hamilton and Madison. The mystery surrounding the authorship of the disputed Federalist Papers remains unsolved through my analysis. While the clustering results offered valuable insights and
suggest potential authorship patterns, they do not provide conclusive evidence. To confirm the findings, further research employing advanced text analysis techniques as well as it is necessary to research more information. I do believe Hamilton wrote most of the papers though nonetheless.
The primary difference between the two clustering exercises lies in the methods and distance metrics used. The k-Means clustering relied on Euclidean distance and silhouette scores, whereas the Hierarchical Agglomerative Clustering (HAC) used Manhattan distance and complete linkage. The choice of these metrics can influence the clustering results. However, both exercises showed similar patterns regarding the disputed papers' proximity to Madison's work.
The most significant takeaway from this exercise is that text analysis and clustering methods can
shed light on historical authorship questions. It underscores the complexity of the authorship attribution challenge and the limitations of relying solely on computational approaches to solve historical mysteries.
Related Documents
Related Questions
You are a biomedical engineer working for a small orthopaedic firm that fabricates rectangular shaped fracture
fixation plates from titanium alloy (model = "Ti Fix-It") materials. A recent clinical report documents some problems with the plates
implanted into fractured limbs. Specifically, some plates have become permanently bent while patients are in rehab and doing partial
weight bearing activities.
Your boss asks you to review the technical report that was generated by the previous test engineer (whose job you now have!) and used to
verify the design. The brief report states the following... "Ti Fix-It plates were manufactured from Ti-6Al-4V (grade 5) and machined into
solid 150 mm long beams with a 4 mm thick and 15 mm wide cross section. Each Ti Fix-It plate was loaded in equilibrium in a 4-point bending
test (set-up configuration is provided in drawing below), with an applied load of 1000N. The maximum stress in this set-up was less than the
yield stress for the Ti-6Al-4V…
arrow_forward
Follow the instructions carefully.
arrow_forward
Identify the lines
arrow_forward
University of Babylon
Collage of Engineering\Al-Musayab
Department of Automobile
Engineering
Under Grad/Third stage
Notes:
1-Attempt Four Questions.
2- Q4 Must be Answered
3-Assume any missing data.
4 تسلم الأسئلة بعد الامتحان مع الدفتر
Subject: Mechanical
Element Design I
Date: 2022\01\25
2022-2023
Time: Three Hours
Course 1
Attempt 1
Q1/ Design a thin cylindrical pressure tank (pressure vessel) with hemispherical ends to the
automotive industry, shown in figure I below. Design for an infinite life by finding the
appropriate thickness of the vessel to carry a sinusoidal pressure varied from {(-0.1) to (6) Mpa}.
The vessel is made from Stainless Steel Alloy-Type 316 sheet annealed. The operating
temperature is 80 C° and the dimeter of the cylinder is 36 cm. use a safety factor of 1.8.
Fig. 1
(15 Marks)
Q2/ Answer the following:
1- Derive the design equation for the direct evaluation of the diameter of a shaft to a desired
fatigue safety factor, if the shaft subjected to both fluctuated…
arrow_forward
Please do not rely too much on chatgpt, because its answer may be wrong. Please consider it carefully and give your own answer. You can borrow ideas from gpt, but please do not believe its answer.Very very grateful!Please do not rely too much on chatgpt, because its answer may be wrong. Please consider it carefully and give your own answer. You can borrow
ideas from gpt, but please do not believe its answer.Very very grateful!
arrow_forward
Motiyo
Add explanation
arrow_forward
Hello tutors, help me. Just answer "Let Us Try"
arrow_forward
ECO
5. AUTOMOTIVE. The power an
engine produces is called
horsepower. In mathematical
terms, one horsepower is the
power needed to move 550
pounds one foot in one
second, or the power needed
to move 33,000 pounds one
foot in one minute. Power, in
physics, is defined simply as
the rate of doing work. The
formula below gives the
horsepower at 5,252 radians
per second.
https://philkotse.com/toyota-corona-ior-sale-in-baguio/1991-for-sale-in-aid7017151
625T
1313
where H is the horsepower and T is the torque
a. Find the inverse of the model.
b. If a taxi produces a horsepower of 200, what is the torque it generates?
Solve here:
arrow_forward
Please do not copy other's work and do not use ChatGPT or Gpt4,i will be very very very appreciate!!!
Thanks a lot!!!!!
arrow_forward
Please solve, engineering econ
arrow_forward
K
mylabmastering.pearson.com
Chapter 12 - Lecture Notes.pptx: (MAE 272-01) (SP25) DY...
P Pearson MyLab and Mastering
Mastering Engineering
Back to my courses
Course Home
Scores
Course Home
arrow_forward
The question and data are in pictures.
Please answer properly with each steps and explanation.
Best of luck.
Thank you so much in advance.
arrow_forward
You are an engineer in a company that manufactures and designs several mechanical devices, and your manager asked you to help your customers. In this time, you have two customers, one of them wants to ask about internal combustion engines while the other requires a heat exchanger with particular specifications. Follow the parts in the following tasks to do your job and support your customers.Task 1:Your first customer asked for an internal combustion engine to use it in a designed car. Your role is to describe the operation sequence of different types of available engines, explain their mechanical efficiency, and deliver a detailed technical report which includes the following steps:STEP 1Describe with the aid of diagrams the operational sequence of four stroke spark ignition and four stroke compression ignition engines.STEP 2Explain and compare the mechanical efficiency of two and four-stroke engines.STEP 3Review the efficiency of ideal heat engines operating on the Otto and Diesel…
arrow_forward
Hello I have two pictures with some questions Id like to get answers to! Short and great explanations please thank you !
arrow_forward
I need help solving this problem.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Principles of Heat Transfer (Activate Learning wi...
Mechanical Engineering
ISBN:9781305387102
Author:Kreith, Frank; Manglik, Raj M.
Publisher:Cengage Learning
Related Questions
- You are a biomedical engineer working for a small orthopaedic firm that fabricates rectangular shaped fracture fixation plates from titanium alloy (model = "Ti Fix-It") materials. A recent clinical report documents some problems with the plates implanted into fractured limbs. Specifically, some plates have become permanently bent while patients are in rehab and doing partial weight bearing activities. Your boss asks you to review the technical report that was generated by the previous test engineer (whose job you now have!) and used to verify the design. The brief report states the following... "Ti Fix-It plates were manufactured from Ti-6Al-4V (grade 5) and machined into solid 150 mm long beams with a 4 mm thick and 15 mm wide cross section. Each Ti Fix-It plate was loaded in equilibrium in a 4-point bending test (set-up configuration is provided in drawing below), with an applied load of 1000N. The maximum stress in this set-up was less than the yield stress for the Ti-6Al-4V…arrow_forwardFollow the instructions carefully.arrow_forwardIdentify the linesarrow_forward
- University of Babylon Collage of Engineering\Al-Musayab Department of Automobile Engineering Under Grad/Third stage Notes: 1-Attempt Four Questions. 2- Q4 Must be Answered 3-Assume any missing data. 4 تسلم الأسئلة بعد الامتحان مع الدفتر Subject: Mechanical Element Design I Date: 2022\01\25 2022-2023 Time: Three Hours Course 1 Attempt 1 Q1/ Design a thin cylindrical pressure tank (pressure vessel) with hemispherical ends to the automotive industry, shown in figure I below. Design for an infinite life by finding the appropriate thickness of the vessel to carry a sinusoidal pressure varied from {(-0.1) to (6) Mpa}. The vessel is made from Stainless Steel Alloy-Type 316 sheet annealed. The operating temperature is 80 C° and the dimeter of the cylinder is 36 cm. use a safety factor of 1.8. Fig. 1 (15 Marks) Q2/ Answer the following: 1- Derive the design equation for the direct evaluation of the diameter of a shaft to a desired fatigue safety factor, if the shaft subjected to both fluctuated…arrow_forwardPlease do not rely too much on chatgpt, because its answer may be wrong. Please consider it carefully and give your own answer. You can borrow ideas from gpt, but please do not believe its answer.Very very grateful!Please do not rely too much on chatgpt, because its answer may be wrong. Please consider it carefully and give your own answer. You can borrow ideas from gpt, but please do not believe its answer.Very very grateful!arrow_forwardMotiyo Add explanationarrow_forward
- Hello tutors, help me. Just answer "Let Us Try"arrow_forwardECO 5. AUTOMOTIVE. The power an engine produces is called horsepower. In mathematical terms, one horsepower is the power needed to move 550 pounds one foot in one second, or the power needed to move 33,000 pounds one foot in one minute. Power, in physics, is defined simply as the rate of doing work. The formula below gives the horsepower at 5,252 radians per second. https://philkotse.com/toyota-corona-ior-sale-in-baguio/1991-for-sale-in-aid7017151 625T 1313 where H is the horsepower and T is the torque a. Find the inverse of the model. b. If a taxi produces a horsepower of 200, what is the torque it generates? Solve here:arrow_forwardPlease do not copy other's work and do not use ChatGPT or Gpt4,i will be very very very appreciate!!! Thanks a lot!!!!!arrow_forward
- Please solve, engineering econarrow_forwardK mylabmastering.pearson.com Chapter 12 - Lecture Notes.pptx: (MAE 272-01) (SP25) DY... P Pearson MyLab and Mastering Mastering Engineering Back to my courses Course Home Scores Course Homearrow_forwardThe question and data are in pictures. Please answer properly with each steps and explanation. Best of luck. Thank you so much in advance.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Principles of Heat Transfer (Activate Learning wi...Mechanical EngineeringISBN:9781305387102Author:Kreith, Frank; Manglik, Raj M.Publisher:Cengage Learning

Principles of Heat Transfer (Activate Learning wi...
Mechanical Engineering
ISBN:9781305387102
Author:Kreith, Frank; Manglik, Raj M.
Publisher:Cengage Learning