lab09-AgeUniverse
html
keyboard_arrow_up
School
Temple University *
*We aren’t endorsed by this school
Course
1013
Subject
Astronomy
Date
Dec 6, 2023
Type
html
Pages
19
Uploaded by samzahroun
The Age of the Universe
¶
Welcome to Lab 9! Elements of Data Science adapted from Berkeley Data8
Sometimes, the primary purpose of regression analysis is to learn something about the slope or intercept of the best-fitting line. When we use a sample of data to estimate the slope or intercept, our estimate is subject to random error, just as in the simpler case of the mean of a random sample.
In this lab, we'll use regression to get an accurate estimate for the age of the universe, using pictures of exploding stars. Our estimate will come from a sample of all exploding stars. We'll compute a confidence interval to quantify the error caused by sampling.
In [1]:
name = "Sam Z."
In [2]:
## import statements
# These lines load the tests. from gofer.ok import check
import numpy as np
from datascience import *
import pandas as pd
import matplotlib
from matplotlib import patches
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter('ignore', FutureWarning)
plt.style.use('fivethirtyeight')
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import os
user = os.getenv('JUPYTERHUB_USER')
The Actual Big Bang Theory
¶
In the early 20th century, the most popular cosmological theory suggested that the universe had always existed at a fixed size. Today, the Big Bang theory prevails: Our universe started out very small and is still expanding.
A consequence of this is Hubble's Law, which says that the expansion of the universe creates the appearance that every celestial object that's reasonably far away from Earth (for example, another galaxy) is moving away from us at a constant speed. If we extrapolate that motion backwards to the time when everything in the universe was in the same place, that time is (roughly) the beginning of the universe!
Scientists have used this fact, along with measurements of the current location
and movement speed
of other celestial objects, to estimate when the universe started.
The cell below simulates a universe in which our sun is the center and every other star is
moving away from us. Each star starts at the same place as the sun, then moves away from it over time. Different stars have different directions and speeds
; the arrows indicate the direction and speed of travel.
Run the cell, then move the slider to see how things change over time.
Question 1
¶
When did the universe start, in this example?
In [3]:
# Just run this cell. (The simulation is actually not
# that complicated; it just takes a lot of code to draw
# everything. So you don't need to read this unless you
# have time and are curious about more advanced plotting.)
num_locations = 15
example_velocities = Table().with_columns(
"x", np.random.normal(size=num_locations),
"y", np.random.normal(size=num_locations))
start_of_time = -2
def scatter_after_time(t, start_of_time, end_of_time, velocities, center_name, other_point_name, make_title):
max_location = 1.1*(end_of_time-start_of_time)*max(max(abs(velocities.column("x"))), max(abs(velocities.column("y"))))
new_locations = velocities.with_columns(
"x", (t-start_of_time)*velocities.column("x"),
"y", (t-start_of_time)*velocities.column("y"))
plt.scatter(make_array(0), make_array(0), label=center_name, s=100, c="yellow")
plt.scatter(new_locations.column("x"), new_locations.column("y"), label=other_point_name)
for i in np.arange(new_locations.num_rows):
plt.arrow(
new_locations.column("x").item(i),
new_locations.column("y").item(i),
velocities.column("x").item(i),
velocities.column("y").item(i),
fc='black',
ec='black',
head_width=0.025*max_location,
lw=.15)
plt.xlim(-max_location, max_location)
plt.ylim(-max_location, max_location)
plt.gca().set_aspect('equal', adjustable='box')
plt.gca().set_position(make_array(0, 0, 1, 1))
plt.legend(bbox_to_anchor=(1.6, .7))
plt.title(make_title(t))
plt.show()
interact(
scatter_after_time,
t=widgets.FloatSlider(min=start_of_time, max=5, step=.05, value=0, msg_throttle=1),
start_of_time=fixed(start_of_time),
end_of_time=fixed(5),
velocities=fixed(example_velocities),
center_name=fixed("our sun"),
other_point_name=fixed("other star"),
make_title=fixed(lambda t: "The world {:01g} year{} in the {}".format(abs(t), "" if abs(t) == 1 else "s", "past" if t < 0 else "future")));
interactive(children=(FloatSlider(value=0.0, description='t', max=5.0, min=-2.0, step=0.05), Output()), _dom_c…
According to this example, the universe started 2 years ago.
Question 2
¶
After 5 years (with the slider all the way to the right), stars with longer arrows are further
away from the Sun. Why?
The stars are further away from the sun because of the passage of time and according to
this code, the universe started with the sun and other star also came from our sun and then they get further away over time. .
Analogy: driving
¶
Here's an analogy to illustrate how scientists use information about stars to estimate the age of the universe.
Suppose that at some point in the past, our friend Mei started driving in a car going at a steady speed of 60 miles per hour straight east. We're still standing where she started.
We want to know how long she's been driving, but we forgot to record the time when she
left. If we find out that she's 120 miles away, and she's been going 60 miles per hour the whole time, we can infer that she left 2 hours ago.
One way we can compute that number is by fitting a line to a scatter plot of our locations
and speeds. It turns out that the slope
of that line is the amount of time that has passed. Run the next cell to see a picture:
In [4]:
# Run this cell to see a picture of Mei's locations over time.
mei_velocity = Table().with_columns("x", make_array(60), "y", make_array(0))
interact(
scatter_after_time,
t=widgets.FloatSlider(min=-2, max=1, step=.05, value=0, msg_throttle=1),
start_of_time=fixed(-2),
end_of_time=fixed(1),
velocities=fixed(mei_velocity),
center_name=fixed("Us"),
other_point_name=fixed("Mei"),
make_title=fixed(lambda t: "Mei's position {:01g} hour{} in the {}".format(abs(t), "" if abs(t) == 1 else "s", "past" if t < 0 else "future")));
interactive(children=(FloatSlider(value=0.0, description='t', max=1.0, min=-2.0, step=0.05), Output()), _dom_c…
The slope of the line is 2 hours. (The units are vertical-axis units divided by horizontal-
axis units, which are $\frac{\texttt{miles}}{\texttt{miles} / \texttt{hour}}$, or hours.) So that's our answer.
Imagine that you don't know Mei's exact distance or speed, only rough estimates. Then if
you drew this line, you'd get a slightly bad estimate of the time since she left. But if you measured the distance and speed of hundreds of people who left you at the same time going different speeds, and drew a line through them, the slope of that line would be a pretty good estimate of the time they left, even if the individual measurements weren't exactly right.
The drivers.csv
dataset contains the speeds and distances-from-start of 100 drivers. They all left the same starting location at the same time, driving at a fixed speed on a straight line away from the start. The measurements aren't exact, so they don't fit exactly on a line. We've created a scatter plot and drawn a line through the data.
In [5]:
# Just run this cell to plot the data.
small_driving_example = Table().with_columns(
"Name", make_array("Us", "Mei"),
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
"Speed moving away from us (miles per hour)", make_array(0, 60),
"Current distance from us (miles)", make_array(0, 120))
x = small_driving_example.column("Speed moving away from us (miles per hour)")
y = small_driving_example.column("Current distance from us (miles)")
plt.scatter(x,y, s=200)
# Now compute the fit and overlay
linear_model=np.polyfit(x,y,1)
linear_model_fn=np.poly1d(linear_model)
x_s=np.arange(0,60)
plt.plot(x_s,linear_model_fn(x_s),color="green")
plt.show()
Question 3
¶
By looking at the fit line, estimate how long ago (in hours) Mei left.
In [6]:
# Just run this cell.
drivers= Table.read_table("drivers.csv")
xd,yd = drivers.column("Speed moving away from us (miles per hour)"),drivers.column("Current distance from us (miles)")
drivers
Out[6]:
Speed moving away from us (miles per hour)
Current distance from us (miles)
29.5047 47.6285
Speed moving away from us (miles per hour)
Current distance from us (miles)
64.8771 110.753 15.9152 22.8718 13.9501 23.0032 72.5393 126.281 38.43 59.2222 29.9099 51.3957 46.18 75.077 62.1933 105.435 23.4237 40.9736 ... (90 rows omitted)
In [7]:
plt.scatter(xd,yd, color='red')
# Second approach to plotting
#create scatterplot with regression line and confidence interval lines using Seaborn module
sns.regplot(x=xd, y=yd)
plt.show()
In [8]:
##### Fill in the start time you infer from the above line.
driving_start_time_hours = (100-60)/(60-35)
driving_start_time_hours
Out[8]:
1.6
In [9]:
check('tests/q3.py')
Out[9]:
All tests passed!
Back to cosmology
¶
To do the same thing for the universe, we need to know the distance-from-Earth and speed-away-from-Earth of many celestial objects. Using pictures taken by very accurate telescopes and a lot of physics, astronomers have been able to estimate both. It turns out that nearby supernovae
-- stars that have recently died and exploded -- are among the best sources of this data, because they are very easy to see. This picture taken by the Hubble telescope shows an entire galaxy, with a single supernova - as bright by itself
as billions of stars - at the bottom left.
Our astronomical data for today will come from the Supernova Cosmology Project
at Lawrence Berkeley Lab. The original dataset is here
, with (brief) documentation here
. Each row in the table corresponds to a supernova near Earth that was observed by astronomers. From pictures like the one above, the astronomers deduced how far away each supernova was from Earth and how fast it was moving away from Earth. Their deductions were good, but not perfect.
Run the cell below to load the data into a table called close_novas
and make a scatter
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
plot. (If you prefer, you can also use the name close_novae
; both are correct.)
Question 4
¶
Looking this plot, make a guess at the age of the universe.
Note
: Make sure you get the units right! In case you need to know what a parsec is, it's a big unit of distance, equivalent to 30.86 trillion kilometers.
In [10]:
# Just run this cell.
close_novas = Table.read_table("close_novas.csv")
close_novae = close_novas
xn,yn = close_novae.column("Speed (parsecs/year)"),close_novae.column("Distance (million parsecs)")
sns.regplot(x=xn, y=yn)
plt.show()
close_novas
Out[10]:
Speed (parsecs/year)
Distance (million parsecs)
0.00873361 117.305 0.0153418 217.007 0.0162256 230.961
Speed (parsecs/year)
Distance (million parsecs)
0.00528131 85.2853 0.0129474 185.051 0.0138862 212.841 0.0111837 151.728 0.0060085 82.6121 0.00838228 104.029 0.00812078 124.778 ... (146 rows omitted)
In [11]:
# Fill this in manually by examining the line above.
first_guess_universe_age_years = (10*10**9)
# This just shows your guess as a nice string, in billions of years.
"{:,} billion years".format(round(first_guess_universe_age_years / 1e9, 2))
Out[11]:
'10.0 billion years'
In [12]:
check('tests/q4.py')
Out[12]:
All tests passed!
Fitting the line yourself
¶
Displaying a fit line is visually helpful but we need to be able to calculate the slope as a number. Recall that the least-squares regression line for our supernova data is:
•
the line •
with the smallest average (over all the supernovae we observe) •
error, •
squared, •
where the error is $$\text{the supernova's actual distance from Earth} - \text{the height of the line at that supernova's speed.}$$ In [13]:
## TEST Fitting a practice line with slope and intercept that you pick slope = 4
intercept = 3
predict = close_novas.column('Speed (parsecs/year)')*slope+intercept
tbl = close_novas.with_column('predict',predict/1e6)
#tbl = tbl
Out[13]:
Speed (parsecs/year)
Distance (million parsecs)
predict
0.00873361 117.305 3.03493e-06
0.0153418 217.007 3.06137e-06
0.0162256 230.961 3.0649e-06 0.00528131 85.2853 3.02113e-06
0.0129474 185.051 3.05179e-06
0.0138862 212.841 3.05554e-06
0.0111837 151.728 3.04473e-06
0.0060085 82.6121 3.02403e-06
0.00838228 104.029 3.03353e-06
0.00812078 124.778 3.03248e-06
... (146 rows omitted)
Question 5
¶
Define a function called errors
. It should take three arguments:
1. a table like close_novas
(with the same column names and meanings, but not necessarily the same data) 2. the slope of a line (a number) 3. the intercept of a line (a number). It should return an array of the errors made when a line with that slope and intercept is used to predict distance from speed for each supernova in the given table. (The error is the actual distance minus the predicted distance.)
In [14]:
def errors(tbl, slope, intercept):
## extract values from table X_values = tbl.column("Speed (parsecs/year)")
Y_values = tbl.column("Distance (million parsecs)")
## predicted values from line predicted_y = (slope * X_values) + intercept
## errors error_values = Y_values - predicted_y
return error_values
Question 6
¶
Using errors
, compute the errors for the line with slope 16000
and intercept 0
on the close_novas
dataset. Name that array example_errors
. Then make a scatter plot of the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
errors.
Hint:
To make a scatter plot of the errors, plot the error for each supernova in the dataset. Put the actual speed on the horizontal axis and the error on the vertical axis.
In [15]:
example_errors = errors(close_novas,16000,0)
example_errors
plt.scatter(close_novas[0], example_errors)
Out[15]:
<matplotlib.collections.PathCollection at 0x7fe882bc0d60>
In [16]:
## Check sign of error
np.round(example_errors.item(0), 2)
Out[16]:
-22.43
In [17]:
check('tests/q6.py')
Out[17]:
All tests passed!
You should find that the errors are almost all negative. That means our line is a little bit too steep. Let's find a better one.
Question 7
¶
Define a function called fit_line
. It should take a table like close_novas
(with the same column names and meanings) as its argument. It should return an array containing
the slope (as item 0) and intercept (as item 1) of the least-squares regression line predicting distance from speed for that table.
Here we will minimize
square errors.
Note: If you haven't tried to use the minimize
function
yet, now is a great time to
practice. Here's an example from the textbook, Inferential Thinking 15.3
.
First we will define an mse
function to get the mean squared error.
In [18]:
def mse(tbl,xlabel,ylabel,any_slope,any_intercept):
xdata, ydata = tbl.column(xlabel), tbl.column(ylabel)
fitted = any_slope * xdata + any_intercept
mse = np.mean((ydata - fitted) ** 2)
print("Root mean squared error:", mse ** 0.5)
return mse
In [19]:
mse(close_novas,'Speed (parsecs/year)','Distance (million parsecs)',1600,0)
Root mean squared error: 129.154916893
Out[19]:
16680.992557632489
In order to use datascience minimize
function we need to be able to vary all function parameters which is impossible for table name and data column labels so we creat a related function below.
In [20]:
def mse_c(any_slope,any_intercept):
tbl = close_novas
xlabel = 'Speed (parsecs/year)'
ylabel = 'Distance (million parsecs)'
xdata, ydata = tbl.column(xlabel), tbl.column(ylabel)
fitted = any_slope * xdata + any_intercept
mse = np.mean((ydata - fitted) ** 2)
print("Root mean squared error:", mse ** 0.5)
return mse
Determine the mse (mean squared error) for three values of any_slope and any_intercept
¶
In [21]:
###
mse_c(slope, intercept)
Root mean squared error: 142.499211976
Out[21]:
20306.025413737752
Now use minimize to find minimum of mse for any_slope and any_intercept
In [22]:
minimize(mse_c)
Root mean squared error: 145.306173836
Root mean squared error: 145.306173836
Root mean squared error: 145.296075973
Root mean squared error: 145.279737294
Root mean squared error: 143.49888016
Root mean squared error: 10.5223813535
Root mean squared error: 231.799957245
Root mean squared error: 89.0721365992
Root mean squared error: 55.6676595884
Root mean squared error: 10.6217148189
Root mean squared error: 10.6217148206
Root mean squared error: 10.5223813535
Root mean squared error: 10.5362970345
Root mean squared error: 10.6996457522
Root mean squared error: 10.5612213794
Root mean squared error: 10.5164811094
Root mean squared error: 10.5164425214
Root mean squared error: 10.5164431154
Root mean squared error: 10.5164431154
Root mean squared error: 145.956039279
Root mean squared error: 10.5164425214
Root mean squared error: 10.5167617359
Root mean squared error: 10.5159466466
Root mean squared error: 10.5113724236
Root mean squared error: 10.5233436836
Root mean squared error: 10.5131198561
Root mean squared error: 10.5120399177
Root mean squared error: 10.5113729308
Root mean squared error: 10.5113729308
Root mean squared error: 10.5113724236
Root mean squared error: 10.5302291223
Root mean squared error: 10.6809654332
Root mean squared error: 10.547214309
Root mean squared error: 10.5073515629
Root mean squared error: 10.5070443118
Root mean squared error: 10.5070447447
Root mean squared error: 10.5070447447
Root mean squared error: 10.4991277292
Root mean squared error: 10.5070443118
Root mean squared error: 10.4991277292
Root mean squared error: 10.489464887
Root mean squared error: 10.481839294
Root mean squared error: 10.5017916282
Root mean squared error: 10.4847526658
Root mean squared error: 10.4829521985
Root mean squared error: 10.4818418175
Root mean squared error: 10.4818418175
Root mean squared error: 10.481839294
Root mean squared error: 10.529432795
Root mean squared error: 10.6059883561
Root mean squared error: 10.500043856
Root mean squared error: 10.4887965472
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.484497273
Root mean squared error: 10.4822271298
Root mean squared error: 10.4818958793
Root mean squared error: 10.4818475497
Root mean squared error: 10.4818404984
Root mean squared error: 10.4818394697
Root mean squared error: 10.4818393196
Root mean squared error: 10.4818392977
Root mean squared error: 10.4818392945
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.5070443118
Root mean squared error: 10.5476993189
Root mean squared error: 10.491473901
Root mean squared error: 10.4855204312
Root mean squared error: 10.481839294
Root mean squared error: 10.4818393401
Root mean squared error: 10.4818393007
Root mean squared error: 10.4818392951
Root mean squared error: 10.4818392942
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Out[22]:
array([ 1.40945338e+04, 2.40907951e+00])
In [23]:
values = minimize(mse_c)
print(values,values[0],values[1]) # print and extract values
Root mean squared error: 145.306173836
Root mean squared error: 145.306173836
Root mean squared error: 145.296075973
Root mean squared error: 145.279737294
Root mean squared error: 143.49888016
Root mean squared error: 10.5223813535
Root mean squared error: 231.799957245
Root mean squared error: 89.0721365992
Root mean squared error: 55.6676595884
Root mean squared error: 10.6217148189
Root mean squared error: 10.6217148206
Root mean squared error: 10.5223813535
Root mean squared error: 10.5362970345
Root mean squared error: 10.6996457522
Root mean squared error: 10.5612213794
Root mean squared error: 10.5164811094
Root mean squared error: 10.5164425214
Root mean squared error: 10.5164431154
Root mean squared error: 10.5164431154
Root mean squared error: 145.956039279
Root mean squared error: 10.5164425214
Root mean squared error: 10.5167617359
Root mean squared error: 10.5159466466
Root mean squared error: 10.5113724236
Root mean squared error: 10.5233436836
Root mean squared error: 10.5131198561
Root mean squared error: 10.5120399177
Root mean squared error: 10.5113729308
Root mean squared error: 10.5113729308
Root mean squared error: 10.5113724236
Root mean squared error: 10.5302291223
Root mean squared error: 10.6809654332
Root mean squared error: 10.547214309
Root mean squared error: 10.5073515629
Root mean squared error: 10.5070443118
Root mean squared error: 10.5070447447
Root mean squared error: 10.5070447447
Root mean squared error: 10.4991277292
Root mean squared error: 10.5070443118
Root mean squared error: 10.4991277292
Root mean squared error: 10.489464887
Root mean squared error: 10.481839294
Root mean squared error: 10.5017916282
Root mean squared error: 10.4847526658
Root mean squared error: 10.4829521985
Root mean squared error: 10.4818418175
Root mean squared error: 10.4818418175
Root mean squared error: 10.481839294
Root mean squared error: 10.529432795
Root mean squared error: 10.6059883561
Root mean squared error: 10.500043856
Root mean squared error: 10.4887965472
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.484497273
Root mean squared error: 10.4822271298
Root mean squared error: 10.4818958793
Root mean squared error: 10.4818475497
Root mean squared error: 10.4818404984
Root mean squared error: 10.4818394697
Root mean squared error: 10.4818393196
Root mean squared error: 10.4818392977
Root mean squared error: 10.4818392945
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.5070443118
Root mean squared error: 10.5476993189
Root mean squared error: 10.491473901
Root mean squared error: 10.4855204312
Root mean squared error: 10.481839294
Root mean squared error: 10.4818393401
Root mean squared error: 10.4818393007
Root mean squared error: 10.4818392951
Root mean squared error: 10.4818392942
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
Root mean squared error: 10.481839294
[ 1.40945338e+04 2.40907951e+00] 14094.5337865 2.40907950644
Now define the function fit_line
below to use minimize function
In [24]:
def fit_line(tbl):
# Your code may need more than 1 line below here.
xdata,ydata = tbl.column(0), tbl.column(1)
def mse_c(slope,intercept):
fitted = slope * xdata + intercept
return np.mean((ydata - fitted)**2)
values = minimize(mse_c)
slope = values[0]
intercept = values[1]
return make_array(slope, intercept)
# Here is an example call to your function. To test your function,
# figure out the right slope and intercept by hand.
example_table = Table().with_columns(
"Speed (parsecs/year)", make_array(0, 1),
"Distance (million parsecs)", make_array(1, 3))
fit_line(example_table)
Out[24]:
array([ 2., 1.])
In [25]:
check('tests/q7.py')
Out[25]:
All tests passed!
Question 8
¶
Use your function to fit a line to close_novas
.
Then, set new_errors
equal to the errors that we get calling errors
with our new line. The following line will graph the corresponding residual plot with a best fit line.
Make sure that the residual plot makes sense (Hint: what qualities should the best fit line
of a residual plot have?)
In [26]:
best_line = fit_line(close_novas)
best_line_slope = best_line.item(0)
best_line_intercept = best_line.item(1)
new_errors = errors(close_novas, best_line_slope, best_line_intercept)
# This code displays the residual plot, given your values for the best_line_slope and best_line_intercept
best=Table().with_columns("Speed (parsecs/year)", close_novas.column("Speed (parsecs/year)"), "Distance errors (million parsecs)", new_errors
)
xb,yb = best.column("Speed (parsecs/year)"),best.column("Distance errors (million parsecs)")
sns.regplot(x=xb, y=yb)
plt.show()
# This just shows your answer as a nice string, in billions of years.
"Slope: {:g} (corresponding to an estimated age of {:,} billion years)".format(best_line_slope, round(best_line_slope/1000, 4))
Out[26]:
'Slope: 14094.5 (corresponding to an estimated age of 14.0945 billion years)'
That slope (multiplied by 1 million) is an estimate of the age of the universe. The current best estimate of the age of the universe (using slightly more sophisticated techniques) is
13.799 billion years.
What was our determined age? Did we get close?
The determined age is 14.0945 billion years, Yes, this is accurate.
One reason our answer might be a little off is that we are using a sample of only some of the supernovae in the universe. Our sample isn't exactly random, since astronomers
presumably chose the novae that were easiest to measure (or used some other nonrandom criteria). But let's assume it is. How can we produce a confidence interval for the age of the universe?
Question 9
¶
It's time to bootstrap so that we can quantify the variability in our estimate! Simulate 1000 resamples from close_novas
. For each resample, compute the slope of the least-
squares regression line, and multiply it by 1 million to compute an estimate of the age of the universe. Store these ages in an array called bootstrap_ages
, and then use them to compute a 95% confidence interval for the age of the universe.
Note:
This might take up to a minute, and more repetitions will take even longer. See text: Inferential Thinking 13.1
In [46]:
bootstrap_ages = make_array()
for i in np.arange(1000):
sampled_novas = fit_line(close_novas.sample())
sampled_age = sampled_novas[0] * 1000000
bootstrap_ages = np.append(bootstrap_ages, sampled_age)
lower_end = percentile(2.5,bootstrap_ages)
upper_end = percentile(97.5,bootstrap_ages)
Table().with_column("Age estimate", bootstrap_ages*1e-9).hist(bins=np.arange(12, 16, .1), unit="billion years")
print("95% confidence interval for the age of the universe: [{:g}, {:g}] billion years".format(lower_end*1e-9, upper_end*1e-9))
95% confidence interval for the age of the universe: [13.614, 14.518] billion years
In [47]:
check('tests/q9.py')
Out[47]:
All tests passed!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Nice work, data astronomer! You can compare your result to the Planck project 2015 results
, which estimated the age of the universe to be 13.799±0.021 billion years. Please remember to submit!
*
In [48]:
# For your convenience, you can run this cell to run all the tests at once!
import glob
from gofer.ok import check
correct = 0
checks = [3,4,6,7,9]
total = len(checks)
for x in checks:
print('Testing question {}: '.format(str(x)))
g = check('tests/q{}.py'.format(str(x)))
if g.grade == 1.0:
print("Passed")
correct += 1
else:
print('Failed')
display(g)
print('Grade: {}'.format(str(correct/total)))
print("Nice work ",name, user)
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)
Testing question 3: Passed
Testing question 4: Passed
Testing question 6: Passed
Testing question 7: Passed
Testing question 9: Passed
Grade: 1.0
Nice work Sam Z. tul06187@temple.edu
Submitted @ Fri Nov 17 03:06:35 2023
In [ ]: