lab 10
html
keyboard_arrow_up
School
Temple University *
*We aren’t endorsed by this school
Course
853
Subject
Computer Science
Date
Jan 9, 2024
Type
html
Pages
51
Uploaded by ProfFreedom5002
Machine Learning and prediction
¶
Elements of Data Science
In this laboratory we will use training data to predict outcomes. We will first test these ideas using our Old Faithful data again. Next we will look at data on the iris flower to classify iris' based on sepal width and length. In our culminating activity we will predict molecular acidity using data computed by Prof. Vince Voelz
in the Temple Chemistry department and a graduate student, Robert Raddi. See their paper: Stacking Gaussian processes to improve pKa predictions in the SAMPL7 challenge
.
In [1]:
Your_name = "Sarmad"
Learning from training data
¶
A key concept in machine learning is using a subset of a dataset to train an algorithm to make estimates on a separate set of test data. The quality of the machine learning and algorithm can be assesed based on the accuracy of the predictions made on test data. Many times there are also parameters sometimes termed hyper-parameters which can be optimized through an iterative approach on test or validation data. In practice a dataset is randomly split into training and test sets using sampling.
k nearest neighbor
¶
We will examine one machine learning algorithm in the laboratory, k nearest neighbor. Many of the concepts are applicable to the broad range of machine learning algorithms available.
In [2]:
## import statements
# These lines load the tests. from gofer.ok import check
import numpy as np
from datascience import *
import pandas as pd
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import warnings
warnings.simplefilter('ignore', UserWarning)
#from IPython.display import Image
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
# Fix for datascience collections Iterable
import collections as collections
import collections.abc as abc
collections.Iterable = abc.Iterable
!pip install jupyterquiz
from jupyterquiz import display_quiz
import json
from IPython.core.display import HTML
Requirement already satisfied: jupyterquiz in /opt/conda/lib/python3.10/site-packages (2.1.5)
k nearest neighbor regression
¶
We will use the k nearest neighbor algorithm to make predictions of wait time in minutes
following an eruption duration ofa given number of minutes (independent variable).
In [3]:
faithful = Table.read_table("data/faithful.csv")
faithful.scatter(0, 1, fit_line=True)
Question 1
¶
Use the datascience .split(n) Table method to split the dataset into 80% training and 20% test. The argument for .split(n) method,n, needs to be an integer. See datascience documentation
In [4]:
trainf, testf = faithful.split(int(faithful.num_rows * 0.8))
print(trainf.num_rows, 'training and', testf.num_rows, 'test instances.')
217 training and 55 test instances.
In [5]:
check('tests/q1.py')
Out[5]:
All tests passed!
Nearest neighbor concept
¶
The training examines the chreacteristics of k
nearest neighbors to the data point for which a prediction will be made. Nearness is measured using several different metrics
with Euclidean distance being a common one for numerical attributes.
Euclidean distance:
1-D $$ d(p,q) = \sqrt{(p-q)^{2}} $$
2-D $$ d(p,q) = \sqrt{(p_1-q_1)^{2}+(p_2-q_2)^{2}} $$
For multiple points (rows): 2-D $$ d(p,q) = \sum{{\sqrt{((p_1-q_1)^{2}+(p_2-
q_2)^{2}}}} $$
Try different attribute values in the following 2D Euclidean distance example code below to get a feel for the computation
¶
In [6]:
# Example code to compute an Euclidean distance between two 2-D points
d_p_q = np.sqrt(sum((make_array(2,3)-make_array(4,3))**2))
d_p_q
Out[6]:
2.0
To get values from Table row as an array as is done in row_distance. Note in the faithful data case we will only consider the duration column in nearest neighbor computation but
in examples below we will use a 2-D array of attributes with the iris data and a 10-D array in the chemistry and molecular acidity case.
In [7]:
f_array = np.array(faithful.row(0))
f_array
Out[7]:
array([ 3.6, 79. ])
A couple quick review questions about nearest neighbor below, select the best answer (multiple tries ok). Execute the below cell to reveal the self-check quiz.
¶
In [8]:
with open("questions.json", "r") as file:
questions=json.load(file) display_quiz(questions)
Question 2
¶
Define a function which is the Euclidean distance between two values. Use the last two example code cells above as inspiration. This is where we will compute the distance between two duration
values.
In [9]:
def distance(pt1, pt2):
"""The distance between two points, represented as arrays."""
return np.sqrt(sum((pt1 - pt2)**2))
In [10]:
check('tests/q2.py')
Out[10]:
All tests passed!
Rest of the nearest neighbor algorithm
¶
Execute these cells to create the complete algorithm
In [11]:
def row_distance(row1, row2):
"""The distance between two rows of a table."""
return distance(np.array(row1), np.array(row2)) # Need to convert rows into arrays
def distances(training, example, output):
"""Compute the distance from example for each row in training."""
dists = []
attributes = training.drop(output)
for row in attributes.rows:
dists.append(row_distance(row, example))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
return training.with_column('Distance', dists)
def closest(training, example, k, output):
"""Return a table of the k closest neighbors to example."""
return distances(training, example, output).sort('Distance').take(np.arange(k))
Question 3
¶
Take an example row from the test data (testf), drop the prediction column and use the closest function to see the top 10 closest points to the target in the training data.
In [12]:
example_row = testf.row(1)
example_row # This should display data contained in selected row in testf table.
Out[12]:
Row(duration=3.75, wait=75.0)
In [13]:
k = 10 # Number of nearest neighbors
closest(testf,example_row,k,'wait')
Out[13]:
duration wait Distance
4.933 86 70.077 4.9 82 70.1094 4.9 89 70.1094 4.767 78 70.2404 4.716 90 70.2906 4.7 73 70.3064 4.7 80 70.3064 4.667 84 70.339 4.633 81 70.3725 4.583 77 70.4219 In [14]:
check('tests/q3.py')
Out[14]:
All tests passed!
Question 4
¶
Predict the value for this row using the defined predict_nn function below and compare to the value reported for wait in the test data. How do they compare?
In [15]:
def predict_nn(example):
"""Return the majority class among the k nearest neighbors."""
k = 10
return np.average(closest(trainf, example, k , 'wait').column('wait'))
In [16]:
predictionf = predict_nn(example_row) # This is the value predicted for wait using the average of the k nearest neighbors in the test set
actual = example_row
print(predictionf,actual)
84.0 Row(duration=3.75, wait=75.0)
The prediction value and the value reported for wait in the test data are close, but still off by a noticeable amount. In [17]:
check('tests/q4.py')
Out[17]:
All tests passed!
Question 5
Predictions
¶
Now we will make predictions for the whole data set using the apply Table method. We will then look at the root mean squared error (RMSE) for the nearest neighbor fit and a scatter plot. Try adjusting the value of k in the predict_nn function to see it's effect on the quality of fit by rerunning these cells. Are the predicted points in a perfect
straight line, why or why not?
In [18]:
testf = testf.with_columns("predict",testf.apply(predict_nn,"duration"))
nn_test_predictions = testf.column("predict")
test_wait = testf.column("wait")
rmse_nn = np.mean((test_wait - nn_test_predictions) ** 2) ** 0.5
print('Test set RMSE for nearest neighbor regression:', round(rmse_nn,2))
Test set RMSE for nearest neighbor regression: 6.1
In [19]:
testf.scatter("duration")
There appears to be a trend but it's not very strong because there are two clusters of data points. Classify iris data with machine learning
¶
Next we will take on the problem of classifying iris data into three categories, setosa, versicolor, and virginica. Here we will also learn the basics of the k nearest neighbor algorithm.
The first data set we will look at consists of 50 samples from three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured including the length and the width of the sepals and petals, in centimeters for each observation. Iris stainglass, J.R. Smith
In [20]:
n_neighbors = 15
# Load iris data
iris = datasets.load_iris()
# We only take the first two features. iris_table = Table().with_columns("Name",iris.target,iris.feature_names[0],iris.data[:,0],iris.feature_
names[1],iris.data[:,1])
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
iris_table
Out[20]:
Name
sepal length (cm)
sepal width (cm)
0 5.1 3.5 0 4.9 3 0 4.7 3.2 0 4.6 3.1 0 5 3.6 0 5.4 3.9 0 4.6 3.4 0 5 3.4 0 4.4 2.9 0 4.9 3.1 ... (140 rows omitted)
In [21]:
iris.target_names
Out[21]:
array(['setosa', 'versicolor', 'virginica'],
dtype='<U10')
Question 6
¶
Train and test split the iris_table @ 80% as above.
In [22]:
train_i, test_i = iris_table.split(int(iris_table.num_rows * 0.8))
print(train_i.num_rows, 'training and', test_i.num_rows, 'test instances.')
120 training and 30 test instances.
In [23]:
check('tests/q6.py')
Out[23]:
All tests passed!
Question 7
¶
With classification we need to use training data to decide how to classify data given a set
of attributes, sepal length and sepal width in this case. Create a function which returns the majority classification among three possibilities in "Name" coded as 0, 1, 2 (setosa, versicolor, and virginica respectively). The and
below combines two conditionals. For example, (twos > ones) and ...
In [24]:
def majority(topkclasses):
twos = topkclasses.where('Name', are.equal_to(2)).num_rows
ones = topkclasses.where('Name', are.equal_to(1)).num_rows
zeros = topkclasses.where('Name', are.equal_to(0)).num_rows
# Now test to see what the majority name for each k class
if twos and ones:
return 2
elif ones and zeros:
return 1
else:
return 0
In [25]:
check('tests/q7.py')
Out[25]:
All tests passed!
In [26]:
def classify(training, new_point, k):
closestk = closest(training, new_point, k,"Name")
topkclasses = closestk.select('Name')
return majority(topkclasses)
In [27]:
test_row = 21
print("Prediction: ",classify(train_i,example_row,test_row)," Actual: ",test_i.select("Name").row(test_row))
Prediction: 0 Actual: Row(Name=0)
In [28]:
def predict(train, test_attributes, k):
pred = []
for i in np.arange(test_attributes.num_rows):
pred.append(classify(train,test_attributes.row(i),k))
return pred
Question 8
¶
Make a new table called prediction which includes original columns of test Table but also includes a "predict" column.
In [29]:
k = 10
prediction = test_i.with_columns("predict",predict(train_i, test_i.drop('Name'), k))
prediction.show(30)
Name
sepal length (cm)
sepal width (cm)
predict
1 5.5 2.3 2 2 7.7 3.8 2 2 7.7 3 2 0 4.8 3 0 1 6.7 3.1 2
Name
sepal length (cm)
sepal width (cm)
predict
1 6 2.7 2 1 6.9 3.1 2 2 7.2 3.6 2 1 5.6 2.5 2 0 5.3 3.7 0 1 5.6 2.7 2 1 5 2 2 2 6.3 2.7 2 1 5.8 2.7 2 2 6.5 3 2 0 4.3 3 0 2 6.3 3.3 2 1 5.5 2.6 2 1 4.9 2.4 2 0 5.1 3.5 0 0 5.1 3.3 0 0 4.6 3.1 0 2 5.7 2.5 2 1 6.1 2.8 2 0 5.5 4.2 0 1 6.8 2.8 2 0 4.7 3.2 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Name
sepal length (cm)
sepal width (cm)
predict
0 4.7 3.2 0 2 6.4 3.2 2 0 5.2 4.1 0 In [30]:
check('tests/q8.py')
Out[30]:
All tests passed!
Plot decision outcomes for test set
¶
Question 9
¶
Use above prediction Table to make a scatter plot of the color coded predictions based on the tweo attributes(use colors="predict" in scatter plot after specifying x and y axis based on attributes).
In [31]:
colors = "predict"
prediction.drop("Name").scatter('sepal length (cm)','sepal width (cm)', colors, group = 2)
Fancy plot showing color coded decision boundaries
¶
We can make a more informative plot by predicting on a grid of attribute values as shown below. Seaborn is an add-on to the Matplotlib plotting we have been using which provides more control of plotting. Execute (this may take a minute+) and study the below input and resulting output for your information.
In [32]:
def make_colors(iris, y, cmap):
colors = []
cdict = {'setosa':0, 'virginica':2, 'versicolor':1}
for x in iris.target_names[y]:
colors.append(cmap[cdict[x]])
return colors
In [33]:
import seaborn as sns
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
h = .1 # step size in the mesh
k = 10
x_min, x_max = iris.data[:, 0].min() - 1, iris.data[:, 0].max() + 1
y_min, y_max = iris.data[:, 1].min() - 1, iris.data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
## Create a grid of predictions in a Table
attribute_grid = Table().with_columns(iris.feature_names[0],np.c_[xx.ravel(), yy.ravel()]
[:,0],iris.feature_names[1],np.c_[xx.ravel(), yy.ravel()][:,1])
Z = np.array(predict(train_i,attribute_grid,k))
# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap_light)
# Plot the test points but convert to numpy arrays
predictions = prediction.column('predict')
attribute1 = prediction.column(1)
attribute2 = prediction.column(2)
plt.scatter(x=attribute1, y=attribute2, c = make_colors(iris, predictions, cmap_bold), alpha=1.0, edgecolor="black")
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i')"
% (k))
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
Out[33]:
Text(0, 0.5, 'sepal width (cm)')
Use scikit learn
¶
Scikit learn
is a standard state of the art machine learning library. For demonstration purposes execute the below commands to classify and generate a comparable output.
In [34]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
In [35]:
clf = neighbors.KNeighborsClassifier(k) # Initiate the classifier
x_train, x_test, y_train, y_test = train_test_split(iris.data[:,:2], iris.target, random_state=22) # scikit split
# Now fit
clf.fit(x_train, y_train)
Out[35]:
KNeighborsClassifier(n_neighbors=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsClassifier
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
KNeighborsClassifier(n_neighbors=10)
In [36]:
import seaborn as sns
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
h = .1 # step size in the mesh
x_min, x_max = iris.data[:, 0].min() - 1, iris.data[:, 0].max() + 1
y_min, y_max = iris.data[:, 1].min() - 1, iris.data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
y = y_test
plt.scatter(x=x_test[:, 0], y=x_test[:, 1], c = make_colors(iris, y, cmap_bold),
alpha=1.0, edgecolor="black")
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i')"
% (k))
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
Out[36]:
Text(0, 0.5, 'sepal width (cm)')
Question 10
¶
Comment on the quality of the predictions by 1. Your nearest neighjbor algorithm 2. scikit learn 3. Comparison 1.The nearest neighbor algorithm quality is good because in the graph the each color lines up well with its
classification 2. The scikit learn quality is good as well, since it looks accurate compared to the data 3. The scikit learn is more accurate than the neighjbor algorithm, although they both have good quality of predictions. ¶
Molecules and predicting acidity measured by pKa
¶
Within the Jupyter notebook we can also analyze molecules and their molecular data using the library RDKit. RDKit adds the ability to visualize 2D and 3D molecular structures. We can apply many of the data science tools we have learned to molecular data as well. First we will briefly look at acid-base chemistry and how acidity is defined. pH is a measure of the acidity of a water-based (aqueous) solution. A pH of 1 is acidic, a pH of 7 is neutral and a pH of 14 is basic. Next we will use some computed atributes of a large set of molecules to train a k nearest neighbor model to predict acidity. We will use a range of attributes including the partial charges on atoms adjacent to the acidic proton, molecular weight, solvent accessible surface area (SASA), carbon-oxygen bond order,
and some thermochemistry measures all of which may help predict acidity with a lower pKa indicating a stronger (weak) acid.
Acid-base and pKa background
¶
A very brief background on acid - base equilibria demonstrated for glycine. See OpenStax Chemistry
for details based on interest. RDKit
¶
RDKit
is a specialized library to handle the complexities of molecules within Python.
In [37]:
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole #Needed to show molecules
from rdkit.Chem.Draw.MolDrawing import MolDrawing, DrawingOptions #Only needed if modifying defaults
from rdkit.Chem import rdRGroupDecomposition
from rdkit.Chem import rdDepictor
from rdkit.Chem import PandasTools
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit import DataStructs
# Options
DrawingOptions.bondLineWidth=1.8
rd =True
Load detailed molecular data for 2000 molecules
¶
In [38]:
url = "https://raw.githubusercontent.com/robraddi/GP-SAMPL7/main/pKaDatabase/OChem/ochem0-
2000.csv"
data = Table.read_table(url)
data=data.sort('N')
data.show(5)
SMILES
CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAM
1
[O-]C1=C2C=C
C=CC2=NC=N
1
- R1207641
M20829 - - 4-
Hydroxyquinazoline - OC1=CC2=CN
=CN=C2C=C1 - R1207643
M1107327 - - 6-
hydroxyquinazoline - OC1=CC2=NC
=NC=C2C=C1 - R1207645
M1107328 - - 7-
hydroxyquinazoline - CC1=C2C=CC= - R1207650
M46729 - - 8-hydroxy-4-
-
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SMILES
CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAM
1
C(O)C2=NC=N
1
methylquinazoline
[S-]c1ncc2cccc
c2[n+]1 - R1207652
M1158202 - - 2-
mercaptoquinazoline
- ... (1995 rows omitted)
Question 11.
Select an amino acid
¶
Use the Table above to view data for an amino acid of your selection from the 21 amino acids which are building blocks of protiens. See web page
for possible choices. Hint: use are.containing within the .where() Table method. For example below we can find compounds which contain a trimethyl group (3 CH$_3$) groups. We get 11 rows (records).
In [39]:
trimethyl = data.where("NAME",are.containing("trimethyl"))
trimethyl
Out[39]:
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
INT
CC1=NC
(C)=C(C
)C(=N1)
S([O-])
(=O)=O
- R1207394
M1158149 - - 2,4,5-trimethyl-6-
sulphopyrimidine - Koe
CC1=NC
(C)=C(C
)N=N1 - R1207401
M1158154 - - 3,5,6-trimethyl-
1,2,4-triazine - Koe
CC1=NC - R1207338
M1106856 - - 2-hydroxy-4,5,6-
- Koe
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
INT
([O-])=N
C(C)=C1
C trimethylpyrimidine Cc1nc([
O-])
[n+]c(C)
c1C - R1207339
M1158110 - - 2-hydroxy-4,5,6-
trimethylpyrimidine - Koe
CCC1=C
(C)N=N
C(C)=C1
C - R1207140
M1157998 - - 4-ethyl-3,5,6-
trimethylpyridazine - Koe
CC1=CC
(C)=C(C
)N=N1 - R1207154
M1158007 - - 3,4,6-
trimethylpyridazine - Koe
CNC(=N
)N(C)C - R1206678
M1106709 - - N,N',N'-
trimethylaguanidine - Koe
CNN(C)C - R1206476
M1016037 - - trimethylhydrazine - Koe
CC1=C(
C)C(C)=
NO1 - R1206399
M32590 - - Isoxazoline, 3,4,5-
trimethyl- - Koe
CC1(C)C
(CCC1(C
)C(O)=O
)C(O)=O
- R1203666
M6436 - - cyclopentan-1,3-
dicarboxlic acid-
1,2,2-trimethyl
- Koe
... (1 rows omitted)
In [40]:
amino = data.where("NAME",are.containing("lysine"))
amino
Out[40]:
SMILES
CASRN RECORDID MOLECULEID EXTERNALID N NAME
NAME.
1
INTRODUCER
M
[NH2+]C
CCCC([N
H2+])C(
O)=O
- R1203836
M1157713 - - lysine
- Koerner c
NC(CCCC
[NH2+])
C([O-])=
O - R1203837
M1157714 - - lysine
- Koerner c
NCCCCC(
N)C([O-])
=O - R1203838
M17151 - - lysine
- Koerner c
In [41]:
check('tests/q11.py')
Out[41]:
All tests passed!
Display molecular structure
¶
SMILES
is a shorthand language to describe molecular structure. Execute each structure below.
In [42]:
Chem.MolFromSmiles("[H]-O-[H]") #Water
Out[42]:
In [43]:
Chem.MolFromSmiles("[CH3]") #Methyl radical
Out[43]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [44]:
Chem.MolFromSmiles("C-C-O") #Ethanol
Out[44]:
In [45]:
Chem.MolFromSmiles("[NH2+]CC(O)=O") # Glycine
Out[45]:
Selected amino acid 2D molecular structure
¶
Try it out for fun! Use the same above syntax and the SMILES string from your above Table to display a 2D amino acid structurefrom your selection. Even if there is no rdkit, try your hand at the SMILES molecular description.
In [50]:
smile_struct = ("[NH2+]CCCCC([NH2+])C(O)=O")
Chem.MolFromSmiles(smile_struct)
Out[50]:
Code to create a grid of molecular images with labels
¶
Execute and study the below code
In [51]:
mols = [Chem.MolFromSmiles(x) for x in amino.column("SMILES") if x is not None] #Iterator
mols
name = amino.column("NAME")
for i,m in enumerate(mols):
m.SetProp("Name",name[i])
p = Draw.MolsToGridImage( [mols[x] for x in range(0,3)] , legends=[x.GetProp("Name") for x
in mols],molsPerRow=3,subImgSize=(300,250), useSVG=True )
p
Out[51]:
Use pandas to add 2D structures to dataframe
¶
We can convert our Table to pandas then use the RDKit AddMoleculeColumnToFrame method to add structures. One row has an anomolous nitrogen atom, N, so don't be alarmed by the error presented. Occasionally the 2D images of the structures fail to appear, unfortunate but not a cause for concern either.
In [52]:
df = data.to_df()
df
Out[52]:
SMILES
CASRN RECORDID MOLECULEID EXTERNALID
N
NA
0
[O-]C1=C2C=C
C=CC2=NC=N
1
-
R1207641
M20829
-
-
4-Hydroxyquinazo
1
OC1=CC2=CN
=CN=C2C=C1
-
R1207643
M1107327
-
-
6-hydroxyquinazo
2
OC1=CC2=NC
=NC=C2C=C1
-
R1207645
M1107328
-
-
7-hydroxyquinazo
3
CC1=C2C=CC
=C(O)C2=NC=
N1
-
R1207650
M46729
-
-
8-hydroxy-4-meth
4
[S-]c1ncc2cccc
c2[n+]1
-
R1207652
M1158202
-
-
2-mercaptoquina
...
...
...
...
...
...
...
...
SMILES
CASRN RECORDID MOLECULEID EXTERNALID
N
NA
1995
OC(=O)COC1=
CC=CC(=C1)
[N+]([O-])=O
1878-
88-2
R2182408
M896
-
99
m-Nitrophenoxya
1996
COC1=CC=CC(
OC)=C1C(=O)N
[C@H]1[C@H]2
SC(C)(C)[C@...
61-32-5 R1509608
M9792
-
99
6-({[2,6-
bis(methyloxy)ph
mino)-...
1997
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OC[C@@H]
(CO[P@@](O)
(=...
-
R1321912
M663928
-
99
1,2-Dioleoylphosp
1998
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OCC(COP(O)
(=O)OCCN)OC..
.
-
R1321798
M659539
-
99
1,2-
Dioleoylphosphat
e
1999
NC1=CC=C(N)
C(O)=C1N
-
R2172809
M2608463
-
nan -
2000 rows × 32 columns
In [53]:
df = data.to_df() # Convert Table to pandas dataframe
PandasTools.AddMoleculeColumnToFrame(df,smilesCol='SMILES',molCol='Molecule',includeFinger
prints=True)
col = df.pop("NAME")
df.insert(0, col.name, col) # Move structure to first column
col = df.pop("Molecule")
df.insert(1, col.name, col) # Move structure to first column
df
[00:51:55] Explicit valence for atom # 11 N, 4, is greater than permitted
Out[53]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
0
4-Hydroxyquinazoline
[O-]C1=C2C=C
C=CC2=NC=N
1
-
R12
1
6-hydroxyquinazoline
OC1=CC2=CN
=CN=C2C=C1
-
R12
2
7-hydroxyquinazoline
OC1=CC2=NC
=NC=C2C=C1
-
R12
3
8-hydroxy-4-methylquinazoline
CC1=C2C=CC
=C(O)C2=NC=
N1
-
R12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
4
2-mercaptoquinazoline
[S-]c1ncc2cccc
c2[n+]1
-
R12
...
...
...
...
...
...
1995
m-Nitrophenoxyacetic Acid
OC(=O)COC1=
CC=CC(=C1)
[N+]([O-])=O
1878-
88-2
R21
1996
6-({[2,6-
bis(methyloxy)phenyl]carbonyl}a
mino)-...
COC1=CC=CC(
OC)=C1C(=O)N
[C@H]1[C@H]2
SC(C)(C)[C@...
61-32-5 R15
1997
1,2-Dioleoylphosphatidylethano
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OC[C@@H]
(CO[P@@](O)
(=...
-
R13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
1998
1,2-
Dioleoylphosphatidylethanolamin
e
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OCC(COP(O)
(=O)OCCN)OC..
.
-
R13
1999
-
NC1=CC=C(N)
C(O)=C1N
-
R21
2000 rows × 33 columns
pKa data examination
¶
Now we will look at a data set derived from the above data but with computed molecular
attributes for our machine learning. This data set is computed and described by Prof. Vince Voelz
in the Temple Chemistry department and a graduate student, Robert Raddi. See their paper: Stacking Gaussian processes to improve pKa predictions in the SAMPL7 challenge
.
In [54]:
db = pd.read_pickle("data/pKaDatabaseF22.pkl")
db_table = Table().from_df(db) # Datascience Table from pandas dataframe
db
Out[54]:
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
0
methyclothiazide_mic
ro001
methyclothiazide_mic
ro000
C[N@]1[C@@H
]
([N-]c2cc(c(cc2
C[N@]1[C@@
H]
(Nc2cc(c(cc2S
-0.79657
-0.55
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
S1(=O)=O)S(=
O)(=O)N...
1(=O)=O)S(=
O)(=O)N)Cl...
1
sulpiride_micro001
sulpiride_micro000
CC[N@]1CCC[C
@H]1C[N-]C(=
O)c2cc(ccc2OC
)S(=O)(=O)N
CC[N@]1CCC[
C@H]1CNC(=
O)c2cc(ccc2O
C)S(=O)
(=O)N
-0.53903
-0.59
2
celecoxib_micro001
celecoxib_micro000
Cc1ccc(cc1)c2c
c(nn2c3ccc(cc3
)S(=O)(=O)
[NH-])C(...
Cc1ccc(cc1)c2
cc(nn2c3ccc(c
c3)S(=O)
(=O)N)C(F)
(F)F
-1.02861
-1.31
3
metolazone_micro00
1
metolazone_micro00
0
Cc1ccccc1N2[C
@@H]
([N-]c3cc(c(cc3
C2=O)S(=O)
(=O)...
Cc1ccccc1N2[
C@@H]
(Nc3cc(c(cc3C
2=O)S(=O)
(=O)N)Cl)C
-0.74198
-0.55
4
polythiazide_micro00
1
polythiazide_micro00
0
C[N@]1[C@@H
]
([N-]c2cc(c(cc2
S1(=O)=O)S(=
O)(=O)N...
C[N@]1[C@@
H]
(Nc2cc(c(cc2S
1(=O)=O)S(=
O)(=O)N)Cl...
-0.79841
-0.56
...
...
...
...
...
...
...
137
sulfadimethoxine_mic
ro001
sulfadimethoxine_mic
ro000
COc1cc([N-]S(
=O)
(=O)c2ccc(N)cc
2)nc(OC)n1
N([H])
(c1nc(OC([H])
([H])
[H])nc(OC([H]
)([H])[H]...
-0.86987
-0.98
138
sulfamethoxydiazine_
micro001
sulfamethoxydiazine_
micro000
COc1cnc([N-]S(
=O)
(=O)c2ccc(N)cc
2)nc1
N([H])
(c1nc([H])c(O
C([H])([H])
[H])c([H])n1)S
(=...
-0.85693
-0.86
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
139
sulfisomidine_micro0
01
sulfisomidine_micro0
00
Cc1cc([N-]S(=
O)
(=O)c2ccc(N)cc
2)nc(C)n1
N([H])
(c1nc(C([H])
([H])
[H])nc(C([H])
([H])[H])c...
-0.87755
-0.90
140
sulfamethazine_micr
o001
sulfamethazine_micr
o000
Cc1cc(C)nc([N-
]S(=O)
(=O)c2ccc(N)cc
2)n1
N([H])
(c1nc(C([H])
([H])
[H])c([H])c(C([
H])([H])...
-0.85927
-0.94
141
sulfapyridine_micro0
01
sulfapyridine_micro0
00
Nc1ccc(S(=O)
(=O)
[N-]c2ccccn2)c
c1
N([H])
(c1nc([H])c([H
])c([H])c1[H])
S(=O)
(=O)c1c...
-0.88207
-0.94
3456 rows × 34 columns
We can look at the structures and data on several derivatives of acetic acid by executing the code below
¶
In [55]:
glycine=db_table.where("protonated microstate ID",are.containing("acetic acid")) # Select those data containing acetic acid in the name.
mols = [Chem.MolFromSmiles(x) for x in glycine.column("protonated microstate smiles") if x
is not None]
pKa = glycine.column("pKa")
for i,m in enumerate(mols):
m.SetProp("Name","pKa: "+str(pKa[i]))
p = Draw.MolsToGridImage( [mols[x] for x in range(0,3)] , legends=[x.GetProp("Name") for x
in mols],molsPerRow=2,subImgSize=(300,250), useSVG=True )
p
Out[55]:
protonated microstate smiles
¶
Here we place the pKa which we will predict in the first column. Use SMILES format to display structures. Execute the below cells.
In [56]:
#PandasTools.AddMoleculeColumnToFrame(db,smilesCol='protonated microstate smiles',molCol='Molecule',includeFingerprints=True)
PandasTools.AddMoleculeColumnToFrame(db, smilesCol='protonated microstate smiles', molCol='protonated molecule')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
PandasTools.AddMoleculeColumnToFrame(db, smilesCol='deprotonated microstate smiles', molCol='deprotonated molecule')
col = db.pop('protonated molecule')
db.insert(0, col.name, col)
col = db.pop('deprotonated molecule')
db.insert(1, col.name, col)
col = db.pop("pKa")
db.insert(0, col.name, col)
db=db.sort_values(by='pKa',ascending=False)
db.head()
Out[56]:
pKa
protonated molecule
deprotonated
molecule
deprotonated
microstate ID
protonated
microstate
ID
d
735
19.2
<rdkit.Chem.rdchem.Mol
object at 0x7f734d9136f0>
<rdkit.Chem.rdchem.Mol
object at 0x7f734d65ec00>
2-Methyl-2-
propanol_micro
001
2-Methyl-2-
propanol_micr
o000
C
1323
17.6
<rdkit.Chem.rdchem.Mol
object at 0x7f734d903a70>
<rdkit.Chem.rdchem.Mol
object at 0x7f734d652f80>
2-
butanol_micro0
01
2-
butanol_micro
000
C
[O
832
17.1
<rdkit.Chem.rdchem.Mol
object at 0x7f734d95e1f0>
<rdkit.Chem.rdchem.Mol
object at 0x7f734d5a5700>
2-
Propanol_micro
001
2-
Propanol_micr
o000
C
1200
16.6
<rdkit.Chem.rdchem.Mol
object at 0x7f734d9004a0>
<rdkit.Chem.rdchem.Mol
object at 0x7f734d59b920>
3-
methylindole_
micro001
3-
methylindole_
micro000
C
c
1473
16.4
<rdkit.Chem.rdchem.Mol
object at 0x7f734d661850>
<rdkit.Chem.rdchem.Mol
object at 0x7f734d5e4d60>
Mandelic acid_micro001
Mandelic acid_micro00
0
c
[C
[O
5 rows × 34 columns
Examine a few acidity and molecular weight distribution
¶
The below code will generate histograms for acidity as measured by pKa and molar molecular weight measured in grams per mole. Execute code and examine output.
In [57]:
fig = plt.figure()
ax = plt.subplot(2,2,1)
ax1 = plt.subplot(2,2,2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
db.sort_values('Weight', ascending=True)
ax = db["Weight"].plot.hist(rot=0, figsize=(14, 4), bins=25, edgecolor='black', linewidth=1.2, ax=ax)
ax.set_xlabel("molecular weight", size=16)
ax.set_ylabel("", size=12)
ax.axvline(x=db['Weight'].mean(), linewidth=4, color='r')
ax1 = db["pKa"].plot.hist(rot=0, figsize=(14, 4), bins=25, edgecolor='black', linewidth=1.2, ax=ax1)#, subplots=True, layout=(2,2))
ax1.set_xlabel(r"$pK_{a}$", size=16)
ax1.set_ylabel("", size=12)
ax1.axvline(x=4, linewidth=4, color='r')
ax1.axvline(x=9, linewidth=4, color='r')
fig = ax1.get_figure()
fig.savefig("MW_dist.pdf")
Look at pKa and molecular attribute relationships
¶
Here we will plot some of the attributes to see if there is a relationship between their values and the pKa we are trying to predict. Execute these cells.
In [58]:
db.plot.scatter("Weight","pKa")
Out[58]:
<Axes: xlabel='Weight', ylabel='pKa'>
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [59]:
db.plot.scatter("AM1BCC partial charge (prot. atom)","pKa")
Out[59]:
<Axes: xlabel='AM1BCC partial charge (prot. atom)', ylabel='pKa'>
In [60]:
db.plot.scatter("Bond Order","pKa")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Out[60]:
<Axes: xlabel='Bond Order', ylabel='pKa'>
Nearest Neighbor
¶
Let's restrict our consideration to acids with 0 < pKa < 7. The reason may be evident from the distribution in the histogram of pKa's above with two peaks, one around 4 and another at 8. The negative pKa's are outliers which also will be difficult to predict. For machine learning we will also drop the 2D molecular structures.
In [61]:
dblow = db[db["pKa"].values<7]
dblow = dblow[dblow["pKa"].values>0]
List attribute columns with index
In [62]:
for (i, item) in enumerate(list(dblow.columns)):
print(i, item)
0 pKa
1 protonated molecule
2 deprotonated molecule
3 deprotonated microstate ID
4 protonated microstate ID
5 deprotonated microstate smiles
6 protonated microstate smiles
7 AM1BCC partial charge (prot. atom)
8 AM1BCC partial charge (deprot. atom)
9 AM1BCC partial charge (prot. atoms 1 bond away)
10 AM1BCC partial charge (deprot. atoms 1 bond away)
11 AM1BCC partial charge (prot. atoms 2 bond away)
12 AM1BCC partial charge (deprot. atoms 2 bond away)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
13 Gasteiger partial charge (prot. atom)
14 Gasteiger partial charge (deprot. atom)
15 Gasteiger partial charge (prot. atoms 1 bond away)
16 Gasteiger partial charge (deprot. atoms 1 bond away)
17 Gasteiger partial charge (prot. atoms 2 bond away)
18 Gasteiger partial charge (deprot. atoms 2 bond away)
19 Extented Hückel partial charge (prot. atom)
20 Extented Hückel partial charge (deprot. atom)
21 Extented Hückel partial charge (prot. atoms 1 bond away)
22 Extented Hückel partial charge (deprot. atoms 1 bond away)
23 Extented Hückel partial charge (prot. atoms 2 bond away)
24 Extented Hückel partial charge (deprot. atoms 2 bond away)
25 ∆G_solv (kJ/mol) (prot-deprot)
26 SASA (Shrake)
27 SASA (Lee)
28 Bond Order
29 Change in Enthalpy (kJ/mol) (prot-deprot)
30 href
31 Weight
32 num ionizable groups
33 pKa source
Selection of attributes/features for training and prediction
¶
We need too select the features that we will use in the training. These will include the charges computed for key atoms adjacent to the acidic proton (H+) in columns (7-12) using the AM1BCC method, ∆G_solv (kJ/mol) (prot-deprot) in column 25,solvent accessible surface area (SASA) in column 26, bond order in column 28, Change in Enthalpy (kJ/mol) (prot-deprot) in column 29, and molecular weight in column 32. These are the 11 attributes features we will use. We also keep the labels and SMILES as well as the pKa we will train on.
In [63]:
molecular = Table().from_df(dblow) # Now back to Table
molecular = molecular.select(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25, 26, 28, 29, 32)
molecular
Out[63]:
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.98
Fenpropimorph_micro
001 Fenpropimorph_micro
000 C[C@@H]1CN(
C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
C[C@@H]1C[N
H+](C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
-0.68511 -0
6.95 Imidazole_micro001 Imidazole_micro000 c1cnc[n-]1 c1cnc[nH]1 -0.32082 -0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.95
3-methoxy-6-
mercaptopyridazine_
micro001 3-methoxy-6-
mercaptopyridazine_
micro000 COc1ccc(nn1)
[S-] COc1ccc(nn1)S -0.31734 -0
6.95
2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro001 2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro000 C[N@@]
(CCCc1ccccc1)
CCC#N C[N@@H+]
(CCCc1ccccc1)
CCC#N -0.6868 -0
6.95
1-
Methylimidazole_micr
o001 1-
Methylimidazole_micr
o000 Cn1ccnc1 Cn1cc[nH+]c1 -0.13447 -0
6.95
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro001
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro000
CCC(=O)C(CCN
1CCOCC1)
(c2ccccc2)c3cc
ccc3 CCC(=O)C(CC[
NH+]1CCOCC1)
(c2ccccc2)c3cc
ccc3 -0.68628 -0
6.95
2-bis(2-
chloroethyl)aminopro
pane_micro001 2-bis(2-
chloroethyl)aminopro
pane_micro000 CC(C)N(CCCl)C
CCl CC(C)[NH+]
(CCCl)CCCl -0.70654 -0
6.94
4-
mercaptopyrimidine_
micro001 4-
mercaptopyrimidine_
micro000 c1cncnc1[S-] c1cncnc1S -0.31642 -0
6.94
Barbituric acid_micro001 Barbituric acid_micro000 CC[C@]1(C(=O)
NC(=O)
[N-]C1=O)c2cc
c(cc2)[N+](=O)
[O-] CCC1(C(=O)NC
(=O)NC1=O)c2
ccc(cc2)[N+]
(=O)[O-] -0.59474 -0
6.92
2,3-
diaminobutane_micro
001 2,3-
diaminobutane_micro
000 C[C@@H]
([C@H](C)
[NH-])[NH3+] C[C@@H]
([C@H](C)N)
[NH3+] -0.92502 -0
... (2035 rows omitted)
Train, test split
¶
Question 12
¶
Split the molecular Table into train and test data using 80% for training and remembering that the split must be an integer using int() function. Again we will select
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
certain rowsas attributes.
In [64]:
train, test = molecular.split(int(molecular.num_rows*0.8))
print(train.num_rows, 'training and', test.num_rows, 'test instances.')
train.show(3)
1636 training and 409 test instances.
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1BC
partial
charge
(deprot
atom)
3.46
3-(3-cyano-6-
hydroxyphenyl)-
propenoic acid_micro001
3-(3-cyano-6-
hydroxyphenyl)-
propenoic acid_micro000
c1cc(c(cc1C#N
)C=CC(=O)
[O-])O c1cc(c(cc1C
#N)C=CC(=
O)O)O -0.60477 -0.82975
3.75
Cloquintocet-
mexyl_micro001 Cloquintocet-
mexyl_micro000 CCCCC[C@H]
(C)OC(=O)COc
1ccc(c2c1nccc
2)Cl
CCCCC[C@H
]
(C)OC(=O)C
Oc1ccc(c2c1
[nH+]ccc2)C
l
-0.07103 -0.62426
3.03
2-hydroxy-5-methyl-
benzoic acid_micro001 2-hydroxy-5-methyl-
benzoic acid_micro000 Cc1ccc(c(c1)C(
=O)[O-])O Cc1ccc(c(c1)
C(=O)O)O -0.6163 -0.82444
... (1633 rows omitted)
In [65]:
check('tests/q12.py')
Out[65]:
All tests passed!
Our k nearest neighbors code
¶
Remember our the k nearest neighbor code from above which wewill again use here.
def row_distance(row1, row2): """The distance between two rows of a table.""" return distance(np.array(row1), np.array(row2)) def distances(training, example, output): """Compute the distance from example for each row in training.""" dists = [] attributes = training.drop(output) for row in attributes.rows: dists.append(row_distance(row, example)) return training.with_column('Distance', dists)
def closest(training, example, k, output): """Return a table of the k closest neighbors to example.""" return distances(training, example, output).sort('Distance').take(np.arange(k))
Test algorithm
¶
Execute these cells to define the predict_nn function for pKa, pick an example row,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
predict and compare.
In [66]:
def predict_nn(example):
"""Return the majority class among the k nearest neighbors."""
k = 10
return np.average(closest(train.drop(1,2,3,4), example, k, 'pKa').column('pKa'))
Examine 1 row in test set to try to predict
In [67]:
test.drop(1,2,3,4).take(100)
Out[67]:
pKa
AM1BCC
partial
charge
(prot.
atom)
AM1BCC
partial
charge
(deprot.
atom)
AM1BCC
partial
charge
(prot.
atoms 1
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 1
bond
away)
AM1BCC
partial
charge
(prot.
atoms 2
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 2
bond
away)
∆G_solv
(kJ/mol)
(prot-
deprot)
SASA
(Shrake
)
Bond
Order
3.49 -0.8211 -0.72665 0.14415 0.1976 -0.158735 -0.28565 -0.599491 10.5909 0.71504
In [68]:
# Look at closest in training set to test row, need to drop pKa from test
k = 10
closest(train.drop(1,2,3,4), test.drop(0,1,2,3,4).row(100), k, 'pKa').select('pKa','Distance')
Out[68]:
pKa
Distance
2.53 0.377212
2.64 0.398179
0.35 0.40699 2.98 0.425523
2.68 0.434558
1.15 0.455964
1.76 0.46183 2.472 0.536042
4.53 0.553159
4.82 0.586634
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
If we use test data in both cases we get exact match (Distance = 0) and no training, not machine learning but matching!
¶
In [69]:
closest(test.drop(1,2,3,4), test.drop(0,1,2,3,4).row(100), k, 'pKa').select('pKa','Distance')
Out[69]:
pKa Distance
3.49 0 4.97 0.576611
3.62 0.606185
5.5 1.02594 3.96 1.03533 3.41 1.21428 6.74 1.32101 5.34 1.3256 3.58 1.35756 4.73 1.37847 Histogram of experimental acidity to be predicted
¶
Question
¶
Make a histogram of acidity measured by pKa in the training data
In [70]:
train.hist('pKa')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 13
Prediction time
¶
Now predict the pKa of the 10 molecule in the test_nn dataset using predict. We need to drop the experimental pKa and descriptors in the first 4 columns to create and example_nn_row with the attributes for the k nearest neighbor. Discuss the quality of the
fit and the name of the name of the molecule from column 1. Repeat for two more rows and discuss the prediction quality. Keep in mind that the prediction of pKa is a very challenging task for machine learning.
In [71]:
example_nn_row = test.drop(0,1,2,3,4).row(9)
example_nn_row
Out[71]:
Row(AM1BCC partial charge (prot. atom)=-0.61282002615431941, AM1BCC partial charge (deprot. atom)=-0.84631001713740472, AM1BCC partial charge (prot. atoms 1 bond away)=0.63823998284836614, AM1BCC partial charge (deprot. atoms 1 bond away)=0.91762000557911749, AM1BCC partial charge (prot. atoms 2 bond away)=-
0.32465500819186366, AM1BCC partial charge (deprot. atoms 2 bond away)=-
0.51761501046883707, ∆G_solv (kJ/mol) (prot-deprot)=307.16503297431643, SASA (Shrake)=17.487361346942222, Bond Order=0.58100399209836706, Change in Enthalpy (kJ/mol) (prot-deprot)=1.1443968508527818, num ionizable groups=1.0)
In [72]:
example_nn_row_table = test.drop(0,1,2,3,4).take(9) # For display and verification
example_nn_row_table
Out[72]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
AM1BCC
partial
charge
(prot.
atom)
AM1BCC
partial
charge
(deprot.
atom)
AM1BCC
partial
charge
(prot.
atoms 1
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 1
bond
away)
AM1BCC
partial
charge
(prot.
atoms 2
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 2
bond
away)
∆G_sol
v
(kJ/mol
) (prot-
deprot)
SASA
(Shrake
)
Bond
Order
Ch
Ent
(kJ
(p
de
-0.61282 -0.84631 0.63824 0.91762 -0.324655 -0.517615 307.165 17.4874 0.581004 1.14
In [73]:
predict_nn(example_nn_row)
Out[73]:
4.3054799999999993
In [74]:
print('Experimental pKa:', test.column('pKa').item(9))
print('Predicted pKa using nearest neighbors:', round(predict_nn(example_nn_row),2))
Experimental pKa: 4.72
Predicted pKa using nearest neighbors: 4.31
In [75]:
check('tests/q13.py')
Out[75]:
All tests passed!
Now let's plot knn prediction success
¶
Execute the next three cells
In [76]:
exp_pKa = make_array()
predict_pKA = make_array()
In [77]:
# This takes a while!
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_nn(example_nn_row) )
In [78]:
plt.scatter(exp_pKa,predict_pKA)
#calculate equation for regression line
z = np.polyfit(exp_pKa,predict_pKA, 1)
p = np.poly1d(z)
#add trendline to plot
plt.plot(exp_pKa, p(exp_pKa),'blue',label="{}".format(p)) # Equation of line placed in legend from label
plt.xlabel("Experimental pKa")
plt.ylabel("Predicted pKa")
plt.legend(fontsize="small")
plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Conclusions on our k nearest neighbor model
¶
Question 14
¶
Evaluate the overall quality of our machine learning prediction based on the above plot and your 3 predictions above.
Our machine learning prediction is good quality as there are strong correlations between
my predictions and the 3 predictions.
*** Now we will try a few values for k to try to optimize this hyperparameter. We need a new
version of predict_nn
that also has an argument of k.
In [79]:
def predict_knn(example,k):
"""Return the majority class among the k nearest neighbors."""
return np.average(closest(train.drop(1,2,3,4), example, k, 'pKa').column('pKa'))
In [80]:
for k in [5,7,10,15,20]:
exp_pKa = make_array()
predict_pKA = make_array()
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_knn(example_nn_row,k) )
plt.scatter(exp_pKa,predict_pKA)
z = np.polyfit(exp_pKa,predict_pKA, 1)
p = np.poly1d(z)
plt.plot(exp_pKa, p(exp_pKa),'blue',label="{}".format(p)) # Equation of line placed in
legend from label
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
plt.xlabel("Experimental pKa")
plt.ylabel("Predicted pKa")
plt.title("k = "+str(k))
plt.legend(fontsize="small")
plt.show(5)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question:
Which value of k
makes the best estimation?
In [81]:
k = 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [82]:
check('tests/q14.py')
Out[82]:
All tests passed!
Extra (advanced): Try to vary the set of parameters/attributes by using fewer attributes or where choices
exist such as using Gasteiger partial charges instead of AM1BCC or removing moolecular weight or other attributes.
¶
In [83]:
for (i, item) in enumerate(list(dblow.columns)): # List of attributes
print(i, item)
0 pKa
1 protonated molecule
2 deprotonated molecule
3 deprotonated microstate ID
4 protonated microstate ID
5 deprotonated microstate smiles
6 protonated microstate smiles
7 AM1BCC partial charge (prot. atom)
8 AM1BCC partial charge (deprot. atom)
9 AM1BCC partial charge (prot. atoms 1 bond away)
10 AM1BCC partial charge (deprot. atoms 1 bond away)
11 AM1BCC partial charge (prot. atoms 2 bond away)
12 AM1BCC partial charge (deprot. atoms 2 bond away)
13 Gasteiger partial charge (prot. atom)
14 Gasteiger partial charge (deprot. atom)
15 Gasteiger partial charge (prot. atoms 1 bond away)
16 Gasteiger partial charge (deprot. atoms 1 bond away)
17 Gasteiger partial charge (prot. atoms 2 bond away)
18 Gasteiger partial charge (deprot. atoms 2 bond away)
19 Extented Hückel partial charge (prot. atom)
20 Extented Hückel partial charge (deprot. atom)
21 Extented Hückel partial charge (prot. atoms 1 bond away)
22 Extented Hückel partial charge (deprot. atoms 1 bond away)
23 Extented Hückel partial charge (prot. atoms 2 bond away)
24 Extented Hückel partial charge (deprot. atoms 2 bond away)
25 ∆G_solv (kJ/mol) (prot-deprot)
26 SASA (Shrake)
27 SASA (Lee)
28 Bond Order
29 Change in Enthalpy (kJ/mol) (prot-deprot)
30 href
31 Weight
32 num ionizable groups
33 pKa source
Selection of attributes
In [84]:
molecular = Table().from_df(dblow) # Now back to Table
molecular=molecular.select(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25, 26, 28, 29, 32) # Change these
molecular
Out[84]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.98
Fenpropimorph_micro
001 Fenpropimorph_micro
000 C[C@@H]1CN(
C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
C[C@@H]1C[N
H+](C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
-0.68511 -0
6.95 Imidazole_micro001 Imidazole_micro000 c1cnc[n-]1 c1cnc[nH]1 -0.32082 -0
6.95
3-methoxy-6-
mercaptopyridazine_
micro001 3-methoxy-6-
mercaptopyridazine_
micro000 COc1ccc(nn1)
[S-] COc1ccc(nn1)S -0.31734 -0
6.95
2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro001 2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro000 C[N@@]
(CCCc1ccccc1)
CCC#N C[N@@H+]
(CCCc1ccccc1)
CCC#N -0.6868 -0
6.95
1-
Methylimidazole_micr
o001 1-
Methylimidazole_micr
o000 Cn1ccnc1 Cn1cc[nH+]c1 -0.13447 -0
6.95
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro001
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro000
CCC(=O)C(CCN
1CCOCC1)
(c2ccccc2)c3cc
ccc3 CCC(=O)C(CC[
NH+]1CCOCC1)
(c2ccccc2)c3cc
ccc3 -0.68628 -0
6.95
2-bis(2-
chloroethyl)aminopro
pane_micro001 2-bis(2-
chloroethyl)aminopro
pane_micro000 CC(C)N(CCCl)C
CCl CC(C)[NH+]
(CCCl)CCCl -0.70654 -0
6.94
4-
mercaptopyrimidine_
micro001 4-
mercaptopyrimidine_
micro000 c1cncnc1[S-] c1cncnc1S -0.31642 -0
6.94
Barbituric acid_micro001 Barbituric acid_micro000 CC[C@]1(C(=O)
NC(=O)
[N-]C1=O)c2cc
c(cc2)[N+](=O)
[O-] CCC1(C(=O)NC
(=O)NC1=O)c2
ccc(cc2)[N+]
(=O)[O-] -0.59474 -0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.92
2,3-
diaminobutane_micro
001 2,3-
diaminobutane_micro
000 C[C@@H]
([C@H](C)
[NH-])[NH3+] C[C@@H]
([C@H](C)N)
[NH3+] -0.92502 -0
... (2035 rows omitted)
Now test by copying appropriate code from above
In [85]:
exp_pKa = make_array()
predict_pKA = make_array()
In [86]:
# This takes a while!
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_nn(example_nn_row) )
In [89]:
# Plot sns.regplot(x=predict_pKA,y=exp_pKa)
Out[89]:
<Axes: >
In [90]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# Comments: Now demonstrate knn from scikit learn
¶
scikit learn is a standard and powerful machine learning library. Below is a demonstration for your information of the same machine learning task using scikit learn. Execute the below cells.
In [91]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
In [92]:
knn = KNeighborsRegressor(n_neighbors=15, weights='distance',p=1)
In [93]:
X = make_array()
attributes = train.drop('pKa',1,2,3,4)
for i in np.arange(attributes.num_rows):
X=np.append(X,np.array(attributes.row(i)))
X=X.reshape(attributes.num_rows,len(attributes))
In [94]:
y=train.column('pKa')
y
Out[94]:
array([ 3.46 , 3.75 , 3.03 , ..., 5.3 , 4.2 , 4.155])
In [95]:
knn.fit(X,y)
Out[95]:
KNeighborsRegressor(n_neighbors=15, p=1, weights='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor
KNeighborsRegressor(n_neighbors=15, p=1, weights='distance')
Now test attributes
¶
In [96]:
attributes = test.drop('pKa',1,2,3,4)
Xtest = make_array()
for i in np.arange(attributes.num_rows):
Xtest=np.append(Xtest,np.array(attributes.row(i)))
Xtest=Xtest.reshape(attributes.num_rows,len(attributes))
In [97]:
ytest=test.column('pKa')
In [98]:
y_predicted = knn.predict(Xtest)
predict_nn = test.with_columns("pKa predict",y_predicted)
In [99]:
plt.scatter(ytest,y_predicted)
#calculate equation for regression line
z = np.polyfit(ytest,y_predicted, 1)
p = np.poly1d(z)
# Label with equation
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print(p)
#add trendline to plot
plt.plot(ytest, p(ytest),'red',label="{}".format(p))
plt.legend(fontsize="small")
plt.show()
0.2769 x + 2.771
Return the coefficient of determination of the prediction.
¶
The coefficient of determination $R^2$ is defined as $$ (1-\frac{u}{v}) $$ u is the residual sum of squares ((y_true - y_pred)** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2 = 0.0 $
In [100]:
knn.score(Xtest, ytest) # knn score 0-1 Out[100]:
0.28932128481557273
Final fancy plotting of select molecules
¶
Question 15
¶
Use a part of a molecular name to see if it is included in the test data and then execute the code to examine structures. Structures will be default structures if rdkit is not available.
In [101]:
molecular_name = 'phenyl'
predict_nn.where("protonated microstate ID",are.containing(molecular_name))
Out[101]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated microstate
ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1B
parti
charg
(prot
atom
3.32 2-(2-bromophenyl)-2-
hydroxy-acetic acid_micro001 2-(2-bromophenyl)-2-
hydroxy-acetic acid_micro000 c1ccc(c(c1)
[C@H](C(=O)
[O-])O)Br c1ccc(c(c1)
[C@H]
(C(=O)O)O)
Br -0.5848
3.82 (4-
nitro)phenylhydrazine_mi
cro001 (4-
nitro)phenylhydrazine_mi
cro000 c1cc(ccc1N[NH
-])[N+](=O)[O-]
c1cc(ccc1NN
)[N+](=O)
[O-] -0.6586
4.2 b'4-[(3aS,4R,8aS,8bR)-2-
[(4-fluorophenyl)methyl]-
1,3-dio ...
b'4-[(3aS,4R,8aS,8bR)-2-
[(4-fluorophenyl)methyl]-
1,3-dio ...
c1cc(ccc1CN2C
(=O)
[C@H]3[C@@H
]4CCC[N@]4[C
@H]
([C@H]3C2=O)
c ...
c1cc(ccc1CN
2C(=O)
[C@H]3[C@
@H]4CCC[N
@]4[C@H]
([C@H]3C2=
O)c ...
-0.9484
2.5 2-
phenylthioaniline_micro0
01 2-
phenylthioaniline_micro0
00 c1ccc(cc1)Sc2c
cccc2[NH-] c1ccc(cc1)S
c2ccccc2N -0.8278
2.94 4-methoxy-5-
phenylpyrimidine_micro0
01 4-methoxy-5-
phenylpyrimidine_micro0
00 COc1c(cncn1)c
2ccccc2 COc1c(c[nH
+]cn1)c2ccc
cc2 -0.2925
4.545
beta,beta'-diphenyladipic
acid_micro001 beta,beta'-diphenyladipic
acid_micro000 c1ccc(cc1)
[C@@H]
(CC(=O)O)
[C@H](CC(=O)
[O-])c2ccccc2 c1ccc(cc1)
[C@@H]
(CC(=O)O)
[C@H]
(CC(=O)O)c
2ccccc2 -0.6154
4.58 beta-(ortho-
chlorophenyl)-propionic acid_micro001 beta-(ortho-
chlorophenyl)-propionic acid_micro000 c1ccc(c(c1)CCC
(=O)[O-])Cl c1ccc(c(c1)C
CC(=O)O)Cl -0.6105
5.05 (4-
bromo)phenylhydrazine_
micro001 (4-
bromo)phenylhydrazine_
micro000 c1cc(ccc1N[NH
-])Br c1cc(ccc1NN
)Br -0.6617
4.36 3-(3,5-dimethoxyphenyl)- 3-(3,5-dimethoxyphenyl)- COc1cc(cc(c1)
COc1cc(cc(c -0.6042
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated microstate
ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1B
parti
charg
(prot
atom
propenoic acid_micro001 propenoic acid_micro000 OC)C=CC(=O)
[O-] 1)OC)C=CC(
=O)O 4.15 1-(p-
ethoxycarbonylphenyl)-1-
methylhydrazine_micro0
01 1-(p-
ethoxycarbonylphenyl)-1-
methylhydrazine_micro0
00 CCOC(=O)c1cc
c(cc1)[N@](C)
[NH-] CCOC(=O)c1
ccc(cc1)
[N@](C)N -0.5762
... (38 rows omitted)
In [102]:
check('tests/q15.py')
Out[102]:
All tests passed!
In [103]:
glycine=predict_nn.where("protonated microstate ID",are.containing(molecular_name))
mols = [Chem.MolFromSmiles(x) for x in glycine.column("protonated microstate smiles") if x
is not None]
molde = [Chem.MolFromSmiles(x) for x in glycine.column("deprotonated microstate smiles") if x is not None]
mol = [None] * 2 * glycine.num_rows
mol[0::2]=mols
mol[1::2]=molde
label = [None] * 2 * glycine.num_rows
lpred = [None] * 2 * glycine.num_rows
exp = glycine.column("pKa")
pred = glycine.column("pKa predict")
label[0::2] = exp
label[1::2] = exp
lpred[0::2] = pred
lpred[1::2] = pred
for i,m in enumerate(mol):
m.SetProp("Name","pKa: "+str(np.round(label[i],2))+" knn: "+str(np.round(lpred[i],2)))
p = Draw.MolsToGridImage( [mol[x] for x in range(0,(2 * glycine.num_rows))] , legends=[x.GetProp("Name") for x in mol],molsPerRow=2,subImgSize=(300,250))
print("\t\tProtonated","\t\t\tDeprotonated")
p
Protonated Deprotonated
Out[103]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
All finished...
¶
Run checks and submit .html and .ipynb files after downloading.
In [105]:
# For your convenience, you can run this cell to run all the tests at once!
import glob
from gofer.ok import check
correct = 0
checks = [1,2,3,4,6,7,8,11,12,13,14,15]
total = len(checks)
for x in checks:
print('Testing question {}: '.format(str(x)))
g = check('tests/q{}.py'.format(str(x)))
if g.grade == 1.0:
print("Passed")
correct += 1
else:
print('Failed')
display(g)
print('Grade: {}'.format(str(correct/total)))
print("Nice work",Your_name)
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)
Testing question 1: Passed
Testing question 2: Passed
Testing question 3: Passed
Testing question 4: Passed
Testing question 6: Passed
Testing question 7: Passed
Testing question 8: Passed
Testing question 11: Passed
Testing question 12: Passed
Testing question 13: Passed
Testing question 14: Passed
Testing question 15: Passed
Grade: 1.0
Nice work Sarmad
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Submitted @ Wed Apr 19 01:08:02 2023
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Recommended textbooks for you
- Principles of Information Systems (MindTap Course...Computer ScienceISBN:9781305971776Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781337097536Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
- Information Technology Project ManagementComputer ScienceISBN:9781337101356Author:Kathy SchwalbePublisher:Cengage LearningOperations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks Cole

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole