Machine Learning and Chemistry
html
keyboard_arrow_up
School
Temple University *
*We aren’t endorsed by this school
Course
853
Subject
Computer Science
Date
Jan 9, 2024
Type
html
Pages
51
Uploaded by ProfFreedom5002
Machine Learning and prediction
¶
Elements of Data Science
In this laboratory we will use training data to predict outcomes. We will first test these ideas using our Old Faithful data again. Next we will look at data on the iris flower to classify iris' based on sepal width and length. In our culminating activity we will predict molecular acidity using data computed by Prof. Vince Voelz
in the Temple Chemistry department and a graduate student, Robert Raddi. See their paper: Stacking Gaussian processes to improve pKa predictions in the SAMPL7 challenge
.
In [1]:
Your_name = "Madlyn Anglin"
Learning from training data
¶
A key concept in machine learning is using a subset of a dataset to train an algorithm to make estimates on a separate set of test data. The quality of the machine learning and algorithm can be assesed based on the accuracy of the predictions made on test data. Many times there are also parameters sometimes termed hyper-parameters which can be optimized through an iterative approach on test or validation data. In practice a dataset is randomly split into training and test sets using sampling.
k nearest neighbor
¶
We will examine one machine learning algorithm in the laboratory, k nearest neighbor. Many of the concepts are applicable to the broad range of machine learning algorithms available.
In [2]:
## import statements
# These lines load the tests. from gofer.ok import check
import numpy as np
from datascience import *
import pandas as pd
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import warnings
warnings.simplefilter('ignore', UserWarning)
#from IPython.display import Image
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
# Fix for datascience collections Iterable
import collections as collections
import collections.abc as abc
collections.Iterable = abc.Iterable
!pip install jupyterquiz
from jupyterquiz import display_quiz
import json
from IPython.core.display import HTML
Requirement already satisfied: jupyterquiz in /opt/conda/lib/python3.10/site-packages (2.6.1)
k nearest neighbor regression
¶
We will use the k nearest neighbor algorithm to make predictions of wait time in minutes
following an eruption duration ofa given number of minutes (independent variable).
In [3]:
faithful = Table.read_table("data/faithful.csv")
faithful.scatter(0, 1, fit_line=True)
Question 1
¶
Use the datascience .split(n) Table method to split the dataset into 80% training and 20% test. The argument for .split(n) method,n, needs to be an integer. See datascience documentation
In [4]:
trainf, testf = faithful.split(int(faithful.num_rows * 0.8))
print(trainf.num_rows, 'training and', testf.num_rows, 'test instances.')
217 training and 55 test instances.
In [5]:
check('tests/q1.py')
Out[5]:
All tests passed!
Nearest neighbor concept
¶
The training examines the chreacteristics of k
nearest neighbors to the data point for which a prediction will be made. Nearness is measured using several different metrics
with Euclidean distance being a common one for numerical attributes.
Euclidean distance:
1-D $$ d(p,q) = \sqrt{(p-q)^{2}} $$
2-D $$ d(p,q) = \sqrt{(p_1-q_1)^{2}+(p_2-q_2)^{2}} $$
For multiple points (rows): 2-D $$ d(p,q) = \sum{{\sqrt{((p_1-q_1)^{2}+(p_2-
q_2)^{2}}}} $$
Try different attribute values in the following 2D Euclidean distance example code below to get a feel for the computation
¶
In [6]:
# Example code to compute an Euclidean distance between two 2-D points
d_p_q = np.sqrt(sum((make_array(2,3)-make_array(4,3))**2))
d_p_q
Out[6]:
2.0
To get values from Table row as an array as is done in row_distance. Note in the faithful data case we will only consider the duration column in nearest neighbor computation but
in examples below we will use a 2-D array of attributes with the iris data and a 10-D array in the chemistry and molecular acidity case.
In [7]:
f_array = np.array(faithful.row(0))
f_array
Out[7]:
array([ 3.6, 79. ])
A couple quick review questions about nearest neighbor below, select the best answer (multiple tries ok). Execute the below cell to reveal the self-check quiz.
¶
In [8]:
with open("questions.json", "r") as file:
questions=json.load(file) display_quiz(questions)
Question 2
¶
Define a function which is the Euclidean distance between two values. Use the last two example code cells above as inspiration. This is where we will compute the distance between two duration
values.
In [11]:
def distance(pt1, pt2):
"""The distance between two points, represented as arrays."""
return np.sqrt(sum((pt1 - pt2)**2))
In [12]:
check('tests/q2.py')
Out[12]:
All tests passed!
Rest of the nearest neighbor algorithm
¶
Execute these cells to create the complete algorithm
In [13]:
def row_distance(row1, row2):
"""The distance between two rows of a table."""
return distance(np.array(row1), np.array(row2)) # Need to convert rows into arrays
def distances(training, example, output):
"""Compute the distance from example for each row in training."""
dists = []
attributes = training.drop(output)
for row in attributes.rows:
dists.append(row_distance(row, example))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
return training.with_column('Distance', dists)
def closest(training, example, k, output):
"""Return a table of the k closest neighbors to example."""
return distances(training, example, output).sort('Distance').take(np.arange(k))
Question 3
¶
Take an example row from the test data (testf), drop the prediction column and use the closest function to see the top 10 closest points to the target in the training data.
In [14]:
example_row = testf.row(1)
example_row # This should display data contained in selected row in testf table.
Out[14]:
Row(duration=3.7669999999999999, wait=83.0)
In [15]:
k = 10 # Number of nearest neighbors
closest(testf,example_row,k,'wait')
Out[15]:
duration wait Distance
5.033 77 77.9773 4.9 82 78.1082 4.817 77 78.1901 4.8 76 78.2068 4.767 78 78.2394 4.716 90 78.2898 4.7 84 78.3056 4.7 83 78.3056 4.7 88 78.3056 4.7 73 78.3056 In [16]:
check('tests/q3.py')
Out[16]:
All tests passed!
Question 4
¶
Predict the value for this row using the defined predict_nn function below and compare to the value reported for wait in the test data. How do they compare?
In [17]:
def predict_nn(example):
"""Return the majority class among the k nearest neighbors."""
k = 10
return np.average(closest(trainf, example, k , 'wait').column('wait'))
In [18]:
predictionf = predict_nn(example_row) # This is the value predicted for wait using the average of the k nearest neighbors in the test set
actual = example_row
print(predictionf,actual)
85.8 Row(duration=3.7669999999999999, wait=83.0)
The prediction value and the reported value for wait in the test data are somewhat close,
but not perfect. In [19]:
check('tests/q4.py')
Out[19]:
All tests passed!
Question 5
Predictions
¶
Now we will make predictions for the whole data set using the apply Table method. We will then look at the root mean squared error (RMSE) for the nearest neighbor fit and a scatter plot. Try adjusting the value of k in the predict_nn function to see it's effect on the quality of fit by rerunning these cells. Are the predicted points in a perfect
straight line, why or why not?
In [20]:
testf = testf.with_columns("predict",testf.apply(predict_nn,"duration"))
nn_test_predictions = testf.column("predict")
test_wait = testf.column("wait")
rmse_nn = np.mean((test_wait - nn_test_predictions) ** 2) ** 0.5
print('Test set RMSE for nearest neighbor regression:', round(rmse_nn,2))
Test set RMSE for nearest neighbor regression: 6.81
In [21]:
testf.scatter("duration")
The points are not in a straight line. It looks like there is a slight trend but there are still two clusters of data points. Classify iris data with machine learning
¶
Next we will take on the problem of classifying iris data into three categories, setosa, versicolor, and virginica. Here we will also learn the basics of the k nearest neighbor algorithm.
The first data set we will look at consists of 50 samples from three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured including the length and the width of the sepals and petals, in centimeters for each observation. Iris stainglass, J.R. Smith
In [22]:
n_neighbors = 15
# Load iris data
iris = datasets.load_iris()
# We only take the first two features. iris_table = Table().with_columns("Name",iris.target,iris.feature_names[0],iris.data[:,0],iris.feature_
names[1],iris.data[:,1])
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
iris_table
Out[22]:
Name
sepal length (cm)
sepal width (cm)
0 5.1 3.5 0 4.9 3 0 4.7 3.2 0 4.6 3.1 0 5 3.6 0 5.4 3.9 0 4.6 3.4 0 5 3.4 0 4.4 2.9 0 4.9 3.1 ... (140 rows omitted)
In [23]:
iris.target_names
Out[23]:
array(['setosa', 'versicolor', 'virginica'],
dtype='<U10')
Question 6
¶
Train and test split the iris_table @ 80% as above.
In [24]:
train_i, test_i = iris_table.split(int(iris_table.num_rows * 0.8))
print(train_i.num_rows, 'training and', test_i.num_rows, 'test instances.')
120 training and 30 test instances.
In [25]:
check('tests/q6.py')
Out[25]:
All tests passed!
Question 7
¶
With classification we need to use training data to decide how to classify data given a set
of attributes, sepal length and sepal width in this case. Create a function which returns the majority classification among three possibilities in "Name" coded as 0, 1, 2 (setosa, versicolor, and virginica respectively). The and
below combines two conditionals. For example, (twos > ones) and ...
In [26]:
def majority(topkclasses):
twos = topkclasses.where('Name', are.equal_to(2)).num_rows
ones = topkclasses.where('Name', are.equal_to(1)).num_rows
zeros = topkclasses.where('Name', are.equal_to(0)).num_rows
# Now test to see what the majority name for each k class
if twos and ones:
return 2
elif ones and zeros:
return 1
else:
return 0
In [27]:
check('tests/q7.py')
Out[27]:
All tests passed!
In [28]:
def classify(training, new_point, k):
closestk = closest(training, new_point, k,"Name")
topkclasses = closestk.select('Name')
return majority(topkclasses)
In [29]:
test_row = 21
print("Prediction: ",classify(train_i,example_row,test_row)," Actual: ",test_i.select("Name").row(test_row))
Prediction: 0 Actual: Row(Name=0)
In [30]:
def predict(train, test_attributes, k):
pred = []
for i in np.arange(test_attributes.num_rows):
pred.append(classify(train,test_attributes.row(i),k))
return pred
Question 8
¶
Make a new table called prediction which includes original columns of test Table but also includes a "predict" column.
In [31]:
k = 10
prediction = test_i.with_columns("predict",predict(train_i, test_i.drop('Name'), k))
prediction.show(30)
Name
sepal length (cm)
sepal width (cm)
predict
0 4.8 3.4 0 1 5.6 2.7 2 0 4.9 3.1 0 2 6 3 2 1 6.4 2.9 2
Name
sepal length (cm)
sepal width (cm)
predict
1 6.1 3 2 0 5.5 3.5 0 1 5.9 3.2 2 1 5.7 2.6 2 0 5.4 3.4 1 1 5.6 2.5 2 2 5.9 3 2 2 7.7 2.8 2 0 4.8 3.1 0 2 6.4 2.8 2 1 5.7 2.9 2 0 5.4 3.9 0 2 6.5 3 2 1 5.5 2.4 2 0 5 3 1 2 6.7 3.3 2 0 5 3.2 0 0 4.7 3.2 0 2 6.3 2.5 2 2 7.4 2.8 2 0 5 3.3 0 1 6.9 3.1 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Name
sepal length (cm)
sepal width (cm)
predict
2 7.9 3.8 2 1 5.2 2.7 2 0 4.3 3 0 In [32]:
check('tests/q8.py')
Out[32]:
All tests passed!
Plot decision outcomes for test set
¶
Question 9
¶
Use above prediction Table to make a scatter plot of the color coded predictions based on the tweo attributes(use colors="predict" in scatter plot after specifying x and y axis based on attributes).
In [34]:
prediction.drop("Name").scatter('sepal length (cm)','sepal width (cm)', colors, group = 2)
colors = "predict"
Fancy plot showing color coded decision boundaries
¶
We can make a more informative plot by predicting on a grid of attribute values as shown below. Seaborn is an add-on to the Matplotlib plotting we have been using which provides more control of plotting. Execute (this may take a minute+) and study the below input and resulting output for your information.
In [35]:
def make_colors(iris, y, cmap):
colors = []
cdict = {'setosa':0, 'virginica':2, 'versicolor':1}
for x in iris.target_names[y]:
colors.append(cmap[cdict[x]])
return colors
In [36]:
import seaborn as sns
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
h = .1 # step size in the mesh
k = 10
x_min, x_max = iris.data[:, 0].min() - 1, iris.data[:, 0].max() + 1
y_min, y_max = iris.data[:, 1].min() - 1, iris.data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
## Create a grid of predictions in a Table
attribute_grid = Table().with_columns(iris.feature_names[0],np.c_[xx.ravel(), yy.ravel()]
[:,0],iris.feature_names[1],np.c_[xx.ravel(), yy.ravel()][:,1])
Z = np.array(predict(train_i,attribute_grid,k))
# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap_light)
# Plot the test points but convert to numpy arrays
predictions = prediction.column('predict')
attribute1 = prediction.column(1)
attribute2 = prediction.column(2)
plt.scatter(x=attribute1, y=attribute2, c = make_colors(iris, predictions, cmap_bold), alpha=1.0, edgecolor="black")
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i')"
% (k))
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
Out[36]:
Text(0, 0.5, 'sepal width (cm)')
Use scikit learn
¶
Scikit learn
is a standard state of the art machine learning library. For demonstration purposes execute the below commands to classify and generate a comparable output.
In [37]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
In [38]:
clf = neighbors.KNeighborsClassifier(k) # Initiate the classifier
x_train, x_test, y_train, y_test = train_test_split(iris.data[:,:2], iris.target, random_state=22) # scikit split
# Now fit
clf.fit(x_train, y_train)
Out[38]:
KNeighborsClassifier(n_neighbors=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsClassifier
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
KNeighborsClassifier(n_neighbors=10)
In [39]:
import seaborn as sns
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
h = .1 # step size in the mesh
x_min, x_max = iris.data[:, 0].min() - 1, iris.data[:, 0].max() + 1
y_min, y_max = iris.data[:, 1].min() - 1, iris.data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
y = y_test
plt.scatter(x=x_test[:, 0], y=x_test[:, 1], c = make_colors(iris, y, cmap_bold),
alpha=1.0, edgecolor="black")
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i')"
% (k))
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
Out[39]:
Text(0, 0.5, 'sepal width (cm)')
Question 10
¶
Comment on the quality of the predictions by 1. Your nearest neighbor algorithm 2. scikit learn 3. Comparison The nearest neighbor algorithm quality is good because when looking at the graph each color lines up with its classification. The scikit learn quality is also good when compared to the data. Both are good quality but the scikit learn is more accurate than the neighbor algorithm.
¶
Molecules and predicting acidity measured by pKa
¶
Within the Jupyter notebook we can also analyze molecules and their molecular data using the library RDKit. RDKit adds the ability to visualize 2D and 3D molecular structures. We can apply many of the data science tools we have learned to molecular data as well. First we will briefly look at acid-base chemistry and how acidity is defined. pH is a measure of the acidity of a water-based (aqueous) solution. A pH of 1 is acidic, a pH of 7 is neutral and a pH of 14 is basic. Next we will use some computed atributes of a large set of molecules to train a k nearest neighbor model to predict acidity. We will use a range of attributes including the partial charges on atoms adjacent to the acidic proton, molecular weight, solvent accessible surface area (SASA), carbon-oxygen bond order, and some thermochemistry measures all of which may help predict acidity with a lower
pKa indicating a stronger (weak) acid.
Acid-base and pKa background
¶
A very brief background on acid - base equilibria demonstrated for glycine. See OpenStax Chemistry
for details based on interest. RDKit
¶
RDKit
is a specialized library to handle the complexities of molecules within Python.
In [40]:
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole #Needed to show molecules
from rdkit.Chem.Draw.MolDrawing import MolDrawing, DrawingOptions #Only needed if modifying defaults
from rdkit.Chem import rdRGroupDecomposition
from rdkit.Chem import rdDepictor
from rdkit.Chem import PandasTools
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit import DataStructs
# Options
DrawingOptions.bondLineWidth=1.8
rd =True
Load detailed molecular data for 2000 molecules
¶
In [41]:
url = "https://raw.githubusercontent.com/robraddi/GP-SAMPL7/main/pKaDatabase/OChem/ochem0-
2000.csv"
data = Table.read_table(url)
data=data.sort('N')
data.show(5)
SMILES
CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAM
1
[O-]C1=C2C=C
C=CC2=NC=N
1
- R1207641
M20829 - - 4-
Hydroxyquinazoline - OC1=CC2=CN
=CN=C2C=C1 - R1207643
M1107327 - - 6-
hydroxyquinazoline - OC1=CC2=NC
=NC=C2C=C1 - R1207645
M1107328 - - 7-
hydroxyquinazoline - CC1=C2C=CC=
C(O)C2=NC=N
- R1207650
M46729 - - 8-hydroxy-4-
-
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SMILES
CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAM
1
1
methylquinazoline
[S-]c1ncc2cccc
c2[n+]1 - R1207652
M1158202 - - 2-
mercaptoquinazoline
- ... (1995 rows omitted)
Question 11.
Select an amino acid
¶
Use the Table above to view data for an amino acid of your selection from the 21 amino acids which are building blocks of protiens. See web page
for possible choices. Hint: use are.containing within the .where() Table method. For example below we can find compounds which contain a trimethyl group (3 CH$_3$) groups. We get 11 rows (records).
In [46]:
trimethyl = data.where("NAME",are.containing("trimethyl"))
trimethyl
Out[46]:
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
INT
CC1=NC
(C)=C(C
)C(=N1)
S([O-])
(=O)=O
- R1207394
M1158149 - - 2,4,5-trimethyl-6-
sulphopyrimidine - Koe
CC1=NC
(C)=C(C
)N=N1 - R1207401
M1158154 - - 3,5,6-trimethyl-
1,2,4-triazine - Koe
CC1=NC
([O-])=N
- R1207338
M1106856 - - 2-hydroxy-4,5,6-
- Koe
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
INT
C(C)=C1
C trimethylpyrimidine Cc1nc([
O-])
[n+]c(C)
c1C - R1207339
M1158110 - - 2-hydroxy-4,5,6-
trimethylpyrimidine - Koe
CCC1=C
(C)N=N
C(C)=C1
C - R1207140
M1157998 - - 4-ethyl-3,5,6-
trimethylpyridazine - Koe
CC1=CC
(C)=C(C
)N=N1 - R1207154
M1158007 - - 3,4,6-
trimethylpyridazine - Koe
CNC(=N
)N(C)C - R1206678
M1106709 - - N,N',N'-
trimethylaguanidine - Koe
CNN(C)C - R1206476
M1016037 - - trimethylhydrazine - Koe
CC1=C(
C)C(C)=
NO1 - R1206399
M32590 - - Isoxazoline, 3,4,5-
trimethyl- - Koe
CC1(C)C
(CCC1(C
)C(O)=O
)C(O)=O
- R1203666
M6436 - - cyclopentan-1,3-
dicarboxlic acid-
1,2,2-trimethyl
- Koe
... (1 rows omitted)
In [47]:
amino = data.where("NAME",are.containing("phenylalanine"))
amino
Out[47]:
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
[NH2+]C
(CC1=C
C=CC=
C1F)C(O
)=O - R1204234
M1157740 - - o-Fluorophenylalanine - [NH2+]C
(CC1=C
C(F)=CC
=C1)C(O
)=O - R1204236
M1157741 - - m-Fluorophenylalanine - [NH2+]C
(CC1=C
C=C(F)C
=C1)C(O
)=O - R1204238
M1157742 - - p=fluorophenylalanine - [NH2+]C
(CC1=C
C=CC=
C1Cl)C(
O)=O - R1204240
M1157743 - - O-Chlorophenylalanine - [NH2+]C
(CC1=C
C(Cl)=C
C=C1)C(
O)=O - R1204242
M1157744 - - m-chlorophenylalanine - [NH2+]C
(CC1=C
C=C(O)
C(O)=C1
)C(O)=O
- R1204248
M1157747 - - 3,4-
Dihydroxyphenylalanin
e
- [NH2+]C
(CC1=C
C=C(O)
- R1204249
M1157747 - - 3,4-
Dihydroxyphenylalanin
e
-
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SMILES CASRN RECORDID MOLECULEID EXTERNALID N
NAME
NAME.
1
C(O)=C1
)C([O-])
=O
NC(CC1
=CC=C(
[O-])C(O
)=C1)C([
O-])=O - R1204250
M12590 - - 3,4-
Dihydroxyphenylalanin
e
- In [48]:
check('tests/q11.py')
Out[48]:
All tests passed!
Display molecular structure
¶
SMILES
is a shorthand language to describe molecular structure. Execute each structure below.
In [49]:
Chem.MolFromSmiles("[H]-O-[H]") #Water
Out[49]:
In [50]:
Chem.MolFromSmiles("[CH3]") #Methyl radical
Out[50]:
In [51]:
Chem.MolFromSmiles("C-C-O") #Ethanol
Out[51]:
In [52]:
Chem.MolFromSmiles("[NH2+]CC(O)=O") # Glycine
Out[52]:
Selected amino acid 2D molecular structure
¶
Try it out for fun! Use the same above syntax and the SMILES string from your above Table to display a 2D amino acid structurefrom your selection. Even if there is no rdkit, try your hand at the SMILES molecular description.
In [53]:
smile_struct = '[NH2+]C(CC1=CC=CC=C1Cl)C(O)=O'
Chem.MolFromSmiles(smile_struct)
Out[53]:
Code to create a grid of molecular images with labels
¶
Execute and study the below code
In [54]:
mols = [Chem.MolFromSmiles(x) for x in amino.column("SMILES") if x is not None] #Iterator
mols
name = amino.column("NAME")
for i,m in enumerate(mols):
m.SetProp("Name",name[i])
p = Draw.MolsToGridImage( [mols[x] for x in range(0,3)] , legends=[x.GetProp("Name") for x
in mols],molsPerRow=3,subImgSize=(300,250), useSVG=True )
p
Out[54]:
Use pandas to add 2D structures to dataframe
¶
We can convert our Table to pandas then use the RDKit AddMoleculeColumnToFrame method to add structures. One row has an anomolous nitrogen atom, N, so don't be alarmed by the error presented. Occasionally the 2D images of the structures fail to appear, unfortunate but not a cause for concern either.
In [55]:
df = data.to_df()
df
Out[55]:
SMILES
CASRN RECORDID MOLECULEID EXTERNALID
N
NA
0
[O-]C1=C2C=C
C=CC2=NC=N
1
-
R1207641
M20829
-
-
4-Hydroxyquinazo
1
OC1=CC2=CN
=CN=C2C=C1
-
R1207643
M1107327
-
-
6-hydroxyquinazo
2
OC1=CC2=NC
=NC=C2C=C1
-
R1207645
M1107328
-
-
7-hydroxyquinazo
3
CC1=C2C=CC
=C(O)C2=NC=
N1
-
R1207650
M46729
-
-
8-hydroxy-4-meth
4
[S-]c1ncc2cccc
c2[n+]1
-
R1207652
M1158202
-
-
2-mercaptoquina
...
...
...
...
...
...
...
...
1995
OC(=O)COC1=
CC=CC(=C1)
[N+]([O-])=O
1878-
88-2
R2182408
M896
-
99
m-Nitrophenoxya
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
SMILES
CASRN RECORDID MOLECULEID EXTERNALID
N
NA
1996
COC1=CC=CC(
OC)=C1C(=O)N
[C@H]1[C@H]2
SC(C)(C)[C@...
61-32-5 R1509608
M9792
-
99
6-({[2,6-
bis(methyloxy)ph
mino)-...
1997
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OC[C@@H]
(CO[P@@](O)
(=...
-
R1321912
M663928
-
99
1,2-Dioleoylphosp
1998
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OCC(COP(O)
(=O)OCCN)OC..
.
-
R1321798
M659539
-
99
1,2-
Dioleoylphosphat
e
1999
NC1=CC=C(N)
C(O)=C1N
-
R2172809
M2608463
-
nan -
2000 rows × 32 columns
In [56]:
df = data.to_df() # Convert Table to pandas dataframe
PandasTools.AddMoleculeColumnToFrame(df,smilesCol='SMILES',molCol='Molecule',includeFinger
prints=True)
col = df.pop("NAME")
df.insert(0, col.name, col) # Move structure to first column
col = df.pop("Molecule")
df.insert(1, col.name, col) # Move structure to first column
df
[02:03:33] Explicit valence for atom # 11 N, 4, is greater than permitted
Out[56]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
0
4-Hydroxyquinazoline
[O-]C1=C2C=C
C=CC2=NC=N
1
-
R12
1
6-hydroxyquinazoline
OC1=CC2=CN
=CN=C2C=C1
-
R12
2
7-hydroxyquinazoline
OC1=CC2=NC
=NC=C2C=C1
-
R12
3
8-hydroxy-4-methylquinazoline
CC1=C2C=CC
=C(O)C2=NC=
N1
-
R12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
4
2-mercaptoquinazoline
[S-]c1ncc2cccc
c2[n+]1
-
R12
...
...
...
...
...
...
1995
m-Nitrophenoxyacetic Acid
OC(=O)COC1=
CC=CC(=C1)
[N+]([O-])=O
1878-
88-2
R21
1996
6-({[2,6-
bis(methyloxy)phenyl]carbonyl}a
mino)-...
COC1=CC=CC(
OC)=C1C(=O)N
[C@H]1[C@H]2
SC(C)(C)[C@...
61-32-5 R15
1997
1,2-Dioleoylphosphatidylethano
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OC[C@@H]
(CO[P@@](O)
(=...
-
R13
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
NAME
Molecule
SMILES
CASRN REC
1998
1,2-
Dioleoylphosphatidylethanolamin
e
CCCCCCCC\
C=C\
CCCCCCCC(=O
)OCC(COP(O)
(=O)OCCN)OC..
.
-
R13
1999
-
NC1=CC=C(N)
C(O)=C1N
-
R21
2000 rows × 33 columns
pKa data examination
¶
Now we will look at a data set derived from the above data but with computed molecular
attributes for our machine learning. This data set is computed and described by Prof. Vince Voelz
in the Temple Chemistry department and a graduate student, Robert Raddi. See their paper: Stacking Gaussian processes to improve pKa predictions in the SAMPL7 challenge
.
In [57]:
db = pd.read_pickle("data/pKaDatabaseF22.pkl")
db_table = Table().from_df(db) # Datascience Table from pandas dataframe
db
Out[57]:
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
0
methyclothiazide_mic
ro001
methyclothiazide_mic
ro000
C[N@]1[C@@H
]
([N-]c2cc(c(cc2
C[N@]1[C@@
H]
(Nc2cc(c(cc2S
-0.79657
-0.55
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
S1(=O)=O)S(=
O)(=O)N...
1(=O)=O)S(=
O)(=O)N)Cl...
1
sulpiride_micro001
sulpiride_micro000
CC[N@]1CCC[C
@H]1C[N-]C(=
O)c2cc(ccc2OC
)S(=O)(=O)N
CC[N@]1CCC[
C@H]1CNC(=
O)c2cc(ccc2O
C)S(=O)
(=O)N
-0.53903
-0.59
2
celecoxib_micro001
celecoxib_micro000
Cc1ccc(cc1)c2c
c(nn2c3ccc(cc3
)S(=O)(=O)
[NH-])C(...
Cc1ccc(cc1)c2
cc(nn2c3ccc(c
c3)S(=O)
(=O)N)C(F)
(F)F
-1.02861
-1.31
3
metolazone_micro00
1
metolazone_micro00
0
Cc1ccccc1N2[C
@@H]
([N-]c3cc(c(cc3
C2=O)S(=O)
(=O)...
Cc1ccccc1N2[
C@@H]
(Nc3cc(c(cc3C
2=O)S(=O)
(=O)N)Cl)C
-0.74198
-0.55
4
polythiazide_micro00
1
polythiazide_micro00
0
C[N@]1[C@@H
]
([N-]c2cc(c(cc2
S1(=O)=O)S(=
O)(=O)N...
C[N@]1[C@@
H]
(Nc2cc(c(cc2S
1(=O)=O)S(=
O)(=O)N)Cl...
-0.79841
-0.56
...
...
...
...
...
...
...
137
sulfadimethoxine_mic
ro001
sulfadimethoxine_mic
ro000
COc1cc([N-]S(
=O)
(=O)c2ccc(N)cc
2)nc(OC)n1
N([H])
(c1nc(OC([H])
([H])
[H])nc(OC([H]
)([H])[H]...
-0.86987
-0.98
138
sulfamethoxydiazine_
micro001
sulfamethoxydiazine_
micro000
COc1cnc([N-]S(
=O)
(=O)c2ccc(N)cc
2)nc1
N([H])
(c1nc([H])c(O
C([H])([H])
[H])c([H])n1)S
(=...
-0.85693
-0.86
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1
par
cha
(dep
ato
139
sulfisomidine_micro0
01
sulfisomidine_micro0
00
Cc1cc([N-]S(=
O)
(=O)c2ccc(N)cc
2)nc(C)n1
N([H])
(c1nc(C([H])
([H])
[H])nc(C([H])
([H])[H])c...
-0.87755
-0.90
140
sulfamethazine_micr
o001
sulfamethazine_micr
o000
Cc1cc(C)nc([N-
]S(=O)
(=O)c2ccc(N)cc
2)n1
N([H])
(c1nc(C([H])
([H])
[H])c([H])c(C([
H])([H])...
-0.85927
-0.94
141
sulfapyridine_micro0
01
sulfapyridine_micro0
00
Nc1ccc(S(=O)
(=O)
[N-]c2ccccn2)c
c1
N([H])
(c1nc([H])c([H
])c([H])c1[H])
S(=O)
(=O)c1c...
-0.88207
-0.94
3456 rows × 34 columns
We can look at the structures and data on several derivatives of acetic acid by executing the code below
¶
In [58]:
glycine=db_table.where("protonated microstate ID",are.containing("acetic acid")) # Select those data containing acetic acid in the name.
mols = [Chem.MolFromSmiles(x) for x in glycine.column("protonated microstate smiles") if x
is not None]
pKa = glycine.column("pKa")
for i,m in enumerate(mols):
m.SetProp("Name","pKa: "+str(pKa[i]))
p = Draw.MolsToGridImage( [mols[x] for x in range(0,3)] , legends=[x.GetProp("Name") for x
in mols],molsPerRow=2,subImgSize=(300,250), useSVG=True )
p
Out[58]:
protonated microstate smiles
¶
Here we place the pKa which we will predict in the first column. Use SMILES format to display structures. Execute the below cells.
In [59]:
#PandasTools.AddMoleculeColumnToFrame(db,smilesCol='protonated microstate smiles',molCol='Molecule',includeFingerprints=True)
PandasTools.AddMoleculeColumnToFrame(db, smilesCol='protonated microstate smiles', molCol='protonated molecule')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
PandasTools.AddMoleculeColumnToFrame(db, smilesCol='deprotonated microstate smiles', molCol='deprotonated molecule')
col = db.pop('protonated molecule')
db.insert(0, col.name, col)
col = db.pop('deprotonated molecule')
db.insert(1, col.name, col)
col = db.pop("pKa")
db.insert(0, col.name, col)
db=db.sort_values(by='pKa',ascending=False)
db.head()
Out[59]:
pKa
protonated molecule
deprotonated
molecule
deprotonated
microstate ID
protonated
microstate
ID
d
735
19.2
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf84eac70>
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf81de1f0>
2-Methyl-2-
propanol_micro
001
2-Methyl-2-
propanol_micr
o000
C
1323
17.6
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf84aeff0>
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf815e570>
2-
butanol_micro0
01
2-
butanol_micro
000
C
[O
832
17.1
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf8499770>
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf80f4cf0>
2-
Propanol_micro
001
2-
Propanol_micr
o000
C
1200
16.6
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf84c7990>
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf81baf10>
3-
methylindole_
micro001
3-
methylindole_
micro000
C
c
1473
16.4
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf840cdd0>
<rdkit.Chem.rdchem.Mol
object at 0x7fcdf82300b0>
Mandelic acid_micro001
Mandelic acid_micro00
0
c
[C
[O
5 rows × 34 columns
Examine a few acidity and molecular weight distribution
¶
The below code will generate histograms for acidity as measured by pKa and molar molecular weight measured in grams per mole. Execute code and examine output.
In [60]:
fig = plt.figure()
ax = plt.subplot(2,2,1)
ax1 = plt.subplot(2,2,2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
db.sort_values('Weight', ascending=True)
ax = db["Weight"].plot.hist(rot=0, figsize=(14, 4), bins=25, edgecolor='black', linewidth=1.2, ax=ax)
ax.set_xlabel("molecular weight", size=16)
ax.set_ylabel("", size=12)
ax.axvline(x=db['Weight'].mean(), linewidth=4, color='r')
ax1 = db["pKa"].plot.hist(rot=0, figsize=(14, 4), bins=25, edgecolor='black', linewidth=1.2, ax=ax1)#, subplots=True, layout=(2,2))
ax1.set_xlabel(r"$pK_{a}$", size=16)
ax1.set_ylabel("", size=12)
ax1.axvline(x=4, linewidth=4, color='r')
ax1.axvline(x=9, linewidth=4, color='r')
fig = ax1.get_figure()
fig.savefig("MW_dist.pdf")
Look at pKa and molecular attribute relationships
¶
Here we will plot some of the attributes to see if there is a relationship between their values and the pKa we are trying to predict. Execute these cells.
In [61]:
db.plot.scatter("Weight","pKa")
Out[61]:
<Axes: xlabel='Weight', ylabel='pKa'>
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [62]:
db.plot.scatter("AM1BCC partial charge (prot. atom)","pKa")
Out[62]:
<Axes: xlabel='AM1BCC partial charge (prot. atom)', ylabel='pKa'>
In [63]:
db.plot.scatter("Bond Order","pKa")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Out[63]:
<Axes: xlabel='Bond Order', ylabel='pKa'>
Nearest Neighbor
¶
Let's restrict our consideration to acids with 0 < pKa < 7. The reason may be evident from the distribution in the histogram of pKa's above with two peaks, one around 4 and another at 8. The negative pKa's are outliers which also will be difficult to predict. For machine learning we will also drop the 2D molecular structures.
In [64]:
dblow = db[db["pKa"].values<7]
dblow = dblow[dblow["pKa"].values>0]
List attribute columns with index
In [65]:
for (i, item) in enumerate(list(dblow.columns)):
print(i, item)
0 pKa
1 protonated molecule
2 deprotonated molecule
3 deprotonated microstate ID
4 protonated microstate ID
5 deprotonated microstate smiles
6 protonated microstate smiles
7 AM1BCC partial charge (prot. atom)
8 AM1BCC partial charge (deprot. atom)
9 AM1BCC partial charge (prot. atoms 1 bond away)
10 AM1BCC partial charge (deprot. atoms 1 bond away)
11 AM1BCC partial charge (prot. atoms 2 bond away)
12 AM1BCC partial charge (deprot. atoms 2 bond away)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
13 Gasteiger partial charge (prot. atom)
14 Gasteiger partial charge (deprot. atom)
15 Gasteiger partial charge (prot. atoms 1 bond away)
16 Gasteiger partial charge (deprot. atoms 1 bond away)
17 Gasteiger partial charge (prot. atoms 2 bond away)
18 Gasteiger partial charge (deprot. atoms 2 bond away)
19 Extented Hückel partial charge (prot. atom)
20 Extented Hückel partial charge (deprot. atom)
21 Extented Hückel partial charge (prot. atoms 1 bond away)
22 Extented Hückel partial charge (deprot. atoms 1 bond away)
23 Extented Hückel partial charge (prot. atoms 2 bond away)
24 Extented Hückel partial charge (deprot. atoms 2 bond away)
25 ∆G_solv (kJ/mol) (prot-deprot)
26 SASA (Shrake)
27 SASA (Lee)
28 Bond Order
29 Change in Enthalpy (kJ/mol) (prot-deprot)
30 href
31 Weight
32 num ionizable groups
33 pKa source
Selection of attributes/features for training and prediction
¶
We need too select the features that we will use in the training. These will include the charges computed for key atoms adjacent to the acidic proton (H+) in columns (7-12) using the AM1BCC method, ∆G_solv (kJ/mol) (prot-deprot) in column 25,solvent accessible surface area (SASA) in column 26, bond order in column 28, Change in Enthalpy (kJ/mol) (prot-deprot) in column 29, and molecular weight in column 32. These are the 11 attributes features we will use. We also keep the labels and SMILES as well as the pKa we will train on.
In [66]:
molecular = Table().from_df(dblow) # Now back to Table
molecular = molecular.select(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25, 26, 28, 29, 32)
molecular
Out[66]:
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.98
Fenpropimorph_micro
001 Fenpropimorph_micro
000 C[C@@H]1CN(
C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
C[C@@H]1C[N
H+](C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
-0.68511 -0
6.95 Imidazole_micro001 Imidazole_micro000 c1cnc[n-]1 c1cnc[nH]1 -0.32082 -0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.95
3-methoxy-6-
mercaptopyridazine_
micro001 3-methoxy-6-
mercaptopyridazine_
micro000 COc1ccc(nn1)
[S-] COc1ccc(nn1)S -0.31734 -0
6.95
2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro001 2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro000 C[N@@]
(CCCc1ccccc1)
CCC#N C[N@@H+]
(CCCc1ccccc1)
CCC#N -0.6868 -0
6.95
1-
Methylimidazole_micr
o001 1-
Methylimidazole_micr
o000 Cn1ccnc1 Cn1cc[nH+]c1 -0.13447 -0
6.95
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro001
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro000
CCC(=O)C(CCN
1CCOCC1)
(c2ccccc2)c3cc
ccc3 CCC(=O)C(CC[
NH+]1CCOCC1)
(c2ccccc2)c3cc
ccc3 -0.68628 -0
6.95
2-bis(2-
chloroethyl)aminopro
pane_micro001 2-bis(2-
chloroethyl)aminopro
pane_micro000 CC(C)N(CCCl)C
CCl CC(C)[NH+]
(CCCl)CCCl -0.70654 -0
6.94
4-
mercaptopyrimidine_
micro001 4-
mercaptopyrimidine_
micro000 c1cncnc1[S-] c1cncnc1S -0.31642 -0
6.94
Barbituric acid_micro001 Barbituric acid_micro000 CC[C@]1(C(=O)
NC(=O)
[N-]C1=O)c2cc
c(cc2)[N+](=O)
[O-] CCC1(C(=O)NC
(=O)NC1=O)c2
ccc(cc2)[N+]
(=O)[O-] -0.59474 -0
6.92
2,3-
diaminobutane_micro
001 2,3-
diaminobutane_micro
000 C[C@@H]
([C@H](C)
[NH-])[NH3+] C[C@@H]
([C@H](C)N)
[NH3+] -0.92502 -0
... (2035 rows omitted)
Train, test split
¶
Question 12
¶
Split the molecular Table into train and test data using 80% for training and remembering that the split must be an integer using int() function. Again we will select
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
certain rowsas attributes.
In [67]:
train, test = molecular.split(int(molecular.num_rows*0.8))
print(train.num_rows, 'training and', test.num_rows, 'test instances.')
train.show(3)
1636 training and 409 test instances.
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
AM1BCC
partial
charge
(deprot.
atom)
AM
p
c
(
at
a
4.37
4-Methyl-benzoic acid_micro001 4-Methyl-benzoic acid_micro000 Cc1ccc(cc1)C(
=O)[O-] Cc1ccc(cc1)
C(=O)O -0.6085 -0.83399 0.6
2.45
2-iodo-4-
methylthioaniline_
micro001
2-iodo-4-
methylthioaniline_
micro000
CSc1ccc(c(c1)I)
[NH-] CSc1ccc(c(c
1)I)N -0.87739 -0.68227 0.2
3.2 2-
Fluoroaniline_micr
o001 2-
Fluoroaniline_micr
o000 c1ccc(c(c1)
[NH-])F c1ccc(c(c1)N
)F -0.81357 -0.72807 0.1
... (1633 rows omitted)
In [68]:
check('tests/q12.py')
Out[68]:
All tests passed!
Our k nearest neighbors code
¶
Remember our the k nearest neighbor code from above which wewill again use here.
def row_distance(row1, row2): """The distance between two rows of a table.""" return distance(np.array(row1), np.array(row2)) def distances(training, example, output): """Compute the distance from example for each row in training.""" dists = [] attributes = training.drop(output) for row in attributes.rows: dists.append(row_distance(row, example)) return training.with_column('Distance', dists)
def closest(training, example, k, output): """Return a table of the k closest neighbors to example.""" return distances(training, example, output).sort('Distance').take(np.arange(k))
Test algorithm
¶
Execute these cells to define the predict_nn function for pKa, pick an example row, predict and compare.
In [69]:
def predict_nn(example):
"""Return the majority class among the k nearest neighbors."""
k = 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
return np.average(closest(train.drop(1,2,3,4), example, k, 'pKa').column('pKa'))
Examine 1 row in test set to try to predict
In [70]:
test.drop(1,2,3,4).take(100)
Out[70]:
pKa
AM1BCC
partial
charge
(prot.
atom)
AM1BCC
partial
charge
(deprot.
atom)
AM1BCC
partial
charge
(prot.
atoms 1
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 1
bond
away)
AM1BCC
partial
charge
(prot.
atoms 2
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 2
bond
away)
∆G_sol
v
(kJ/mol
) (prot-
deprot)
SASA
(Shrake
)
Bond
Order
5.3 -0.8744 -0.86723 0.19836 0.314295 -0.3455 -0.423077 3.61438 8.62053 0.715061
In [71]:
# Look at closest in training set to test row, need to drop pKa from test
k = 10
closest(train.drop(1,2,3,4), test.drop(0,1,2,3,4).row(100), k, 'pKa').select('pKa','Distance')
Out[71]:
pKa Distance
4.26 1.73079 6.08 2.15534 0.98 2.17089 4.75 2.20239 1.33 2.27305 4.98 2.46338 4.66 2.70047 2.49 2.70827 5.16 2.72613 5.4 2.76962 If we use test data in both cases we get exact match (Distance = 0) and no training, not machine learning but matching!
¶
In [72]:
closest(test.drop(1,2,3,4), test.drop(0,1,2,3,4).row(100), k, 'pKa').select('pKa','Distance')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Out[72]:
pKa Distance
5.3 0 0.33 1.07805 4.45 2.95984 5.41 2.99021 4.7 3.09566 2.72 3.09574 2.58 3.11376 0.2 3.1951 1.4 3.3731 5.95 3.38494 Histogram of experimental acidity to be predicted
¶
Question
¶
Make a histogram of acidity measured by pKa in the training data
In [74]:
train.hist('pKa')
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 13
Prediction time
¶
Now predict the pKa of the 10 molecule in the test_nn dataset using predict. We need to drop the experimental pKa and descriptors in the first 4 columns to create and example_nn_row with the attributes for the k nearest neighbor. Discuss the quality of the
fit and the name of the name of the molecule from column 1. Repeat for two more rows and discuss the prediction quality. Keep in mind that the prediction of pKa is a very challenging task for machine learning.
In [75]:
example_nn_row = test.drop(0,1,2,3,4).row(9)
example_nn_row
Out[75]:
Row(AM1BCC partial charge (prot. atom)=-0.61150002394887537, AM1BCC partial charge (deprot. atom)=-0.8293499943046343, AM1BCC partial charge (prot. atoms 1 bond away)=0.65188002671030432, AM1BCC partial charge (deprot. atoms 1 bond away)=0.90706998145296458, AM1BCC partial charge (prot. atoms 2 bond away)=-
0.32791000892492861, AM1BCC partial charge (deprot. atoms 2 bond away)=-
0.46774499827907201, ∆G_solv (kJ/mol) (prot-deprot)=285.65048792778322, SASA (Shrake)=16.009556162693585, Bond Order=0.58754159243616033, Change in Enthalpy (kJ/mol) (prot-deprot)=2.8655543841587572, num ionizable groups=1.0)
In [77]:
example_nn_row_table = test.drop(0,1,2,3,4).take(9) # For display and verification
example_nn_row_table
Out[77]:
AM1BCC
partial
charge
(prot.
atom)
AM1BCC
partial
charge
(deprot.
atom)
AM1BCC
partial
charge
(prot.
atoms 1
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 1
bond
away)
AM1BCC
partial
charge
(prot.
atoms 2
bond
away)
AM1BCC
partial
charge
(deprot.
atoms 2
bond
away)
∆G_sol
v
(kJ/mo
l)
(prot-
depro
t)
SASA
(Shrake
)
Bond
Order
Chan
in
Entha
(kJ/m
(pro
depr
-0.6115 -0.82935 0.65188 0.90707 -0.32791 -0.467745 285.65 16.0096 0.587542 2.865
In [78]:
predict_nn(example_nn_row)
Out[78]:
3.5556000000000005
In [79]:
print('Experimental pKa:', test.column('pKa').item(9))
print('Predicted pKa using nearest neighbors:', round(predict_nn(example_nn_row),2))
Experimental pKa: 4.17
Predicted pKa using nearest neighbors: 3.56
In [80]:
check('tests/q13.py')
Out[80]:
All tests passed!
Now let's plot knn prediction success
¶
Execute the next three cells
In [81]:
exp_pKa = make_array()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
predict_pKA = make_array()
In [82]:
# This takes a while!
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_nn(example_nn_row) )
In [83]:
plt.scatter(exp_pKa,predict_pKA)
#calculate equation for regression line
z = np.polyfit(exp_pKa,predict_pKA, 1)
p = np.poly1d(z)
#add trendline to plot
plt.plot(exp_pKa, p(exp_pKa),'blue',label="{}".format(p)) # Equation of line placed in legend from label
plt.xlabel("Experimental pKa")
plt.ylabel("Predicted pKa")
plt.legend(fontsize="small")
plt.show()
Conclusions on our k nearest neighbor model
¶
Question 14
¶
Evaluate the overall quality of our machine learning prediction based on the above plot and your 3 predictions above.
I think our machine learning prediction overall is good quality since there are strong correlations between the predictions on the above plot.
Now we will try a few values for k to try to optimize this hyperparameter. We need a new
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
version of predict_nn
that also has an argument of k.
In [84]:
def predict_knn(example,k):
"""Return the majority class among the k nearest neighbors."""
return np.average(closest(train.drop(1,2,3,4), example, k, 'pKa').column('pKa'))
In [88]:
for k in [5,7,10,15,20]:
exp_pKa = make_array()
predict_pKA = make_array()
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_knn(example_nn_row,k) )
plt.scatter(exp_pKa,predict_pKA)
z = np.polyfit(exp_pKa,predict_pKA, 1)
p = np.poly1d(z)
plt.plot(exp_pKa, p(exp_pKa),'blue',label="{}".format(p)) # Equation of line placed in
legend from label
plt.xlabel("Experimental pKa")
plt.ylabel("Predicted pKa")
plt.title("k = "+str(k))
plt.legend(fontsize="small")
plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question:
Which value of k
makes the best estimation?
In [89]:
k = 10
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [90]:
check('tests/q14.py')
Out[90]:
All tests passed!
Extra (advanced): Try to vary the set of parameters/attributes by using fewer attributes or where choices
exist such as using Gasteiger partial charges instead of AM1BCC or removing moolecular weight or other attributes.
¶
In [91]:
for (i, item) in enumerate(list(dblow.columns)): # List of attributes
print(i, item)
0 pKa
1 protonated molecule
2 deprotonated molecule
3 deprotonated microstate ID
4 protonated microstate ID
5 deprotonated microstate smiles
6 protonated microstate smiles
7 AM1BCC partial charge (prot. atom)
8 AM1BCC partial charge (deprot. atom)
9 AM1BCC partial charge (prot. atoms 1 bond away)
10 AM1BCC partial charge (deprot. atoms 1 bond away)
11 AM1BCC partial charge (prot. atoms 2 bond away)
12 AM1BCC partial charge (deprot. atoms 2 bond away)
13 Gasteiger partial charge (prot. atom)
14 Gasteiger partial charge (deprot. atom)
15 Gasteiger partial charge (prot. atoms 1 bond away)
16 Gasteiger partial charge (deprot. atoms 1 bond away)
17 Gasteiger partial charge (prot. atoms 2 bond away)
18 Gasteiger partial charge (deprot. atoms 2 bond away)
19 Extented Hückel partial charge (prot. atom)
20 Extented Hückel partial charge (deprot. atom)
21 Extented Hückel partial charge (prot. atoms 1 bond away)
22 Extented Hückel partial charge (deprot. atoms 1 bond away)
23 Extented Hückel partial charge (prot. atoms 2 bond away)
24 Extented Hückel partial charge (deprot. atoms 2 bond away)
25 ∆G_solv (kJ/mol) (prot-deprot)
26 SASA (Shrake)
27 SASA (Lee)
28 Bond Order
29 Change in Enthalpy (kJ/mol) (prot-deprot)
30 href
31 Weight
32 num ionizable groups
33 pKa source
Selection of attributes
In [92]:
molecular = Table().from_df(dblow) # Now back to Table
molecular=molecular.select(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25, 26, 28, 29, 32) # Change these
molecular
Out[92]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.98
Fenpropimorph_micro
001 Fenpropimorph_micro
000 C[C@@H]1CN(
C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
C[C@@H]1C[N
H+](C[C@@H]
(O1)C)C[C@H]
(C)Cc2ccc(cc2)
C(C)(C)C
-0.68511 -0
6.95 Imidazole_micro001 Imidazole_micro000 c1cnc[n-]1 c1cnc[nH]1 -0.32082 -0
6.95
3-methoxy-6-
mercaptopyridazine_
micro001 3-methoxy-6-
mercaptopyridazine_
micro000 COc1ccc(nn1)
[S-] COc1ccc(nn1)S -0.31734 -0
6.95
2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro001 2-(N-(2-cyanoethyl)-
N-
methyl)aminopropylb
enzene_micro000 C[N@@]
(CCCc1ccccc1)
CCC#N C[N@@H+]
(CCCc1ccccc1)
CCC#N -0.6868 -0
6.95
1-
Methylimidazole_micr
o001 1-
Methylimidazole_micr
o000 Cn1ccnc1 Cn1cc[nH+]c1 -0.13447 -0
6.95
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro001
Morpholine, N-(3-
ethylcarbonyl-3,3-
diphenyl)propyl-
_micro000
CCC(=O)C(CCN
1CCOCC1)
(c2ccccc2)c3cc
ccc3 CCC(=O)C(CC[
NH+]1CCOCC1)
(c2ccccc2)c3cc
ccc3 -0.68628 -0
6.95
2-bis(2-
chloroethyl)aminopro
pane_micro001 2-bis(2-
chloroethyl)aminopro
pane_micro000 CC(C)N(CCCl)C
CCl CC(C)[NH+]
(CCCl)CCCl -0.70654 -0
6.94
4-
mercaptopyrimidine_
micro001 4-
mercaptopyrimidine_
micro000 c1cncnc1[S-] c1cncnc1S -0.31642 -0
6.94
Barbituric acid_micro001 Barbituric acid_micro000 CC[C@]1(C(=O)
NC(=O)
[N-]C1=O)c2cc
c(cc2)[N+](=O)
[O-] CCC1(C(=O)NC
(=O)NC1=O)c2
ccc(cc2)[N+]
(=O)[O-] -0.59474 -0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated
microstate ID
protonated
microstate ID
deprotonated
microstate
smiles
protonated
microstate
smiles
AM1BCC
partial
charge
(prot.
atom)
A
p
c
(d
6.92
2,3-
diaminobutane_micro
001 2,3-
diaminobutane_micro
000 C[C@@H]
([C@H](C)
[NH-])[NH3+] C[C@@H]
([C@H](C)N)
[NH3+] -0.92502 -0
... (2035 rows omitted)
Now test by copying appropriate code from above
In [93]:
exp_pKa = make_array()
predict_pKA = make_array()
In [94]:
# This takes a while!
for i in np.arange(test.num_rows):
exp_pKa = np.append(exp_pKa,test.column('pKa').item(i))
example_nn_row = test.drop(0,1,2,3,4).row(i)
predict_pKA = np.append(predict_pKA,predict_nn(example_nn_row) )
In [95]:
# Plot sns.regplot(x=predict_pKA,y=exp_pKa)
Out[95]:
<Axes: >
In [96]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
# Comments: Now demonstrate knn from scikit learn
¶
scikit learn is a standard and powerful machine learning library. Below is a demonstration for your information of the same machine learning task using scikit learn. Execute the below cells.
In [97]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
In [98]:
knn = KNeighborsRegressor(n_neighbors=15, weights='distance',p=1)
In [99]:
X = make_array()
attributes = train.drop('pKa',1,2,3,4)
for i in np.arange(attributes.num_rows):
X=np.append(X,np.array(attributes.row(i)))
X=X.reshape(attributes.num_rows,len(attributes))
In [100]:
y=train.column('pKa')
y
Out[100]:
array([ 4.37, 2.45, 3.2 , ..., 3.14, 2.85, 3.54])
In [101]:
knn.fit(X,y)
Out[101]:
KNeighborsRegressor(n_neighbors=15, p=1, weights='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor
KNeighborsRegressor(n_neighbors=15, p=1, weights='distance')
Now test attributes
¶
In [102]:
attributes = test.drop('pKa',1,2,3,4)
Xtest = make_array()
for i in np.arange(attributes.num_rows):
Xtest=np.append(Xtest,np.array(attributes.row(i)))
Xtest=Xtest.reshape(attributes.num_rows,len(attributes))
In [103]:
ytest=test.column('pKa')
In [104]:
y_predicted = knn.predict(Xtest)
predict_nn = test.with_columns("pKa predict",y_predicted)
In [105]:
plt.scatter(ytest,y_predicted)
#calculate equation for regression line
z = np.polyfit(ytest,y_predicted, 1)
p = np.poly1d(z)
# Label with equation
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
print(p)
#add trendline to plot
plt.plot(ytest, p(ytest),'red',label="{}".format(p))
plt.legend(fontsize="small")
plt.show()
0.2374 x + 2.937
Return the coefficient of determination of the prediction.
¶
The coefficient of determination $R^2$ is defined as $$ (1-\frac{u}{v}) $$ u is the residual sum of squares ((y_true - y_pred)** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2 = 0.0 $
In [106]:
knn.score(Xtest, ytest) # knn score 0-1 Out[106]:
0.22954352412772705
Final fancy plotting of select molecules
¶
Question 15
¶
Use a part of a molecular name to see if it is included in the test data and then execute the code to examine structures. Structures will be default structures if rdkit is not available.
In [107]:
molecular_name = 'methyl'
predict_nn.where("protonated microstate ID",are.containing(molecular_name))
Out[107]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
pKa
deprotonated microstate
ID
protonated microstate ID
deprotonated
microstate
smiles
protonate
microstat
smiles
3.56 b'(2-butanoyloxy-4-hydroxy-
4-oxobutyl)-
trimethylazanium' ...
b'(2-butanoyloxy-4-hydroxy-
4-oxobutyl)-
trimethylazanium' ...
CCCC(=O)O[C
@H](CC(=O)
[O-])C[N+](C)
(C)C CCCC(=O)O
@H]
(CC(=O)O)C
+](C)(C)C 4.61 2-amino-4-
methylaniline_micro001 2-amino-4-
methylaniline_micro000 Cc1ccc(c(c1)N)
[NH-] Cc1ccc(c(c1
)N 3.16 3-cyano-2-methyl-benzoic acid_micro001 3-cyano-2-methyl-benzoic acid_micro000 Cc1c(cccc1C(=
O)[O-])C#N Cc1c(cccc1C
=O)O)C#N 3.45 2-methylthioaniline_micro001 2-methylthioaniline_micro000
CSc1ccccc1[NH
-] CSc1ccccc1
3.305
methylene-bis-thioacetic acid_micro001 methylene-bis-thioacetic acid_micro000 C(C(=O)O)SCS
CC(=O)[O-] C(C(=O)O)S
CC(=O)O 4.84 alpha,alpha-diphenyl-adipic acid-alpha-methylhalfester_m
...
alpha,alpha-diphenyl-adipic acid-alpha-methylhalfester_m
...
COC(=O)C(CCC
C(=O)[O-])
(c1ccccc1)c2cc
ccc2 COC(=O)C(C
CC(=O)O)
(c1ccccc1)c
cccc2 4.88 2,4-dimethyl-8-
methylthioquinoline_micro00
1 2,4-dimethyl-8-
methylthioquinoline_micro00
0 Cc1cc(nc2c1cc
cc2SC)C Cc1cc([nH+
2c1cccc2SC
5.36 trans-hexahydro-3,6-
endomethylene-o-phthalic acid_micro001 trans-hexahydro-3,6-
endomethylene-o-phthalic acid_micro000 C1C[C@H]2C[C
@@H]1[C@H]
([C@H]2C(=O)
[O-])C(=O)[O-]
C1C[C@H]2
C@@H]1[C@
]
([C@H]2C(=
[O-])C(=O)O
3.12 2-amino-7-
methylsulphonylnaphthalene
_micro001 2-amino-7-
methylsulphonylnaphthalene
_micro000 CS(=O)
(=O)c1ccc2ccc(
cc2c1)[NH-] CS(=O)
(=O)c1ccc2
c(cc2c1)N 2 Morpholine, N-cyanomethyl-
_micro001 Morpholine, N-cyanomethyl-
_micro000 C1COCCN1CC#
N C1COCC[NH
1CC#N ... (78 rows omitted)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
In [108]:
check('tests/q15.py')
Out[108]:
All tests passed!
In [109]:
glycine=predict_nn.where("protonated microstate ID",are.containing(molecular_name))
mols = [Chem.MolFromSmiles(x) for x in glycine.column("protonated microstate smiles") if x
is not None]
molde = [Chem.MolFromSmiles(x) for x in glycine.column("deprotonated microstate smiles") if x is not None]
mol = [None] * 2 * glycine.num_rows
mol[0::2]=mols
mol[1::2]=molde
label = [None] * 2 * glycine.num_rows
lpred = [None] * 2 * glycine.num_rows
exp = glycine.column("pKa")
pred = glycine.column("pKa predict")
label[0::2] = exp
label[1::2] = exp
lpred[0::2] = pred
lpred[1::2] = pred
for i,m in enumerate(mol):
m.SetProp("Name","pKa: "+str(np.round(label[i],2))+" knn: "+str(np.round(lpred[i],2)))
p = Draw.MolsToGridImage( [mol[x] for x in range(0,(2 * glycine.num_rows))] , legends=[x.GetProp("Name") for x in mol],molsPerRow=2,subImgSize=(300,250))
print("\t\tProtonated","\t\t\tDeprotonated")
p
Protonated Deprotonated
Out[109]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
All finished...
¶
Run checks and submit .html and .ipynb files after downloading.
In [110]:
# For your convenience, you can run this cell to run all the tests at once!
import glob
from gofer.ok import check
correct = 0
checks = [1,2,3,4,6,7,8,11,12,13,14,15]
total = len(checks)
for x in checks:
print('Testing question {}: '.format(str(x)))
g = check('tests/q{}.py'.format(str(x)))
if g.grade == 1.0:
print("Passed")
correct += 1
else:
print('Failed')
display(g)
print('Grade: {}'.format(str(correct/total)))
print("Nice work ",Your_name)
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)
Testing question 1: Passed
Testing question 2: Passed
Testing question 3: Passed
Testing question 4: Passed
Testing question 6: Passed
Testing question 7: Passed
Testing question 8: Passed
Testing question 11: Passed
Testing question 12: Passed
Testing question 13: Passed
Testing question 14: Passed
Testing question 15: Passed
Grade: 1.0
Nice work Madlyn Anglin
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Submitted @ Fri Dec 1 02:16:01 2023
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Recommended textbooks for you
- Principles of Information Systems (MindTap Course...Computer ScienceISBN:9781305971776Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781337097536Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
- Information Technology Project ManagementComputer ScienceISBN:9781337101356Author:Kathy SchwalbePublisher:Cengage LearningOperations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks Cole

Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole