Absolute Quantitation

Notebook Code: Notebook Prose:

A common goal in chromatography is to quantify with physically meaningful units the concentration of an analyte in a solution. While Chromatography will not give that to you directly off the instrument, you can prepare a “standard curve”–a set of solutions where you know the concentration of the analyte of interest. With a properly configured machine, one can make a direct linear relation between the integrated area of a peak and the concentration of the analyte. In this tutorial, we will use hplc-py to quantify a standard curve of a lactose solution and then use the .map_peaks method of the Chromatogram object to test our calibration curve.

Generating a Calibration Curve

Here, we will use hplc-py to quantify aqueous solutions of lactose in different concentrations. These files have been preprocessed to have the known lactose concentration in the file name.

[1]:

import glob

# Get the list of files
files = glob.glob('data/calibration/lactose*.csv')
print(files[0])

data/calibration/lactose_mM_6.csv

We can load this file into memory as a chromatogram using the load_chromatogram function from the io module and instantiate a Chromatogram object.

[2]:

from hplc.io import load_chromatogram
from hplc.quant import Chromatogram

# Load and display the first file.
df = load_chromatogram(files[0], cols=['time', 'signal'])
chrom = Chromatogram(df)
chrom.show()

[2]:

[<Figure size 640x480 with 1 Axes>, <Axes: xlabel='time', ylabel='signal'>]

../_images/tutorials_calibration_curve_5_1.png

As a reminder, we can quickly quantify this single peak by calling the .fit_peaks method.

[3]:

# Quantify the peak
peaks = chrom.fit_peaks(verbose=False)
chrom.show()
peaks.head()

[3]:

	retention_time	scale	skew	amplitude	area	signal_maximum	peak_id
0	13.56	0.281228	1.654595	8004.240816	960508.897906	15977.970977	1

../_images/tutorials_calibration_curve_7_1.png

While it’s useful to know the various parameters returned by the fitting, we are fundamen We are interested in the integrated area of the peak (integrated over the entire duration of the chromatogram). Using a for loop and getting the concentration of lactose from each file name, we can generate a new Pandas DataFrame which will hold the calibration information.

[4]:

import pandas as pd
# Set up a blank dataframe for the calibration curve.
cal_curve = pd.DataFrame([])

# Iterate through each file and perform the quantitation
for f in files:
    df = load_chromatogram(f, cols=['time', 'signal'])
    chrom = Chromatogram(df)
    peaks = chrom.fit_peaks(verbose=False)

    # Get the concentration of lactose from the file name
    conc = float(f.split('_')[-1][:-4])

    # Add the concentration to the peak table and add it
    # to the instantiated calibration dataframe
    peaks['conc_mM'] = conc
    cal_curve = pd.concat([cal_curve, peaks])

cal_curve

[4]:

retention_time	scale	skew	amplitude	area	signal_maximum	peak_id	conc_mM
13.56	0.281228	1.654595	8004.240816	960508.897906	15977.970977	1	6.0
13.56	0.278886	1.627672	747.107260	89652.871210	1496.949321	1	0.5
13.56	0.278874	1.629961	1540.315414	184837.849638	3087.620065	1	1.0
13.56	0.280349	1.644179	3896.489630	467578.755562	7787.982871	1	3.0

We can now plot the peak area as a function of time, which we expect to appear linear.

[5]:

import matplotlib.pyplot as plt

# Plot the calibration curve.
plt.plot(cal_curve['conc_mM'], cal_curve['amplitude'], 'o', markersize=10)
plt.xlabel('lactose concentration [mM]')
plt.ylabel('integrated peak area [a.u.]')

[5]:

Text(0, 0.5, 'integrated peak area [a.u.]')

../_images/tutorials_calibration_curve_11_1.png

We can perform a simple regression on these data to get a calibration curve.

[6]:

import numpy as np
from scipy.stats import linregress

#  Compute the best fit calibration curve
fit_params = linregress(cal_curve['conc_mM'], cal_curve['area'])
slope = fit_params[0]
intercept = fit_params[1]

# Plot the fit over the data
conc_range = np.linspace(0, 8, 100)
cal = intercept + slope * conc_range
plt.plot(cal_curve['conc_mM'], cal_curve['area'], 'o', markersize=10, label='measurement')
plt.plot(conc_range, cal, '-', color='k', label='fit')
plt.xlabel('lactose concentration [mM]')
plt.ylabel('integrated peak area [a.u.]')
plt.legend()

[6]:

<matplotlib.legend.Legend at 0x14fca6860>

../_images/tutorials_calibration_curve_13_1.png

Testing the Calibration

We also have a set of lactose solutions with known concentrations that we did not use when fitting the calibration curve. We can use the .map_peaks method when quantifying these test data to see if we get the same concentrations out that we know the peaks represent.

[7]:

# Load the test data
files = glob.glob('data/test/lactose*.csv')

# Instantiate a dataframe to store the results
test_data = pd.DataFrame([])

# Iterate through each file and quantify the peaks
for f in files:
    df = load_chromatogram(f, cols=['time', 'signal'])
    chrom = Chromatogram(df)
    peaks =  chrom.fit_peaks(verbose=False)

    # Now, use the map_peaks method to quantify the signal based off our
    # calibration curve
    mapping = {'lactose': {'retention_time': 13.56,
                           'slope': slope,
                           'intercept': intercept,
                           'unit': 'mM'}}
    measured_conc = chrom.map_peaks(params=mapping)

    # Parse the known concentration from the file name
    known_conc = float(f.split('_')[-1][:-4])

    # Add it to the dataframe and concatenate
    measured_conc['true_conc_mM']  = known_conc
    test_data  = pd.concat([test_data, measured_conc])
test_data

[7]:

retention_time	scale	skew	amplitude	area	signal_maximum	peak_id	compound	concentration	unit	true_conc_mM
13.56	0.281437	1.664340	10715.395193	1.285847e+06	21414.173961	1	lactose	8.118513	mM	8.0
13.56	0.280571	1.649106	5316.475249	6.379770e+05	10627.640549	1	lactose	3.981019	mM	4.0
13.56	0.279941	1.638944	2600.265354	3.120318e+05	5201.011260	1	lactose	1.899435	mM	2.0
13.56	0.279515	1.636055	2154.007783	2.584809e+05	4312.738273	1	lactose	1.557443	mM	1.5

It looks like it’s in good agreement! We can confirm this by plotting the measured value versus the true value. If in agreement, everything should fall on the identity line.

[8]:

# Plot the measured versus known value of the test set
plt.plot(test_data['true_conc_mM'], test_data['concentration'], 'o',
         markersize=10, label='measurements')
plt.plot([0, 10], [0, 10], 'k--', label='equivalence')
plt.xlabel('true lactose concentration [mM]')
plt.ylabel('measured lactose concentration [mM]')
plt.legend()

[8]:

<matplotlib.legend.Legend at 0x14fd33f40>

../_images/tutorials_calibration_curve_17_1.png