Absolute Quantitation

Notebook Code: License: MIT Notebook Prose: License: CC BY 4.0


A common goal in chromatography is to quantify with physically meaningful units the concentration of an analyte in a solution. While Chromatography will not give that to you directly off the instrument, you can prepare a “standard curve”–a set of solutions where you know the concentration of the analyte of interest. With a properly configured machine, one can make a direct linear relation between the integrated area of a peak and the concentration of the analyte. In this tutorial, we will use hplc-py to quantify a standard curve of a lactose solution and then use the .map_peaks method of the Chromatogram object to test our calibration curve.

Generating a Calibration Curve

Here, we will use hplc-py to quantify aqueous solutions of lactose in different concentrations. These files have been preprocessed to have the known lactose concentration in the file name.

[1]:
import glob

# Get the list of files
files = glob.glob('data/calibration/lactose*.csv')
print(files[0])
data/calibration/lactose_mM_6.csv

We can load this file into memory as a chromatogram using the load_chromatogram function from the io module and instantiate a Chromatogram object.

[2]:
from hplc.io import load_chromatogram
from hplc.quant import Chromatogram

# Load and display the first file.
df = load_chromatogram(files[0], cols=['time', 'signal'])
chrom = Chromatogram(df)
chrom.show()
[2]:
[<Figure size 640x480 with 1 Axes>, <Axes: xlabel='time', ylabel='signal'>]
../_images/tutorials_calibration_curve_5_1.png

As a reminder, we can quickly quantify this single peak by calling the .fit_peaks method.

[3]:
# Quantify the peak
peaks = chrom.fit_peaks(verbose=False)
chrom.show()
peaks.head()
[3]:
retention_time scale skew amplitude area signal_maximum peak_id
0 13.56 0.281228 1.654595 8004.240816 960508.897906 15977.970977 1
../_images/tutorials_calibration_curve_7_1.png

While it’s useful to know the various parameters returned by the fitting, we are fundamen We are interested in the integrated area of the peak (integrated over the entire duration of the chromatogram). Using a for loop and getting the concentration of lactose from each file name, we can generate a new Pandas DataFrame which will hold the calibration information.

[4]:
import pandas as pd
# Set up a blank dataframe for the calibration curve.
cal_curve = pd.DataFrame([])

# Iterate through each file and perform the quantitation
for f in files:
    df = load_chromatogram(f, cols=['time', 'signal'])
    chrom = Chromatogram(df)
    peaks = chrom.fit_peaks(verbose=False)

    # Get the concentration of lactose from the file name
    conc = float(f.split('_')[-1][:-4])

    # Add the concentration to the peak table and add it
    # to the instantiated calibration dataframe
    peaks['conc_mM'] = conc
    cal_curve = pd.concat([cal_curve, peaks])

cal_curve
[4]:
retention_time scale skew amplitude area signal_maximum peak_id conc_mM
0 13.56 0.281228 1.654595 8004.240816 960508.897906 15977.970977 1 6.0
0 13.56 0.278886 1.627672 747.107260 89652.871210 1496.949321 1 0.5
0 13.56 0.278874 1.629961 1540.315414 184837.849638 3087.620065 1 1.0
0 13.56 0.280349 1.644179 3896.489630 467578.755562 7787.982871 1 3.0

We can now plot the peak area as a function of time, which we expect to appear linear.

[5]:
import matplotlib.pyplot as plt

# Plot the calibration curve.
plt.plot(cal_curve['conc_mM'], cal_curve['amplitude'], 'o', markersize=10)
plt.xlabel('lactose concentration [mM]')
plt.ylabel('integrated peak area [a.u.]')
[5]:
Text(0, 0.5, 'integrated peak area [a.u.]')
../_images/tutorials_calibration_curve_11_1.png

We can perform a simple regression on these data to get a calibration curve.

[6]:
import numpy as np
from scipy.stats import linregress

#  Compute the best fit calibration curve
fit_params = linregress(cal_curve['conc_mM'], cal_curve['area'])
slope = fit_params[0]
intercept = fit_params[1]

# Plot the fit over the data
conc_range = np.linspace(0, 8, 100)
cal = intercept + slope * conc_range
plt.plot(cal_curve['conc_mM'], cal_curve['area'], 'o', markersize=10, label='measurement')
plt.plot(conc_range, cal, '-', color='k', label='fit')
plt.xlabel('lactose concentration [mM]')
plt.ylabel('integrated peak area [a.u.]')
plt.legend()
[6]:
<matplotlib.legend.Legend at 0x14fca6860>
../_images/tutorials_calibration_curve_13_1.png

Testing the Calibration

We also have a set of lactose solutions with known concentrations that we did not use when fitting the calibration curve. We can use the .map_peaks method when quantifying these test data to see if we get the same concentrations out that we know the peaks represent.

[7]:
# Load the test data
files = glob.glob('data/test/lactose*.csv')

# Instantiate a dataframe to store the results
test_data = pd.DataFrame([])

# Iterate through each file and quantify the peaks
for f in files:
    df = load_chromatogram(f, cols=['time', 'signal'])
    chrom = Chromatogram(df)
    peaks =  chrom.fit_peaks(verbose=False)

    # Now, use the map_peaks method to quantify the signal based off our
    # calibration curve
    mapping = {'lactose': {'retention_time': 13.56,
                           'slope': slope,
                           'intercept': intercept,
                           'unit': 'mM'}}
    measured_conc = chrom.map_peaks(params=mapping)

    # Parse the known concentration from the file name
    known_conc = float(f.split('_')[-1][:-4])

    # Add it to the dataframe and concatenate
    measured_conc['true_conc_mM']  = known_conc
    test_data  = pd.concat([test_data, measured_conc])
test_data
[7]:
retention_time scale skew amplitude area signal_maximum peak_id compound concentration unit true_conc_mM
0 13.56 0.281437 1.664340 10715.395193 1.285847e+06 21414.173961 1 lactose 8.118513 mM 8.0
0 13.56 0.280571 1.649106 5316.475249 6.379770e+05 10627.640549 1 lactose 3.981019 mM 4.0
0 13.56 0.279941 1.638944 2600.265354 3.120318e+05 5201.011260 1 lactose 1.899435 mM 2.0
0 13.56 0.279515 1.636055 2154.007783 2.584809e+05 4312.738273 1 lactose 1.557443 mM 1.5

It looks like it’s in good agreement! We can confirm this by plotting the measured value versus the true value. If in agreement, everything should fall on the identity line.

[8]:
# Plot the measured versus known value of the test set
plt.plot(test_data['true_conc_mM'], test_data['concentration'], 'o',
         markersize=10, label='measurements')
plt.plot([0, 10], [0, 10], 'k--', label='equivalence')
plt.xlabel('true lactose concentration [mM]')
plt.ylabel('measured lactose concentration [mM]')
plt.legend()

[8]:
<matplotlib.legend.Legend at 0x14fd33f40>
../_images/tutorials_calibration_curve_17_1.png

© Griffin Chure, 2024. This notebook and the code within are released under a Creative-Commons CC-BY 4.0 and GPLv3 license, respectively.