COmbinatorial PEptide POoling Design for TCR specificity

CopepodTCR is a tool for the design of combinatorial peptide pooling schemes for TCR speficity assays.

CopepodTCR guides the user through all stages of the experiment design and interpetation:

  • selection of parameters for the experiment (Balance check)
  • examination of peptides (Overlap check)
  • generation of pooling scheme (Pooling scheme)
  • generation of punched cards of efficient peptide mixing (STL files)
  • results interpetation using hierarchical Bayesian model (Activated pools)

Cite as

Kovaleva V. A., et al. "copepodTCR: Identification of Antigen-Specific T Cell Receptors with combinatorial peptide pooling." bioRxiv (2023): 2023-11.

Or use the following BibTeX entry:

@article{
	kovaleva2023copepodtcr,
	title        = {copepodTCR: Identification of Antigen-Specific T Cell Receptors with combinatorial peptide pooling},
	author       = {Kovaleva, Vasilisa A and Pattinson, David J and Barton, Carl and Chapin, Sarah R and Minerva, Anastasia A and Richards, Katherine A and Sant, Andrea J and Thomas, Paul G and Pogorelyy, Mikhail V and Meyer, Hannah V},
	year         = 2023,
	journal      = {bioRxiv},
	publisher    = {Cold Spring Harbor Laboratory},
	pages        = {2023--11}
}

Description

Identification of a cognate peptide for TCR of interest is crucial for biomedical research. Current computational efforts for TCR specificity did not produce reliable tool, so testing of large peptide libraries against a T cell bearing TCR of interest remains the main approach in the field.

Testing each peptide against a TCR is reagent- and time-consuming. More efficient approach is peptide mixing in pools according to a combinatorial scheme. Each peptide is added to a unique subset of pools ("address"), which leads to matching activation patterns in T cells stimulated by combinatorial pools.

Efficient combinatorial peptide pooling (CPP) scheme must implement:

  • use of overlapping peptide in the assay to cover the whole protein space;
  • error detection.

Here, we present CopepodTCR -- a tool for design of CPP schemes. CopepodTCR detects experimental errors and, coupled with a hierarchical Bayesian model for unbiased results interpretation, identifies the response-eliciting peptide for a TCR of interest out of hundreds of peptides tested using a simple experimental set-up.

The experimental setup starts with defining the protein/proteome of interest and obtaining synthetic peptides tiling its space.

This set of peptides, containing an overlap of a constant length, is entered into copepodTCR. It creates a peptide pooling scheme and, optionally, provides the pipetting scheme to generate the desired pools as either 384-well plate layouts or punch card models which could be further 3D printed and overlay the physical plate or pipette tip box.

Following this scheme, the peptides are mixed, and the resulting peptide pools tested in a T cell activation assay. The activation of T cells is measured for each peptide pool (experimental layout, activation assay, and experimental read out) with the assay of choice, such as flow cytometry- or microscopy-based activation assays detecting transcription and translation of a reporter gene.

The experimental measurements for each pool are entered back into copepodTCR which employs a Bayesian mixture model to identify activated pools. Based on the activation patterns, it returns the set of overlapping peptides leading to T cell activation (Results interpretation).

Branch-and-Bound algorithm

For detailed description of the algorithm and its development refer to Kovaleva et al (2023).

The Branch-and-Bound part of copepodTCR generates a peptide mixing scheme by optimizing the peptide distribution into a predefined number of pools n (in Figure n=6). The distribution of each peptide is encoded into an address (edges in the graph), which connect nodes in the graph (circles) that represent a union between two addresses. The peptide mixing scheme constitutes the path through these unions and connecting addresses that ensure a balanced pool design.

Activation model

For detailed description of the model, refer to Kovaleva et al (2023).

To accurately interpret results of T cell activation assay, copepodTCR utilizes a Bayesian mixture model.

The model considers the activation signal to be drawn from two distinct distributions arising from the activated and non-activated pools and provides the probabilities that the value was drawn from either distribution as a criterion for pool classification.

CopepodTCR Python package

Can be installed with

pip install copepodTCR

or

conda install -c vasilisa.kovaleva copepodTCR

Requirements

Required packages should be installed simulataneously with the copepodTCR packages.

But if they were not, here is the list of requirements:

	pip install "pandas>=1.5.3"
	pip install "numpy>=1.23.5"
	pip install "cvxpy>=1.3.2"
	pip install "trimesh>=3.23.5"
	pip install "trimesh>=3.23.5"
	pip install "pymc>=5.9.2"
	pip install "arviz>=0.16.1"

Usage

The tool consists of four distinctive parts, each of which can be used separately. These four parts constitute the stages of the experiment design and implementation.

Balance check

First, the appropriate number of pools and peptide occurrence (number of pools per one peptide) should be selected.

Peptide occurrence affects number of peptides in one pool, and therefore too high peptide occurrence may lead to higher dilution of a single peptide. In Kovaleva et al (2023), we were able to detect signal with the cognate peptide diluted to 1.58μM.

Peptide dilution can be mitigated by increasing number of pools, however, it might increase the complexity of experimental set-up. Consequently, these parameters should be chosen carefully.

To assist with this process, copepodTCR provides the user with possible peptide occurrence values based on given number of pools and number of tested peptides. Also, copepodTCR calculates the resulting distribution of peptides in pools and compares this distribution with perfect scenario, where number of peptides per pools is completely balanced.

Overlap check

Inconsistent overlap length in the list of tested peptides can lead to imprecise results interpretation. In copepodTCR, the user can check overlap consistency across the entire list of tested peptides.

Upon peptides entry, copepodTCR returns the list of present overlap lengths and corresponding number of peptide pairs. If two overlap lengths are observed, it also returns peptide pairs with all overlap lengths differing from the most common one.

Pooling scheme

Upon parameters selection and peptides check, the user can enter them into copepodTCR and get a peptide pooling scheme.

CopepodTCR returns three tables:

  • peptide pooling scheme pool-wise (i.e. the table with peptides in each pool)
  • peptide pooling scheme peptide-wise (i.e. the table with pools for each peptide)
  • simulation table

During simulation step, copepodTCR simulates results of the experiment for any possible epitope of the provided length and returns a table with every possible epitope and all pools where this epitope is present.

The function has two regimes: with and without drop-outs. Without drop-outs, it returns a table as there were no experimental mistakes, and there were zero erroneous non-activated pools. With drop-outs, it returns a table with all possible mistakes (i.e. all possible erroneous non-activated pools). This option needs time to be generated, usually several minutes, although it depends on the number of peptides and peptide occurrence.

Without drop-outs

Simulation table without drop-outs looks as follows:

  • Peptide — peptide sequence
  • Address — pool indices where this peptide should be added
  • Epitope — checked epitope from this peptide
  • Act pools — list with pool indices where this epitope is present
  • # of pools — number of pools where this epitope is present
  • # of epitopes — number of epitopes that are present in the same pools (= number of possible epitopes upon activation of such pools)
  • # of peptides — number of peptides in which there are epitopes that are present in the same pools (= number of possible peptides upon activation of such pools)
  • Remained — only upon regime=”with dropouts”, list of pools left after a drop-out
  • # of lost — only upon regime=”with dropouts”, number of dropped pools
  • Right peptide — True or False, whether the peptide is present in the list of possible peptides
  • Right epitope — True or False, whether the peptide is present in the list of possible peptides

To interpret the results of the experiment, user can find all rows where the Act Pools column contains respective combination of activated pools. This way, all possible peptides and epitopes leading to the activation of such a combination of pools are obtained.

Without drop-outs, # of peptides should be equal to number peptides sharing an epitope. For end-position peptides, it would be less. However, if for some epitopes # of peptides is bigger, than these peptides have bigger overlap than others.

If observed combination of activated pools is not present in the table, simulation with drop-outs can be checked.

With drop-outs

It would look as follows:

Right peptide and Right epitope columns check whether the scheme remains interpretable upon pool drop-out. Right peptide should always contain the value “True”; otherwise, recovery was unsuccessful.

Results of the simulation can be assesed with help of generated histplot.

  • it shows # of peptides for each combination of activated pools; generally this number should correspond to number of peptides sharing the same epitope; two end-position peptides (the first and the last ones) would have lesser number of peptides; all other results mean inconsistent overlap length across peptide list;
  • it shows whether there are epitopes erroneously recovered after a drop-out (Right peptide = False).

STL files

To avoid mixing pools manually, the user might print special punched cards using files with their 3D models.

Each card represents one pool, with holes positioned at the coordinates corresponding to the peptides designated for addition to that pool.

Produced punched card is placed on the empty tip box, and open holes are filled with tips. This patterned pippette tip array is used to transfer peptides from the plate to the corresponding pool.

The user can adjust parameters to fit their plate:

  • number of rows — number of rows in the plate
  • number of columns — number of columns in the plate
  • length — length of the plate (in mm)
  • width — width of the plate (in mm)
  • thickness — thickness of the plate (in mm)
  • hole radius — diameter of the well divided by 2
  • X offset — margin along the X axis for the A1 well, in mm
  • Y offset — margin along the Y axis for the A1 well, in mm
  • well spacing — distance between wells, in mm

To better orient tip pattern, the user can add the last hole (with coordinates m-k). It should be used only in absence of peptide in the corresponding well.

Activated pools

The experiment can be conducted using flow cytometry or microscopy.

After the experiment, copepodTCR can help with data analyzation. The primitive version of experiment interpetation is decscribed in section Pooling scheme, in the explanation of simulation step.

But also user can analyze the results using Bayesian Mixture model. This model returns the probability of each pool being activated (green) or not (gray).

To enter the results of the experiment in the model, the user needs to make a CSV table with two columns: Pool and Percentage. Experiment can be conducted with replicas, then all replicas of one pool should have the same name in Pool column. Percentage is a percentage of activated T cells in a given pool (in case of microscopy, the user can divide number of activated T cells per well by total number of activated T cells in the experiment).

Then the user needs to enter the table with peptide addresses (produced during Pooling scheme step), experiment read out (CSV table), number of pools, peptide occurrence used in the experiment, and expected epitope length.

After fitting the data to the model (it might take some time), copepodTCR returns list of activated pools and peptides responsible for their activation.

Selection of parameters

Number of pools
Number of peptides
Possible peptide occurence across pools
Resulting balance

          

Check for overlap length of the peptides

Choose a file with peptides to upload:
Or enter them here:

        

Get a peptide pooling scheme

Number of pools
Peptide occurrence across pools
Expected epitope length
Select regime
Choose a file with peptides to upload:
Or enter them here:
Download results

Download zip file with STL

Choose a file with pools:
Choose a file with peptide arrangement in the plate:
Number of rows
Number of columns
Length (mm)
Width (mm)
Thickness (mm)
Hole radius (mm)
X offset (mm)
Y offset (mm)
Well spacing (mm)
Add well with coordinates m-k
Download zip file with STL files

Analyze results

Choose a file with simulation:
Experiment read out
Number of pools
Download results