Jupyter Notebook

Manage a cell type registry#

Cell types classify cells based on public and private knowledge from studying transcription, morphology, function & other properties. Established cell types have well-characterized markers and properties; however, cell subtypes and states are continuously being discovered, refined and better understood.

In this notebook, we manage an immune cell type registry from CellTypist, a computational tool used for cell type classification in scRNA-seq data. We’ll walk you through the following steps:

  1. Create a cell type registry seeded from cell types supported by CellTypist.

  2. Use CellTypist to classify cell types of a previously unannotated dataset and track the dataset with LaminDB.

  3. Demonstrate how datasets are queryable by cell types using LaminDB.

Setup#

To run this notebook, you need to load a LaminDB instance that has the bionty schema mounted.

Here, we’ll create a test instance (skip if you’d like to run it using your own instance):

!lamin init --storage ./celltypist --schema bionty
Hide code cell output
💡 creating schemas: core==0.46.3 bionty==0.30.2 
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-30 13:54:07)
✅ saved: Storage(id='luStdUTJ', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist', type='local', updated_at=2023-08-30 13:54:07, created_by_id='DzTjkKse')
✅ loaded instance: testuser1/celltypist
💡 did not register local instance on hub (if you want, call `lamin register`)

Hide code cell content
# filter warnings from celltypist
import warnings

warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")
import lamindb as ln
import lnschema_bionty as lb
import celltypist
import pandas as pd

lb.settings.species = "human"  # globally set species
✅ loaded instance: testuser1/celltypist (lamindb 0.51.2)


ln.track()
💡 notebook imports: celltypist==1.6.0 lamindb==0.51.2 lnschema_bionty==0.30.2 pandas==2.1.0
✅ saved: Transform(id='s5mkN5NQ1ttIz8', name='Manage a cell type registry', short_name='celltypist', version='0', type=notebook, updated_at=2023-08-30 13:54:12, created_by_id='DzTjkKse')
✅ saved: Run(id='UGCjQ982CqMSOPegtRHX', run_at=2023-08-30 13:54:12, transform_id='s5mkN5NQ1ttIz8', created_by_id='DzTjkKse')

For a start, let’s take a look at the public Cell Ontology.

celltype_bt = lb.CellType.bionty()  # equals to bionty.CellType()
Hide code cell output


celltype_bt
CellType
Species: all
Source: cl, 2023-04-20
#terms: 2862

📖 CellType.df(): ontology reference table
🔎 CellType.lookup(): autocompletion of terms
🎯 CellType.search(): free text search of terms
✅ CellType.validate(): strictly validate values
🧐 CellType.inspect(): full inspection of values
👽 CellType.standardize(): convert to standardized names
🪜 CellType.diff(): difference between two versions
🔗 CellType.ontology: Pronto.Ontology object

Create an in-house registry of CellTypist terms#

Fetch CellTypist’s immune cell encyclopedia#

As a first step we will read in CellTypist’s immune cell encyclopedia

description = "CellTypist Pan Immune Atlas v2: basic cell type information"
celltypist_source_v2_url = "https://github.com/Teichlab/celltypist_wiki/raw/main/atlases/Pan_Immune_CellTypist/v2/tables/Basic_celltype_information.xlsx"

# our source data
celltypist_file = ln.File.filter(description=description).one_or_none()

if celltypist_file is None:
    celltypist_df = pd.read_excel(celltypist_source_v2_url)
    celltypist_file = ln.File(celltypist_df).save()
else:
    celltypist_df = celltypist_file.load().head()
💡 file will be copied to default storage upon `save()` with key `None` ('.lamindb/pr46UxJnerq0OlC153ra.parquet')
💡 data is a dataframe, consider using .from_df() to link column names as features
✅ storing file 'pr46UxJnerq0OlC153ra' at '.lamindb/pr46UxJnerq0OlC153ra.parquet'

It provides an ontology_id of the public Cell Ontology for the majority of records.

celltypist_df.head()
High-hierarchy cell types Low-hierarchy cell types Description Cell Ontology ID Curated markers
0 B cells B cells B lymphocytes with diverse cell surface immuno... CL:0000236 CD79A, MS4A1, CD19
1 B cells Follicular B cells resting mature B lymphocytes found in the prim... CL:0000843 CXCR5, TNFRSF13B, CD22
2 B cells Proliferative germinal center B cells proliferating germinal center B cells CL:0000844 MKI67, SUGCT, AICDA
3 B cells Germinal center B cells proliferating mature B cells that undergo soma... CL:0000844 POU2AF1, CD40, SUGCT
4 B cells Memory B cells long-lived mature B lymphocytes which are form... CL:0000787 CR2, CD27, MS4A1

The “Cell Ontology ID” is associated with multiple “Low-hierarchy cell types”:

celltypist_df.set_index(["Cell Ontology ID", "Low-hierarchy cell types"]).head(10)
High-hierarchy cell types Description Curated markers
Cell Ontology ID Low-hierarchy cell types
CL:0000236 B cells B cells B lymphocytes with diverse cell surface immuno... CD79A, MS4A1, CD19
CL:0000843 Follicular B cells B cells resting mature B lymphocytes found in the prim... CXCR5, TNFRSF13B, CD22
CL:0000844 Proliferative germinal center B cells B cells proliferating germinal center B cells MKI67, SUGCT, AICDA
Germinal center B cells B cells proliferating mature B cells that undergo soma... POU2AF1, CD40, SUGCT
CL:0000787 Memory B cells B cells long-lived mature B lymphocytes which are form... CR2, CD27, MS4A1
Age-associated B cells B cells CD11c+ T-bet+ memory B cells associated with a... FCRL2, ITGAX, TBX21
CL:0000788 Naive B cells B cells mature B lymphocytes which express cell-surfac... IGHM, IGHD, TCL1A
CL:0000818 Transitional B cells B cells immature B cell precursors in the bone marrow ... CD24, MYO1C, MS4A1
CL:0000817 Large pre-B cells B-cell lineage proliferative B lymphocyte precursors derived ... MME, CD24, MKI67
Small pre-B cells B-cell lineage non-proliferative B lymphocyte precursors deri... MME, CD24, IGLL5

Validate terms with the public Cell Ontology#

For any cell type record that can be validated against the public Cell Ontology, we’d like to ensure that it’s actually validated.

This will avoid that we’ll refer to the same cell type with different identifiers.

Let’s see how well the Cell Typist reference data can be validated.

All Celltypist labeled ontology IDs are validated using the public Cell Ontology:

celltype_bt.inspect(celltypist_df["Cell Ontology ID"], celltype_bt.ontology_id);
68 terms (100.00%) are validated for ontology_id

However, when inspecting the names, most of them don’t validate:

celltype_bt.inspect(celltypist_df["Low-hierarchy cell types"], celltype_bt.name);
1 term (1.00%) is validated for name
97 terms (99.00%) are not validated for name: B cells, Follicular B cells, Proliferative germinal center B cells, Germinal center B cells, Memory B cells, Age-associated B cells, Naive B cells, Transitional B cells, Large pre-B cells, Small pre-B cells, Pre-pro-B cells, Pro-B cells, Cycling B cells, Cycling DCs, Cycling gamma-delta T cells, Cycling monocytes, Cycling NK cells, Cycling T cells, DC, DC1, ...
💡    detected 9 terms with synonyms: DC1, DC2, ETP, CMP, ELP, GMP, ILC2, ILC3, pDC
💡 →  standardize terms via .standardize()

A search tells us that terms that are named in plural in Cell Typist occur with a name in singular in the Cell Ontology:

celltypist_df["Low-hierarchy cell types"][0]
'B cells'
celltype_bt.search(celltypist_df["Low-hierarchy cell types"][0]).head(2)
ontology_id definition synonyms parents __agg__ __ratio__
name
B cell CL:0000236 A Lymphocyte Of B Lineage That Is Capable Of B... B-cell|B lymphocyte|B-lymphocyte [CL:0000945] b cell 92.307692
B-1 B cell CL:0000819 A B Cell Of Distinct Lineage And Surface Marke... B1 B cell|B-1 B lymphocyte|B1 cell|B-1 B-cell|... [CL:0000785] b-1 b cell 85.714286

Let’s try to strip "s" and inspect if more names are now validated. Yes, there are!

celltype_bt.inspect(
    [i.rstrip("s") for i in celltypist_df["Low-hierarchy cell types"]],
    celltype_bt.name,
);
5 terms (5.10%) are validated for name
93 terms (94.90%) are not validated for name: Follicular B cell, Proliferative germinal center B cell, Germinal center B cell, Memory B cell, Age-associated B cell, Naive B cell, Transitional B cell, Large pre-B cell, Small pre-B cell, Pre-pro-B cell, Pro-B cell, Cycling B cell, Cycling DC, Cycling gamma-delta T cell, Cycling monocyte, Cycling NK cell, Cycling T cell, DC, DC1, DC2, ...
💡    detected 34 terms with inconsistent casing/synonyms: Follicular B cell, Germinal center B cell, Memory B cell, Naive B cell, Transitional B cell, Small pre-B cell, Pro-B cell, DC1, DC2, Endothelial cell, Epithelial cell, Erythrocyte, ETP, Fibroblast, Granulocyte, Neutrophil, CMP, ELP, GMP, ILC2, ...
💡 →  standardize terms via .standardize()

Every “low-hierarchy cell type” has an ontology id and most “high-hierarchy cell types” also appear as “low-hierarchy cell types” in the Cell Typist table. Four, however, don’t, and therefore don’t have an ontology ID.

high_terms = celltypist_df["High-hierarchy cell types"].unique()
low_terms = celltypist_df["Low-hierarchy cell types"].unique()

high_terms_nonval = set(high_terms).difference(low_terms)
high_terms_nonval
{'B-cell lineage', 'Cycling cells', 'Erythroid', 'T cells'}

Register CellTypist records in LaminDB#

Let’s first add the “High-hierarchy cell types” as a column "parent".

This enables LaminDB to populate the parents and children fields, which will enable you to query for hierarchical relationships.

celltypist_df["parent"] = celltypist_df.pop("High-hierarchy cell types")

# if high and low terms are the same, no parents
celltypist_df.loc[
    (celltypist_df["parent"] == celltypist_df["Low-hierarchy cell types"]), "parent"
] = None

# rename columns, drop markers
celltypist_df.drop(columns=["Curated markers"], inplace=True)
celltypist_df.rename(
    columns={"Low-hierarchy cell types": "name", "Cell Ontology ID": "ontology_id"},
    inplace=True,
)
celltypist_df.columns = celltypist_df.columns.str.lower()
celltypist_df.head(2)
name description ontology_id parent
0 B cells B lymphocytes with diverse cell surface immuno... CL:0000236 None
1 Follicular B cells resting mature B lymphocytes found in the prim... CL:0000843 B cells

Now, let’s create records from the public ontology:

public_records = lb.CellType.from_values(
    celltypist_df.ontology_id, lb.CellType.ontology_id
)
✅ created 68 CellType records from Bionty matching ontology_id: CL:0000236, CL:0000843, CL:0000844, CL:0000787, CL:0000788, CL:0000818, CL:0000817, CL:0002046, CL:0000826, CL:0001056, CL:0000798, CL:0000576, CL:0000623, CL:0000084, CL:0000990, CL:0000840, CL:0001029, CL:0002489, CL:0000809, CL:0000553, ...

Let’s now amend public ontology records so that they maintain additional annotations that Cell Typist might have.

records_names = {}
public_records_dict = {r.ontology_id: r for r in public_records}

for _, row in celltypist_df.iterrows():
    name = row["name"]
    ontology_id = row["ontology_id"]
    public_record = public_records_dict[ontology_id]

    # if both name and ontology_id match public record, use public record
    if name.lower() == public_record.name.lower():
        records_names[name] = public_record
        continue
    else:  # when ontology_id matches the public record and name doesn't match
        # if singular form of the Celltypist name matches public name
        if name.lower().rstrip("s") == public_record.name.lower():
            # add the Celltypist name to the synonyms of the public ontology record
            public_record.add_synonym(name)
            records_names[name] = public_record
            continue
        if public_record.synonyms is not None:
            synonyms = [s.lower() for s in public_record.synonyms.split("|")]
            # if any of the public matches celltypist name
            if any(
                [
                    i.lower() in {name.lower(), name.lower().rstrip("s")}
                    for i in synonyms
                ]
            ):
                # add the Celltypist name to the synonyms of the public ontology record
                public_record.add_synonym(name)
                records_names[name] = public_record
                continue

        # create a record only based on Celltypist metadata
        records_names[name] = lb.CellType(
            name=name, ontology_id=ontology_id, description=row.description
        )

You can see certain records are created by adding the Celltypist name to the synonyms of the public record:

records_names["GMP"]
CellType(id='f5eAsw0p', name='granulocyte monocyte progenitor cell', ontology_id='CL:0000557', synonyms='granulocyte-macrophage progenitor|granulocyte/monocyte progenitor|GMP|CFU-GM|granulocyte/monocyte precursor|colony forming unit granulocyte macrophage', description='A Hematopoietic Progenitor Cell That Is Committed To The Granulocyte And Monocyte Lineages. These Cells Are Cd123-Positive, And Do Not Express Gata1 Or Gata2 But Do Express C/Ebpa, And Pu.1.', bionty_source_id='UHv4', created_by_id='DzTjkKse')

Other records are created based on Celltypist metadata:

records_names["Age-associated B cells"]
CellType(id='00ieV0IG', name='Age-associated B cells', ontology_id='CL:0000787', description='CD11c+ T-bet+ memory B cells associated with autoimmunity and aging', created_by_id='DzTjkKse')

Let’s save them to our database:

records = set(records_names.values())

ln.save(records)
Hide code cell output
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='NJ07Q1hX', name='plasmablast', ontology_id='CL:0000980', synonyms='CD27-positive, CD38-positive, CD20-negative B cell|Plasmablasts', description='An Activated Mature (Naive Or Memory) B Cell That Is Secreting Immunoglobulin, Typified By Being Cd27-Positive, Cd38-Positive, Cd138-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000785
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-08-30 13:54:14, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0001201
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='CIS4VJI0', name='B cell, CD19-positive', ontology_id='CL:0001201', synonyms='CD19+ B cell|B lymphocyte, CD19-positive|B-lymphocyte, CD19-positive|CD19-positive B cell|B-cell, CD19-positive', description='A B Cell That Is Cd19-Positive.', updated_at=2023-08-30 13:54:15, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='RSjQq98q', name='pro-B cell', ontology_id='CL:0000826', synonyms='progenitor B cell|progenitor B-lymphocyte|pro-B lymphocyte|pro-B-cell|progenitor B-cell|pro-B-lymphocyte|Pro-B cells|progenitor B lymphocyte', description='A Progenitor Cell Of The B Cell Lineage, With Some Lineage Specific Activity Such As Early Stages Of Recombination Of B Cell Receptor Genes, But Not Yet Fully Committed To The B Cell Lineage Until The Expression Of Pax5 Occurs.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000838
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='vQ9N8BKH', name='lymphoid lineage restricted progenitor cell', ontology_id='CL:0000838', description='A Progenitor Cell Restricted To The Lymphoid Lineage.', updated_at=2023-08-30 13:54:15, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002031
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='xl6RtpG9', name='hematopoietic lineage restricted progenitor cell', ontology_id='CL:0002031', description='A Hematopoietic Progenitor Cell That Is Capable Of Developing Into Only One Lineage Of Hematopoietic Cells.', updated_at=2023-08-30 13:54:16, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: CL:0008001, CL:0000988
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-08-30 13:54:17, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: CL:0002371, CL:0000548
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-08-30 13:54:17, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000548
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000003
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-08-30 13:54:18, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000000
💡 also saving parents of CellType(id='H0taCt24', name='animal cell', ontology_id='CL:0000548', synonyms='metazoan cell', description='A Native Cell That Is Part Of Some Metazoa.', updated_at=2023-08-30 13:54:17, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000255
💡 also saving parents of CellType(id='0d3ym06W', name='hematopoietic precursor cell', ontology_id='CL:0008001', description='Any Hematopoietic Cell That Is A Precursor Of Some Other Hematopoietic Cell Type.', updated_at=2023-08-30 13:54:17, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='b5k0suF0', name='erythrocyte', ontology_id='CL:0000232', synonyms='Erythrocytes|RBC|red blood cell', description='A Red Blood Cell. In Mammals, Mature Erythrocytes Are Biconcave Disks Containing Hemoglobin Whose Function Is To Transport Oxygen.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='ppLUhJWx', name='non-classical monocyte', ontology_id='CL:0000875', synonyms='patrolling monocyte|resident monocyte|Non-classical monocytes', description='A Type Of Monocyte Characterized By Low Expression Of Ccr2, Low Responsiveness To Monocyte Chemoattractant Ccl2/Mcp1, Low Phagocytic Activity, And Decrease Size Relative To Classical Monocytes, But Increased Co-Stimulatory Activity. May Also Play A Role In Tissue Repair.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Iywg7lUq', name='early lymphoid progenitor', ontology_id='CL:0000936', synonyms='lymphoid-primed multipotent progenitor|LMPP|ELP', description='A Lymphoid Progenitor Cell That Is Found In Bone Marrow, Gives Rise To B Cells, T Cells, Natural Killer Cells And Dendritic Cells, And Has The Phenotype Lin-Negative, Kit-Positive, Sca-1-Positive, Flt3-Positive, Cd34-Positive, Cd150 Negative, And Glya-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B cells|B-lymphocyte|B lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000945
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Z0yFV7vU', name='lymphocyte of B lineage', ontology_id='CL:0000945', description='A Lymphocyte Of B Lineage With The Commitment To Express An Immunoglobulin Complex.', updated_at=2023-08-30 13:54:19, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000542
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='g8slxY8X', name='lymphocyte', ontology_id='CL:0000542', description='A Lymphocyte Is A Leukocyte Commonly Found In The Blood And Lymph That Has The Characteristics Of A Large Nucleus, A Neutral Staining Cytoplasm, And Prominent Heterochromatin.', updated_at=2023-08-30 13:54:20, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000738
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-08-30 13:54:21, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='nd6Qaf38', name='Hofbauer cell', ontology_id='CL:3000001', synonyms='Hofbauer cells', description='Oval Eosinophilic Histiocytes With Granules And Vacuoles Found In Placenta, Which Are Of Mesenchymal Origin, In Mesoderm Of The Chorionic Villus, Particularly Numerous In Early Pregnancy.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='4fOuOYtl', name='endothelial cell', ontology_id='CL:0000115', synonyms='endotheliocyte|Endothelial cells', description='An Endothelial Cell Comprises The Outermost Layer Or Lining Of Anatomical Structures And Can Be Squamous Or Cuboidal. In Mammals, Endothelial Cell Has Vimentin Filaments And Is Derived From The Mesoderm.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 2 CellType records from Bionty matching ontology_id: CL:0000213, CL:0002078
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='AHV57RuN', name='lining cell', ontology_id='CL:0000213', synonyms='boundary cell', description='A Cell Within An Epithelial Cell Sheet Whose Main Function Is To Act As An Internal Or External Covering For A Tissue Or An Organism.', updated_at=2023-08-30 13:54:21, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000215
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='gON03kRx', name='barrier cell', ontology_id='CL:0000215', description='A Cell Whose Primary Function Is To Prevent The Transport Of Stuff Across Compartments.', updated_at=2023-08-30 13:54:22, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='nGEtVlKq', name='meso-epithelial cell', ontology_id='CL:0002078', synonyms='epithelial mesenchymal cell', description='Epithelial Cell Derived From Mesoderm Or Mesenchyme.', updated_at=2023-08-30 13:54:21, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='xfxkvliE', name='granulocyte', ontology_id='CL:0000094', synonyms='granular leucocyte|Granulocytes|polymorphonuclear leukocyte|granular leukocyte', description='A Leukocyte With Abundant Granules In The Cytoplasm.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000766
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='40onq0tm', name='myeloid leukocyte', ontology_id='CL:0000766', description='A Cell Of The Monocyte, Granulocyte, Or Mast Cell Lineage.', updated_at=2023-08-30 13:54:23, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='ePsFBu6n', name='transitional stage B cell', ontology_id='CL:0000818', synonyms='transitional stage B-lymphocyte|transitional B cell|transitional stage B-cell|Transitional B cells|transitional stage B lymphocyte', description='An Immature B Cell Of An Intermediate Stage Between The Pre-B Cell Stage And The Mature Naive Stage With The Phenotype Surface Igm-Positive And Cd19-Positive, And Are Subject To The Process Of B Cell Selection. A Transitional B Cell Migrates From The Bone Marrow Into The Peripheral Circulation, And Then To The Spleen.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='8lTrmDbK', name='neutrophil', ontology_id='CL:0000775', synonyms='neutrocyte|neutrophil leukocyte|neutrophilic leucocyte|neutrophilic leukocyte|neutrophil leucocyte|Neutrophils', description='Any Of The Immature Or Mature Forms Of A Granular Leukocyte That In Its Mature Form Has A Nucleus With Three To Five Lobes Connected By Slender Threads Of Chromatin, And Cytoplasm Containing Fine Inconspicuous Granules And Stainable By Neutral Dyes.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Q2BH279Q', name='classical monocyte', ontology_id='CL:0000860', synonyms='inflammatory monocyte|Classical monocytes', description='A Monocyte That Responds Rapidly To Microbial Stimuli By Secreting Cytokines And Antimicrobial Factors And Which Is Characterized By High Expression Of Ccr2 In Both Rodents And Humans, Negative For The Lineage Markers Cd3, Cd19, And Cd20, And Of Larger Size Than Non-Classical Monocytes.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='YzV7Qgmj', name='monocyte', ontology_id='CL:0000576', synonyms='Monocytes', description='Myeloid Mononuclear Recirculating Leukocyte That Can Act As A Precursor Of Tissue Macrophages, Osteoclasts And Some Populations Of Tissue Dendritic Cells.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='3rJgLble', name='conventional dendritic cell', ontology_id='CL:0000990', synonyms='dendritic reticular cell|DC1|type 1 DC|cDC', description='Conventional Dendritic Cell Is A Dendritic Cell That Is Cd11C-High.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000451
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-08-30 13:54:23, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='l0R9X3Bs', name='promyelocyte', ontology_id='CL:0000836', synonyms='Promyelocytes', description='A Precursor In The Granulocytic Series, Being A Cell Intermediate In Development Between A Myeloblast And Myelocyte, That Has Distinct Nucleoli, A Nuclear-To-Cytoplasmic Ratio Of 5:1 To 3:1, And Containing A Few Primary Cytoplasmic Granules. Markers For This Cell Are Fucosyltransferase Fut4-Positive, Cd33-Positive, Integrin Alpha-M-Negative, Low Affinity Immunoglobulin Gamma Fc Region Receptor Iii-Negative, And Cd24-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002191
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='odstSt5D', name='granulocytopoietic cell', ontology_id='CL:0002191', description='A Cell Involved In The Formation Of A Granulocyte.', updated_at=2023-08-30 13:54:24, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000839
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='Q0v4wWyZ', name='myeloid lineage restricted progenitor cell', ontology_id='CL:0000839', description='A Progenitor Cell Restricted To The Myeloid Lineage.', updated_at=2023-08-30 13:54:24, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='f5eAsw0p', name='granulocyte monocyte progenitor cell', ontology_id='CL:0000557', synonyms='granulocyte-macrophage progenitor|granulocyte/monocyte progenitor|GMP|CFU-GM|granulocyte/monocyte precursor|colony forming unit granulocyte macrophage', description='A Hematopoietic Progenitor Cell That Is Committed To The Granulocyte And Monocyte Lineages. These Cells Are Cd123-Positive, And Do Not Express Gata1 Or Gata2 But Do Express C/Ebpa, And Pu.1.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002032
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='JYQl3RX8', name='hematopoietic oligopotent progenitor cell', ontology_id='CL:0002032', description='A Hematopoietic Oligopotent Progenitor Cell That Has The Ability To Differentiate Into Limited Cell Types But Lacks Lineage Cell Markers And Self Renewal Capabilities.', updated_at=2023-08-30 13:54:25, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='loo3Xanl', name='common myeloid progenitor', ontology_id='CL:0000049', synonyms='CMP|common myeloid precursor', description='A Progenitor Cell Committed To Myeloid Lineage, Including The Megakaryocyte And Erythroid Lineages.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='YN0gzDt3', name='Kupffer cell', ontology_id='CL:0000091', synonyms='macrophagocytus stellatus|Kupffer cells|von Kupffer cell|hepatic macrophage|littoral cell of hepatic sinusoid|liver macrophage|stellate cell of von Kupffer', description='A Tissue-Resident Macrophage Of The Reticuloendothelial System Found On The Luminal Surface Of The Hepatic Sinusoids Involved In Erythrocyte Clearance. Markers Include F4/80+, Cd11B-Low, Cd68-Positive, Sialoadhesin-Positive, Cd163/Srcr-Positive. Irregular, With Long Processes Including Lamellipodia Extending Into The Sinusoid Lumen, Have Flattened Nucleus With Cytoplasm Containing Characteristic Invaginations Of The Plasma Membrane (Vermiform Bodies); Lie Within The Sinusoid Lumen Attached To The Endothelial Surface; Derived From The Bone Marrow, Form A Major Part Of The Body'S Mononuclear Phagocyte System.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000864
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='pXnRaLwJ', name='tissue-resident macrophage', ontology_id='CL:0000864', synonyms='resting histiocyte|fixed macrophage', description='A Macrophage Constitutively Resident In A Particular Tissue Under Non-Inflammatory Conditions, And Capable Of Phagocytosing A Variety Of Extracellular Particulate Material, Including Immune Complexes, Microorganisms, And Dead Cells.', updated_at=2023-08-30 13:54:26, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='Z7uMAWUF', name='regulatory T cell', ontology_id='CL:0000815', synonyms='regulatory T-cell|regulatory T lymphocyte|Regulatory T cells|regulatory T-lymphocyte|Treg', description='A T Cell Which Regulates Overall Immune Responses As Well As The Responses Of Other T Cell Subsets Through Direct Cell-Cell Contact And Cytokine Release.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002419
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='2C5PhwrW', name='mature T cell', ontology_id='CL:0002419', synonyms='mature T-cell|CD3e-positive T cell', description='A T Cell That Expresses A T Cell Receptor Complex And Has Completed T Cell Selection.', updated_at=2023-08-30 13:54:26, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='XjG8T0GY', name='fibroblast', ontology_id='CL:0000057', synonyms='Fibroblasts', description='A Connective Tissue Cell Which Secretes An Extracellular Matrix Rich In Collagen And Other Macromolecules. Flattened And Irregular In Outline With Branching Processes; Appear Fusiform Or Spindle-Shaped.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002320
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='4zAzIMBQ', name='connective tissue cell', ontology_id='CL:0002320', description='A Cell Of The Supporting Or Framework Tissue Of The Body, Arising Chiefly From The Embryonic Mesoderm And Including Adipose Tissue, Cartilage, And Bone.', updated_at=2023-08-30 13:54:27, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='0JvRwfVm', name='plasma cell', ontology_id='CL:0000786', synonyms='plasmocyte|plasmacyte|plasma B-cell|plasma B cell|Plasma cells', description='A Terminally Differentiated, Post-Mitotic, Antibody Secreting Cell Of The B Cell Lineage With The Phenotype Cd138-Positive, Surface Immunonoglobulin-Negative, And Mhc Class Ii-Negative. Plasma Cells Are Oval Or Round With Extensive Rough Endoplasmic Reticulum, A Well-Developed Golgi Apparatus, And A Round Nucleus Having A Characteristic Cartwheel Heterochromatin Pattern And Are Devoted To Producing Large Amounts Of Immunoglobulin.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000946
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='hFMWJcWc', name='antibody secreting cell', ontology_id='CL:0000946', description='A Lymphocyte Of B Lineage That Is Devoted To Secreting Large Amounts Of Immunoglobulin.', updated_at=2023-08-30 13:54:27, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='S4Urkinl', name='macrophage', ontology_id='CL:0000235', synonyms='Macrophages|histiocyte', description='A Mononuclear Phagocyte Present In Variety Of Tissues, Typically Differentiated From Monocytes, Capable Of Phagocytosing A Variety Of Extracellular Particulate Material, Including Immune Complexes, Microorganisms, And Dead Cells.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='FMTngXKK', name='follicular B cell', ontology_id='CL:0000843', synonyms='Fo B-cell|follicular B-cell|Follicular B cells|follicular B lymphocyte|Fo B cell|follicular B-lymphocyte', description='A Resting Mature B Cell That Has The Phenotype Igm-Positive, Igd-Positive, Cd23-Positive And Cd21-Positive, And Found In The B Cell Follicles Of The White Pulp Of The Spleen Or The Corticol Areas Of The Peripheral Lymph Nodes. This Cell Type Is Also Described As Being Cd19-Positive, B220-Positive, Aa4-Negative, Cd43-Negative, And Cd5-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='X458vtJX', name='naive B cell', ontology_id='CL:0000788', synonyms='naive B lymphocyte|Naive B cells|naive B-cell|naive B-lymphocyte', description='A Naive B Cell Is A Mature B Cell That Has The Phenotype Surface Igd-Positive, Surface Igm-Positive, Cd20-Positive, Cd27-Negative And That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='P6E7yrc7', name='epithelial cell', ontology_id='CL:0000066', synonyms='Epithelial cells|epitheliocyte', description='A Cell That Is Usually Found In A Two-Dimensional Sheet With A Free Surface. The Cell Has A Cytoskeleton That Allows For Tight Cell To Cell Contact And For Cell Polarity Where Apical Part Is Directed Towards The Lumen And The Basal Part To The Basal Lamina.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='67zMsufW', name='memory B cell', ontology_id='CL:0000787', synonyms='memory B-cell|Memory B cells|memory B-lymphocyte|memory B lymphocyte', description='A Memory B Cell Is A Mature B Cell That Is Long-Lived, Readily Activated Upon Re-Encounter Of Its Antigenic Determinant, And Has Been Selected For Expression Of Higher Affinity Immunoglobulin. This Cell Type Has The Phenotype Cd19-Positive, Cd20-Positive, Mhc Class Ii-Positive, And Cd138-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='TENASE93', name='alveolar macrophage', ontology_id='CL:0000583', synonyms='Alveolar macrophages|dust cell', description='A Tissue-Resident Macrophage Found In The Alveoli Of The Lungs. Ingests Small Inhaled Particles Resulting In Degradation And Presentation Of The Antigen To Immunocompetent Cells. Markers Include F4/80-Positive, Cd11B-/Low, Cd11C-Positive, Cd68-Positive, Sialoadhesin-Positive, Dectin-1-Positive, Mr-Positive, Cx3Cr1-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000864
✅ created 1 CellType record from Bionty matching ontology_id: CL:1001603
💡 also saving parents of CellType(id='i20ionW5', name='mast cell', ontology_id='CL:0000097', synonyms='histaminocyte|labrocyte|Mast cells|mastocyte', description='A Cell That Is Found In Almost All Tissues Containing Numerous Basophilic Granules And Capable Of Releasing Large Amounts Of Histamine And Heparin Upon Activation. Progenitors Leave Bone Marrow And Mature In Connective And Mucosal Tissue. Mature Mast Cells Are Found In All Tissues, Except The Bloodstream. Their Phenotype Is Cd117-High, Cd123-Negative, Cd193-Positive, Cd200R3-Positive, And Fceri-High. Stem-Cell Factor (Kit-Ligand; Scf) Is The Main Controlling Signal Of Their Survival And Development.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ loaded 1 CellType record matching ontology_id: CL:0000766
✅ created 1 CellType record from Bionty matching ontology_id: CL:0002274
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='70wMh2r7', name='histamine secreting cell', ontology_id='CL:0002274', description='A Cell Type That Secretes Histamine.', updated_at=2023-08-30 13:54:29, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000457
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
💡 you can switch this off via: lb.settings.auto_save_parents = False
💡 also saving parents of CellType(id='TpKvGjqi', name='biogenic amine secreting cell', ontology_id='CL:0000457', updated_at=2023-08-30 13:54:29, bionty_source_id='UHv4', created_by_id='DzTjkKse')
✅ created 1 CellType record from Bionty matching ontology_id: CL:0000151
💡 also saving parents of CellType(id='g2Rk2xkb', name='myelocyte', ontology_id='CL:0002193', synonyms='Myelocytes', description='A Cell Type That Is The First Of The Maturation Stages Of The Granulocytic Leukocytes Normally Found In The Bone Marrow. Granules Are Seen In The Cytoplasm. The Nuclear Material Of The Myelocyte Is Denser Than That Of The Myeloblast But Lacks A Definable Membrane. The Cell Is Flat And Contains Increasing Numbers Of Granules As Maturation Progresses.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='64kIG7So', name='gamma-delta T cell', ontology_id='CL:0000798', synonyms='gammadelta T cell|gamma-delta T-cell|gamma-delta T cells|gamma-delta T-lymphocyte|gamma-delta T lymphocyte', description='A T Cell That Expresses A Gamma-Delta T Cell Receptor Complex.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')
💡 also saving parents of CellType(id='uMLhrmbZ', name='germinal center B cell', ontology_id='CL:0000844', synonyms='germinal center B lymphocyte|GC B-lymphocyte|GC B cell|GC B-cell|germinal center B-cell|Germinal center B cells|germinal center B-lymphocyte|GC B lymphocyte', description='A Rapidly Cycling Mature B Cell That Has Distinct Phenotypic Characteristics And Is Involved In T-Dependent Immune Responses And Located Typically In The Germinal Centers Of Lymph Nodes. This Cell Type Expresses Ly77 After Activation.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')

Add parent-child relationship of the records from Celltypist#

We still need to add the renaming 4 High hierarchy terms:

list(high_terms_nonval)
['T cells', 'B-cell lineage', 'Erythroid', 'Cycling cells']

Let’s get the top hits from a search:

for term in list(high_terms_nonval):
    print(f"Term: {term}")
    display(celltype_bt.search(term).head(1))
Term: T cells

ontology_id definition synonyms parents __agg__ __ratio__
name
T cell CL:0000084 A Type Of Lymphocyte Whose Defining Characteri... T-lymphocyte|T-cell|T lymphocyte [CL:0000542] t cell 92.307692
Term: B-cell lineage

ontology_id definition synonyms parents __agg__ __ratio__
name
obsolete cell by lineage CL:0000220 None None [] obsolete cell by lineage 73.684211
Term: Erythroid

ontology_id definition synonyms parents __agg__ __ratio__
name
erythrocyte CL:0000232 A Red Blood Cell. In Mammals, Mature Erythrocy... RBC|red blood cell [CL:0000764] erythrocyte 70.0
Term: Cycling cells

ontology_id definition synonyms parents __agg__ __ratio__
name
circulating cell CL:0000080 A Cell Which Moves Among Different Tissues Of ... None [] circulating cell 75.862069

So we decide to:

  • Add the “T cells” to the synonyms of the public “T cell” record

  • Create the remaining 3 terms only using their names (we think “B cell flow” shouldn’t be identified with “B cell”)

for name in high_terms_nonval:
    if name == "T cells":
        record = lb.CellType.from_bionty(name="T cell")
        record.add_synonym(name)
        record.save()
    else:
        record = lb.CellType(name=name)
        record.save()
    records_names[name] = record
✅ created 1 CellType record from Bionty matching name: T cell
💡 also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T cells|T lymphocyte|T-lymphocyte|T-cell', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-08-30 13:54:31, bionty_source_id='UHv4', created_by_id='DzTjkKse')
❗ records with similar names exist! did you mean to load one of them?
id synonyms __ratio__
name
Cycling B cells ibzfn1zQ 92.857143
Cycling T cells TTziQpub 92.857143
Cycling NK cells rC47wc9h 89.655172

Now let’s add the parent records:

for _, row in celltypist_df.iterrows():
    record = records_names[row["name"]]
    if row["parent"] is not None:
        parent_record = records_names[row["parent"]]
        record.parents.add(parent_record)

Access the in-house CellType registry#

The previously added CellTypist ontology registry is now available in LaminDB. To retrieve the full ontology table as a Pandas DataFrame we can use .filter:

lb.CellType.filter().df()
name ontology_id abbr synonyms description bionty_source_id updated_at created_by_id
id
V0WRkNEN Intermediate macrophages CL:0000235 None None TNIP3+ CCL2+ macrophages which are largely res... None 2023-08-30 13:54:13 DzTjkKse
0z4U7Y9A Transitional NK CL:0000823 None None immature natural killer cells which originate ... None 2023-08-30 13:54:13 DzTjkKse
LZFcinwt Treg(diff) CL:0000815 None None unconventional T lymphocyte subpopulation in t... None 2023-08-30 13:54:13 DzTjkKse
LIJ5jLyj Tem/Effector helper T cells PD1+ CL:0000905 None None CD4+ helper T lymphocyte subpopulation in the ... None 2023-08-30 13:54:13 DzTjkKse
NyvLMjOH MNP CL:0000113 None None mononuclear phagocytes including dendritic cel... None 2023-08-30 13:54:13 DzTjkKse
... ... ... ... ... ... ... ... ...
wVT2qeb9 secretory cell CL:0000151 None None A Cell That Specializes In Controlled Release ... UHv4 2023-08-30 13:54:30 DzTjkKse
BxNjby0x T cell CL:0000084 None T cells|T lymphocyte|T-lymphocyte|T-cell A Type Of Lymphocyte Whose Defining Characteri... UHv4 2023-08-30 13:54:31 DzTjkKse
IWApAp8k B-cell lineage None None None None None 2023-08-30 13:54:31 DzTjkKse
W2AIYF7R Erythroid None None None None None 2023-08-30 13:54:31 DzTjkKse
QXXcsYW6 Cycling cells None None None None None 2023-08-30 13:54:31 DzTjkKse

132 rows × 8 columns

This enables us to look for cell types by creating a lookup object from our new CellType registry.

db_lookup = lb.CellType.lookup()
db_lookup.memory_b_cell
CellType(id='67zMsufW', name='memory B cell', ontology_id='CL:0000787', synonyms='memory B-cell|Memory B cells|memory B-lymphocyte|memory B lymphocyte', description='A Memory B Cell Is A Mature B Cell That Is Long-Lived, Readily Activated Upon Re-Encounter Of Its Antigenic Determinant, And Has Been Selected For Expression Of Higher Affinity Immunoglobulin. This Cell Type Has The Phenotype Cd19-Positive, Cd20-Positive, Mhc Class Ii-Positive, And Cd138-Negative.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')

See cell type hierarchy:

db_lookup.memory_b_cell.view_parents()
https://d33wubrfki0l68.cloudfront.net/69f03b5c7ace2b44adb09013d63ac6b64bd34b47/ca8de/_images/ba5ecf6cea1545b9fab50592c22b79af0b09f1ec9db5490408f5923a3ac3aec5.svg

Access parents of a record:

db_lookup.memory_b_cell.parents.list()
[CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-08-30 13:54:14, bionty_source_id='UHv4', created_by_id='DzTjkKse'),
 CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B cells|B-lymphocyte|B lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-30 13:54:13, bionty_source_id='UHv4', created_by_id='DzTjkKse')]

Annotate a dataset with cell types using CellTypist#

Annotate cell types predicted with CellTypist#

We now demonstrate how simple it is to predict and add cell types to LaminDB with CellTypist. Our dataset of choice is a simple sample dataset together with a sample model.

input_file = celltypist.samples.get_sample_csv()
input_file
'/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/celltypist/data/samples/sample_cell_by_gene.csv'
predictions = celltypist.annotate(
    input_file, model="Immune_All_Low.pkl", majority_voting=True
)
Hide code cell output
🔎 No available models. Downloading...
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 31
📂 Storing models in /home/runner/.celltypist/data/models
💾 Downloading model [1/31]: Immune_All_Low.pkl
💾 Downloading model [2/31]: Immune_All_High.pkl
💾 Downloading model [3/31]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [4/31]: Adult_Mouse_Gut.pkl
💾 Downloading model [5/31]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [6/31]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [7/31]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [8/31]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [9/31]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [10/31]: COVID19_Immune_Landscape.pkl
💾 Downloading model [11/31]: Cells_Fetal_Lung.pkl
💾 Downloading model [12/31]: Cells_Intestinal_Tract.pkl
💾 Downloading model [13/31]: Cells_Lung_Airway.pkl
💾 Downloading model [14/31]: Developing_Human_Brain.pkl
💾 Downloading model [15/31]: Developing_Human_Hippocampus.pkl
💾 Downloading model [16/31]: Developing_Human_Thymus.pkl
💾 Downloading model [17/31]: Developing_Mouse_Brain.pkl
💾 Downloading model [18/31]: Developing_Mouse_Hippocampus.pkl
💾 Downloading model [19/31]: Healthy_COVID19_PBMC.pkl
💾 Downloading model [20/31]: Healthy_Mouse_Liver.pkl
💾 Downloading model [21/31]: Human_AdultAged_Hippocampus.pkl
💾 Downloading model [22/31]: Human_IPF_Lung.pkl
💾 Downloading model [23/31]: Human_Longitudinal_Hippocampus.pkl
💾 Downloading model [24/31]: Human_Lung_Atlas.pkl
💾 Downloading model [25/31]: Human_PF_Lung.pkl
💾 Downloading model [26/31]: Lethal_COVID19_Lung.pkl
💾 Downloading model [27/31]: Mouse_Dentate_Gyrus.pkl
💾 Downloading model [28/31]: Mouse_Isocortex_Hippocampus.pkl
💾 Downloading model [29/31]: Mouse_Postnatal_DentateGyrus.pkl
💾 Downloading model [30/31]: Nuclei_Lung_Airway.pkl
💾 Downloading model [31/31]: Pan_Fetal_Human.pkl
📁 Input file is '/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/celltypist/data/samples/sample_cell_by_gene.csv'
⏳ Loading data
🔬 Input data has 559 cells and 32786 genes
🔗 Matching reference genes in the model
🧬 5313 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!

Now that we’ve predicted all cell types we create an Anndata object that we will eventually track with LaminDB.

adata_annotated = predictions.to_adata()
adata_annotated.obs
predicted_labels over_clustering majority_voting conf_score
Cell_1 Intermediate macrophages 3 Age-associated B cells 0.979577
Cell_2 Trm cytotoxic T cells 3 Age-associated B cells 0.073008
Cell_3 pDC 9 Macrophages 0.020744
Cell_4 Follicular B cells 36 Age-associated B cells 0.167273
Cell_5 Trm cytotoxic T cells 36 Age-associated B cells 0.430877
... ... ... ... ...
Cell_555 Alveolar macrophages 5 Alveolar macrophages 0.152075
Cell_556 Alveolar macrophages 0 Alveolar macrophages 0.901491
Cell_557 Tcm/Naive helper T cells 5 Alveolar macrophages 0.092006
Cell_558 Alveolar macrophages 5 Alveolar macrophages 0.747148
Cell_559 Alveolar macrophages 0 Alveolar macrophages 0.060108

559 rows × 4 columns

Validate cell types:

celltypes = lb.CellType.from_values(
    adata_annotated.obs.predicted_labels, lb.CellType.name
)
Hide code cell output
✅ loaded 19 CellType records matching name: Intermediate macrophages, Trm cytotoxic T cells, pDC, Tcm/Naive helper T cells, T(agonist), Age-associated B cells, DC, Tem/Temra cytotoxic T cells, CD16- NK cells, Double-positive thymocytes, Tem/Effector helper T cells, CD16+ NK cells, NKT cells, Mono-mac, Type 17 helper T cells, MNP, Erythrophagocytic macrophages, DC2, Cycling T cells
✅ loaded 11 CellType records matching synonyms: Follicular B cells, Macrophages, B cells, Classical monocytes, Alveolar macrophages, Memory B cells, Regulatory T cells, Myelocytes, NK cells, Non-classical monocytes, Monocytes

Track the annotated dataset in LaminDB#

Register features#

features = ln.Feature.from_df(adata_annotated.obs)
ln.save(features)

Create a file record of the AnnData object. We further define a name of the dataset for clarity that can also be queried for.

file_annotated = ln.File.from_anndata(
    adata_annotated, description="Examplary CellTypist file", var_ref=lb.Gene.symbol
)
💡 file will be copied to default storage upon `save()` with key `None` ('.lamindb/H5VWxBGaeIxuodM7SpVl.h5ad')
💡 parsing feature names of X stored in slot 'var'
32786 terms (100.00%) are not validated for symbol: MIR1302-10, FAM138A, OR4F5, RP11-34P13.7, RP11-34P13.8, AL627309.1, RP11-34P13.14, RP11-34P13.9, AP006222.2, RP4-669L17.10, OR4F29, RP4-669L17.2, RP5-857K21.15, RP5-857K21.1, RP5-857K21.2, RP5-857K21.3, RP5-857K21.4, RP5-857K21.5, OR4F16, RP11-206L10.3, ...
❗    no validated features, skip creating feature set
💡 parsing feature names of slot 'obs'
4 terms (100.00%) are validated for name
✅    linked: FeatureSet(id='M9RvoLyFSlprZ5oSmSNF', n=4, registry='core.Feature', hash='xwEKP7sHlreprEDwxV2p', modality_id='RzyfTOnw', created_by_id='DzTjkKse')
file_annotated.save()
✅ saved 1 feature set for slot: 'obs'
✅ storing file 'H5VWxBGaeIxuodM7SpVl' at '.lamindb/H5VWxBGaeIxuodM7SpVl.h5ad'
file_annotated.add_labels(celltypes, feature="predicted_labels")
✅ linked feature 'predicted_labels' to registry 'bionty.CellType'
file_annotated.describe()
💡 File(id='H5VWxBGaeIxuodM7SpVl', suffix='.h5ad', accessor='AnnData', description='Examplary CellTypist file', size=75080752, hash='F_6OoLZOZ9Ppm019zJyHLx', hash_type='sha1-fl', updated_at=2023-08-30 13:55:58)

Provenance:
    🗃️ storage: Storage(id='luStdUTJ', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist', type='local', updated_at=2023-08-30 13:54:07, created_by_id='DzTjkKse')
    💫 transform: Transform(id='s5mkN5NQ1ttIz8', name='Manage a cell type registry', short_name='celltypist', version='0', type=notebook, updated_at=2023-08-30 13:55:58, created_by_id='DzTjkKse')
    👣 run: Run(id='UGCjQ982CqMSOPegtRHX', run_at=2023-08-30 13:54:12, transform_id='s5mkN5NQ1ttIz8', created_by_id='DzTjkKse')
    👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-30 13:54:07)
Features:
  obs (metadata):
    🔗 predicted_labels (30, bionty.CellType): ['Intermediate macrophages', 'MNP', 'non-classical monocyte', 'Trm cytotoxic T cells', 'B cell']
file_annotated.view_flow()
https://d33wubrfki0l68.cloudfront.net/0baa12a53c8a1f556c91b992a97c99d14afe67d8/dc334/_images/54311214a55026171d8a1cc3d2a35010f05dfbf6bb9040fe675cad8d3d2f33a7.svg

Now we can query a file by a specific cell type:

ln.File.filter(cell_types=db_lookup.tcm_naive_helper_t_cells).df()
storage_id key suffix accessor description version initial_version_id size hash hash_type transform_id run_id updated_at created_by_id
id
H5VWxBGaeIxuodM7SpVl luStdUTJ None .h5ad AnnData Examplary CellTypist file None None 75080752 F_6OoLZOZ9Ppm019zJyHLx sha1-fl s5mkN5NQ1ttIz8 UGCjQ982CqMSOPegtRHX 2023-08-30 13:55:58 DzTjkKse

Or track in which notebook the file is annotated by celltypist:

ln.Transform.filter(files__description__icontains="CellTypist").df()
name short_name version initial_version_id type reference updated_at created_by_id
id
s5mkN5NQ1ttIz8 Manage a cell type registry celltypist 0 None notebook None 2023-08-30 13:55:58 DzTjkKse
# clean up test instance
!lamin delete --force celltypist
!rm -r ./celltypist
Hide code cell output
💡 deleting instance testuser1/celltypist
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--celltypist.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/celltypist