Previous Next

Skin Atlas

The skin is an amazing and complex organ that comprises multiple layers and cell types that are functionally distinct.

The aim of the Skinatlas is to characterize the molecular composition of the healthy human skin by creating an atlas of all the proteins expressed in healthy skin as a function of their spatial location as well as its major cell types.

This atlas, comprising the identification of a global proteomic composition of human skin, will provide an important resource and serve as a basis for future studies comparing the proteomes of inflammatory and oncologic skin diseases.

The Skin Atlas library defines the proteomic baseline of:

5 different skin layers:

  • Stratum corneum
  • Stratum lucidum to stratum basale
  • Dermis
  • Subcutis
  • Superficial subcutaneous fat

Skin layers

Fat cells from 2 locations:

  • Superficial subcutaneous fat
  • Deep subcutaneous fat

5 different Immune cells:

  • Dendritic cells
  • Dermal dendritic cells
  • Macrophages
  • Mast cells
  • Monocyte derived macrophages

Cell subsets (4)

  • Fibroblasts
  • Keratinocytes
  • Melanocytes
  • Endothelial cells


Experimental Design


Exploratory analysis of the library

In [230]:
from IPython.display import HTML
from ipywidgets import interact
import pandas as pd
import seaborn as sns
import re
import numpy as np
from multi_key_dict import multi_key_dict
from report_manager.analyses import basicAnalysis
from report_manager.plots import basicFigures
from report_manager import utils
from plotly.offline import iplot, init_notebook_mode
from graphdb_connector import connector
import venn
%matplotlib inline
import warnings

function code_toggle() {
 if (code_show){
 } else {
 code_show = !code_show
$( document ).ready(code_toggle);
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.
In [170]:
proteinGroups_file = '../../data/library_files/proteinGroups.txt'
sorted_samples = ['Fibroblast',
                  'Endothelial cell',
                  'Dendritic cell', 
                  'Dermal dendritic cell', 
                  'Mast cell',
                  'Monocyte derived macrophages',
                  'Stratum corneum', 
                  'Stratum basale', 
                  'Superficial subcutaneous fat', 
                  'Deep subcutaneous fat']
cell_types = multi_key_dict()
           'Endothelial cell']='cell subsets'
cell_types['Dendritic cell', 
           'Dermal dendritic cell', 
           'Mast cell',
           'Monocyte derived macrophages',
           'Pbmc'] ='immune cells'
cell_types['Stratum corneum', 'Stratum basale', 'Subcutis', 'Dermis']='skin layers'
cell_types['Superficial subcutaneous fat', 'Deep subcutaneous fat']= 'Fat cells'

colors = {'Dendritic cell': '#b10026', 
          'Dermal dendritic cell': '#f03b20', 
          'Macrophage': '#fd8d3c', 
          'Monocyte derived macrophages': '#feb24c',
          'Pbmc': '#fed976',          
          'Mast cell': '#ffffb2',
          'Fibroblast': '#54278f', 
          'Keratinocyte': '#756bb1', 
          'Endothelial cell': '#9e9ac8',          
          'Melanocyte': '#bcbddc',
          'Stratum corneum': '#08519c', 
          'Stratum basale': '#3182bd', 
          'Subcutis': '#6baed6', 
          'Superficial subcutaneous fat': '#c51b8a', 
          'Deep subcutaneous fat': '#fa9fb5',
          'immune cells': '#b10026',
          'skin layers':'#08519c',
          'Fat cells': '#c51b8a',
          'cell subsets':'#54278f',

known_layer_markers = {'Stratum corneum':['Q15517', 'O75635', 'Q9Y337', 'Q9UBX7', 'Q96PI1', 'O75342'],
                      'Stratum basale':['P19012', 'O15350', 'A6ND36', 'Q9H3D4', 'Q02487', 'Q14574', 'P22607', 'Q02388'],
                      'Dermis':['P04264', 'Q7Z794', 'P31944', 'O00515', 'P05549', 'Q8N271', 'O15020', 'Q9BYG4', 'P35052'],

staining_markers = {'Stratum basale':['P13647','P02533'],
                   'Stratum corneum':['P04264','P13645'],
                   'Mast cell':['P20231'],
                   'Endothelial cell':['P16284'],
                   'Monocyte derived macrophages':['P08571'],
                   'Dendritic cell':['P06126']}
In [171]:
protein_groups = pd.read_csv(proteinGroups_file, sep='\t', low_memory=False)

protein_groups = protein_groups[protein_groups[['Reverse','Only identified by site']].isnull().all(1)]
protein_groups['Peptide counts (razor+unique)'] = [int(pep.split(';')[0]) for pep in protein_groups['Peptide counts (razor+unique)'].values.tolist()]
protein_groups = protein_groups.loc[protein_groups['Peptide counts (razor+unique)'] >= 2,:]

protein_groups['Majority protein IDs'] = [p.split(';')[0] for p in protein_groups['Majority protein IDs'].tolist()]
protein_groups = protein_groups.set_index('Majority protein IDs') = 'proteins'

proteins = list(protein_groups.index)
genes = [g.split(';')[0] if not isinstance(g, float) else proteins[i] for i,g in enumerate(protein_groups['Gene names'])]
protein_mapping = pd.DataFrame(
    {'gene': genes,
     'protein': proteins

cols = [c for c in protein_groups.columns if c.startswith('Intensity ') or c.startswith('iBAQ') or c=="Majority protein IDs"]
protein_groups = protein_groups[cols]

intensity_cols = [c for c in protein_groups.columns if c.startswith('Intensity ')]
iBAQ_cols = [c for c in protein_groups.columns if c.startswith('iBAQ ')]

intensities = protein_groups[intensity_cols]
new_cols = [c.replace("Intensity ", '') for c in intensities.columns]
intensities.columns = new_cols

iBAQs = protein_groups[iBAQ_cols]
new_cols = [c.replace("iBAQ ", '') for c in iBAQs.columns]
iBAQs.columns = new_cols

num_nan = iBAQs.replace({0:np.nan}).isnull().sum(axis = 0).to_frame()
num_nan.columns = ['num_nan']

selected_columns = ['DC', 'Dermal dendritic cell', 'Deep Fat', 'Dermis', 
                    'Endothelial cell', 'Fat', 'Fibroblast',
                    'Keratinocyte', 'Macrophage', 
                    'Mast cell', 'Melanocyte', 'Monocyte derived macrophages', 
                    'PBMC', 'Stratum Basale', 'Stratum Corneum', 'Subcutis']
intensities = intensities[selected_columns]
cols = ['Dendritic cell','Dermal dendritic cell', 'Deep subcutaneous fat', 
        'Dermis', 'Endothelial cell', 'Superficial subcutaneous fat']
cols.extend([c.lower().capitalize() for c in list(intensities.columns)[6:]])
intensities.columns = cols

intensities = intensities.transpose()
intensities = intensities.replace({0:np.nan})

iBAQs = iBAQs[selected_columns]
iBAQs.columns = cols

iBAQs = iBAQs.transpose()
iBAQs = iBAQs.replace({0:np.nan})

Keratine and Collagen protein mass percentage

Protein mass percentage of total keratins and collagens across five skin layers. Percentage was calculated by dividing the summed LFQ intensities of all keratins or collagens by all proteins. Red dots represent keratins and blue dots represent collagens.

In [172]:
mask_collagens = protein_mapping['gene'].str.contains('COL\d')
mask_keratines = protein_mapping['gene'].str.contains('KRT\d')

layers_intensities = intensities.loc[['Stratum corneum', 'Stratum basale', 'Dermis', 'Subcutis','Superficial subcutaneous fat'],].dropna(axis=1, how='all')
layers_intensities = layers_intensities.reindex(['Superficial subcutaneous fat', 'Subcutis', 'Dermis', 'Stratum basale','Stratum corneum'])

collagen_intensities = layers_intensities.loc[:,mask_collagens].dropna(axis=1, how='all')
#print([protein_mapping_dict[p]['gene'] for p in collagen_intensities.columns])

keratine_intensities = layers_intensities.loc[:,mask_keratines].dropna(axis=1, how='all')
#print([protein_mapping_dict[p]['gene'] for p in keratine_intensities.columns])

percentage = pd.DataFrame()
percentage['layers'] = layers_intensities.index
percentage['Sum_collagens'] = list(collagen_intensities.sum(axis=1))
percentage['Sum_keratins'] = list(keratine_intensities.sum(axis=1))
percentage['Sum_all'] = list(layers_intensities.sum(axis=1))
percentage = percentage.assign(ratio_collagens = lambda percentage : percentage.Sum_collagens / percentage.Sum_all)
percentage = percentage.assign(ratio_keratins = lambda percentage : percentage.Sum_keratins / percentage.Sum_all)

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(4, 6))
plt.plot(percentage['ratio_keratins'], percentage['layers'], 'ro--');
plt.plot(percentage['ratio_collagens'], percentage['layers'], 'bo--');
plt.xlabel('Protein mass percentage', fontsize=12);
plt.legend(['Keratins', 'Collagens']);
#plt.savefig('layers_collagen_keratin.png', dpi=120, bbox_inches='tight')
In [173]:
cols = [protein_mapping_dict[p]['gene'] for p in collagen_intensities.columns]
aux = collagen_intensities.copy()
aux.columns = cols
aux.to_csv('collagen_proteins_intensities.tsv', sep='\t', doublequote=False)
In [174]:
cols = [protein_mapping_dict[p]['gene'] for p in keratine_intensities.columns]
aux = keratine_intensities.copy()
aux.columns = cols
aux.to_csv('keratin_proteins_intensities.tsv', sep='\t', doublequote=False)

Ranking of proteins (intensities)

The plots below illustrate the ranked relative abundance of proteins and number of identified proteins in a subset of libraries. Immunohistochemical stainings of a characterizing protein for each skin layer or cell subset is shown.

Highlighting markers tested in the staining assays

In [175]:
for index in staining_markers:
    gintensities = intensities.loc[index, :].dropna()
    gintensities = gintensities.apply(lambda x: np.log2(x)).sort_values(ascending=False).to_frame().reset_index().reset_index()
    gintensities.columns = ['x','name','y']
    if len(set(gintensities['name'].values.tolist()).intersection(staining_markers[index])) > 0:
        gintensities['colors'] = [colors[index] if index in staining_markers and p in staining_markers[index] else 'lightgrey' for p in gintensities['name'].tolist()]
        gintensities['size'] = [20 if index in staining_markers and p in staining_markers[index] else 5 for p in gintensities['name'].tolist()]
        plot = basicFigures.get_simple_scatterplot(gintensities, 'rank', {'labels':gintensities['name'].tolist(), 
                                                           'title': 'Rank of identified proteins in '+index, 
                                                           'x_title':'rank of proteins', 
                                                            'width':700 })
        intensity_plots_staining[index] = plot.figure
In [176]:
#for name in intensity_plots_staining:
#    iplot(intensity_plots_staining[name], filename=name+"_ranking.png")

Stratum corneum Keratin 1 is one of the most abundant keratins in stratum corneum. The immunohistochemical staining of healthy skin depicts keratin 1 (stratum corneum) and keratin 14 in the deeper layers of epidermis.

Dermis We have identified all known collagens in healthy skin. The highest abundance of collagens is found in the dermis. The immunohistochemical stainings of healthy skin depict collagen III and IV.

Endothelial cell Endothelial cells line the interior of blood and lymph vessels, and are the barrier between vessels and surrounding tissue. The cells are immunohistochemically stained with antibodies to platelet endothelial cell adhesion molecule (PECAM-1) also known as cluster of differentiation 31 (CD31). CD31 can also be expressed by other cell types.

Melanocyte Melanocytes are pigment (melanin)-producing cells found in the basal layer of the skin. The MART-1/melan-A antigen is specific for the melanocyte lineage.

Monocyte derived macrophages Monocyte derived macrophages are one of many subsets of immune cells observed in skin. They are believed to be of monocyte origin based, among other things, on their absence in monocyte deficiency states. CD14 is a pattern recognition receptor and is found on macrophages and neutrophils apart from monocyte derived macrophages.

Standardization of iBAQ values

1) normalize iBAQ values using Log2

2) standardizing values by median-centered

In [177]:
norm_dataset = iBAQs.apply(lambda x: np.log2(x)) = 'samples'
In [178]:
norm_dataset = norm_dataset.sub(norm_dataset.median(axis=1), axis=0).reset_index()
In [179]:
proteins samples A0A024R0K5 A0A087WWR4 A0A024R216 A0A024R368 A0A024R412 A0A024R4E5 A0A087WZA9 A0A024R4M0 A0A024R571 ... S4R303 S4R3H4 S4R3N1 V9GYM3 V9GZ54 V9HW75 W0Z7M9 X5CMH5 X6R5Z6 X6RAL5
0 Dendritic cell -7.731910 -0.724164 -1.093414 -5.539224 -1.944459 2.658899 0.159803 7.024504 4.297308 ... NaN 2.623953 6.603221 -1.686704 NaN 2.176259 -0.199126 -0.350370 3.449053 1.820072
1 Dermal dendritic cell NaN 0.927700 1.393727 -4.763047 -0.655005 2.757749 -0.953016 7.455600 4.361176 ... NaN 2.362542 5.173962 1.172942 -0.335231 2.258161 -1.483458 0.783550 2.698655 3.376036
2 Deep subcutaneous fat NaN NaN 3.781337 -2.219621 -2.344501 2.892155 -0.896123 4.240351 5.088018 ... NaN 3.697669 7.151917 11.694794 2.050960 1.099861 -4.816365 -6.266575 0.380932 1.173863
3 Dermis -3.870581 -2.983131 2.218958 NaN -2.899023 2.651910 0.626955 5.540395 4.169112 ... -2.313265 1.862005 4.215861 6.838484 0.305360 3.203403 0.096236 -3.026000 0.785161 5.642438
4 Endothelial cell NaN -2.340857 5.796140 -4.530302 -3.318189 2.788615 0.510982 6.841195 6.020740 ... -4.298611 1.974418 5.150898 1.496522 -1.336365 2.031198 -3.330048 0.303793 1.796782 2.902793

5 rows × 10976 columns

In [180]:
norm_dataset.to_csv('normalized_iBAQ_values_dataset.tsv', index=False, doublequote=False, sep='\t')

Number of proteins identified

The Skinatlas comprises 10976 identified proteins, including skin layers, primary cell cultures and skin-associated immune cells.

In [181]:
args = {'group':'samples', 'x':'samples', 'y':'counts', 'title':'Number of identified proteins', 
        "x_title":'skin atlas', 'y_title':'Number of proteins', 'colors':colors, 'height': 600, 'width': 900}
barplot_data = norm_dataset[['samples']].copy()
barplot_data['counts'] = len(list(norm_dataset.columns)[1:]) - norm_dataset.iloc[:,1:].T.isnull().sum()
plot = basicFigures.get_barplot(barplot_data, identifier='barplot', args=args)

Skin layers protein overlaps

Venn diagram representing unique and overlapping proteins in skin layers.

In [182]:
skin_layers = ['Stratum corneum', 'Stratum basale', 'Subcutis', 'Dermis']
prots_cell_type = {}
list_colors = []
data = norm_dataset.set_index('samples').loc[skin_layers,:]
for name, group in data.reset_index().groupby('samples'):
    prots_cell_type[name] = set(data.loc[name,:].dropna().index)

labels = venn.generate_petal_labels(prots_cell_type.values())
fig, ax = venn.venn4(labels, names=prots_cell_type.keys(), colors=list_colors,figsize=(10,10))

Fat cells protein overlaps

Venn diagram representing unique and overlapping proteins in Fat.

In [183]:
fat_layers = ['Superficial subcutaneous fat', 'Deep subcutaneous fat']
prots_cell_type = {}
list_colors = []
data = norm_dataset.set_index('samples').loc[fat_layers,:]
for name, group in data.reset_index().groupby('samples'):
    prots_cell_type[name] = set(data.loc[name,:].dropna().index)

labels = venn.generate_petal_labels(prots_cell_type.values())
fig, ax = venn.venn4(labels, names=prots_cell_type.keys(), colors=list_colors,figsize=(10,10))
In [184]:
unique_prots = {}
for i in range(len(prots_cell_type.keys())):
    vals = set(range(len(prots_cell_type.keys()))).difference([i])
    prots = [list(prots_cell_type.values())[v] for v in vals]
    unique_prots[list(prots_cell_type.keys())[i]] = list(prots_cell_type.values())[i].difference(*prots)
In [185]:
unique_prots_df = pd.DataFrame(dict([ (k,pd.Series(list(v))) for k,v in unique_prots.items() ]))
unique_prots_df.to_csv('../../data/results_library/unique_proteins_skin_layers.tsv', doublequote=False, index=False, sep='\t')

Sample stratification

In [186]:
imputed_dataset = basicAnalysis.imputation_normal_distribution(norm_dataset, index=['samples'], shift = 1.8, nstd = 0.3)

PCA analysis

Principal component analysis of protein expression variances between skin layers and cell subsets.

In [187]:
results, args = basicAnalysis.run_pca(imputed_dataset.reset_index(), drop_cols=[], group='samples', components=2)
In [188]:
args['title'] = 'PCA plot'
args['colors'] = colors
args['height'] = 900
args['width'] = 950
plot = basicFigures.get_scatterplot(results['pca'], identifier='pca', args=args)
In [189]:

Anova results on groups of cell types

In [190]:
groups = {c:cell_types[c] for c in imputed_dataset.index}
groups_dataset = imputed_dataset.join(pd.DataFrame.from_dict(groups, orient='index'))
cols = list(imputed_dataset.columns)
groups_dataset.columns = cols
In [191]:
aov_result = basicAnalysis.anova(groups_dataset, alpha=0.05, drop_cols=[], group='group', permutations=0)
In [192]:
aov_result = aov_result.set_index('identifier').join(protein_mapping)
cols = list(aov_result.columns)[0:-1]
aov_result.columns = cols
In [193]:
                                             index=True, doublequote=False, sep='\t')
In [194]:
args={'alpha':0.05, 'fc':2, 'colorscale':'Blues', 'showscale': False, 'marker_size':6, 'x_title':'log2FC', 'y_title':'-log10(pvalue)'}
figures = basicFigures.run_volcano(aov_result, identifier='volcano', args=args)
In [195]:
for figure in figures: