Python也能畫漂亮的complex heatmap?
微信公眾號(hào):「Computational Epigenetics」
關(guān)注生物信息學(xué)和計(jì)算表觀遺傳學(xué)。問題或建議,請(qǐng)公眾號(hào)留言。
對(duì)于經(jīng)常用R語(yǔ)言來(lái)畫圖的科研工作者來(lái)說, 應(yīng)該對(duì) ComplexHeatmap (https://jokergoo.github.io/ComplexHeatmap-reference/book/)很 熟悉了吧。 這個(gè)包畫的熱圖,既專業(yè)又漂亮。
可惜的是,在python中,一直沒能出現(xiàn)一個(gè)可以畫出好看complex heatmap的包,由于我們?cè)谟胮ython做機(jī)器學(xué)習(xí)或者處理大數(shù)據(jù)的時(shí)候,也需要畫熱圖,而在python和R中來(lái)回切換,也比較麻煩而且沒有效率。
今天,給大家介紹一款可以在python中畫出類似于R中ComplexHeatmap效果的包: PyComplexHeatmap (https://github.com/DingWB/PyComplexHeatmap)。 直接看下面的代碼和圖吧(教程來(lái)自: https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb):
1. 導(dǎo)入相關(guān)包
import?os,sys
import?PyComplexHeatmap
from?PyComplexHeatmap?import?*
%matplotlib?inline
import?matplotlib.pylab?as?plt
plt.rcParams['figure.dpi']?=?120
plt.rcParams['savefig.dpi']=300
2. 快速入門
#Generate?example?dataset
df?=?pd.DataFrame(['AAAA1']?*?5?+?['BBBBB2']?*?5,?columns=['AB'])
df['CD']?=?['C']?*?3?+?['D']?*?3?+?['G']?*?4
df['EF']?=?['E']?*?6?+?['F']?*?2?+?['H']?*?2
df['F']?=?np.random.normal(0,?1,?10)
df.index?=?['sample'?+?str(i)?for?i?in?range(1,?df.shape[0]?+?1)]
df_box?=?pd.DataFrame(np.random.randn(10,?4),?columns=['Gene'?+?str(i)?for?i?in?range(1,?5)])
df_box.index?=?['sample'?+?str(i)?for?i?in?range(1,?df_box.shape[0]?+?1)]
df_bar?=?pd.DataFrame(np.random.uniform(0,?10,?(10,?2)),?columns=['TMB1',?'TMB2'])
df_bar.index?=?['sample'?+?str(i)?for?i?in?range(1,?df_box.shape[0]?+?1)]
df_scatter?=?pd.DataFrame(np.random.uniform(0,?10,?10),?columns=['Scatter'])
df_scatter.index?=?['sample'?+?str(i)?for?i?in?range(1,?df_box.shape[0]?+?1)]
df_heatmap?=?pd.DataFrame(np.random.randn(50,?10),?columns=['sample'?+?str(i)?for?i?in?range(1,?11)])
df_heatmap.index?=?["Fea"?+?str(i)?for?i?in?range(1,?df_heatmap.shape[0]?+?1)]
df_heatmap.iloc[1,?2]?=?np.nan
plt.figure(figsize=(6,?12))
row_ha?=?HeatmapAnnotation(label=anno_label(df.AB,?merge=True),
???????????????????????????AB=anno_simple(df.AB,add_text=True),axis=1,
???????????????????????????CD=anno_simple(df.CD,?colors={'C':?'red',?'D':?'yellow',?'G':?'green'},add_text=True),
???????????????????????????Exp=anno_boxplot(df_box,?cmap='turbo'),
???????????????????????????Scatter=anno_scatterplot(df_scatter),?TMB_bar=anno_barplot(df_bar),
???????????????????????????)
cm?=?ClusterMapPlotter(data=df_heatmap,?top_annotation=row_ha,?col_split=2,?row_split=3,?col_split_gap=0.5,
?????????????????????row_split_gap=1,col_dendrogram=False,plot=True,
?????????????????????tree_kws={'col_cmap':?'Set1',?'row_cmap':?'Dark2'})
plt.savefig("example1_heatmap.pdf",?bbox_inches='tight')
plt.show()
3. 畫行/列注釋
3.1 僅畫行/列的注釋信息
plt.figure(figsize=(6,?4))
row_ha?=?HeatmapAnnotation(label=anno_label(df.AB,?merge=True),
????????????????????????????AB=anno_simple(df.AB,add_text=True,legend=True),?axis=1,
????????????????????????????CD=anno_simple(df.CD,?colors={'C':?'red',?'D':?'gray',?'G':?'yellow'},
???????????????????????????????????????????add_text=True,legend=True),
????????????????????????????Exp=anno_boxplot(df_box,?cmap='turbo',legend=True),
????????????????????????????Scatter=anno_scatterplot(df_scatter),?TMB_bar=anno_barplot(df_bar,legend=True),
???????????????????????????plot=True,legend=True,legend_gap=5
????????????????????????????)
plt.savefig("col_annotation.pdf",?bbox_inches='tight')
plt.show()

anno_label:
anno_label是用來(lái)將行/列注釋信息(比如樣本的性別、分組、亞型等)單獨(dú)添加為一行文本(比如上圖中傾斜的AAAA1和BBBBB2),merge參數(shù)控制是否將相鄰兩個(gè)或者多個(gè)單元格的注釋信息合并為一個(gè)(如果相鄰單元格的標(biāo)簽相同的話)?如果?merge != True, 那么,每一列的列標(biāo)簽都會(huì)被單獨(dú)加上去(有時(shí)看起來(lái)會(huì)比較擁擠)。
anno_simple:
anno_simple是用來(lái)添加一個(gè)簡(jiǎn)單注釋的函數(shù)(比如上圖中的AB和CD那兩列colorbar),cmap參數(shù)可以是分類型(categorical) (比如Set1, Dark2, tab10等) ,也可以是連續(xù)的?(比如jet, turbo, parula等)。?參數(shù)add_text 控制是否添加文本到單元格上面(比如上圖中CD行單元格上面的文字C、D、G和AB列上面的注釋文字)。如果顏色和字體大小沒有被指定,函數(shù)會(huì)自動(dòng)決定。比如,如果背景顏色是深色,那么文字顏色就會(huì)是淺色,否則字體顏色就是深色(比如CD行中的文字G就是被自動(dòng)設(shè)定為黑色)。文字的顏色也可以通過參數(shù)text_kws={'color':your_color}來(lái)改變,比如:
plt.figure(figsize=(5,?4))
row_ha?=?HeatmapAnnotation(label=anno_label(df.AB,?merge=True),
????????????????????????????AB=anno_simple(df.AB,add_text=True,legend=True,text_kws={'color':'gold'}),?axis=1,
????????????????????????????CD=anno_simple(df.CD,add_text=True,legend=True,text_kws={'color':'purple'}),
????????????????????????????Exp=anno_boxplot(df_box,?cmap='turbo',legend=True),
????????????????????????????Scatter=anno_scatterplot(df_scatter),?TMB_bar=anno_barplot(df_bar,legend=True),
???????????????????????????plot=True,legend=True,legend_gap=5)
plt.show()

只需要一個(gè)python數(shù)據(jù)框dataframe就可以快速添加各類注釋
當(dāng)數(shù)據(jù)框df被給定時(shí),該dataframe中的所有列都被單獨(dú)作為anno_simple注釋。比如,下面一個(gè)數(shù)據(jù)框df中有4列:AB、CD、EF、F,所有4列都會(huì)被自動(dòng)畫成列注釋圖。如果某一列不是連續(xù)型,而是字符等分類型變量,也可以用anno_boxplot或者anno_scatterplot等添加箱線圖或者散點(diǎn)圖作為列(比如樣本)的信息注釋(比如腫瘤樣本的某種打分、某些基因表達(dá)的箱線圖分布等)。
plt.figure(figsize=(3,?3))
row_ha?=?HeatmapAnnotation(df=df,plot=True,legend=True)
plt.show()

3.2 將圖和圖例分開
有時(shí),我們可能會(huì)只需要圖,不需要圖例,也可能是要將圖例單獨(dú)畫出來(lái),PyComplexHeatmap可以實(shí)現(xiàn)這個(gè)功能,只需要讓plot_legend=False,然后再新建一個(gè)圖,執(zhí)行 row_ha.plot_legends就可以單獨(dú)畫圖例了。
只需要
plt.figure(figsize=(6,?4))
row_ha?=?HeatmapAnnotation(label=anno_label(df.AB,?merge=True),
????????????????????????????AB=anno_simple(df.AB,add_text=True,legend=True),?axis=1,
????????????????????????????CD=anno_simple(df.CD,add_text=True,legend=True),
????????????????????????????Exp=anno_boxplot(df_box,?cmap='turbo',legend=True),
????????????????????????????Scatter=anno_scatterplot(df_scatter),?TMB_bar=anno_barplot(df_bar,legend=True),
???????????????????????????plot=True,legend=True,plot_legend=False,
???????????????????????????legend_gap=5
????????????????????????????)
plt.savefig("col_annotation.pdf",?bbox_inches='tight')
plt.show()
plt.figure()
row_ha.plot_legends()
plt.savefig("legend.pdf",bbox_inches='tight')
plt.show()

No ax was provided, using plt.gca()
4. 畫聚類圖加行/列注釋信息
我們這里使用 PyComplexHeatmap包中提供的example數(shù)據(jù)集:
!wget?https://github.com/DingWB/PyComplexHeatmap/raw/main/data/influence_of_snp_on_beta.pickle
--2022-05-05 22:37:43-- https://github.com/DingWB/pyclustermap/raw/main/data/influence_of_snp_on_beta.pickle
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-05-05 22:37:43 ERROR 404: Not Found.
import?pickle
import?urllib
f=open("influence_of_snp_on_beta.pickle",'rb')
data=pickle.load(f)
f.close()
beta,snp,df_row,df_col,col_colors_dict,row_colors_dict=data
#?beta?is?DNA?methylation?beta?values?matrix,?df_row?and?df_col?are?row?and?columns?annotation?respectively,?col_colors_dict?and?row_colors_dict?are?color?for?annotation
print(beta.iloc[:,list(range(5))].head(5))
print(df_row.head(5))
print(df_col.head(5))
beta=beta.sample(2000)
snp=snp.loc[beta.index.tolist()]
df_row=df_row.loc[beta.index.tolist()]
204875570030_R01C02 204875570030_R04C01 \
cg30848532_TC21 0.525089 0.419515
cg30147375_BC21 0.803776 0.585928
cg46239718_BC21 0.443958 0.517514
cg36100119_BC21 0.351977 0.528846
cg42738582_BC21 0.783958 0.724901
204875570030_R05C01 204875570030_R06C01 204875570035_R05C02
cg30848532_TC21 0.483276 0.460750 0.390317
cg30147375_BC21 0.510269 0.831463 0.550146
cg46239718_BC21 0.535909 0.450167 0.564107
cg36100119_BC21 0.524896 0.374422 0.551200
cg42738582_BC21 0.802178 0.848621 0.850481
chr Target CpG ExtensionBase ProbeDesign CON mapFlag \
cg30848532_TC21 chr12 1 1 0 II C 16
cg30147375_BC21 chr11 0 0 0 II C 0
cg46239718_BC21 chr8 1 1 0 II C 0
cg36100119_BC21 chr19 1 1 0 II C 16
cg42738582_BC21 chr5 0 0 0 II C 16
Group \
cg30848532_TC21 Suboptimal hybridization
cg30147375_BC21 No Effect
cg46239718_BC21 Artificial low meth. reading
cg36100119_BC21 Suboptimal hybridization
cg42738582_BC21 Suboptimal hybridization
Type
cg30848532_TC21 1-1-0-CG-GG-II-C-16-GA-chr12-79760438
cg30147375_BC21 0-0-0-ca-ac-II-C-0-AG-chr11-109557651
cg46239718_BC21 1-1-0-cg-gt-II-C-0-GA-chr8-117860829
cg36100119_BC21 1-1-0-CG-GG-II-C-16-GA-chr19-5877949
cg42738582_BC21 0-0-0-AA-AA-II-C-16-AG-chr5-122031379
Strain Tissue Sex
204875570030_R01C02 MOLF_EiJ Frontal Lobe Brain Female
204875570030_R04C01 CAST_EiJ Frontal Lobe Brain Male
204875570030_R05C01 CAST_EiJ Frontal Lobe Brain Female
204875570030_R06C01 MOLF_EiJ Frontal Lobe Brain Male
204875570035_R05C02 CAST_EiJ Liver Male
row_ha?=?HeatmapAnnotation(Target=anno_simple(df_row.Target,colors=row_colors_dict['Target'],rasterized=True),
???????????????????????????????Group=anno_simple(df_row.Group,colors=row_colors_dict['Group'],rasterized=True),
???????????????????????????????axis=0)
col_ha=?HeatmapAnnotation(label=anno_label(df_col.Strain,merge=True,rotation=15),
??????????????????????????Strain=anno_simple(df_col.Strain,add_text=True),
??????????????????????????Tissue=df_col.Tissue,Sex=df_col.Sex,axis=1)?#df=df_col.loc[:,['Strain','Tissue','Sex']]
plt.figure(figsize=(6,?10))
cm?=?ClusterMapPlotter(data=beta,?top_annotation=col_ha,?left_annotation=row_ha,
?????????????????????show_rownames=False,show_colnames=False,
?????????????????????row_dendrogram=False,col_dendrogram=False,
?????????????????????row_split=df_row.loc[:,?['Target',?'Group']],
?????????????????????col_split=df_col['Strain'],cmap='parula',
?????????????????????rasterized=True,row_split_gap=1,legend=True,
?????????????????????tree_kws={'col_cmap':'Set1'})
plt.savefig("clustermap.pdf",?bbox_inches='tight')
plt.show()

Key features:
用戶可以通過row_split和col_split將所有的行和列按照標(biāo)簽分割成不同的模塊,row_split and col_split 可以是數(shù)字(分成幾個(gè)subgroup)、pandas dataframe或者是Series (每個(gè)樣本對(duì)應(yīng)的類別信息)。
5. 將多個(gè)熱圖[聚類圖]水平或者垂直拼接起來(lái)
row_ha?=?HeatmapAnnotation(Target=anno_simple(df_row.Target,?colors=row_colors_dict['Target'],?rasterized=True),
???????????????????????????????Group=anno_simple(df_row.Group,?colors=row_colors_dict['Group'],?rasterized=True),
???????????????????????????????axis=0)
col_ha?=?HeatmapAnnotation(label=anno_label(df_col.Strain,?merge=True,?rotation=15),
???????????????????????????Strain=anno_simple(df_col.Strain,?add_text=True),
???????????????????????????Tissue=df_col.Tissue,?Sex=df_col.Sex,
???????????????????????????axis=1)??#?df=df_col.loc[:,['Strain','Tissue','Sex']]
cm1?=?ClusterMapPlotter(data=beta,?top_annotation=col_ha,?left_annotation=row_ha,
???????????????????????show_rownames=False,?show_colnames=False,
???????????????????????row_dendrogram=False,?col_dendrogram=False,
???????????????????????row_split=df_row.loc[:,?['Target',?'Group']],
???????????????????????col_split=df_col['Strain'],?cmap='parula',
???????????????????????rasterized=True,?row_split_gap=1,?legend=True,
????????????????????????plot=False,label='beta',
???????????????????????tree_kws={'col_cmap':?'Set1'})??#
cm2?=?ClusterMapPlotter(data=snp,?top_annotation=col_ha,?left_annotation=row_ha,
????????????????????????show_rownames=False,?show_colnames=False,
????????????????????????row_dendrogram=False,?col_dendrogram=False,
????????????????????????col_cluster_method='ward',row_cluster_method='ward',
????????????????????????col_cluster_metric='jaccard',row_cluster_metric='jaccard',
????????????????????????row_split=df_row.loc[:,?['Target',?'Group']],
????????????????????????col_split=df_col['Strain'],
????????????????????????rasterized=True,?row_split_gap=1,?legend=True,
????????????????????????plot=False,cmap='Greys',label='SNP',
????????????????????????tree_kws={'col_cmap':?'Set1'})??#
cmlist=[cm1,cm2]
plt.figure(figsize=(10,12))
composite(cmlist=cmlist,?main=1,legendpad=0,legend_y=0.8)
plt.savefig("beta_snp.pdf",?bbox_inches='tight')
plt.show()

希望這篇文章能對(duì)大家有幫助!掃描文末二維碼或者搜索關(guān)注 Computational Epigenetics 公眾號(hào),我們會(huì)經(jīng)常分享生物信息學(xué)和計(jì)算表觀遺傳學(xué)相關(guān)的文章。
往期精品(點(diǎn)擊圖片直達(dá)文字對(duì)應(yīng)教程)
后臺(tái)回復(fù)“ 生信寶典福利第一波 ”或點(diǎn)擊 閱讀原文 獲取教程合集





























