python进行信息匹配

news/2024/7/8 2:24:18 标签: python

最近需要根据样本编号比对信息,故写了脚本进行处理,满足日常的匹配需求,初步编写的脚本如下:

python"># —*—coding:utf-8_*_
# date: 2020-05-04

import xlrd
import csv
import argparse,os,io
def pre_prepration(cur_path,sample_list):
    sample_list_file = open("%s/%s"%(cur_path,args.list),"r")
    for s in sample_list_file:
        sample_list.append(s.strip())

def match_tumor_con(cur_path,sample_list):
    tumor_con_dict = {}
    # write for txt formate
    # out_txt = open(r"D:\PycharmProjects\Tumor\tumor_con.txt","w")
    out_txt = io.open("%s/%s.txt"%(cur_path,args.outfile),"w",encoding="utf-8")
    out_txt.write("SampleID\tTumor_Con\n")
    # write for csv formate
    out_csv = open("%s/%s.csv"%(cur_path,args.outfile),"w",newline="",encoding="utf-8")
    # out_csv = io.open("%s/%s.csv"%(cur_path,args.outfile),"w",encoding="utf-8")
    header = ["SampleID","Tumor_Con"]
    csv_writer = csv.DictWriter(out_csv,fieldnames=header)
    csv_writer.writeheader()
    # read xlsx file
    xls_file = xlrd.open_workbook("%s/%s"%(cur_path,args.excel),"r")
    for i in range(2):
        data = xls_file.sheet_by_index(i)
        n_cols = data.ncols
        n_rows = data.nrows
        for m in range(1,n_rows):
            c_type = data.cell(m,0).ctype
            if c_type == 2 and data.cell(m,0).value % 1 == 0:
                tumor_con_dict[int(data.cell(m,0).value)] = data.cell(m,4).value
                if data.cell(m,4).value == "":
                    out_txt.write(str(int(data.cell(m,0).value)) + "\t" + "-"  + "\n")
                    csv_writer.writerow({"SampleID":int(data.cell(m,0).value),"Tumor_Con":"-"})
                else:
                    out_txt.write(str(int(data.cell(m,0).value)) + "\t" + str(data.cell(m,4).value) + "\n")
                    csv_writer.writerow({"SampleID":data.cell(m,0).value,"Tumor_Con":data.cell(m,4).value})
            else:
                tumor_con_dict[data.cell(m, 0).value] = data.cell(m, 4).value
                if data.cell(m, 4).value == "":
                    out_txt.write(str(data.cell(m, 0).value) + "\t" + "-" + "\n")
                    csv_writer.writerow({"SampleID": data.cell(m, 0).value, "Tumor_Con": "-"})
                else:
                    out_txt.write(str(data.cell(m, 0).value) + "\t" + str(data.cell(m, 4).value) + "\n")
                    csv_writer.writerow({"SampleID": data.cell(m, 0).value, "Tumor_Con": data.cell(m, 4).value})

    with open("%s/%s_match.txt"%(cur_path,args.outfile),"w") as match_file:
        match_file.write("Lib_ID\tTumor_Con\n")
        for sample in list(set(sample_list)):
            if sample[:9] in tumor_con_dict:
                match_file.write(sample + "\t" + tumor_con_dict[sample[:9]] + "\n")
    match_file.close()

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("-e","--excel",type=str,help="input file formate like .xls or .xlsx")
    parser.add_argument("-l","--list",type=str,help="the sample list of matching the tumor concentration")
    parser.add_argument("-o","--outfile",type=str,help="output file name")
    args = parser.parse_args()
    sample_list = []
    cur_path = os.getcwd()
    pre_prepration(cur_path,sample_list)
    match_tumor_con(cur_path,sample_list)

 


http://www.niftyadmin.cn/n/1721984.html

相关文章

数据可视化--表格融合练习

数据可视化--表格融合练习pd.merge()函数说明代码演练参考书籍pd.merge()函数说明 使用共有列作为两个数据框数据融合的依据,主要使用pd.merge()函数: 参数说明: left: 传递左表数据right: 传递右表数据how: 数据融合方式 left:保留左表的数…

练习系列:Python字典:一键对应多值

需求: 遍历文本文件,生成一键对应多值的字典,如下所示: 文本文件内容("\t"分割字符串): “”" A 1 A 2 A 3 B c B d C 4 C 5 C e “”" 目标生成文件格式: target_dict {“A”:[1,2,3…

从分析结果中根据list提取突变信息

# _*_coding:utf-8_*_ # author: 稻田工作者 # date: 2020-06-13"""根据原始样本对应的突变信息从数据分析文件中提取检出结果,如: 原始样本LC-BR3对应的突变信息如下: NM_000245.2:exon14_intron14:c.3028_302816del17:p.? …

bwa mem 报错处理:[mem_sam_pe] paired reads have different names

背景: 从samtools sort 默认排序后的bam文件中提取fastq序列并对其格式化,对格式化后的fastq文件重新比对到参考基因组,报错如下:”[mem_sam_pe] paired reads have different names: “A00575:297:HWHKYDMXX:1:1331:22372:31814…

python windows环境下批量修改文件的创建时间、访问时间、修改时间

引用:https://blog.csdn.net/dengnihuilaiwpl/article/details/86551720 常见的修改文件时间有两种方式: 方式一:修改访问和修改时间; 使用utime函数 方法二:修改创建时间 使用pywin32file库 以下代码可以实现两种…

pip更新时出错pip install --upgrade pip

pip更新时出错pip install --upgrade pip 今天在配置电脑的时候安装anaconda后,在用pip下载包时,照例pip install --upgrade pip更新pip却出现如下错误 这是一个下载解压工具结果安装包是压缩包的烦恼呀。 看论坛里大家众说纷纭,挨个试了试…

pip安装包时ImportError: cannot import name 'main'

使用标准的 pip install packge时出现的错误 解决办法: python -m pip install packge如: python -m pip install numpy

Ubuntu18.04安装NVIDIA 显卡驱动(GTX 1060)

在ubuntu安装NVIDIA驱动是安装Tensorflow中的第一步,比较容易出乱子。指不定哪错了就把ubuntu系统整崩了(卡在开机界面,无法进入图形界面等等)。这里将我自己在ubuntu18.04安装NVIDIA显卡驱动的过程记录下来。给遇到情形跟我一样的…