【实用】精简Commands（更新）

## 1、处理blast_m6_result.txt
提取blast_m6的比对结果中subject序列名所在fasta中的的序列到单个文件中（并将序列名作为文件名）
```shell
for i in `cat blast_m6_result.txt | awk '{print $2}'`;do sed -n "/$i/,/^>/p" species.fasta | sed '$d' >$i.fa; done
```

##2、sort的简洁用法
```
awk -F '_' '{split($3,x,"");print $1,$2,x[1],x[2]}' File.txt | sort -k 1,2 -k 4 |awk '{print $1"_"$2"_"$3$4}' >output.txt
```
>File.txt    output.txt
A_1_F1      A_1_F1
A_1_R1      A_1_R1
A_3_R2      A_3_F2
A_3_F2      A_3_R2 
TB_9_R1     TB_9_F1
TB_9_F1     TB_9_R1

## 3、vcf文件中提取信息
```
grep -v '^#' file.vcf | grep -v 'INDEL' | cut -f 1,2,4,5,8 | awk '{OFS="\t";split($5,x,";");split(x[1],y,"=");print $1,$2,$3,$4,y[2]}' >new.txt
```
> result: chr pos REF ALT 6(DP)

## 4、文件如下：
sdfsadfasfj;sdaf [asdfasfddsfasdfjkjksldf [fdas : asdf
只去掉符号“[”前面的一个空格
`awk -F ' \\\[' '{if(NR%2==1){print $1"["$2}else{print $0}}' file`
或者
`awk -F '[[:space:]]\\\[' '{if(NR%2==1){print $1"["$2}else{print $0}}' file`
> 解释：-----> 第一轮转义为\[，然后第二次转义为[
同理，如果文件中[变成\，我们还是只要去 掉空格不要去掉\的话，那就应该用 -F ' \\\\' 或 -F '[[:space:]]\\\\'

或者使用perl的命令行
```
perl -e 'while(<>){$a=1;if($a%2){@a=split(/ \[/,$_);print $a[0],"[",$a[1]}else{print $_};$a+=1;}' new
```
简写：
`perl -ne '{$a=1;if($a%2){@a=split(/ \[/,$_);print $a[0],"[",$a[1]}else{print $_};$a+=1;}' new`

## 5、两个文件分别只有一列数据，如下
69091690926909369094
两个文件之间会有相同的行（即有交集），如何提取相同的部分？
`perl -lne 'chomp;if(exists($h{$_})){print $_;}else{$h{$_}=1;}' file1 file2 >>same.txt`
注意：在每个文件中不允许有重复的数据，否则会导致得到的交集结果偏多。
脚本实现：
```python
#!/usr/bin/python
f3=open("same.txt",'w')
dict={}
for line in open("us_old_chr1"):
    a=line.strip()
    dict[a]=1
#for k in dict:
#    print k
for line in open("dbNSFP_chr1"):
    b=line.strip()
    if dict.has_key(b):
        f3.write(b)
        f3.write("\n")
f3.close()
```
## 6、已经有了chr1.sh脚本，要复制并修改里面的chr信息为对应的chr"$i".sh脚本中：
`for i in $(seq 2 29);do cp chr1.sh chr"$i".sh2;sed "s/chr1/chr${i}/g" chr"$i".sh>tmp;cp tmp chr"$i".sh;donerm *.sh2 tmp`

## 7、获取vcf文件中的DP=\d+数据。
`grep -v '^#' GS90000355-FS3-L01L02L03L04.samtools.non_N.non_poly.non_indel.d2Q20.vcf|gawk '{if(match($8,/DP=[0-9]+/,m)) print m[0]}'|le`

## 8、批量替换
1）文件内多处替换： vim file
通过命令： `:%s/old/new/g`
2）多个文件都要替换：vim
通过命令：  `:args *.list` 打开所有list文件，然后 `:argdo %s/old/new/g |update` 即可完成 （由于是直接写入的，所以要注意备份，以防出错）

## 9、批量创建文件夹，写入脚步到文件中，并将文件移到文件夹中 （参考：/ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/biosoft/ucsc_liftOver/psl_test/psl_3）
```
for i in $(seq 300 399);do mkdir psl_"$i";echo "for i in /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/biosoft/ucsc_liftOver/test1/new_chuck$i*.fa;do /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/biosoft/ucsc_liftOver/blat /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/goat_pro/00.ref.v1/goat.2bit \$i \`basename \$i .fa\`.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap;done"> blat_work"$i".sh;mv blat_work"$i".sh psl_"$i";done
```
生成文件夹psl_300，里面有一个blat_work300.sh脚本，内容包含
```
for i in /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/biosoft/ucsc_liftOver/test1/new_chuck300*.fa;do /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/biosoft/ucsc_liftOver/blat /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/goat_pro/00.ref.v1/goat.2bit $i `basename $i .fa`.psl-tileSize=12 -minScore=100 -minIdentity=98 -fastMap;done
```
在通过一句命令统统运行：
`for i in $(seq 300 399);do cd psl_"$i";q4 blat_work"$i".sh;cd ..;done`

## 关于编号问题
for((i=1;i<=111;i++)); do a=$((1000+$i)); echo 537${a:1}; done
## 上面生成537000-537111编号
10、查看那些命令成功在大型机上运行（qw看到都是51780开头的）
`for i in $(seq 0 22);do a=$(($i+5178000));qstatj $a|grep 'job_name';done`
得到如下：
```
。。。省略。。。
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr14.sh
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr15.sh
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr16.sh
Following jobs do not exist:
5178003
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr18.sh
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr19.sh
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr20.sh
job_name:                   zebra_551FH07181N2_L01.pre_NEW_Vchr21.sh
```
另一个更霸气的：
`qw | sed '1,2d' |awk '{print $1}'| for i in `cat -`;do qstat -j $i|grep 'job_name';done`
`qw | grep '^[ 0-9]'|awk '{print $1}'| for i in `cat -`;do qstat -j $i|grep 'cwd';done`

## 11、Linux下批量删除空文件（大小等于0的文件）的方法
`find . -name "*" -type f -size 0c | xargs -n 1 rm -f`

## 12、SNP后期过滤（gatk得到的）
```
0) grep '^#' NEW_Vchr9.raw.snps.indels.vcf >chr_all.raw.snps.indels.vcf;for i in $(seq 1 35);do grep -v '^#' NEW_Vchr"$i".raw.snps.indels.vcf >>chr_all.raw.snps.indels.vcf;done
1) non_N: grep '^#' chr_all.raw.snps.indels.vcf >chr_all.raw.snps.indels.non_N.vcf; grep -v '^#' chr_all.raw.snps.indels.vcf  | awk '{if($4!~/N/){print $0}}' >>chr_all.raw.snps.indels.non_N.vcf
2) non_poly: grep '^#' chr_all.raw.snps.indels.non_N.vcf >chr_all.raw.snps.indels.non_N.non_poly.vcf; grep -v '^#' chr_all.raw.snps.indels.non_N.vcf | awk '{if($5!~/,/){print $0}}' >>chr_all.raw.snps.indels.non_N.non_poly.vcf
3) non_indel: grep '^#' chr_all.raw.snps.indels.non_N.non_poly.vcf>chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf; grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.vcf  |awk '{if(length($4)==1 && length($5)==1){print $0}}' >>chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf
4) filter_d2Q10: perl /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/Scripts/vcfutils.pl varFilter -d 2 -Q 10 chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf >chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q10.vcf
grep '^#' NEW_Vchr9.recalibrated.vcf >chr_all.recalibrated.vcf;for i in $(seq 1 35);do grep -v '^#' NEW_Vchr"$i".recalibrated.vcf >>chr_all.recalibrated.vcf;done
5) PASS: grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.PASS.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/PASS/){print $0}}' >>chr_all.recalibrated.PASS.vcf
6) VQSR: grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.VQSR.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/VQSR/){print $0}}' >>chr_all.recalibrated.VQSR.vcf
7) LowQual: grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.LowQual.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/LowQual/){print $0}}' >>chr_all.recalibrated.LowQual.vcf
放到shell脚本中：SNP_STAT.sh
grep '^#' chromosome01.raw.snps.indels.vcf >chr_all.raw.snps.indels.vcf;for i in $(seq 1 9);do grep -v '^#' chromosome0"$i".raw.snps.indels.vcf >>chr_all.raw.snps.indels.vcf;done;for i in $(seq 0 2);do grep -v '^#' chromosome1"$i".raw.snps.indels.vcf >>chr_all.raw.snps.indels.vcf;done
grep '^#' chr_all.raw.snps.indels.vcf >chr_all.raw.snps.indels.non_N.vcf; grep -v '^#' chr_all.raw.snps.indels.vcf  | awk '{if($4!~/N/){print $0}}' >>chr_all.raw.snps.indels.non_N.vcf
grep '^#' chr_all.raw.snps.indels.non_N.vcf >chr_all.raw.snps.indels.non_N.non_poly.vcf; grep -v '^#' chr_all.raw.snps.indels.non_N.vcf | awk '{if($5!~/,/){print $0}}' >>chr_all.raw.snps.indels.non_N.non_poly.vcf
grep '^#' chr_all.raw.snps.indels.non_N.non_poly.vcf>chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf; grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.vcf  |awk '{if(length($4)==1 && length($5)==1){print $0}}' >>chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf
nohup perl /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/Scripts/vcfutils.pl varFilter -d 2 -Q 10 chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf >chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q10.vcf &
nohup perl /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/Scripts/vcfutils.pl varFilter -d 2 -Q 20 chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf >chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q20.vcf &
nohup perl /ifs1/ST_BREED/PMO/F13ZOOYJSY1418/lvlaihui/project/Scripts/vcfutils.pl varFilter -d 2 -Q 30 chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf >chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q30.vcf &
grep '^#' chromosome01.recalibrated.vcf >chr_all.recalibrated.vcf;for i in $(seq 1 9);do grep -v '^#' chromosome0"$i".recalibrated.vcf >>chr_all.recalibrated.vcf;done;for i in $(seq 0 2);do grep -v '^#' chromosome1"$i".recalibrated.vcf >> chr_all.recalibrated.vcf;done
grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.PASS.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/PASS/){print $0}}' >>chr_all.recalibrated.PASS.vcf
grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.VQSR.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/VQSR/){print $0}}' >>chr_all.recalibrated.VQSR.vcf
grep '^#' chr_all.recalibrated.vcf >chr_all.recalibrated.LowQual.vcf; grep -v '^#' chr_all.recalibrated.vcf | awk '{if($7~/LowQual/){print $0}}' >>chr_all.recalibrated.LowQual.vcf

echo "total:" >SNP_STAT.stat;
grep -v '^#' chr_all.raw.snps.indels.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.non_indel.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q10.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q20.vcf |wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.raw.snps.indels.non_N.non_poly.non_indel.d2Q30.vcf |wc -l >>SNP_STAT.stat
echo "recalibrated:" >>SNP_STAT.stat; 
grep -v '^#' chr_all.recalibrated.vcf|wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.recalibrated.PASS.vcf|wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.recalibrated.VQSR.vcf|wc -l >>SNP_STAT.stat
grep -v '^#' chr_all.recalibrated.LowQual.vcf|wc -l >>SNP_STAT.stat
```

## 13、合并两文件
```
File1内容如下：
a aa
a bb
b vv
c nn
c nn
File2内容如下：
a +
b -
c +
```
解决方法：（也可以用join命令： join file1 file2）
```
perl -lane' BEGIN { $x=shift; %h = map +split, <>; print %h;@ARGV=$x } print $_, " : ", $h{$F[0]} if $h{$F[0]}' file1 file2
```
得到结果：
 a aa : +
 a bb : +
 b vv : -
 c nn : +
 c nn : +
 
## 14、批量修改文件名
```
for i in `ls` ; do NEW=`echo $i |sed 's/_1//'`;mv $i $NEW ; done
```
将所有文件带有"_1"的去掉了。

## 15、提取fasta文件中连续为Ns的区间
```
perl -ne 'chomp;if( />(.*)/){$head = $1; $i=0; next};@a=split("",$_); foreach(@a){$i++;if($_ eq "N" && $s ==0 ){print "$head\t$i"; $s =1}elsif($s==1 && $_ ne "N"){print "\t$i\n";$s=0}}' /dpan/humandb/GRCh38.fasta >GRCh38.Ns.bed
```

## 16、perl依据命令按染色体拆分bed文件
```
perl -ane 'open $out{$F[0]}, ">", $F[0].".bed" unless $out{$F[0]}; print { $out{$F[0]} } $_;' hg19WES_AgilentV6_IDTxGen_iGeneTechV1V4_Twist_BGI_clinvar20200803_hgmd201901.v20210514.bed
```

## 17、bam转cram，默认创建索引
```shell
samp=11745_1; samtools view -CS -T /data2/Database/b37/human_g1k_v37.fasta -@ 10 --write-index ${samp}.bam -o ${samp}.cram && mv ${samp}.bam ${samp}-1.bam && mv ${samp}.bam.bai ${samp}-1.bam.bai
```

## 18、针对所有列判断tag=和Id=开头的内容输出
```shell
awk -F',' '{
    tag=""; id=""
    for(i=1;i<=NF;i++){
        if($i ~ /^tag=/) tag=$i
        if($i ~ /^Id=/)  id=$i
    }
    if(tag!="" && id!="") print tag, id     # 都找到才打印
}' file.csv
```

## 19、查看某列中拆分后tag=开头的字段，如NCBI GFF文件中RefSeq转录本
```shell
zcat GCF_000001405.25_GRCh37.p13_genomic.gff.gz|grep "ID=rna-"|grep "tag=RefSeq "|awk -v FS="\t" '{split($9, x, /;/); for (i = 1; i in x; i++){if (x[i] ~ /^tag=/) {print x[1], x[i];break}};}'
```