awk
准确的说,awk是一种编程语言,所以值得拿出来看看.
我希望通过写这篇日志让自己明白这个命令的用法,如果不幸帮助到别人,我会倍感失落.
此教程来自tutorialspoint-awk
理解awk执行流程
例如:
有文本内容marks.txt:
1 | 1) Amit Physics 80 |
我们要加上标题栏可以使用BEGIN关键字:
1 | awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print} END{printf "end line\n"}' marks.txt |
基本使用方式
- 命令行
1
awk [options] file
- 输入脚本文件例如:command.awk内容:
1
awk [options] -f awk-script-file file ....
使用:1
{print}
1
awk -f command.awk marks.txt
options选项
-v
v代表variable,指在程序执行前赋予变量值.1
2awk -v name=Jerry 'BEGIN{printf "Name = %s\n", name}'
Name = Jerry–dump-variables[=file]
输出全局变量到文件,默认为awkvars.out.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33[jimo@jimo-pc shell]$ awk --dump-variables ''
[jimo@jimo-pc shell]$ cat awkvars.out
ARGC: 1
ARGIND: 0
ARGV: array, 1 elements
BINMODE: 0
CONVFMT: "%.6g"
ENVIRON: array, 60 elements
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: ""
FNR: 0
FPAT: "[^[:space:]]+"
FS: " "
FUNCTAB: array, 41 elements
IGNORECASE: 0
LINT: 0
NF: 0
NR: 0
OFMT: "%.6g"
OFS: " "
ORS: "\n"
PREC: 53
PROCINFO: array, 28 elements
RLENGTH: 0
ROUNDMODE: "N"
RS: "\n"
RSTART: 0
RT: ""
SUBSEP: "\034"
SYMTAB: array, 28 elements
TEXTDOMAIN: "messages"–lint
严格检查语法,将警告当做错误输出1
2
3
4
5[jimo@jimo-pc shell]$ awk '' /bin/ls
[jimo@jimo-pc shell]$ awk --lint '' /bin/ls
awk: 命令行:1: 警告:命令行中程序体为空
awk: 命令行:1: 警告:源文件不以换行符结束
awk: 警告:完全没有程序正文!–profile[=file]
这个选项在文件中生成一个相当漂亮的程序版本,默认文件是awkprof.out。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22[jimo@jimo-pc shell]$ awk --profile 'BEGIN{printf"---|Header|--\n"} {print} END{printf"---|Footer|---\n"}' marks.txt > /dev/null
[jimo@jimo-pc shell]$ cat awkprof.out
gawk 配置, 创建 Thu Dec 21 20:45:30 2017
BEGIN 规则
BEGIN {
1 printf "---|Header|--\n"
}
规则
5 {
5 print $0
}
END 规则
END {
1 printf "---|Footer|---\n"
}基本使用实例
使用文件:1
2
3
4
5
6[jimo@jimo-pc shell]$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89打印列
1
2
3
4
5
6awk '{print $3 "\t" $4}' marks.txt
Physics 80
Maths 90
Biology 87
English 85
History 89打印所有行
默认打印所有行,下面使用了正则匹配,必须含有a才打印1
2
3
4
5
6
7awk '/a/ {print $0}' marks.txt
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
等价于
awk '/a/' marks.txt计数
1
2awk '/a/{++cnt} END {print "Count=",cnt}' marks.txt
Count= 4打印行字符数
1
2
3
4
5
6awk '{print length($0)}' marks.txt
23
23
23
23
23awk内置变量
有一些内置变量需要了解,可以加快编程.ARGC
代表参数个数为什么有5个?看下面1
2awk 'BEGIN {print "Arguments =", ARGC}' One Two Three Four
Arguments = 5ARGV
存储每个参数1
2
3
4
5
6
7
8
9
10awk 'BEGIN {
for (i = 0; i < ARGC; ++i) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
}' one two three four
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = fourCONVFMT
代表数字的转换格式,默认是%.6g1
2awk 'BEGIN {print "Conversion format = ",CONVFMT}'
Conversion format = %.6gENVIRON
一个环境变量的关联数组.1
2awk 'BEGIN {print ENVIRON["user"]}'
什么都没有FILENAME
当前文件名1
2awk 'END {print FILENAME}' marks.txt
marks.txtFS
field separator,输入域的分隔符,默认是空格.可以使用-F修改.1
2
3
4
5[jimo@jimo-pc shell]$ awk 'BEGIN {print "FS=" FS}' | cat -vte
FS= $
[jimo@jimo-pc shell]$ awk -F "," 'BEGIN {print "FS=" FS}'
FS=,NF
number of field,只是某一行的域个数.1
2echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NF > 3'
One Two Three FourNR
number of record,当前记录的个数.1
2
3echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NR < 3'
One Two
One Two ThreeFNR
它与NR类似,但相对于当前文件。AWK在多个文件上运行时很有用。FNR的值与新文件重置。OFMT
输出数字格式,默认%.6g1
2awk 'BEGIN {print "OFMT = " OFMT}'
OFMT = %.6gOFS
output field separator,行内输出分隔符,默认也是空格.1
2awk 'BEGIN {print "OFS = " OFS}' | cat -vte
OFS = $ORS
output record separator,输出记录(行)分隔符,或则行间分隔符,默认是\n1
2
3awk 'BEGIN {print "ORS = " ORS}' | cat -vte
ORS = $RS
input record separator,输入记录(行)分隔符,或则行间分隔符,默认是\n1
2
3awk 'BEGIN {print "RS = " RS}' | cat -vte
RS = $RLENGTH
代表match函数匹配的字符串长度.1
2awk 'BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }'
2RSTART
由match匹配的字符串的第一个位置.1
2awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }'
9SUBSEP
数组下标分隔符,默认是\0341
2awk 'BEGIN { print "SUBSEP = " SUBSEP }' | cat -vte
SUBSEP = ^\$$0
代表整个输入记录1
2
3
4
5
6awk '{print $0}' marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89$n
代表第n个被FS分割的单元1
2
3
4
5
6awk '{print $3 "\t" $4}' marks.txt
Physics 80
Maths 90
Biology 87
English 85
History 89GNU AWK的特定变量
ERRNO
一个错误字符串1
2awk 'BEGIN { ret = getline < "junk.txt"; if (ret == -1) print "Error:", ERRNO }'
Error: 没有那个文件或目录IGNORECASE
忽略大小写1
2awk 'BEGIN{IGNORECASE = 1} /amit/' marks.txt
1) Amit Physics 80LINT
1
2
3awk 'BEGIN {LINT = 1; a}'
awk: 命令行:1: 警告:引用未初始化的变量“a”
awk: 命令行:1: 警告:statement has no effectPROCINFO
打印与该进程相关的信息,如UID, PID等.1
2awk 'BEGIN { print PROCINFO["pid"] }'
14206TEXTDOMAIN
它代表了AWK程序的文本域,它用于查找程序字符串的本地化翻译。1
2awk 'BEGIN { print TEXTDOMAIN }'
messagesawk运算符
和程序语言一样的运算符.
算术运算
+-*/%
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { a = 50; b = 20; print "(a + b) = ", (a + b) }' |
++,–
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { a = 10; b = ++a; printf "a = %d, b = %d\n", a, b }' |
简写:
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { cnt = 2; cnt **= 4; print "Counter =", cnt }' |
比较运算符:>,<,==,!=,>=,<=
逻辑运算符:&&,||,!
三元运算符:condition expression ? statement1 : statement2
一元运算符:
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { a = -10; a = +a; print "a =", a }' |
指数运算:
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { a = 10; a = a ^ 2; print "a =", a }' |
字符串连接符:有点区别,这里是空格
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { str1 = "Hello,"; str2 = "World"; str3 = str1 str2; print str3 }' |
in运算符:
1 | [jimo@jimo-pc shell]$ awk 'BEGIN { |
正则运算符:这里不是指正则运算的,而是匹配和不匹配:
1 | 含有9的行 |
awk split函数
按某个分隔符分割再打印:
1 | # echo 1+2+3 | awk '{split($0,a,"+");print a[2]}' |