Ade Malsasa Akbar contact
Senior author, Open Source enthusiast.
Thursday, July 28, 2016 at 11:09


This episode begins to emphasize some basic regex and POSIX character classes examples. We give a simple table of POSIX character classes here so you will practice them easier for this episode. Both of them will make every GNU sed job much more easier. This is the sixth episode, so if you don't want to miss anything, we recommend you to read the first until the fifth episodes. Happy practicing!


POSIX Character Classes


Regex is standardized in POSIX standard. And POSIX has character classes, certain names for certain sets of characters. By using character classes, you avoid using confusing backslashes or slashes, you avoid using too many characters, and also they are easier to understand. These are some of POSIX character classes:

`[:alpha:]` = uppercase and lowercase, a-z and A-Z. Same with [[:upper:][:lower:]] or [a-zA-Z].
`[:alnum:]` = digits 0-9 and uppercase and lowercase. Same with [[:alpha:][:digit:]] or [a-zA-Z0-9].
`[:digit:]` = digits 0-9. Same as [0-9].
`[:punct:]` = punctuation such as ,.:;!?-
`[:upper:]` = uppercase letters. Same as [A-Z].
`[:lower:]` = lowercase letters. Same as [a-z].
`[:space:]` = whitespace characters.
`[:blank:]` = space and tab only.

Text Examples


text14.txt:
text15.txt:

51. Print Only Lines Between Two Patterns


Command Examples:

  1. sed -n '/gnu/,/dunix/p' text15.txt
  2. sed -n '/dunix/,/sunos/Ip' text15.txt

Output Examples:

(1)
master@master:/tmp$ sed -n '/gnu/,/dunix/p' text15.txt

gnu ,.

ULTRIX : ;

1 xenix

2 minix

3 dunix

master@master:/tmp$


(2)

master@master:/tmp$ sed -n '/dunix/,/sunos/Ip' text15.txt

3 dunix

4 5 6 7 8



this is tab

yes, this is tab



HP-UX ; OS X ; SUNOS

master@master:/tmp$

Explanation:

The first command prints only the lines of text between the line containing `gnu` string until the line containing `dunix` line. This is the two pattern spaces addressing for ‘p’ command. The option `-n` makes ‘p’ command to print only the matched results.

The second command does the same, it prints only the lines starting from the line containing “dunix” string until the line containing “sunos” string. But it makes use again the ‘I’ (case-insensitive) so this `sunos` (lowercase) will match the “SUNOS” string (uppercase). Just think about Google Search for example. 

52. Delete Only Comment Lines Between Address Range


Command Examples:

  1. sed '10,19{/\/\*/,/\*\//d}' text14.txt
  2. sed '10,19{/\/\//d}' text14.txt

Output Examples:

(1)

master@master:/tmp$ sed '10,19{/\/\*/,/\*\//d}' text14.txt

/* this is a free software

this software is licensed as GNU General Public License v2

*/



int main()

{

printf("hello\n");





// this is another style of comment line to be deleted



printf("new hello\n");

return 0;

}

master@master:/tmp$



(2)

master@master:/tmp$ sed '10,19{/\/\//d}' text14.txt

* this is a free software

this software is licensed as GNU General Public License v2

*/



int main()

{

printf("hello\n");



/*

this is a new block of comment lines

to be deleted by sed

with the address range and delete commands

*/





printf("new hello\n");

return 0;

}

master@master:/tmp$

Explanation:

This example is basically the same with the example number 41, 42, and 43 in Episode 5. But it is actually a refinement for the specific line range. You must notice the usage of a pair of brackets (`{}`) to surround the regex of ‘d’ command, and notice the `[begin_number],[end_number]` address range. This kind of addressing is extremely useful. The general syntax is

‘[begin_number],[end_number]{[sed_command]}’

one of the benefits of this line addressing is you are allowed to make complex ‘d’ command, for specific lines range.

The first command deletes slash-asterisk style comments only for line number 10 until 19. That’s why the slash-asterisk in the line 1 until 5 is not deleted.

The second command deletes the same range of lines (from 10 until 19) but does it for double slash style commenting.


53. Delete Only Comment Lines Between Two Patterns


Command Examples:

  1. sed '/printf/,/printf/{/\/\*/,/\*\//d}' text14.txt
  2. sed '/^/,/main/{/\/\*/,/\*\//d}' text14.txt

Output Examples:

(1)

master@master:/tmp$ sed '/printf/,/printf/{/\/\*/,/\*\//d}' text14.txt

/* this is a free software

this software licensed as GNU General Public License v2

*/



int main()

{

printf("hello\n");





// this is another style of comment line to be deleted



printf("new hello\n");

return 0;

}

master@master:/tmp$





(2)

master@master:/tmp$ sed '/^/,/int main()/{/\/\*/,/\*\//d}' text14.txt



int main()

{

printf("hello\n");





// this is another style of comment line to be deleted



printf("new hello\n");

return 0;

}

master@master:/tmp$

Explanation:

This is the regex address range demonstration. This is basically the same with the example number 52 except the regex address range. We use //,// before the {} block of command, to do the command inside the {} only for an address specified by //.

The first command deletes slash-asterisk style comments between the very beginning of the first line until sed finds the “int main()” string. Hence, sed deletes only the first block of slash-asterisk comments. It doesn’t delete the second block below the the “int main()”.

The second command does the same except the regex range is between the first “printf” string until the second “printf” string. So, only slash-asterisk comments between two printf strings deleted here. 

54. Edit Only Matched Lines Between Address Range


Command Example:

sed '4,7{s/x/[X]/Ig}' text15.txt

Output Example:

master@master:/tmp$ sed '4,7{s/x/[X]/Ig}' text15.txt

unix ?

bsd !

gnu ,.

ULTRIX : ;

1 [X]eni[X]

2 mini[X]

3 duni[X]

4 5 6 7 8



this is tab

yes, this is tab



HP-UX ; OS X ; SUNOS

master@master:/tmp$

Explanation:

This is exactly the same with the example numbre 51, 52, and 53 except the ‘s’ command. This command does the substitution (‘s’) only for the determined address (from the line 4 until 7). See the result, the letter “x” in the strings “unix”, “ULTRIX”, “HP-UX, ”and “OS X” don’t change. Only the selected lines has changed. 

55. Edit All Uppercase Characters (POSIX Character Class)


Command Example:

sed 's/[[:upper:]]/[X]/g' text15.txt

Output Example:

master@master:/tmp$ sed 's/[[:upper:]]/[X]/g' text15.txt

unix ?

bsd !

gnu ,.

[X][X][X][X][X][X] : ;

1 xenix

2 minix

3 dunix

4 5 6 7 8


this is tab

yes, this is tab



[X][X]-[X][X] ; [X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$

Explanation:

This command makes use of `[:upper:]` character class. So this command edits only the uppercase letters for the whole text. 

56. Edit All Lowercase Characters (POSIX Character Class)


Command Example:

sed 's/[[:lower:]]/[X]/g' text15.txt

Output Example:

master@master:/tmp$ sed 's/[[:lower:]]/[X]/g' text15.txt

[X][X][X][X] ?

[X][X][X] !

[X][X][X] ,.

ULTRIX : ;

1 [X][X][X][X][X]

2 [X][X][X][X][X]

3 [X][X][X][X][X]

4 5 6 7 8


[X][X][X][X] [X][X] [X][X][X]

[X][X][X], [X][X][X][X] [X][X] [X][X][X]



HP-UX ; OS X ; SUNOS

master@master:/tmp$

Explanation:

This command is a reversal for the example number 55. This edits only the lowercase letter for the whole text.

57. Edit All Punctuation Characters (POSIX Character Class)


Command Example: 

sed 's/[[:punct:]]/[X]/g' text15.txt

Output Example: 

master@master:/tmp$ sed 's/[[:punct:]]/[X]/g' text15.txt

unix [X]

bsd [X]

gnu [X][X]

ULTRIX [X] [X]

1 xenix

2 minix

3 dunix

4 5 6 7 8


this is tab

yes[X] this is tab



HP[X]UX [X] OS X [X] SUNOS

master@master:/tmp$ 

Explanation: 

This example makes use of `[:punct:]` character class. So, it edits every punctuations available in the whole text. You can see the output, because it is replacement command, we see `[X]` sequence replacing every single of comma, dot, colon & semicolon, dash, exclamation and question mark.


58. Edit All Uppercase & Lowercase Characters (POSIX Character Class)


Command Example:

sed 's/[[:alpha:]]/[X]/g' text15.txt

Output Example:

master@master:/tmp$ sed 's/[[:alpha:]]/[X]/g' text15.txt
[X][X][X][X] ?
[X][X][X] !
[X][X][X] ,.
[X][X][X][X][X][X] : ;
1 [X][X][X][X][X]
2 [X][X][X][X][X]
3 [X][X][X][X][X]
4 5 6 7 8
[X][X][X][X] [X][X] [X][X][X]
[X][X][X], [X][X][X][X] [X][X] [X][X][X]

[X][X]-[X][X] ; [X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$

Explanation:
This command makes use of `[:alpha:]` character class. It means it edits both the uppercase and lowercase letters. So the remaining characters are digits and punctuation.

59. Edit All Numeric Characters (POSIX Character Class)


Command Example:

sed 's/[[:digit:]]/[X]/g' text15.txt

Output Example:

master@master:/tmp$ sed 's/[[:digit:]]/[X]/g' text15.txt
unix ?
bsd !
gnu ,.
ULTRIX : ;
[X] xenix
[X] minix
[X] dunix
[X] [X] [X] [X] [X]
this is tab
yes, this is tab

HP-UX ; OS X ; SUNOS
master@master:/tmp$

Explanation:

This example makes use of `[:digit:]` character class. It edits only every number of the whole text. 

60. Edit All Space & Tab Characters (POSIX Character Class)


Command Example:

sed 's/[[:blank:]]/[X]/g' text15.txt

Output Example:

master@master:/tmp$ sed 's/[[:blank:]]/[X]/g' text15.txt
unix[X]?
bsd[X]!
gnu[X],.
ULTRIX[X]:[X];
1[X]xenix
2[X]minix
3[X]dunix
4[X]5[X]6[X]7[X]8
[X]
[X]this[X]is[X]tab
[X]yes,[X]this[X]is[X]tab

HP-UX[X];[X]OS[X]X[X];[X]SUNOS
master@master:/tmp$

Explanation:

This example makes use of `[:blank:]` character class. So it edits only the spaces here. Every space character has been changed with `[X]` sequence. 

61. Combine Multiple POSIX Character Classes


Command Example:

sed 's/[[:upper:][:digit:]]/[X]/g' text15.txt

Output Examples:

(1)
master@master:/tmp$ sed 's/[[:upper:][:digit:]]/[X]/g' text15.txt
unix ?
bsd !
gnu ,.
[X][X][X][X][X][X] : ;
[X] xenix
[X] minix
[X] dunix
[X] [X] [X] [X] [X]
this is tab
yes, this is tab

[X][X]-[X][X] ; [X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$

(2)
master@master:/tmp$ sed 's/[[:lower:][:punct:]]/[X]/g' text15.txt
[X][X][X][X] [X]
[X][X][X] [X]
[X][X][X] [X][X]
ULTRIX [X] [X]
1 [X][X][X][X][X]
2 [X][X][X][X][X]
3 [X][X][X][X][X]
4 5 6 7 8
[X][X][X][X] [X][X] [X][X][X]
[X][X][X][X] [X][X][X][X] [X][X] [X][X][X]

HP[X]UX [X] OS X [X] SUNOS
master@master:/tmp$

Explanation:

This example demonstrate how to use more than one character class. You must put them inside a square brackets pair (`[]`). The combination of `[:upper:][:digit:]` will match only uppercase letters and digit characters. The combination of `[:lower:][:punct:]` will match only lowercase letters and punctuation characters.