Ade Malsasa Akbar contact
Senior author, Open Source enthusiast.
Thursday, September 24, 2015 at 11:41

This article is intended as a simple introduction to regular expression. I will show it in search & replace feature of a text editor. I will use Kate here, but it is applicable to another text editors available such as Gedit. I will show you some short examples to make it easy. It is not the best, but I hope it gives you global overview of regex in Linux text editors.


My Text



This is the text being edited. This is basically just a table of contents I create in LibreOffice. It contains many lines, every line with beginning numbers -> subtitle -> dots -> then page number. We will "play" with them. I will use this text as example. You may download this text file in http://pastebin.com/Z6SpwWyn.


Enable Regex Searching


In Kate, press Ctrl+R to open Search & Replace facility. Then select Regular expression from Mode selection.



Find The Page Numbers


[0-9]*$

Because every page number contains number, we use [0-9] regex here. Because every page number may contain one or two or three number, we use * regex here. Because the page number is always in the end of line, we use $ regex here. So, complete regex will be \s[0-9]*$. See picture below.



Find The Dots


\.\.\.*

The dots on the center of every line have no same exact number. There is many dots, and every line has different number of it. But at least a line has three continuous dots so we use \.\.\. regex and we add * regex in the end. We use three dots to distinguish it with dot in the beginning of every line. A single dot ( . ) has special meaning in regex, so we don't use it. We use instead a single slash single dot ( \. ) as escape character, as a normal dot matching with real dot in text. See picture below.



Find The Beginning Numbers


^[0-9]\.[0-9]?[0-9]\.

Our beginning numbers in each line has format X.X. (notice the trailing dot) or X.XX. (notice the trailing dot too). So, the regex will be [0-9] for first one single number. Then, we add \. regex as replacement of dot (real dot in text) in the middle. Then, we add again [0-9][0-9] but we add ? regex in the middle of both so it is [0-9]?[0-9]. Regex question mark ( ? ) makes previous character optional, so it makes 1 (single number) or 11 (double number) match. We add again \. in the end so it will match real dot in text you've noticed above. Finally, we add ^ regex to match this regex in the exactly beginning of every line. So the final regex will be ^[0-9]\.[0-9]?[0-9]\.. See picture below.



Reference