Ade Malsasa Akbar contact
Senior author, Open Source enthusiast.
Thursday, May 31, 2018 at 23:40


If you have an XML sitemap file of a website, you can read it up on LibreOffice Calc easily. For example, you want to display the data for only the list of URLs. Before I tried it, I didn't realize how easy to do it. Here's how:




Preparation


Prepare a sitemap.xml of a website. For example, you can create a sitemap.xml of http://ubuntubuzz.com. In this tutorial, I use my Indonesia blog https://kursusteknoplasma.wordpress.com as example to extract from it a table of contents (list of URLs) only for the section Transcript Download. You can create it at no cost from online service like https://www.xml-sitemaps.com.

Do It


1) Open your LibreOffice Calc.

2) Go to main menu Data > XML Source.

3) A dialog appears > press Open Folder button there > open the sitemap.xml you prepared.

4) The dialog will show you the summary of the content of the sitemap.xml. In this example, select ns0:url > select first cell of your sheet > the box of Mapped cell: will automatically show the address of the cell > press Import button.

Importing the XML sitemap

5) The result is the data from sitemap.xml is added to your sheet. Finally, that is the table of contents (i.e. list of all URLs) of the website!

Final Result

Note: if you find your Data > XML Source is greyed (disabled), perhaps you are using LibreOffice 6.0, and to enable it just go to menu Tools > Options > Advanced > give check mark to Enable experimental features > OK > accept Restart LibreOffice.

Processing The Data


If you have the final result above, you have all, now you can process the data as you wish. For example, you can show only the URL of certain articles, or certain pages, or certain tags, simply by using Find & Replace. To demonstrate this, let's take the 'unduh-transkrip' sections only from the bunch of URLs available.

1) Press Ctrl+H > enter the keyword unduh-transkrip > press Find All > a new dialog appears showing selected cells > OK.

2) Now the targeted URLs are all selected > press Ctrl+C.

3) Create new sheet.

4) Press Ctrl+V.

5) Now you have a list of URLs under 'unduh-transkrip' keyword.

6) Select all URLs from the new sheet > menu Data > Sort Ascending > now all URLs sorted.

Processed data, sorted, finished

By this, for example, now we have all URLs (table of contents) of the website being processed. We can even process further to make these URLs all live (as hyperlinks) by moving them to LibreOffice Writer.

1) Select all cells of URLs from the sheet > press Ctrl+C.

2) Go to LibreOffice Writer > right-click > Paste Special > Unformatted Text > all URLs copied.

3) Select all URLs > menu Tools > AutoCorrect > Apply > now all URLs are live and clickable. Finish.

Finally, a table of contents of my own blog under 'transcript download' section




This article is licensed under CC BY-SA 3.0.