Scraping content from the web is getting easier and easier. Sometimes for the data to be useful you need to break the data up into multiple lines. Today’s post explores how you can split column data by Carriage Return to separate the cell contents into different rows.
Recently I tried retrieving the share prices of an investment fund from a website. Power Query imported all the share prices neatly. Yet all the data resides in a single cell, separated by carriage returns.
The share prices are available on a public website. To get them into Power Query
- Open Power Query -> Select new source -> from Web
A pop up will open that allows you to fill in a website.
- Fill in http://www.behr.nl/Beurs/Slotkoersen/slotkoersen.php?fd=nn.energ.f
- Press OK
You end up at below screen.
- Select the Table called “Document” -> press OK
Power Query now imports the above line. By itself the line is not very useful. Yet when you look closer, you see that the column ‘Data’ includes a Table object. You can use this Table Object to navigate to the part of the table containing the share prices. You can find more information on Table Objects here.
It may take some trial and error to find the right table. Perform the following steps:
- Click the word Table in the column ‘Data’
- Click the word Table in the column ‘Children’
- Click the word Table in the column ‘Children’ in the Row containing the name Header (row 2)
- Keep clicking the word Table in the most upper row in the column called ‘Children’
The result is the table below.
Split Column by Carriage Return
At first sight, the results look pleasing. The Column named Text contains all the share prices we requested. Yet all the data resides in a single row. So let’s try to split the cell contents into rows.
- In the ribbon select Split Column -> By Delimiter
Below menu appears.
- As delimiter, select –Custom–
- Select Each occurrence of the delimiter
- Split into Rows
- Select Split into special characters
- From the drop down menu make sure to select Carriage Return
As result the data is now split into rows.
As the earlier drop down showed, you split your cell contents by other special characters. Below you find the list of special characters followed by their M-code equivalents.
- Tab: #(tab)
- Carriage Return: #(cr)
- Line Feed: #(lf)
- Carriage Return and Line Feed: #(cr)#(lf)
- Non-Breaking Space: #(00A0)
Most people are aware they can split a cell into columns. But many don’t know you can split cells into rows as well. This post showed how you can split cell contents into rows.
I hope you liked this post. If you did, I would love it if you shared it with your network. And please leave a comment below with questions or suggestions.