Ruby Scraping
- Christina Williams
- Jul 28, 2020
- 2 min read
One thing

I would remind anyone who is scraping to do is to not forget about the arrays
in some of the HTML. I got stuck at one point and could not figure out what I was scraping and was out-putting extra information that was on the same line as the info I wanted to capture. Then I realized that if I tried to break it down and number each item (array), I would be able to capture only what I needed.
For example:
school_xml.css("div.search-result-fact")[0].text, #acceptance_rate
school_xml.css("div.search-result-fact")[1].text, #cost
The first line above scrapes data for the acceptance rate for the top 25 colleges for theatre arts and the second line scrapes data for the yearly cost. These items appeared next to each other on the same line and when capturing the data for them, the HTML code looks identical. If you do not include the numbers for the array of items on the same line (and don't forget to start with zero), then your outp
ut will include everything on that line that is included in the same HTML code for that particular array. At first, I was not adding the [0] and [1] and I was getting the acceptance rate and cost together on each line of output. After I put the array numbers in, I got only the data I wanted on each line.
I used nokogiri for the scrape gem. I used the google chrome tools -View-Developer - Developer tools.
I then highlighted the information I wanted to scrape, right-clicked and chose to
inspect. I then scraped the code that was left highlighted and placed it in my written project code.
Example of what this looks like:

My final project is available to view on GitHub here:
https://github.com/ChristinaXT/Niche_TS



Comments