• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Rob Spoor
  • Devaka Cooray
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Jj Roberts
  • Al Hobbs
  • Piet Souris

[Solved] Help With Regular Expressions/Scraping

 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm trying to extract the text inside of the <td> tags (such as: "West Orange, NJ", "Saint Barnabas Health Care System", and "Manager Field Services North") and the contents of the href attribute from data scraped by a php script. The script itself works, I just don't know how to formulate the expressions.
Here's a sample of HTML that contains the job info:




This is what I've tried $location= '/location (.+?)/'; but it just gives back array(2) { [0]=> string(10) "location j" [1]=> string(1) "j" } j


Here's the scraper too in case you need to see that: curl_scraper.php
Thanks.
 
Brandon Golway
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I got it to output "West Orange, NJ" using this expression: $regex_location= '/<td class=\"location\">(.+?)<\/td>/';

There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
 
Ranch Hand
Posts: 733
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
>There's more data in there since I get array(2) { [0]=> string(41) "West Orange, NJ" [1]=> string(15) "West Orange, NJ" } when I do var_dump($scraped_location_data) but I don't know how to access it.
That is not an unusual return structure of the matches argument. It results from the pattern containing one pair of round brackets for group/backreference. In this case, it is the "(.+?)" part of the pattern. To access it, it is that simple, unless you've other thing in your mind more sophisticated.
 
Brandon Golway
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
got it to work all i was missing was _all from preg_match

 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic