PHP Classes

Sorcerer: Scrape Web page content using regular expressions

Recommend this page to a friend!
  Info   View files Documentation   View files View files (6)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 141 This week: 1All time: 9,190 This week: 560Up
Version License PHP version Categories
sorcerer 1.0.0MIT/X Consortium ...5PHP 5, Web services
Description 

Author

This class can scrape Web page content using regular expressions,

It takes a given page URL and retrieves its contents.

The class can use a given list of regular expressions and extract the page content matches to a given file.

Picture of Gavin Gordon Markowski
  Performance   Level  
Name: Gavin Gordon Markowski <contact>
Classes: 13 packages by
Country: Canada Canada
Age: 36
All time rank: 178641 in Canada Canada
Week rank: 416 Up11 in Canada Canada Up
Innovation award
Innovation award
Nominee: 5x

Documentation

Sorcerer

Packagist Version Github Release Usage License

Description

An easy-to-use PHP class for scraping webpages' source code.

Usage

Installation

	$ composer require gavinggordon/sorcerer

Examples

Insantiation

	include( 'vendor/autoload.php' );

	use GGG\Http\Data\Collection\Sorcerer as Sorcerer;
	
	$scraper = new Sorcerer();

Configuration

	$url = 'http://www.testurl.com/index.php';
	
	$regexes = [
		'/\<a\s?[^\>]+?\>(.+)\<\/a\>/i',
		'/\<img\s?([^\>]+?)[\s\/]*?\>/i'
	];
	
	$savefile = __DIR__ . './testurl-scrapedata.txt';
	
	$scraper->configure( $url, $regexes, $savefile );

Run

If no filepath was set for "$savefile",...

	$data = $scraper->scrape();
	
	print_r( $data );

...the scraped data will be returned.

If a filepath was set for "$savefile",...

	$scraper->scrape();

...the scraped data will be saved to the file which you specified.

Issues

If you have any issues at all, please post your findings in the issues page at https://github.com/gavinggordon/sorcerer/issues.

License

This package utilizes the MIT License.


  Files folder image Files  
File Role Description
Files folder imagesrc (1 directory)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.txt Doc. Documentation
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
Files folder imageHttp (1 directory)

  Files folder image Files  /  src  /  Http  
File Role Description
Files folder imageData (1 directory)

  Files folder image Files  /  src  /  Http  /  Data  
File Role Description
Files folder imageCollection (1 file)

  Files folder image Files  /  src  /  Http  /  Data  /  Collection  
File Role Description
  Plain text file Sorcerer.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:141
This week:1
All time:9,190
This week:560Up