Email Crawler Script PHP MySQL

This script is useful for crawling the emails from the website in recursive manner. this is really very easy class to call and starting crawling emails.

Note: I found it somewhere on internet. I put it here for my future reference. It also useful to people who wants this type of script.

emailcrawler.php

<?php
/*
Written by: Aziz S. Hussain 
Email: azizsaleh@gmail.com 
Website: www.azizsaleh.com 
Produced under GPL License 
*/ 
/*****/ 
Email address scraper based on a URL.
*/
class scraper
{
	// URL that stores first URL to start
	var $startURL; 

	// List of allowed page extensions
	var $allowedExtensions = array('.css','.xml','.rss','.ico','.js','.gif','.jpg','.jpeg','.png','.bmp','.wmv','.avi','.mp3','.flash','.swf','.css'); 

	// Which URL to scrape
	var $useURL; 

	// Start path, for links that are relative
	var $startPath; 

	// Set start path
	function setStartPath($path = NULL){
		if($path != NULL)
		{
			$this->startPath = $path;
		} else {
			$temp = explode('/',$this->startURL);
			$this->startPath = $temp[0].'//'.$temp[2];
		}
	} 

	// Add the start URL
	function startURL($theURL){
		// Set start URL
		$this->startURL = $theURL;
	} 

	// Function to get URL contents
	function getContents($url)
	{
		$ch = curl_init(); // initialize curl handle
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_VERBOSE, 0);
		curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
		curl_setopt($ch, CURLOPT_AUTOREFERER, false);
		curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,7);
		curl_setopt($ch, CURLOPT_REFERER, 'http://'.$this->useURL);
		curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
		curl_setopt($ch, CURLOPT_FAILONERROR, 1);
		curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
		curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
		curl_setopt($ch, CURLOPT_TIMEOUT, 50); // times out after 50s
		curl_setopt($ch, CURLOPT_POST, 0); // set POST method
		$buffer = curl_exec($ch); // run the whole process
		curl_close($ch);
		return $buffer;
	}

	// Actually do the URLS
	function startScraping()
	{
		// Get page content
		$pageContent = $this->getContents($this->startURL);
		echo 'Scraping URL: '.$this->startURL.PHP_EOL; 

		// Get list of all emails on page
		preg_match_all('/([\w+\.]*\w+@[\w+\.]*\w+[\w+\-\w+]*\.\w+)/is',$pageContent,$results);
		// Add the email to the email list array
		$insertCount=0;
		foreach($results[1] as $curEmail)
		{
			$insert = mysql_query("INSERT INTO `emaillist` (`emailadd`) VALUES ('$curEmail')");
			if($insert){$insertCount++;}
		} 

		echo 'Emails found: '.number_format($insertCount).PHP_EOL; 

		// Mark the page done
		$insert = mysql_query("INSERT INTO `finishedurls` (`urlname`) VALUES ('".$this->startURL."')"); 

		// Get list of new page URLS is emails were found on previous page
		preg_match_all('/href="([^"]+)"/Umis',$pageContent,$results);
		$currentList = $this->cleanListURLs($results[1]); 

		$insertURLCount=0;
		// Add the list to the array
		foreach($currentList as $curURL)
		{
			$insert = mysql_query("INSERT INTO `workingurls` (`urlname`) VALUES ('$curURL')");
			if($insert){$insertURLCount++;}
		} 

		echo 'URLs found: '.number_format($insertURLCount).PHP_EOL;
		$getURL = mysql_fetch_assoc(mysql_query("SELECT `urlname` FROM `workingurls` ORDER BY RAND() LIMIT 1"));
		$remove = mysql_query("DELETE FROM `workingurls` WHERE `urlname`='$getURL[urlname]' LIMIT 1"); 

		// Get the new page ready
		$this->startURL = $getURL['urlname'];
		$this->setStartPath(); 

		// If no more pages, return
		if($this->startURL == NULL){ return;}
		// Clean vars
		unset($results,$pageContent);
		// If more pages, loop again
		$this->startScraping();
	} 

	// Function to clean input URLS
	function cleanListURLs($linkList)
	{
		foreach($linkList as $sub => $url)
		{
			// Check if only 1 character - there must exist at least / character
			if(strlen($url) <= 1){unset($linkList[$sub]);}
			// Check for any javascript
			if(strpos('javascript',$url)){unset($linkList[$sub]);}
			// Check for invalid extensions
			str_replace($this->allowedExtensions,'',$url,$count);
			if($count > 0){ unset($linkList[$sub]);}
			// If URL starts with #, ignore
			if(substr($url,0,1) == '#'){unset($linkList[$sub]);} 

			// If everything is OK and path is relative, add starting path
			if(substr($url,0,1) == '/' || substr($url,0,1) == '?' || substr($url,0,1) == '='){
			$linkList[$sub] = $this->startPath.$url;
			}
		}
		return $linkList;
	} 
}
?>

database.sql

CREATE TABLE IF NOT EXISTS `emaillist` (
 `emailadd` varchar(255) NOT NULL,
 PRIMARY KEY (`emailadd`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='List of all gotten emails';

CREATE TABLE IF NOT EXISTS `finishedurls` (
 `urlname` varchar(255) NOT NULL,
 PRIMARY KEY (`urlname`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='List of finished urls';

CREATE TABLE IF NOT EXISTS `workingurls` (
 `urlname` varchar(255) NOT NULL,
 PRIMARY KEY (`urlname`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='List of current working urls';

start.php

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	</head>
	<body>
<?php
	error_reporting(0);
	$DB_USER = 'root';
	$DB_PASSWORD = '';
	$DB_HOST = 'localhost';
	$DB_NAME = 'test';
	$dbc = mysql_connect ($DB_HOST, $DB_USER, $DB_PASSWORD) or $error = mysql_error();
	mysql_select_db($DB_NAME) or $error = mysql_error();
	mysql_query("SET NAMES `utf8`") or $error = mysql_error();
	if($error){ die($error);}

	include('emailcrawler.php');

	$new = new scraper;
	// Start Path can be empty, which will be extracted from the start URL
	$new->setStartPath();
	//$new->setStartPath('http://geekiest.net');
	$new->startURL('http://geekiest.net/beautifulmails/');
	$new->startScraping();
?>
	</body>
</html>

 

&npsp;

Javascript Validator Script

Hi Friends,

Today I will write on JavaScript Validation. Earlier when I started my technical career, I have always have to write JavaScript validation for my every form in every website it takes so much time of me. So I started looking forward for such script available on the internet.

Here is one of the script available on the internet:

JavaScript Form Validation Script

This is very easy script to work on. There few easy steps to make this work

  1. Download the script from here javascript_form. The main Link where I found this is Link.
  2. Attach “gen_validatorv4.js” in your form page
    <script language="JavaScript" src="gen_validatorv4.js" type="text/javascript" xml:space="preserve"></script>
  3. Display Form
  4. Attach Validator to the form element

Here is the example:

HTML File

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
	<title>
		JavaScript form validator Example
	</title>
	<style type="text/css" xml:space="preserve">
		BODY, P,TD{ font-family: Arial,Verdana,Helvetica, sans-serif; font-size: 10pt }
		A{font-family: Arial,Verdana,Helvetica, sans-serif;}
		B { font-family : Arial, Helvetica, sans-serif; font-size : 12px; font-weight : bold;}
	</style>
	<script language="JavaScript" src="gen_validatorv4.js" type="text/javascript" xml:space="preserve"></script>
</head>
<body>
	<form action="" name="myform" id="myform">
		<table cellspacing="2" cellpadding="2" border="0">
		<tr>
			<td align="right">First Name</td>
			<td><input type="text" name="FirstName" /></td>
		</tr>
		<tr>
			<td align="right">Last Name</td>
			<td><input type="text" name="LastName" /></td>
		</tr>
		<tr>
			<td align="right">EMail</td>
			<td><input type="text" name="Email" /></td>
		</tr>
		<tr>
			<td align="right">Phone</td>
			<td><input type="text" name="Phone" /></td>
		</tr>
		<tr>
			<td align="right">Address</td>
			<td><textarea cols="20" rows="5" name="Address"></textarea></td>
		</tr>
		<tr>
			<td align="right">Country</td>
			<td><select name="Country">
				<option value="000" selected="selected">[choose yours]</option>
				<option value="008">Albania</option>
				<option value="012">Algeria</option>
				<option value="016">American Samoa</option>
				<option value="020">Andorra</option>
				<option value="024">Angola</option>
				<option value="660">Anguilla</option>
				<option value="010">Antarctica</option>
				<option value="028">Antigua And Barbuda</option>
				<option value="032">Argentina</option>
				<option value="051">Armenia</option>
				<option value="533">Aruba</option></select></td>
		</tr>
		<tr>
			<td align="right"></td>
			<td><input type="submit" value="Submit" /></td>
		</tr>
		</table>
	</form>
	<script language="JavaScript" type="text/javascript" xml:space="preserve">//<![CDATA[
	//You should create the validator only after the definition of the HTML form
	var frmvalidator = new Validator("myform");
	frmvalidator.addValidation("FirstName","req","Please enter your First Name");
	frmvalidator.addValidation("FirstName","maxlen=20", "Max length for FirstName is 20");
	frmvalidator.addValidation("FirstName","alpha","Alphabetic chars only");

	frmvalidator.addValidation("LastName","req","Please enter your Last Name");
	frmvalidator.addValidation("LastName","maxlen=20","Max length is 20");

	frmvalidator.addValidation("Email","maxlen=50");
	frmvalidator.addValidation("Email","req");
	frmvalidator.addValidation("Email","email");

	frmvalidator.addValidation("Phone","maxlen=50");
	frmvalidator.addValidation("Phone","numeric");

	frmvalidator.addValidation("Address","maxlen=50");
	frmvalidator.addValidation("Country","dontselect=000");//]]></script>
</body>
</html>

 

Thank you

 

 

Transfer MySQL Database Server to Server Using cPanel

Hello Friends,

Few days ago I have project to transfer MySQL Database from one server to another. I don’t want to use my internet bandwidth for this large database by downloading to my local computer and then uploading this to another server. Rather I was thinking of transferring this by one server to another.

Below is the syntax to do that.

[In Mysql Bin Folder]>mysqldump -h[source-hostname-or-IP] -u[username] -p[password] [Database-name] | mysql -h[destination-hostname-or-IP] -u[username]&nbsp;-p[password] [target-Database-name]

Before Using this you have to setup few thing in you local machine.

  • Install MySQL
  • Open command prompt.
  • Locate “bin” directory in MySQL Installation in Command Prompt
  • Create Database in Destination or Target Server
  • Make Remote Connection allowed on both Destination and Source Server.
  • fire command with your own parameter.

Please feel free to ask any question.

Thank you.

 

Streaming File to the Browser

Hi Guys,

I hope you all are well. Today I am going to write post on How can make to user download the file from the another website without showing them that the file is actually coming from another server?

This can help many beginners PHP User to to Hide File URL Location in PHP. Here is the example how can u fetch file from other website and give download to the user like its downloading from our website.

Here is the Nice code:

<?php
	$file="http://example.com/example.jpg"; // replace this URL with your URL

	header('Content-Description: File Transfer');
	header('Content-Type: application/octet-stream');
	header('Content-Disposition: attachment; filename='.basename($file));
	header('Content-Transfer-Encoding: binary');
	header('Expires: 0');
	header('Cache-Control: must-revalidate');
	header('Pragma: public');
	header('Content-Length: ' . filesize($file));
	ob_clean();
	flush();
	readfile($file);
	exit;
?>

Very Easy to understand.

Ask if you Have any query.

Thank you.. 🙂

 

URL Rewriting Using htaccess file and PHP Apache

Hello friends,

From long time I was wondering how this URL rewriting is working. I have seen many website with pretty URLs. Also I found the URL Rewriting also helps in Better SEO (Search Engine Optimization).

After Reading bunch of tutorial and forums, I think this is very easy to make just few steps to follow and need knowledge of regular expression.

Steps for Implementing URL rewrite in you website.

  1. Enable mod_rewrite in Apache Server
  2. Create Htaccess file.
  3. Call the URL with New URL Defined.

Enabling mod_rewrite in Apache Server

For enabling mod_rewrite on Apache Server follow these steps:

  1. Find http.conf file in Apache server. Generally this file resides in conf folder of Apache installation PATH. XAMPP server path is C:\xampp\apache\conf on windows. Probably WAMP Path is C:\wamp\apache\conf.
  2. Find the line #LoadModule rewrite_module modules/mod_rewrite.so in the “httpd.conf” file. You can do this easily by searching the keyword “mod_rewrite” from find menu.
  3. Uncomment the line. http.conf file use # to comment the line. So remove the # from begining of the line.
  4. Restart the server. In WAMP you can restart the server by click the icon of WAMP in system tray in right bottom side of the windows. For XAMPP you can open XAMPP control and restart the server.
  5. You can check the mod_rewrite is enabled using phpinfo();

Create HTACCESS file

Lets consider one real world example for creating and understading HTACCESS file and its rules. HTACCESS file uses RewriteRule for define the rule for URL rewriting. This will show you good example for regular expression, rewrite rules & conditions.

For example, create file testhtaccess.php. Write

"This is PHP HTACCESS Test File"

Create second file testhtaccess.html and Write

"This is HTML HTACCESS Test File".

This is simple text written in the both the files so that we can differentiate when we rewrite the URL which file is called.

Create Third file .htaccess in same folder of the server where these two files are and Write

RewriteEngine On
RewriteRule ^/?testhtaccess\.html$ testaccess.php [L]

The first line turns RewiteEnging On. It will allow the server to rewrite the URL. Second line is the rule for URL rewriting. “^/” says that from starting any type of character and any number of character are allowed. Then there is “?testhtaccess\.html”  \ is for escaping the . in name and there is $ which implies that this is end for rule. This is first Parameter for RewriteRule. Now second parameter tells which files to call when url is matched.

So here when we call testaccess.html file is will be rewite the url and call testhtaccess.php file. This will done internally and URL remains the same.

Lets go now little complex for this. lets say that we have URL and Query String like as below:

http://mywebsite.com/details.php?city=cityname&company=companyname

we need to rewrite is like

http://mywebsite.com/cityname/companyname

Now we have to tell the apache that whenever this second rewrited url called it should internally call first URL with query string. Now for the ditails.php script to read and parse the query string, we’ll need to use regular expressions to tell mod_rewrite how to match the two URIs. For this you should have knowledge of regular expression. Even though I am going to explain some key factor of regular expression.

. (dot) – Matches any Single Character eg. c.t will match cat, cut, cot

+ (Plus) – repeates the previous match one more time

* (asterik) – repeats the previous match zero or mote times

? (question mark) – Makes the match optional

^ (anchor) – matches beginning of the string

$ (anchor) – matches end of the string

( ) – group several character into single unit

[ ] -a character class – matches one of the characters. it used for range of the character.

 [^ ] – negative character class – matches any character not specified.

Note: ! character can be used before a regular  expression to negate it.

Now for our example we can use

^([a-zA-Z_]+)/([a-zA-Z_]+)$

or

(.*)/(.*)

for replace Rewrite URL with Actual URL. Using ( ) we can make a variable which can be used on later. This Variable can accessed by

$1

Value incremented automatically on the sequence of occurring of the ( ).

RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+))$ details.php?city=$1&company=$2 [L]

This will replace the Rewrite URL with Actual One.  [a-zA-Z_]+ will match any string in lowercase or Uppercase. Its in small bracket ( ) so that it store its value in $1 and when it occurs second time it will store its value in $2.

This is so simple as we can see. I have written this post for the beginners like me. I written what I learn reading other forums and blog which are seems to be very complex for understanding it.

Any Correction and advice from you friends will be appreciated.

Thank you Please comment here if any questions.. 🙂

How to Improve MySQL Large Database Performance

Hello friends,

This is my first post for the blog. Recently I have worked on two large database projects on   backend MySQL Database. During the period of the project I came across with many problems and somehow I managed them all.

Many of you also came across with the same MySQL Database problem with Performance. Query is not running well etc. During this period  I read many blogs and forums to solve these problems. I have figured out some tips or can say some points which we should taken care of.

Database Engine (MYISAM vs. Innodb)

Both MySQL Database engines have its own pros and cons that we have to decide which type of db engine will work great for us. Below are some concerns with this two engines:

MYISAM

  • MYISAM use less memory
  • It allows full text search
  • It locks the table while writing
  • It is useful for application in which reading is high and fewer write

InnoDB

  • It uses more memory
  • It does not support full text search
  • InnoDB provides faster performance
  • It lock the table at row level while writing to the table
  • It works great for the application which make extensive use of read and write both

 

Good Database Design

Good database design is a backbone for the application performance. Bad design makes application performance less. Table must be normalized. Data Structure is the main factor which must be developed carefully. Every Developer must give time for each table and fields for making good design. You should give proper data type to each field. When you done with database creation, you would like to see what is MySQL suggest you for your database table. Here is the Command which can help you to get this information:

ANALYZE TABLE <table-name>;

You can find full description here: http://dev.mysql.com/doc/refman/5.5/en/analyze-table.html

 Indexes

What many of us is knowing that indexes help us to increase the speed of query. Many times indexes create confusion in mind. Creating of proper indexes for the table is necessary but do make table overhead of indexes as indexes take space on the disk. So it increases the workload on the disk. Working on my project I many time need to add and delete column from large table which takes so much time. For increasing performance we can make index ON of OFF. so before whenever I start my database operation I make the index key off and do the operation when operations completed I again makes indexes ON. So rather than making new indexes on my each operation, it will make new index when I male Index on so its only one time. for turning on and index I use below syntax:

Disable Indexes:  ALTER TABLE table_name DISABLE KEY
Enable Indexes: ALTER TABLE table_name ENABLE KEY

 

Tuning MySQL

Hardware makes as important role as other things in database tuning. Our hardware needs just as much attention and tuning as our database and script does. It is also need to check MySQL Configuration file what type of changes we have make to the configuration. There is one tool available which is perl script. You can download and Install on server It tells that which type of configuration change you can make improve the performance.

 

Note: This are the simple things which I feel good to use and take care for better performance.