Category Archives: PHP

PHP Best Practices: Models and Data Mining

Most PHP developers utilizing MVC frameworks would argue that the best approach for models are classes with a varied number of getters and setters.  After a long time working with PHP and especially MVC frameworks and data mining/retrieval, I am of a completely different opinion and I will discuss why.

First let’s start with the traditional approach which I will refer to as a “heavy model”.  Consider the following:

class Person {
    private $id = 0;
    private $first_name = null;
    private $last_name = null;
 
    public function setId($id){
        $this->id = $id;
    }
 
    public function getId(){
        return $this->id;
    }
 
    public function setFirstName($first_name){
        $this->first_name = $first_name;
    }
 
    public function getFirstName(){
        return $this->first_name;
    }
 
    public function setLastName($last_name){
        $this->last_name = $last_name;
    }
 
    public function getLastName(){
        return $this->last_name;
    }
}

Admittedly, this is a very rudimentary data model, however it does follow the standard approach for most MVC frameworks with regard to model layout and corresponding getters and setters for privatized properties.  Its usage might then follow something along the lines of:

$p = new Person();
$p->setId(1);
$p->setFirstName('Bob');
$p->setLastName('Dobalina');

Usability wise, this above approach is very clean and makes perfect sense from an object oriented model approach even if it is rather remedial. However, having been living in PHP for quite some time now I have two problems with this approach. The first comes from data retrieval, or to be more precise, retrieving data from a database and populating models for use. The other is flexibility/code maintenance. For example, let’s say we’re using PDO to connect to a MySQL database:

$conn = /* get a database connection */;
$sql = 'SELECT * FROM person';
$stmt = $conn->prepare($sql);
$stmt->execute();
 
$people = array();
while($result = $stmt->fetch(PDO::FETCH_ASSOC)){
    $p = new Person();
    $p->setId($result['id']);
    $p->setFirstName($result['first_name']);
    $p->setLastName($result['last_name']);
    $people[] = $p;
}

So if we examine the above we see an array of Person objects being populated from a database result set. Now ignoring the semantics of the above, this is a pretty common way to retrieve and populate models. Sure there are different ways, methods of the model, etc. But in essence they are all pretty much performing this kind of loop somewhere. What if I told you there was a more efficient way to do the above, that executed faster, was more flexible and required less code?

Now consider this example, a modification to the above model which I will refer to as “light model”.

class Person {
    public $id = 0;
    public $first_name = null;
    public $last_name = null;
}

Now I know a lot of developers who see this are currently cringing, just stay with me for a minute. The above acts more like a structure than the traditional model, but it has quite a few advantages. Let me demonstrate with the following data mining code:

$conn = /* get a database connection */;
$sql = 'SELECT * FROM person';
$stmt = $conn->prepare($sql);
$stmt->execute();
 
$people = array();
while($result = $stmt->fetch(PDO::FETCH_ASSOC)){
    $p = new Person();
    foreach($result as $field_name=>$field_value)
        $p->{$field_name} = $field_value;
    $people[] = $p;
}

If you’re unfamiliar with the foreach notation within the while loop, all it is doing is dynamically using the result set name and value to populate the model’s respective matching property. Here’s why I find the light model a much better practice especially when combined with the above while and foreach mining pattern. Firstly, the light model will populate faster with a smaller execution time being that no functions are being invoked. Each function call of the heavy model takes an additional hit performance wise, this can be validated quite easily using time stamps and loops. Secondly, the second mining example allows us to paint our models with all the values being mined out of the database directly, which means going forward if the table changes, only the properties of the model change, the data mining pattern will still work with no code changing. If the database changed on the previous heavy model example, both the model and all mining procedures would have to be updated with the new result field names and respective set methods or at the very least, updated in some model method.

Finally to come full circle, what about the CRUD methods which are usually attached to the models such as save() or get(), etc? Instead of creating instance methods of models which have the overhead of object instantiation, how about static methods of like objects which would be termed “business objects”. For example:

class PersonBO{
 
    public static function save($person){
        /* do save here */
    }
 
    public static function get($person){
        /* do get here */
    }
 
}

This example performs the same functionality that is usually attached to the model however, it makes use of static methods and executes faster with less overhead than its heavy model counterpart. This adds an additional layer of abstraction between the model and its functional business object counterpart and lends itself to clean and easy code maintainability.

In summary, the light models used in conjunction with the data mining pattern demonstrated above reduce quite a bit of retrieval code and add to the codebase’s overall flexibility. This pattern is currently in use within several very large enterprise codebases and have been nothing but a pleasure to work with.

Special thanks to Robert McFrazier for his assistance in developing this pattern.

How To Use FQL With The Facebook PHP SDK

Foreword

So the other day while building a Facebook application, I needed to get some information that I just couldn’t find any way to get other than through the use of FQL  Needless to say, finding examples of FQL are plentiful.  However, examples that use FQL and the Facebook PHP SDK are not.  So I thought I’d put one together.

Getting Started

If you haven’t already done so, make sure you’ve installed the latest Facebook SDK and registered your Facebook application.

FQL And The Facebook SDK

Here’s an example call to get all the photos that belong to the logged in user:

$appInfo = array(
'appId' => 'XXXXXXXXXXXXX',
'appSecret' => 'XXXXXXXXXXXXXXXXXXXXX'
); 
 
$facebook = new Facebook($appInfo);
 
$result = $facebook->api( array('method' => 'fql.query', 'query' => 'SELECT src, caption FROM photo WHERE owner=me()') );
foreach($result as $photo){
   ...
}

And there you have it. Hopefully this will save someone the amount of time it took for me to figure it out.

Twitter OAuth PHP Tutorial

Foreword

Trying to get a dial tone with Twitter’s new OAuth can be a frustrating and daunting task.  Especially when it comes to the utter lack of proper documentation with regards to just connecting with to Twitter using OAuth.  This tutorial will walk you through the steps required to be able to make calls using Twitter’s OAuth in PHP.

Getting Started

OK, first things first.  You’ll need to have your web application registered with Twitter, as well as the associated username and password of the account.  If you haven’t registered your application with Twitter yet, here’s what you’ll need to do:

First visit http://dev.twitter.com/

Click on the “Register an app” link.

Then fill out all the appropriate information requested.  Make sure that you’ve selected “Browser” as your “Application Type”.

You will also need to register a callback URL.  This is the URL where people will be redirected to after they have authorized your website/application for use with their Twitter account.  This is also where you will receive validation information directly from Twitter which will be required to make calls on behalf of the Twitter user.

Once you have filled out the form and registered your application, you’ll be presented with the details of your registration including your “Consumer Key” and “Consumer Secret”.  You’ll be using those shortly so keep a browser instance open or copy them down.

Twitter Application Registration

Twitter Application Registration

Now that the prerequisites are done, it’s time to being the battle with OAuth.

Beginning OAuth

First let’s understand what needs to happen to get OAuth working.

The gist is simple enough, we need to create a header with authorization data in it.  Post it to Twitter, get a token back from twitter letting us know we’re registered and everything is groovy.  Next we’ll retrieve the authorization URL to allow users to authorize our application with their account and finally, do something…Uh Twittery, Twitter-esc…I don’t know, don’t ask.

Now, the most important thing to remember here is Twitter is super picky about how everything is encoded…super picky.  So if you make one mistake, you’ll get a forbidden and get rejected with very little help response message wise from Twitter.

So let’s start by getting the request token from Twitter, which will let us know we’re on the right track.

Getting The Request Token

To get the request token from Twitter we need to POST a call to:

https://api.twitter.com/oauth/request_token

But first we need to sign and encode an authorization header, and yes..it is a pain.  Lucky for you I’ve already been through this so you don’t have to deal with figuring it all out.  Here we go.

The authorization header required for the request token requires the following fields:

  • oauth_callback – the url to be redirected to after authorization
  • oauth_consumer_key – this is the consumer key you get after registering your application with Twitter
  • oauth_nonce – this is a unique value that you generate to reduce the chance of someone hijacking your session
  • oauth_signature_method – this is the method used to sign the base string, we’ll get to this in a bit, but for now the default value is “HMAC-SHA1”
  • oauth_timestamp – this is the current timestamp.
  • oauth_version – this is going to be “1.0”

An easy way to deal with the authorization header and it’s nuances is to load all the oauth header values into an associative array and pass them to functions that will sign, encode, etc.

$nonce = time();
$timestamp = time();
$oauth = array('oauth_callback' => 'http://yourdomain.com/callback_page',
              'oauth_consumer_key' => 'yourconsumerkey',
              'oauth_nonce' => $nonce,
              'oauth_signature_method' => 'HMAC-SHA1',
              'oauth_timestamp' => $timestamp,
              'oauth_version' => '1.0');

Just to clarify what I’ve done so far.  The $nonce variable can be pretty much whatever you want, I thought for the purpose of this tutorial however time() would be the easiest to understand.  The $timestamp, well that should be pretty obvious.  The $oauth is our associative array containing all the fields and values required to get us started.

Now that we have our oauth array, we need to create our base string.  The base string is basically a signature of the action we want to perform which later we will sign using HMAC-SHA1 and our composite key.  The result of which will be our oauth_signature.  I know it sounds a bit confusing, but don’t worry.  I’ll walk you through the entire process step by step.

So let’s build the base string.  The base string has the format of:

METHOD&BASEURI&OAUTH_PARAMS_SORTED_AND_ENCODED

For example, here’s a fully encoded base string:

POST&https%3A%2F%2Fapi.twitter.com%2Foauth%2Frequest_token&oauth_callback%3Dhttp%253A%252F%252Flocalhost%253A3005%252Fthe_dance%252Fprocess_callback%253Fservice_provider_id%253D11%26oauth_consumer_key%3DGDdmIQH6jhtmLUypg82g%26oauth_nonce%3DQP70eNmVz8jvdPevU3oJD2AfF7R7odC2XJcn4XlZJqk%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D1272323042%26oauth_version%3D1.0

Yeah, it’s ugly.  And here’s how you create it.  As I said before Twitter is very picky about encoding, etc.  So to ensure everything is encoded the same way each and every time, let’s use a function that will do it for use using our $oauth array.

/**
 * Method for creating a base string from an array and base URI.
 * @param string $baseURI the URI of the request to twitter
 * @param array $params the OAuth associative array
 * @return string the encoded base string
**/
function buildBaseString($baseURI, $params){
 
$r = array(); //temporary array
    ksort($params); //sort params alphabetically by keys
    foreach($params as $key=>$value){
        $r[] = "$key=" . rawurlencode($value); //create key=value strings
    }//end foreach                
 
    return 'POST&' . rawurlencode($baseURI) . '&' . rawurlencode(implode('&', $r)); //return complete base string
}//end buildBaseString()

So here’s what the buildBaseString() function does.  It takes the $baseURI, in this case “https://api.twitter.com/oauth/request_token” and the $oauth array we created and hands back the encoded base string.  It does this by first sorting the $oauth array according to keys (this is required by Twitter, one of the gotchas), then it creates a “key=value” string using rawurlencode (i.e. oauth_signature=HMAC-SHA1, rawurlencode is another gotcha), then it creates and hands back the base string after yet another full rawurlencode of all the “key=value” strings (this was a major gotcha, took me forever to figure out).  I can’t stress enough how absolutely imperative it is that the base string gets created exactly like this.  Any deviation for this can easily result in forbidden/unauthorized rejections from Twitter.

Now that we have a way to create a base string let’s create one.

$baseString = buildBaseString($baseURI, $oauth);

And wallah, a base string…yay!  Now we need to sign it.  To sign the base string we’ll need to do a couple of things.  The first is to create the composite key.  The composite key is the result of a rawurlencode of the consumer secret separated by the “&” followed by the rawurlencode of the token.  Now, being that we are trying to retrieve the token we don’t have that just yet and it’s set to null.  But for all calls after we get our request token, we’ll need to make sure it is included in the composite key otherwise the call will be rejected.  Once again to make live easy, let’s create a function called getCompositeKey().

/**
 * Method for creating the composite key.
 * @param string $consumerSecret the consumer secret authorized by Twitter
 * @param string $requestToken the request token from Twitter
 * @return string the composite key.
**/
function getCompositeKey($consumerSecret, $requestToken){
    return rawurlencode($consumerSecret) . '&' . rawurlencode($requestToken);
}//end getCompositeKey()

That function is pretty self explanatory.  Now to actually sign the base string we’ll need to use the composite key, sign it with the binary form of HMAC-SHA1 and encode the result as base64.  This will create our oauth_signature which we will need to finalize our request to Twitter.  This is what you’ll need.

$consumerSecret = 'consumer-secret'; //put your actual consumer secret here, it will look something like 'MCD8BKwGdgPHvAuvgvz4EQpqDAtx89grbuNMRd7Eh98'
 
$compositeKey = getCompositeKey($consumerSecret, null); //first request, no request token yet
$oauth_signature = base64_encode(hash_hmac('sha1', $baseString, $compositeKey, true); //sign the base string
$oauth['oauth_signature'] = $oauth_signature; //add the signature to our oauth array

First we create our composite key by passing in the consumer secret that you received when you registered your application with Twitter.  We pass in null for the $requestToken argument because that’s what we’re going to retrieve.  Next we sign our base string.  Then add the $oauth_signature to our $oauth array to be used in our authorization header.  Now, let’s build us a header buddy…yee haw!

The header once constructed will need to have the following format:

Authorization: OAuth key=value, key=value, …

Once again, to make life easy, why don’t we build a function to build Authorization header for us.

/**
 * Method for building the OAuth header.
 * @param array $oauth the oauth array.
 * @return string the authorization header.
**/
function buildAuthorizationHeader($oauth){
    $r = 'Authorization: OAuth '; //header prefix
 
    $values = array(); //temporary key=value array
    foreach($oauth as $key=>$value)
        $values[] = "$key=\"" . rawurlencode($value) . "\""; //encode key=value string
 
    $r .= implode(', ', $values); //reassemble
    return $r; //return full authorization header
}//end buildAuthorizationHeader()

This function simply takes in the $oauth array and transforms it into the Authorization header format.  The only caveat here is that the values need to be wrapped in rawurlencode (yet another gotcha).  Now all that we really need to do is construct our header and place a call out to Twitter and get the request token back.  To make the request across the line to Twitter, I used cURL.  PHP makes it easy to use, so let’s wrap it in a function that we’ll use for all Twitter calls going forward.  Now there’s a bit of a gotcha here that’s undocumented (ofcourse).  When you build the header you will also need to pass in an “Expect: ” header option, otherwise Twitter will complain and fail.  Something like:

/**
 * Method for sending a request to Twitter.
 * @param array $oauth the oauth array
 * @param string $baseURI the request URI
 * @return string the response from Twitter
**/
function sendRequest($oauth, $baseURI){
    $header = array( buildAuthorizationHeader($oauth), 'Expect:'); //create header array and add 'Expect:'
 
    $options = array(CURLOPT_HTTPHEADER => $header, //use our authorization and expect header
                           CURLOPT_HEADER => false, //don't retrieve the header back from Twitter
                           CURLOPT_URL => $baseURI, //the URI we're sending the request to
                           CURLOPT_POST => true, //this is going to be a POST - required
                           CURLOPT_RETURNTRANSFER => true, //return content as a string, don't echo out directly
                           CURLOPT_SSL_VERIFYPEER => false); //don't verify SSL certificate, just do it
 
    $ch = curl_init(); //get a channel
    curl_setopt_array($ch, $options); //set options
    $response = curl_exec($ch); //make the call
    curl_close($ch); //hang up
 
    return $response;
}//end sendRequest()

Please  note: You may need to add CURLOPT_POSTFIELDS => ” depending on the version of PHP you are using. ~thanks to Thomas Krantz

So this function sends a request to Twitter (obviously), it constructs the header using the buildAuthorizationHeader() function and adds the ‘Expect:’ directive as is required to communicate with Twitter successfully.  We set the cURL options, make the call, hang up and return the result.  Now let’s put that into action.

$baseString = buildBaseString($baseURI, $oauth); //build the base string
 
$compositeKey = getCompositeKey($consumerSecret, null); //first request, no request token yet
$oauth_signature = base64_encode(hash_hmac('sha1', $baseString, $compositeKey, true)); //sign the base string
 
$oauth['oauth_signature'] = $oauth_signature; //add the signature to our oauth array
 
$response = sendRequest($oauth, $baseURI); //make the call

Now, if all went well your response should contain the “oauth_token” and the “oauth_token_secret”.  If not, you must have typed something wrong, keep trying until you get the token and secret handed back, or drop me a line and I’ll make sure I didn’t mistype something.

Now, take the “oauth_token” and the “oauth_token_secret” and store it.  We’re going to be using it to make a call to get our authorization url.

Constructing The Authorization URL

To construct the authorization URL, we’re going to be using the request token we received back from Twitter in the previous section Getting The Request Token, the “oauth_token”.  The authorization url has the following format:

http://api.twitter.com/oauth/authorize?oauth_token=$oauth_token

Where the $oauth_token variable is the actual value your received from Twitter parameterized as “oauth_token”.  Redirect the browser to this URL and it will prompt the user for their Twitter username/password if they’re not logged in, then it will ask permission to authorize use with your registered application.  You should see something similar to the following:

Twitter Connect Authorization

Once the user has authorized use, the browser will be redirect back to the “oauth_callback” parameter you specified earlier in your get request token call.  Here is where you will receive the user’s “screen_name”, “user_id”, “oauth_token” and “oauth_token_secret”.  Store these values and use them to make calls on behalf of the user to Twitter.  Hopefully this has helped you get over the initial shell shock and Twitter OAuth hurdle.  From this point forward, you’ll want to reference http://dev.twitter.com/pages/auth to find out more on making calls to Twitter resources on behalf of the user.

The source code for this example is available for download and use:

Download Project Files

PHP Daemons Tutorial

Foreword

The following is a tutorial for creating PHP scripted daemons.  All examples will be performed on CentOS linux.  The daemon code itself will be written in PHP with a BASH script wrapper.

Getting Started

So to get started, let’s create a simple PHP file that will be executed as our daemon. Essentially, we’re going to create your everyday run of the mill command line PHP file.  Then we’ll worry about turning into a daemon a little later.  For this example, let’s create a simple PHP script that put’s text into a log file every second.  So let’s create a file called “Daemon.php” and let’s put the following in it:

#!/usr/bin/php
 
<?php
 
while(true){
    file_put_contents('/var/log/Daemon.log', 'Running...', FILE_APPEND);
    sleep(1);
}//end while
 
?>

Ok, that wasn’t so bad.  The first line tells the interpreter what to execute the file against, in this case we want the file to be interpreted as PHP.  Next we create a simple infinite loop that writes “Running…” to the “/var/log/Daemon.log” file, then sleeps for a second.  Now let’s test it, but first we need to make it executable.

user@computer:$ chmod a+x Daemon.php

Now let’s test it.

user@computer:$ ./Daemon.php

Now that the script is running let’s check the log file.  Open a new terminal and issue the following command to verify the output.

user@computer:$ tail -f /var/log/Daemon.log

If all has gone well you should see live updates that read “Running…Running…Running…”.

Now, let’s enhance the script a little to make it more user friendly.  Let’s add the ability to pass in command line arguments and display a help message.

#!/usr/bin/php
 
<?php
 
$log = '/var/log/Daemon.log';
 
/**
 * Method for displaying the help and default variables.
 **/
function displayUsage(){
    global $log;
 
    echo "n";
    echo "Process for demonstrating a PHP daemon.n";
    echo "n";
    echo "Usage:n";
    echo "tDaemon.php [options]n";
    echo "n";
    echo "toptions:n";
    echo "tt--help display this help messagen";
    echo "tt--log=<filename> The location of the log file (default '$log')n";
    echo "n";
}//end displayUsage()
 
//configure command line arguments
if($argc > 0){
    foreach($argv as $arg){
        $args = explode('=',$arg);
        switch($args[0]){
            case '--help':
                return displayUsage();
            case '--log':
                $log = $args[1];
                break;
        }//end switch
    }//end foreach
}//end if
 
//the main process
while(true){
	file_put_contents($log, 'Running...', FILE_APPEND);
	sleep(1);
}//end while
 
?>

So now we have an elegant way to pass in command line arguments or options to our daemon process along with a nice display usage function which you can use to brand your process with author info, etc.  Now that’s all fine and dandy, but what does that have to do with creating a PHP daemon?  Hold on, we’re getting to that.  Passing in command line flags easily and elegantly will come in hand once we get our daemon up and running.  So now that we have a basic daemon process, let’s actually get it working as a daemon.  To do this we’re going to need a BASH daemon launcher, and we’ll eventually need to put it in the “/etc/init.d” directory.  So now the BASH daemon controller.

BASH Daemon Controller

Let’s start by creating a file called “Daemon” which we’ll use to control our “Daemon.php” file.  So first I’ll give you the entire BASH script, then I’ll explain it.

#!/bin/bash
#
#	/etc/init.d/Daemon
#
# Starts the at daemon
#
# chkconfig: 345 95 5
# description: Runs the demonstration daemon.
# processname: Daemon
 
# Source function library.
. /etc/init.d/functions
 
#startup values
log=/var/log/Daemon.log
 
#verify that the executable exists
test -x /home/godlikemouse/Daemon.php || exit 0RETVAL=0
 
#
#	Set prog, proc and bin variables.
#
prog="Daemon"
proc=/var/lock/subsys/Daemon
bin=/home/godlikemouse/Daemon.php
 
start() {
	# Check if Daemon is already running
	if [ ! -f $proc ]; then
	    echo -n $"Starting $prog: "
	    daemon $bin --log=$log
	    RETVAL=$?
	    [ $RETVAL -eq 0 ] && touch $proc
	    echo
	fi
 
	return $RETVAL
}
 
stop() {
	echo -n $"Stopping $prog: "
	killproc $bin
	RETVAL=$?
	[ $RETVAL -eq 0 ] && rm -f $proc
	echo
        return $RETVAL
}
 
restart() {
	stop
	start
}	
 
reload() {
	restart
}	
 
status_at() {
 	status $bin
}
 
case "$1" in
start)
	start
	;;
stop)
	stop
	;;
reload|restart)
	restart
	;;
condrestart)
        if [ -f $proc ]; then
            restart
        fi
        ;;
status)
	status_at
	;;
*)
 
echo $"Usage: $0 {start|stop|restart|condrestart|status}"
	exit 1
esac
 
exit $?
exit $RETVAL

Ok, so the above BASH script is the daemon controller responsible for starting, stopping, restarting, etc.   Initially it sets itself up for use with chkconfig so that it can be set to start when the OS boots, etc.  Next it includes the basic BASH functions include file.  Afterward we check to see if the executable file actually exists before we try and start it.  Next, we set up the default variables for our program including the proc filename, bin or executable and the program name.  Next, we define some basic default parameters to pass to the PHP file when starting up.  You could also modify this script to read variables from an “/etc/Daemon” configuration file, but for now we’ll just set them directly in the BASH file.  Lastly the PHP file is invoked along with the daemon command.  The rest of the file simple reads the users input to the BASH file and handles start, restart, etc.  Next, let’s make the BASH Daemon file executable so we can use it in just a bit.

user@computer:$ chmod a+x Daemon

PHP Daemon

So now that we have our controller, we’ll need to modify our PHP script to work in a daemon process.  To have our PHP working in a daemon process we’ll need to use fork, that way we can establish a child process which will continually run and return a value to our controller to let it know that we were able to start properly. Here’s the modification to the PHP file.

#!/usr/bin/php
 
<?php
 
$log = '/var/log/Daemon.log';
 
/**
 * Method for displaying the help and default variables.
 **/
function displayUsage(){
    global $log;
 
    echo "n";
    echo "Process for demonstrating a PHP daemon.n";
    echo "n";
    echo "Usage:n";
    echo "tDaemon.php [options]n";
    echo "n";
    echo "toptions:n";
    echo "tt--help display this help messagen";
    echo "tt--log=<filename> The location of the log file (default '$log')n";
    echo "n";
}//end displayUsage()
 
//configure command line arguments
if($argc > 0){
    foreach($argv as $arg){
        $args = explode('=',$arg);
        switch($args[0]){
            case '--help':
                return displayUsage();
            case '--log':
                $log = $args[1];
                break;
        }//end switch
    }//end foreach
}//end if
 
//fork the process to work in a daemonized environment
file_put_contents($log, "Status: starting up.n", FILE_APPEND);
$pid = pcntl_fork();
if($pid == -1){
	file_put_contents($log, "Error: could not daemonize process.n", FILE_APPEND);
	return 1; //error
}
else if($pid){
	return 0; //success
}
else{
    //the main process
    while(true){
        file_put_contents($log, 'Running...', FILE_APPEND);
        sleep(1);
    }//end while
}//end if
 
?>

You’ll notice that I’ve added a few more calls to file_put_contents() just to let us know how we’re doing.  Now for the guts of the operation, we call pcntl_fork() to generate a child process and to let the parent caller return a value back to the BASH daemon controller.  The first if determines if the fork call worked at all, if it returns a -1, then it’s a failure, report the error and return a failed status back to the BASH daemon controller.  If $pid contains a valid number, then we’re good to go, but we’re still in the parent process, so we return 0 to let the BASH daemon controller know that all is well in the universe.  The else executes when the child process is created, and this is where the main part of our program executes.

Now, if all has gone well we should be able to start the daemon in normal daemon fashion.  If you’re running this in your /home/{username} directory then execute the following:

user@computer:$ ./Daemon start

You can also copy the Daemon BASH script to the “/etc/init.d/” directory, in which case you can start the daemon using the “service” command:

user@computer:$ service Daemon start

Stopping Daemon:              [ OK ]

Now to verify that everything is working correctly.  First let’s check the log file:

user@computer:$ tail -f /var/log/Daemon.log

Status: starting up.
Running…Running…Running…

Yep, all good there.  Now let’s check our process.

user@computer:$ ps ax | grep Daemon

14886 pts/0    S      0:00 /usr/bin/php /home/godlikemouse/Daemon.php --log=/var/log/Daemon.log
14944 pts/2    R+     0:00 grep Daemon

Ok, the process is running correctly.  Now, let’s stop it and verify again.  Issue the following command:

user@computer:$ ./Daemon stop

or if you copied the controller to the “/etc/init.d” directory

user@computer:$ service Daemon stop

Stopping Daemon:              [ OK ]

Now let’s verify that our process has stopped:

user@computer:$ ps ax | grep Daemon

14997 pts/2    R+     0:00 grep Daemon

Yep, all good…and there you have it.  Now you can write PHP daemons until you turn blue and pass out.  Hopefully this tutorial has been helpful, good luck.

The files used in this tutorial are available for download and use:

Download Project Files

Quick Start Solr

Foreword

The following is a quick start guide for gettting Solr configured, up and running in a few minutes.  All of my examples will be performed on CentOS Linux.

Getting Started

First things first, make sure you have Java installed and ready.  Next download a Solr release:

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Then extract it:

user@computer:$ tar -xpf apache-solr-1.4.1.tgz

This is just preference, but I like to work off of a short name by creating a symbolic link to the full version directory like so:

user@computer:$ ln -s apache-solr-1.4.1 apache-solr

Now we’re going to need a solid web server, I suggest using something like Tomcat or other comparable Java server.  Download tomcat at:

http://tomcat.apache.org/download-60.cgi

Then extract it:

user@computer:$ tar xpf apache-tomcat-6.0.29.tar.gz

Once again, not a requirement but for personal ease of use, I create a symbolic link.

user@computer:$ ln -s apache-tomcat-6.0.29 apache-tomcat

Next let’s copy out the WAR file to tomcat:

user@computer:$ cp apache-solr/example/webapps/solr.war apache-tomcat/webapps/

Now we need to copy out an example Solr configuration directory:

user@computer:$ cp -R apache-solr/example/solr .

Good, now that’s it for the prep work. Now all we have to do is configure Solr for our specific needs.

Configuring Solr

First let’s setup or configuration file that tells Solr where to store it’s data files.  Open solr/conf/solrconfig.xml in an editor:

user@computer:$ vi solr/conf/solrconfig.xml

Find the section that looks like this:

<!-- Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. -->
<dataDir>${solr.data.dir:./solr/data}</dataDir>

Change the <dataDir> contents to point to where the data will be stored.  In our case use the location of the solr directory.  Something like:

<dataDir>/home/godlikemouse/solr/data</data>

Otherwise when you start Tomcat, it will just put the data directory in the directory in your current directory, wherever that may be.  Not a good idea.

Now it’s on to the fun part.  We need to tell Solr how to index our data.  That is, what form to store the data in.  Open solr/conf/schema.xml in an editor.

vi solr/conf/schema.xml

Now look for this section:

<fields>
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from the
<types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
compressed: [false] if this field should be stored using gzip compression
(this will only apply if the field type is compressable; among
the standard field types, only TextField and StrField are)
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory).  Only full-text
fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a
given field.
When using MoreLikeThis, fields used for similarity should be
stored for best performance.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
will increase storage costs.
default: a value that should be used if no value is specified
when adding a document.
-->

This is going to be a bit confusing if this is your first time dealing with Solr, but just hang in there, it’s not really as bad as it seems.

Inside the <fields> node is where we will define how and what data will be stored.  Let’s say for instance that all you wanted to store was:

id
first_name
last_name

Say, fields from a database user table for instance.  In that case we could remove all of the current field definitions, that is, all of the <field name=”… nodes and replace them with our own.  In this case what we’d want is:

<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="first_name" type="string" indexed="true" stored="true" required="true" />
<field name="last_name" type="string" indexed="true" stored="true" required="true" />

Here’s what the above defines:

Each node defines a field to be used by Solr (obviously).
The “name” attribute tells us, well…the name.
The “type” attribute specifies the type of the data.
The “indexed” attribute determines whether or not the field is searchable.  We can actually create fields that get kept by Solr, but are not searchable.
The “stored” attribute tells Solr to keep the data when it’s received.
The “required” attribute tells Solr whether it needs to explicitly refuse data that does not contain all the required fields.  In other words, anything we mark as required, better be there.

So now we’ve defined what our data will look like, we need to create a simple search field that will aggregate all of the values together for searching purposes.  In other words, what we have will allow us to specifically search for a “first_name”, “last_name” or “id”.  But what if we don’t know what we’re searching for just yet.  Let’s create a field that we can put the first and last name into to make it more searchable.  To do this create the following node:

<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

We will store all field concatenated search values in the text field.  Now if you scroll down in your editor a bit, you’ll see this:

<!-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required="false", it will be a required field
-->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>

This was already setup for is.  It tells Solr to use the “id” field as an index for all the records we are going to send to it.  It also tells Solr that if we don’t specify a specific field to search, to just use the “text” field we defined.

Alright, now let’s tell Solr to copy the “first_name” and “last_name” fields into our searchable “text” field.  Scroll down a bit in your editor and find the <copyField> nodes similar to this:

<copyField source="cat" dest="text"/>
<copyField source="name" dest="text"/>
<copyField source="manu" dest="text"/>
<copyField source="features" dest="text"/>
<copyField source="includes" dest="text"/>
<copyField source="manu" dest="manu_exact"/>

Go ahead and remove all of those, we’re going to define our own.  We only want our “first_name” and “last_name” fields to get copied into our default searchable “text” field:

<copyField source="first_name" dest="text"/>
<copyField source="last_name" dest="text"/>

Now we’re good to go, let’s spin up the server.

user@computer:$ ./tomcat/bin/catalina.sh run

You should see a bunch of output for Tomcat.  The “run” argument tells Tomcat to launch in a single process debug mode.  If you have made any errors up to this point.  Tomcat will show them to you.  Hopefully you haven’t, but if you have just read through the output carefully and correct any mistakes as you find them.

Assuming all went well, you can open up a browser and go to the admin page for Solr at: http://localhost:8080/solr

Solr Admin Welcome Screen

Solr Admin Welcome Screen

Once you see the “Welcome to Solr!” screen, you know you’ve arrived. 🙂

The only thing left to do is to start populating your Solr instance with data and try searching it.  Shall we?

Data Importing

There are a number of ways to import data into Solr, one of the easiest is to send and post directly to Solr.  For this example, let’s build a simple PHP XML command line script called “update-solr.php”:

#!/usr/bin/php
<?php
/**
 * Simple general purpose function for updating Solr
 * @param string $url Solr URL for updating data
 * @param string $postXml XML containing update information
 * @return string Solr response **/
function updateSolr($url, $postXml){
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postXml);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
 
    $header[] = "SOAPAction: $url";
    $header[] = "MIME-Version: 1.0";
    $header[] = "Content-type: text/xml; charset=utf-8";
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
 
    $r = curl_exec($ch);
    if(!$r)
        echo "Error: " . curl_error($ch) . "n";
    curl_close($ch);
    return $r;
}//end updateSolr()
 
// Build a simple XML string to add a new record.
// PHP command line parameters are received in $argv array.$xml = "
<add>
<doc>
<field name="id">$argv[1]</field>
<field name="first_name">$argv[2]</field>
<field name="last_name">$argv[3]</field>
</doc>
</add>";
 
updateSolr($argv[0], $xml);
 
// Lastly let's commit our changes
updateSolr($url, "<commit />");

The above command line PHP script should be pretty self explanatory, but just in case it is not, here’s the explanation.  The “updateSolr” function makes to call to Solr and posts the XML to it.  The information is set via the command line upon invocation.  So let’s try making an update:

First let’s make the script executable:

user@computer:$ chmod a+x update-solr.php

Next let’s update Solr with a few test records:

user@computer:$ ./update-solr.php http://localhost:8080/solr/update 1 John Doe
user@computer:$ ./update-solr.php http://localhost:8080/solr/update 2 Jane Doe

Next let’s view what’s been imported into Solr so far.  Open a browser and go to http://localhost:8080/solr, next click the “Solr Admin” link.

Solr Admin Page

Solr Admin Page

Next replace the Query String field (currently displaying “solr”) with a search for value.  Something like “John” or “Jane”.  Next click the “Search” button and the results are displayed in XML.  That’s all there is to it.  If you’d like to do a get request to try searching from your application, simply copy the URL currently displayed in the address bar for the search results and modify as you see fit.

Using Hadoop And PHP

Getting Started

So first things first.  If you haven’t used Hadoop before you’ll first need to download a Hadoop release and make sure you have Java and PHP installed.  To download Hadoop head over to:

http://hadoop.apache.org/common/releases.html

Click on download a release and choose a mirror.  I suggest choosing the most recent stable release.  Once you’ve downloaded Hadoop, unzip it.

user@computer:$ tar xpf hadoop-0.20.2.tar.gz

I like to create a symlink to the hadoop-<release> directory to make things easier to manage.

user@computer:$ link -s hadoop-0.20.2 hadoop

Now you should have everything you need to start creating a Hadoop PHP job.

Creating The Job

For this example I’m going to create a simple Map/Reduce job for Hadoop.  Let’s start by understanding what we want to happen.

  1. We want to read from an input system – this is our mapper
  2. We want to do something with what we’ve mapped – this is our reducer

At the root of your development directory, let’s create another directory called script.  This is where we’ll store our PHP mapper and reducer files.

user@computer:$ ls

.
..
hadoop-0.20.2
hadoop-0.20.2.tar.gz
hadoop
user@computer:$ mkdir script

Now let’s being creating our mapper script in PHP.  Go ahead and create a PHP file called mapper.php under the script directory.

user@computer:$ touch script/mapper.php

Now let’s look at the basic structure of a PHP mapper.

#!/usr/bin/php
<?php
//this can be anything from reading input from files, to retrieving database content, soap calls, etc.
//for this example I'm going to create a simple php associative array.
$a = array(
'first_name' => 'Hello',
'last_name' => 'World'
);
//it's important to note that anything you send to STDOUT will be written to the output specified by the mapper.
//it's also important to note, do not forget to end all output to STDOUT with a PHP_EOL, this will save you a lot of pain.
echo serialize($a), PHP_EOL;
?>

So this example is extremely simple.  Create a simple associative array and serialize it.  Now onto the reducer.  Create a PHP file in the script directory called reducer.php.

user@computer:$ touch script/reducer.php

Now let’s take a look at the layout of a reducer.

#!/usr/bin/php
 
<?php
 
//Remember when I said anything put out through STDOUT in our mapper would go to the reducer.
//Well, now we read from the STDIN to get the result of our mapper.
//iterate all lines of output from our mapper
while (($line = fgets(STDIN)) !== false) {
    //remove leading and trailing whitespace, just in case 🙂
    $line = trim($line);
    //now recreate the array we serialized in our mapper
    $a = unserialize($line);
    //Now, we do whatever we need to with the data.  Write it out again so another process can pick it up,
    //send it to the database, soap call, whatever.  In this example, just change it a little and
    //write it back out.
    $a['middle_name'] = 'Jason';
    //do not forget the PHP_EOL
    echo serialize($a), PHP_EOL;
}//end while
?>

So now we have a very simple mapper and reducer ready to go.

Execution

So now let’s run it and see what happens.  But first, a little prep work.  We need to specify the input directory that will be used when the job runs.

user@computer:$ mkdir input
user@computer:$ touch input/conf

Ok, that was difficult.  We have an input directory and we’ve created an empty conf file.  The empty conf file is just something that the mapper will use to get started.  For now, don’t worry about it.  Now let’s run this bad boy.  Make sure you have your JAVA_HOME set, this is usually in the /usr directory.  You can set this by running #export JAVA_HOME=/usr.

user@computer:$ hadoop/bin/hadoop jar hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -mapper script/mapper.php -reducer script/reducer.php -input input/* -output output

So here’s what the command does.  The first part executes the hadoop execute script.  The “jar” argument tells hadoop to use a jar, in this case it tells it to use “hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar”.  Next we pass the mapper and reducer arguments to the job and specify input and output directories.  If we wanted to, we could pass configuration information to the mapper, or files, etc.  We would just use the same line read structure that we used in the reducer to get the information.  That’s what would go in the input directory if we needed it to.  But for this example, we’ll just pass nothing.  Next the output directory will contain the output of our reducer.  In this case if everything works out correct, it will contain the PHP serialized form of our modified $a array.  If all goes well you should see something like this:

user@computer:$ hadoop/bin/hadoop jar hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -mapper script/mapper.php -reducer script/reducer.php -input input/* -output output

10/12/10 12:53:56 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/12/10 12:53:56 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
10/12/10 12:53:56 INFO mapred.FileInputFormat: Total input paths to process : 1
10/12/10 12:53:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
10/12/10 12:53:56 INFO streaming.StreamJob: Running job: job_local_0001
10/12/10 12:53:56 INFO streaming.StreamJob: Job running in-process (local Hadoop)
10/12/10 12:53:56 INFO mapred.FileInputFormat: Total input paths to process : 1
10/12/10 12:53:56 INFO mapred.MapTask: numReduceTasks: 1
10/12/10 12:53:56 INFO mapred.MapTask: io.sort.mb = 100
10/12/10 12:53:57 INFO mapred.MapTask: data buffer = 79691776/99614720
10/12/10 12:53:57 INFO mapred.MapTask: record buffer = 262144/327680
10/12/10 12:53:57 INFO streaming.PipeMapRed: PipeMapRed exec [/root/./script/mapper.php]
10/12/10 12:53:57 INFO streaming.PipeMapRed: MRErrorThread done
10/12/10 12:53:57 INFO streaming.PipeMapRed: Records R/W=0/1
10/12/10 12:53:57 INFO streaming.PipeMapRed: MROutputThread done
10/12/10 12:53:57 INFO streaming.PipeMapRed: mapRedFinished
10/12/10 12:53:57 INFO mapred.MapTask: Starting flush of map output
10/12/10 12:53:57 INFO mapred.MapTask: Finished spill 0
10/12/10 12:53:57 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
10/12/10 12:53:57 INFO mapred.LocalJobRunner: Records R/W=0/1
10/12/10 12:53:57 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/12/10 12:53:57 INFO mapred.LocalJobRunner:
10/12/10 12:53:57 INFO mapred.Merger: Merging 1 sorted segments
10/12/10 12:53:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 70 bytes
10/12/10 12:53:57 INFO mapred.LocalJobRunner:
10/12/10 12:53:57 INFO streaming.PipeMapRed: PipeMapRed exec [/root/./script/reducer.php]
10/12/10 12:53:57 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
10/12/10 12:53:57 INFO streaming.PipeMapRed: Records R/W=1/1
10/12/10 12:53:57 INFO streaming.PipeMapRed: MROutputThread done
10/12/10 12:53:57 INFO streaming.PipeMapRed: MRErrorThread done
10/12/10 12:53:57 INFO streaming.PipeMapRed: mapRedFinished
10/12/10 12:53:57 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
10/12/10 12:53:57 INFO mapred.LocalJobRunner:
10/12/10 12:53:57 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
10/12/10 12:53:57 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/root/output
10/12/10 12:53:57 INFO mapred.LocalJobRunner: Records R/W=1/1 > reduce
10/12/10 12:53:57 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
10/12/10 12:53:57 INFO streaming.StreamJob:  map 100%  reduce 100%
10/12/10 12:53:57 INFO streaming.StreamJob: Job complete: job_local_0001
10/12/10 12:53:57 INFO streaming.StreamJob: Output: output

If you get errors where it’s complaining about the output directory, just remove the output directory and try again.

Result

Once you’ve got something similar to the above and no errors, we can check out the result.

user@computer:$ cat output/*

a:3:{s:10:"first_name";s:5:"Hello";s:9:"last_name";s:5:"World";s:11:"middle_name";s:5:"Jason";}

There we go, a serialized form of our modified PHP array $a.  That’s all there is to it.  Now, go forth and Hadoop.

Encryption and Decryption Between .NET and PHP

I recently worked on a project that required encryption and decryption by and between .NET and PHP. By default, the 2 technologies don’t mesh very well. Being that the data was originally being encrypted and decrypted by .NET, I had to write PHP code that worked with the encryption schemas being used. One of the main problems I ran into was the use of padding, in my case pkcs7 which was used by default in .NET. First thing to do was to make sure the encyption schemas were the same. For example, when using DES, the .NET default mode is MCRYPT_MODE_CBC. Once that was setup, I could initialize the mcrypt libraries.

 

$module = mcrypt_module_open(MCRYPT_DES, '', MCRYPT_MODE_CBC, '');
 
if($module === false)
die("DES module could not be opened");
 
$blockSize = mcrypt_get_block_size(MCRYPT_DES, MCRYPT_MODE_CBC);

 

The $blockSize variable is used later for padding and padding removal using pkcs7. Next to encrypt data I had to implement the following:

 

//encryption
$key = substr($key, 0, 8);
 
$iv = $key;
$rc = mcrypt_generic_init($module, $key, $iv);
 
//apply pkcs7 padding
$value_length = strlen($value);
$padding = $blockSize - ($value_length % $blockSize);
$value .= str_repeat( chr($padding), $padding);
 
$value = mcrypt_generic($module, $value);
$value = base64_encode($value);
mcrypt_generic_deinit($module);

 

//value now encrypted

Basically, the encryption scheme the .NET side was using was set the iv to the key, pad data, encrypt data, then base64 encode data. So here I’ve done the same thing in PHP. Now I needed to do the exact same thing for decryption:

//Decryption
$key = substr($key, 0, 8);
$iv = $key;
$rc = mcrypt_generic_init($module, $key, $iv); 
 
$value = base64_decode($value);
$value = mdecrypt_generic($module, $value); 
 
//apply pkcs7 padding removal
$packing = ord($value[strlen($value) - 1]);
if($packing && $packing < $this->_blockSize){
    for($P = strlen($value) - 1; $P >= strlen($value) - $packing; $P--){
        if(ord($value{$P}) != $packing){
            $packing = 0;
        }//end if
    }//end for
}//end if 
 
$value = substr($value, 0, strlen($value) - $packing); 
 
mcrypt_generic_deinit($module); 
 
//value now decrypted

This is basically the same as encryption but in reverse. The only real difference is the pkcs7 padding removal. Hopefully this tidbit helps a few others out there who run into encrypt and decryption issues between .NET and PHP.