:: urbansheep (urbansheep) wrote,
:: urbansheep

[ QU ] phpHoo Tutorial



Jonathan Eisenzopf has done a great job showing how easy it can be to create a Yahoo like directory using Perl. perlHoo's elegant design makes for a very useful Web application that adds significant value to a Web site, while maintaining a high degree of simplicity and ease of maintenance.

Laurence Gold has shown us how easy it can be to create the same simplicity and elegance using ASP technology. Although the end results (perlHoo and Yahasp) are models of simplicity, they both have made design decisions that, ultimately, lead to the failure of both of these programs as viable long-term solution to the "Yahoo like directory" goal. This tutorial will address and solve these limitations by using PHP and a relational database. (MySQL). In part I of this tutorial, we'll start out with the basics of using MySQL with PHP, and create a program with the same functionality found in part II of the perlHoo tutorial. We'll also be adding a few new features.


Designing phpHoo

A program's initial design is probably the most critical phase of programming. When you sit down to write a program, you must take the time to design it correctly from the very beginning. Take a look at the design decisions of perlHoo and Yahasp.

The design criteria was as follows:

  1. Organizes information hierarchically
  2. Contains information about resources on the Internet
  3. Allows users to suggest new resources
  4. Allows editors to modify the directory content
  5. Is simple to maintain
  6. Takes less than 1 week to develop
  7. Exports and imports directory content from/to multiple sources
The perlHoo initial design solution to accomplish goal #1 was to create a directory and file structure, where each Category became a directory on the file system, and links were stored in text files. This solution is not scalable and is not feasible for sites with limited resources.
  1. Your provider may place a cap on the number of files you are allowed to have. (Quotas) The number of files and directories your programs require now become very important.
  2. Regardless of how the program is written or what language you use, file system access will always be the single most resource intensive operation in your program. Your program performance will be directly proportional to the number of file system calls it makes. The more files accessed by the program, the slower it gets. Guaranteed. Creating a program designed around file system calls is a recipe for disaster.
  3. The more useful your directory becomes (more categories with more links), the slower it will be. (More files and directories to search through)
And that is the magic word, isn't it? Search. Program design becomes even more important when you consider the ultimate goal of your program - to eventually implement some means of searching this directory for the information you want. Face it, Yahoo's site would be virtually useless without the ability to search it. Using flat-text files and dozens of directories are no longer viable options when you consider this goal.

phpHoo Download source | Try It
phpHoo is a PHP version of PerlHoo. It is a bit more advanced and requires the MySQL database.


This tutorial will cover the basics of integrating PHP with a database, in this case, MySQL. MySQL is arguably the fastest SQL server available, and the price can't be beat: It's free for non-commercial purposes. (See the MySQL license for more details). It's my personal favorite amongst SQL engines when performance and development time are factors. The original perlHoo was developed in less than a week. phpHoo was developed to match perlHoo part I in -three hours-, and exceeded perlHoo part II in less than 6 hours over the course of 2 days.


Notes about this Tutorial

PHP is a more structured language than Perl. As such, a linear discussion of phpHoo's internals is not possible. perlHoo is a "top down" linear program. The program begins executing on line one and progresses through the code until the end, with required subroutines at the bottom of the file. PHP is not a linear language. It requires that all user defined functions and methods be defined before they can be called, which means that they appear at the beginning of the source file and the logic which uses them is at the bottom of the file. phpHoo is also a much larger tutorial with a more complex program design. We're covering a lot of topics here, from basic MySQL functionality and database design concepts to PHP constructs, Classes and Object Oriented programming. We'll begin by creating the MySQL database and a Class file to work with that database. After we've done all of that, we can actually sit down and work on phpHoo's internals.


Getting PHP and MySQL installed

This is a usage tutorial and as such, instructions for the installation of PHP or MySQL is beyond the realm of this tutorial. PHP and MySQL will both work on Win32 or Unix platforms. MySQL can be obtained from the MySQL home at http://www.mysql.com. PHP can be obtained at http://www.php.net/. Both sites include pointers and instructions on how to install the software and common problems associated with their install or usage. There are also web archives of the PHP and MySQL mailing lists, which can be a treasure trove of information on how to install and use the software. One searchable site that archives all the major PHP and MySQL mailing lists is at Progressive-Comp, http://www.progressive-comp.com/Lists/.


MySQL - Creating our database

The first thing we'll need to do with MySQL is create our database. The easiest way to do this is to first connect to the MySQL server:

mysql -u username -p
Enter password: *******
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 387 to server version: 3.21.32a

Type 'help' for help.


You should now be at the MySQL prompt. If you receive an error at this point, it may be that you entered your database username or password incorrectly. Check with your server administrator to resolve this problem.

To create our database, issue the create command:

mysql> create database phpHoo;
Query OK, 1 row affected (0.00 sec)

If you do not receive the "Query OK" response, check the syntax of your create command. If you've entered it correctly but still receive errors, it may be that you have not been granted "Create" permissions in the database permissions table. If this is the case, you'll need to discuss the problem with your server administrator. Teaching you SQL administrative tasks is beyond the scope of this tutorial, so we'll assume your database is created.

Now if you log off (quit) the MySQL server and want to access this database directly, the command would be:

mysql -u username -p phpHoo

For now, let's use the same session. To tell MySQL to use the new database, issue the 'use' command:

mysql> use phpHoo;
Database changed

MySQL will use the phpHoo database for all future queries and operations. We are now ready to start creating our tables. In relational databases, data is stored in a table->column->row->element format. It can best be illustrated as a grid:

TABLE    -----------------------------------------
COLUMN   -   CatID  -  CatName   -   CatParent   -
ROW 0    -     1    -    Art     -      NULL     -
ROW 1    -     2    -  Computers -      NULL     -
ROW 2    -     3    -  Religion  -      NULL     -
etc..    -----------------------------------------

This is a graphic (albeit poor) representation of the Table->Column->Row->Element data structure. In this case, it's the Categories table that we're about to create. The category name of row 1 of this structure would look like this:

    Table       ->    Column    ->    Row #    ->    Element
    Categories  ->    CatName   ->    Row 1    ->    Computers

Now that you have a clearer idea of how our data is going to be represented, we need to design and create the structure of our Categories table. Here is the CREATE statement we'll use:

    mysql> create table Categories (
           CatID bigint(21) NOT NULL auto_increment,
           CatName varchar(32) NOT NULL,
           CatParent bigint(21),
           PRIMARY KEY(CatID),
           UNIQUE (CatName)
    Query OK, 0 rows affected (0.00 sec)
This table, named Categories, will have 3 Columns, a unique numeric identifier for this category ( CatID ), A unique variable length string for the category name ( CatName ), and a numeric identifier for this categories "Parent Category" ( CatParent ). Let's take a look at the reasons behind the decisions made here.

Since the Category listing is going to be hierarchical, there are going to be Categories with sub-categories. Those sub-categories are quite possibly going to have more sub-categories. Each category needs to have it's own unique identifier AND sub-categories need to keep track of their parent categories.

CatID bigint(21) NOT NULL auto_increment,

This defines a column named CatID. The element for this column will be a big integer up to 21 bytes (digits) in length. 'NOT NULL' tells MySQL that this element can never be empty. 'auto_increment' tells MySQL to automatically increment this ID number by 1 every time we add a new category. So the first category we add is going to be CatID #1, the second will automatically be CatID #2 etc.

CatName varchar(32) NOT NULL,

Each category needs to have it's own name. 'varchar(32)' tells MySQL that this element will be a variable length string, with a maximum of 32 characters. If you feel this is too short for your Category names, feel free to make it a larger size. Once again, we use 'NOT NULL' to tell MySQL that this element can never be empty. (This is a bit of a misnomer, since MySQL secretly ignores the "NOT NULL" directive on string columns, but it's still a good practice to specify it as NOT NULL since not all SQL servers handle string columns in this fashion.. now back to our regularly scheduled program...)

CatParent bigint(21),

This is where we store this Categories "Parent" category ID. In our working example, the "Computers" category has several sub-categories. (Dynamic HTML, E-Commerce, etc..) When we enter these sub-categories, the database needs to know that 'E-Commerce' is a "child" to 'Computers', so we use the 'CatParent' column to keep track of these category associations. Notice with this entry, we've omitted the 'NOT NULL'? This is because in our hierarchy, we're going to have several Categories that are at the "Top" level, and won't have a CatParent. For these categories, we'll assign a NULL value to the CatParent column to show that these are Top Level categories.


Although database indexing is beyond the scope of this tutorial, it's a good idea to always assign a PRIMARY KEY to your tables. This tells MySQL what element in the table is the item that other tables and associations will key off of. The element that you assign to be the Primary Key must always be unique, so we'll assign it to the CatID, since that's automatically incremented by MySQL and will never have more than one entry for each Category.

UNIQUE(CatName) );

The last item in our table tells MySQL that we never want to duplicate the 'CatName' column. You may or may not want to do this in your own table, but having two Categories with the same name is going to lead to problems. Let's avoid that by making it UNIQUE. Now if you try to add more than one category with the same name, MySQL will refuse to enter it. This completes the structure of our 'Categories' table, let's move on to the second table in our database.


The Links Table

Let's create a table to hold all of our Link data. First, the actual command:
    mysql> create table Links (
           LinkID bigint(21) NOT NULL auto_increment,
           CatID bigint(21) NOT NULL,
           Url varchar(255) NOT NULL,
           LinkName varchar(64) NOT NULL,
           Description varchar(255) NOT NULL,
           PRIMARY KEY (LinkID),
           UNIQUE (Url)

Most of this you should be able to discern for yourself. Every link in the database will have it's own unique identifier (LinkID). Every link will be associated with a Category ID (CatID). Each Link will have a Url, LinkName, and Description using variable length strings. The primary key for this table is the LinkID, and we set the Url to be UNIQUE so that no two entries in the database can contain the same Url.

Something to keep in mind when designing your tables. Don't try to think of everything you might need in your data structure at the outset. Your data structures are always going to change while you're programming as you think of things you need in the database. You can come back and modify these tables at any time, and in fact, we're going to be changing the Links table later in this tutorial.

Our data structures are completed, so let's give them some data to work with. (Server's responses omitted)

mysql> insert into Categories (CatName) values ('Art');
mysql> insert into Categories (CatName) values ('Computers');
mysql> insert into Categories (CatName) values ('Religion');

This creates 3 top level categories. Since we did not specify a 'CatParent' for these entries, the CatParent entry was set to NULL automatically. We still need to create some sub-categories under the 'Computers' category but before we do so, we need to know the 'Computers' category ID number:

mysql> select * from Categories;
| CatID | CatName      | CatParent |
|     1 | Art          |      NULL |
|     2 | Computers    |      NULL |
|     3 | Religion     |      NULL |
3 rows in set (0.01 sec)

So 'Computers' is Category ID number 2. We can now assign the CatParent of the sub-categories:

mysql> insert into Categories (CatName,CatParent) values ('Dynamic HTML',2);

Let's see if everything is still OK:

mysql> select * from Categories;
| CatID | CatName      | CatParent |
|     1 | Art          |      NULL |
|     2 | Computers    |      NULL |
|     3 | Religion     |      NULL |
|     4 | Dynamic HTML |         2 |
4 rows in set (0.01 sec)

Let's fill in the rest of the "Computers" sub-categories:

mysql> insert into Categories (CatName,CatParent) values ('E-Commerce',2);
mysql> insert into Categories (CatName,CatParent) values ('Graphics',2);
mysql> insert into Categories (CatName,CatParent) values ('Internet',2);
mysql> insert into Categories (CatName,CatParent) values ('Javascript',2);
mysql> insert into Categories (CatName,CatParent) values ('Perl',2);

If we want to know what sub-categories are under Computers, we could do this:

mysql> select * from Categories where CatParent = 2;
| CatID | CatName      | CatParent |
|     4 | Dynamic HTML |         2 |
|     5 | E-Commerce   |         2 |
|     6 | Graphics     |         2 |
|     7 | Internet     |         2 |
|     8 | Javascript   |         2 |
|     9 | Perl         |         2 |
6 rows in set (0.00 sec)

To find the names of all the "Top" level categories, we need to write our select statement to handle NULL values. We can't use "where CatParent = NULL" because it's an oxymoron. ("Something" can never equal "Nothing") So we need to use the "IS NULL" construct:

mysql> select * from Categories where CatParent IS NULL;
| CatID | CatName   | CatParent |
|     1 | Art       |      NULL |
|     2 | Computers |      NULL |
|     3 | Religion  |      NULL |
3 rows in set (0.01 sec)

Now that we have some data in the database to work with, let's get down to writing our program to access this data.


Enter PHP

PHP is an embedded scripting language similar to ASP in that the programs are written as HTML pages. The code in the page is executed along with any HTML parsing. One of PHP's major strengths is it's ability to easily integrate with many different SQL servers. The most popular pairing is PHP+MySQL. The other major advantage of PHP is that it was specifically designed from the ground up to work on the Web. Remember, this program was developed to match perlHoo part I in less than 3 hours? This is because PHP makes it very easy to work on the web, as you're about to see.

Classes and Object Oriented Programming

The Class, as defined in most object oriented languages, is a mechanism by which a programmer expresses the design and manipulation of a custom data type. In this case, the data type our class describes and manipulates is our phpHoo database. The Class's combined variables and methods, is known as an Object. The concepts behind Object Oriented programming can get very confusing for new programmers, so let's give a real world analogy to OOP.

The next time you walk into a video store, start thinking of it as a giant sized Object. The store is a living, breathing, thinking thing. The store knows about every video sitting on it's shelves. It knows when new releases are going to be placed on a shelf. It knows what movies are available, which ones have been checked out, and when they're due back. All of the employees in this "Video Store Object" are actually methods and functions used to manipulate the store's videos and sales. (It's data) There's an employee that opens (Creates) the store in the morning. (The Object's "Constructor Method"). There are employees that restock the shelves, employees that handle checking out a video, inventory, receiving new video shipments - every method required to keep the "Object" functioning properly.

As the programmer (Customer) that accesses this Object, you could care less -how- it does any of this. You only care about the "public" methods available to you as the user of this Object. You can ask an employee (method) if a video is available, and that employee will answer your question. You don't care how the employee obtains the answer to your question. You don't care if it has to look up the video in the database, or if it already knows the answer. This is "abstraction". The beauty of abstraction is that it saves the programmer from having to know anything at all about the inner workings of the "video store class". As we all know, there are thousands of video stores out there... but the "Object" is always the same! The Object has a different name, thousands of completely different videos (data), but guess what?

The methods you use on the "Video Store" object are always the same!

This get's better though, since this same "Object" can be used on any store! Grocery Stores, Clothing stores, automotive stores, it doesn't matter. The data changes, the location changes, the name changes, but we can use the same Object to work with any of these stores. There's always some guy that shows up at dawn and opens the store. There are always employees stocking the shelves and maintaining inventory and there are always employees answering questions from annoying programmers. The true power of Object Oriented programming is it's ability to adapt. The same Class used to create and maintain the Video Store is the same Class used to create and maintain the Chevy dealership on the corner. This is why Object Oriented programming has taken the programming community by storm. The ability to re-use the same "object" over and over again, the ability to have multiple independent copies of the same object at the same time, all without writing a single additional line of code, is programming nirvana.

The first thing we'll need to do in our phpHoo program is to create the database routines PHP will use to manipulate the data in our database. I decided to encapsulate all of the MySQL access for this tutorial into a Class file. Class files are similar in concept to perl Modules like the CGI.pm perl module. They are an easy way of organizing functions that deal with the same data. We're going to create a PHP class to handle the majority of the MySQL data.

Lines 3 thru 11 of mysql.php3 create the initial Class definition, and set up a few global variables. The four variables you'll need to work with are DBASE, USER, PASS, and SERVER. DBASE is the name of the database we'll be working with. (phpHoo). The USER and PASS variables should contain your database username and password respectively. The SERVER variable contains the method used to connect to the MySQL server. If the web server and the MySQL server are being handled on the same machine, it'll be set to 'localhost'. If however the MySQL server is hosted by a different machine, you'll need to specify that hostname in the SERVER definition. Once you've finished defining these 4 variables, we're done with the configuration of phpHoo! No paths, no file specifications, this is it.

Lines 10 and 11 and 13 set up some more "global" variables in this Class, TRAIL, HITS, and AUTOAPPROVE. Lines 15 through 21 is our first "method" in the class. The error() method is an ultra-simplistic error routine. If we encounter an error anywhere in this Class we'll call this method to output the error message and exit the program, so obviously we're only going to call this method if something truly catastrophic occurs.

The init() method on lines 23 through 37 is the first method we'll call in the MySQL Class. Lines 25 through 28 grab the global variables USER, PASS, SERVER, and DBASE. Pay particular attention to the way we obtain these values.

Line 25: $user = $this->USER;

Before we continue with the program, let's talk a bit about Object Oriented Programming (OOP), and how it applies to PHP. If you're not familiar with Object Oriented Programming, please stop here and take a look at the SIDEBAR.

All functions in PHP, as in most structured languages, are self-contained. The variables a function manipulates exist only within that function. This is called "Variable Scope". Here's a very quick example of this:

function tweak_Vals($a,$b)
    echo "
\nFunction Start
\n"; echo "A equals [$a], B equals [$b]
\n"; $a = $a + $b; $a = $a * $a; $b = $a - $b; echo "A now equals [$a], B is now [$b]
\n"; return $a; } $a = 2.2; $b = 5.0; tweak_Vals($a,$b); echo "Function Returned

\n"; echo "But A is still [$a] and B is still [$b]
\n"; echo "Now if we make an assignment...
\n"; $a = tweak_Vals($a,$b); echo "A becomes [$a] yet B is still [$b]

When this program runs, it outputs the following:

    Function Start 
    A equals [2.2], B equals [5] 
    A now equals [51.84], B is now [46.84]
    Function Returned
    But A is still [2.2] and B is still [5] 
    Now if we make an assignment...
    Function Start 
    A equals [2.2], B equals [5] 
    A now equals [51.84], B is now [46.84]
    Function Returned
    A becomes [51.84] yet B is still [5] 

As you can see, functions work on the values they are given or obtain, but do not make changes to those variables outside of the function's "scope" unless we force it to. This is a MAJOR difference between PHP and Perl. In Perl, variable scope is optional. In PHP, it's mandatory.

[ Side note: Notice what PHP did with the float value '$b', which was set to '5.0'. PHP will quietly and automatically make type conversions like this on an as-needed basis. Since integer math is always faster than floating point math, PHP converts a value to an integer whenever it can get away with it. Although we assigned $b to '5.0', a floating point decimal value, PHP recognized it as a true integer and treated it as such until decimal math was required (in the function). ]

Since we want the "USER", "PASS", and other global variables to be available at all times, we define them as global with the 'var $VARIABLE' construct at the beginning of the class. However, since each function is autonomous, we need to tell the function where to get at those global variables. In PHP, the keyword used to describe the "current instance of this class" is the '$this' construct. '$this->USER' tells PHP that the variable exists within this instance of the class and is otherwise self-contained. '$this' can also point to other methods within the class. Be sure to keep this in mind (pun intended) when writing your function names and variable names to avoid conflicts.

'$this->method-name()' is used to have one method within an instance of the class access another method within the same instance. (If you just got hopelessly confused, it's time to stop here and go read the SIDEBAR. One method accessing another method is analogous to one employee asking another where a video is within the store).

The init() method grabs the global variables and gives them local scope on lines 25 through 28. We can use PHP to open a connection to the database on line 30 using these values. If the mysql_connect() function succeeds, it returns a positive integer value which we assign to '$conn'. '$conn' is going to be our "Connection Identifier". All future calls to the MySQL database require this Connection Identifier. If the connection fails, it returns a "false" value and the error() method is called. The error() method on lines 15 through 21 assemble a human-readable error message and prints it out to the client (web browser). Since we only plan on calling the error() method if something truly catastrophic occurs, this method will exit the program if it's called.

If the mysql_connect is successful, we use the mysql_select_db() method to tell PHP which database on the MySQL server to use. Notice that the method requires our connection identifier? This is why the lack of a connection identifier is considered a fatal error. Everything in our program from this point on requires the database and if it's not there, there's no point in continuing with the program is there? The mysql_select_db() method returns TRUE upon success and FALSE upon failure, and if it returns false, it's considered a fatal error and the program exists with the error message "Dbase Select failed", along with the exact error number and mysql error message. If everything is successful in the init() method, we assign the global variable $CONN to hold our Connection Identifier and return a true value from the method.


The Categories Table

Now that we have our setup routines completed, we can actually start writing some code to work on the data. The two major SQL routines we'll be working with are SELECT and INSERT. The select statement in SQL is the means by which data is retrieved from the database. The insert statement in SQL is the means by which data is placed in the database. Since the lion's share of the work is done by select, we'll go over that first.

Lines 46 through 69 will handle almost all of the select statements we send to the database. Since this class is intended to abstract as much of the grunt work as possible, our "select" method is going to be completely self sufficient. It will send our SQL query to the database, retrieve the values (if any) and always return a multi-dimensional array containing the query results. By always knowing what kind of data we're going to get back, it makes writing our code much easier. Our "select" method is going to accept one piece of data, a pre-written SQL query string. Calls to this method will look something like this:

$sql = "SELECT * FROM Links";
$results = $this->select($sql);

The results of our query ("SELECT * FROM Links") will always return one of two values - FALSE upon failure or empty data set, or, a multi-dimensional array of values. The first two things we do in the "select" method is validate our data. If the SQL query string is empty, we return false immediately since there's nothing to ask the database. Like-wise if the SQL query statement isn't a "select" statement, we don't want to use this method at all so we check for those two things.

We then check to make sure we have a connection to the database on line 54. If we do, we use it on line 56 to actually send the query to the MySQL server and store the results in the '$results' variable. Line 56 does the majority of work in our program, it also happens to be a point of confusion for new PHP programmers, so let's spend some time with it.

56 $results = mysql_query($sql,$conn);

Seems simple enough, doesn't it? Send the SQL query to the server using the connection identifier, and store the results in the '$results' variable.

Oops. It doesn't do that. This is the single most common assumption made by new PHP programmers. The mysql_query() function does send the query to the server, but it does NOT return the results of that query!. More questions are posted to the PHP mailing lists and forums about this particular misconception than any other so it's very important to get this clearly understood. mysql_query() returns a result identifier as a positive integer value. To actually obtain the results of our query, we have to use some more functions that work with this result identifier.

Please be sure you fully understand that 'mysql_query' does NOT return the results of our query before continuing. It's very important. ALL mysql SELECT query operations return RESULT IDENTIFIERS and NOT the RESULTS. It may seem like I'm beating a dead horse here but believe me when I tell you this is the single most common error new PHP / MySQL programmers make.

OK - now that we have this firmly established, the $results of our query is just a result identifier. Since it's a SELECT statement, it must return a positive integer value. If the result identifier is not a positive integer value, then something was wrong with our SQL and the server returned an error. Since MySQL organizes data in Row->Column->Element format, we need to loop through all the returned rows and retrieve the returned column names and elements. The loop on lines 63 through 67 grabs each row of the results (mysql_fetch_array), stores them in a multi-dimensional associative array, clears the results on line 68, and returns the array on line 69. That's all there is to it. The two most common "fetch" routines are mysql_fetch_row() and mysql_fetch_array(). The major difference between the two is that mysql_fetch_row returns an enumerated array of column elements while mysql_fetch_array returns the row in an associative array where each Column name is a key, and the column data becomes that key's values. Since it's always easier to refer to columns by their names, we'll use mysql_fetch_array().

As you can see, retrieving data from the MySQL database is very easy. Placing data in the database is just as simple and uses almost the exact same function. The insert() method on lines 72 through 86 handle the insertion of data in the database. It should be noted here that all MySQL operations (INSERT, DELETE, UPDATE, and SELECT ) use the mysql_query() function. The difference is that where SELECT returns a result identifier, INSERT, DELETE, and UPDATE return TRUE or FALSE only. There is no result identifier for these operations. When we INSERT data into a table with an auto_incremented column, it is sometimes useful to know what the result of the auto_increment was. We can obtain this value using the mysql_insert_id() function, which returns the ID generated for an auto_incremented field after an INSERT operation. We'll use that as our return value. Since the insert ID is going to be a positive integer value, we can use it in a true/false test or use it's value in other queries.

With the completion of our insert method, we've completed all of the MySQL specific routines required to get phpHoo off the ground. The rest of the program will create SQL query statements, send them to "select" or "insert", and make decisions based on the results.

phpHoo is organized by Category, so the first function we'll need is one that retrieves all of the "Top" level categories. The get_Cats() function on lines 92 through 103 is what we need. We'll write this function so that it can handle the retrieval of Top Level categories, or sub-categories. Take a look at line 92.

function get_Cats ($CatParent= "")

This sets up a default value for the functions only argument, $CatParent. The default "empty" value and the logic on lines 94 through 99 allow us to grab all the top level categories by calling the get_Cats() method without any arguments, which results in the following SQL query:

SELECT CatID,CatName FROM Categories WHERE CatParent IS NULL

Remember that the Computers category is Category ID number 2? If we wanted to retrieve all of the sub-categories to "Computers" we would call get_Cats(2), which would result in the following SQL query:

SELECT CatID,CatName FROM Categories WHERE CatParent = 2

Notice how small this function is? By putting the lion's share of the MySQL logic within the Select and Insert methods, we've greatly reduced the amount of work required for other methods that require those operations. Line 100 of get_Cats formats the SQL query string, line 101 sends it to our select() method, and line 102 returns the results. Three lines of code to perform database operations from here on out.

This being the case, we're now going to pick up the pace. One of the things we'll need to do in phpHoo is find out the Parent category of any given sub category (if any). For instance, if we know the ID of the Perl category (Category ID #9) , we need to find out what the Parent CatID is. We can do that with the following SQL query:

    mysql> SELECT CatParent FROM Categories where CatID = 9;
    | CatParent |
    |         2 |

Hmmmm, but what is the parent for CatID 2?

    mysql> SELECT CatParent FROM Categories WHERE CatID = 2;
    | CatParent |
    |      NULL |
    1 row in set (0.00 sec)

OK, now we know that the hierarchy to the Perl sub category is:

    NULL      2       9
    TOP   Computers->Perl

So, given a specific CatID, we now know how we're going to find out where we are in the hierarchy, and what the "bread crumb" trail is back to the "Top". We also know that in order to obtain this information, we submitted the same SQL query more than once, using the results of the previous query to create the next one. That's the definition of a recursive loop. (One iteration of the loop creates the data needed for the next iteration of the same loop.)

Now is the time to show you a bit of "magic" programming that will make you the envy of all your friends. Every time phpHoo runs, it's going to need to find out "where" it is in the hierarchy, which means making multiple SQL queries to determine the trail back to the Top. We'll use two functions to do this. Lines 106 through 112 is the get_ParentsInt() function. This function clears the global variable "TRAIL", re-initializes it, and then calls another method, get_Parents(). The reason we have this little 4 line "magic method" is made clear in the get_Parents() method.



Warning, what you are about to see here is "Powerful Stuff" *tm*. As Spider Man always said, with Great Power comes Great Responsibility. Put the coffee cup down, remove the cat from your lap, and concentrate. One false step here and it'll be the last time your database administrator lets you anywhere near the server, so pay attention.

When I first sat down to write phpHoo, I dropped these recursive methods in as a matter of course. It is by -far- the fastest and most efficient means of obtaining relational data and I use it routinely. Then I debated with myself whether to leave it in what is essentially a beginners guide to using PHP with MySQL. I finally decided that the benefits outweigh the potential for disaster when you sit down to write your own programs. Done properly, recursive routines will make your server admin or host smile in relief at how efficient and clean your code is.

Done improperly and you'll be lucky to escape with your skin intact. A faulty recursive query can and will consume all the resources available on the server. All the file descriptors, all the open ports, all the memory and all the processor time. Most admins have resource limits in place to prevent this from happening, but the potential is still there. Keep this in mind before you decide on implementing these types of methods in your own programs.

I ran into a snag in one of my own programs at work one day. When I described the problem I was having, my boss and resident MySQL guru quipped "use a recursive query" and went back to doing whatever it was he was doing. I did a patented Homer Simpson "Doh!, of course" and went back to work. Not 10 minutes later we were implementing stricter resource limits on our development server. Luckily, I hadn't crashed the thing, just robbed it of all it's resources for over 3 minutes. The development server is a dual PII 450 with a gigabyte of RAM. Six lines of code was all it took.

Be careful out there.

On line 117 we start the get_Parents() method. After checking to make sure the CatID isn't a NULL value, we define almost the same SQL query shown above:

SELECT CatID,CatParent,CatName from Categories where CatID = $CatID

We're going to need the raw output from this query, not the associative array returned by the select() method, so we then call mysql_query() directly using this SQL statement. It's the while() loop on the results where the fun begins.

129         while ( $row = mysql_fetch_array($results))
130         {
131             $trail = $this->TRAIL; 
132             $count = count($trail);
133             $trail[$count] = $row; 
134             $this->TRAIL = $trail;
135             $id = $row["CatParent"];
136             $this->get_Parents($id);   
137         }

The first thing we do when we get our result row is grab the current value of the global TRAIL array. Since we just cleared it using the get_ParentsInt() method, it should be empty. We find out how many elements are in the trail with the count() function on line 132. Since an array is zero based where array element number 1 is index number 0, the number returned by "count" will actually point to the next open slot in the array, or to zero if the array is completely empty. Perfect. Line 133 drops the row data into the trail array at the next open slot in the array. It then immediately re-assigns this value to the global TRAIL array on line 134. Line 135 looks at the current returned row, grabs the value for that rows CatParent column and assigns it to the $id variable. Line 136 then calls get_Parents($id).

Wait a second, we're IN the get_Parents() method already!

Yup. You're now using the power of Recursion. This is why we're using a global variable to track the results. Relational databases are perfect breeding grounds for recursive operations. The two methods here, using recursion, can be given any category ID number at any level in the hierarchy and faster than you can say "Bob's your Uncle" the recursive query will roll through the entire database and return an array telling you exactly what the "path" is back to the top!

The dangers: Infinite Loops. Unless you think things out completely at the start, an infinite loop is not only possible, it's the most likely result of a recursive operation. You are deliberately creating a loop that calls itself! ALWAYS provide yourself at LEAST two "outs" from a recursive query - preferably three. In get_Parents(), we have a total of 3 "outs" from the loop:

    119: if( (empty($CatID)) or ("$CatID" == "NULL")) { return false; }
124-127: if( (!$results) or (empty($results)) ... (return false)
    129: while ( $row = mysql_fetch_array($results))

None of these will save you if your logic is flawed however. Recursion is one of the most powerful tools I can show you, and it is by far the most dangerous. See (SIDEBAR)

The rest of the Category methods, get_CatIDFromName() and get_CatNames() are two quick methods we'll need later in our program. The two methods are used to obtain relational data. get_CatNames($CatID) is a multi-purpose method. Given no arguments, it will return an array of all the available Category Names. Given a specific Category ID, it'll return only that Categories CatName. If asked for CatID zero it'll return "Top". This completes the Category section of the MySQL class. Now we can concentrate on functions for use in the Links table.


The Links Table

The methods we use on the Links table are going to look very familiar. There are only two methods we need for the Links table. (Actually 3 if you count the one that adds links to the database, but we'll work on that later.)

Lines 174 through 186 comprise the get_Links() method. Given a specific category ID, it will return an array of all the Links in that category. If the category ID is not specified or is zero, it returns the Links found in the "Top" category. Very simple, very easy. The select() method we wrote earlier is proving it's worth as we write more methods that work on the database. Most of the work is already done for us. New methods just need to create an SQL query statement and send it off to the server.

Lines 188 through 198 comprise the get_CatFromLink() method. Given a specific LinkID, this method will tell us what category ID the Link belongs to. Again, a very simple method made even easier by the select() method we wrote earlier. This method will be used when we search the database. The search routine will return Link data, and we want to be able to tell the user what category that Link was found in.

There are two more methods in the MySQL class, suggest() and search(). Since these two methods won't make much sense to the reader without a better understanding of what's happening in phpHoo, we'll go over phpHoo's code before coming back to these two methods.


phpHoo - Getting the data to the web.

Congratulations! The majority of our work is done. The MySQL database, and the Class we wrote to handle it's data, is the lion's share of the work to be done to make phpHoo a reality.

The first thing to do in our PHP program is include the MySQL class and connect to the database. Lines 1 through 8 handle the inclusion of our MySQL class. Line 4 creates the MySQL "Object", which we assign to the variable '$db'. All future calls to methods or variables within the MySQL class will require the Object Identifier '$db'. Lines 5 through 8 shows an example of this:

    05 if(!$db->init()) {
    06  echo "Well this sucks<BR>\n";
    07  exit;
    08 }

Line 5 calls the init() method, which as you'll recall creates a connection to the MySQL server and selects the phpHoo database. The check here for success is really not needed, since if the connection fails the program will exit. It's still good practice to check for it anyway.

Remember at the beginning of the tutorial I described that PHP is -not- a linear top-down scripting language. All functions must be created before they can be used. So the very next thing we encounter is the function that creates our 'bread crumb' trail at the top of each page phpHoo generates. The breadcrumb() function on lines 10 through 30 tells our visitors where they are within the hierarchy of the phpHoo database.

Lines 12 and 13 are new. Remember variable scoping rules? A function cannot use variables outside it's own scope unless it is given those variables in it's arguments, or obtains them from an outside source. In this case, we tell the breadcrumb() function that $db and $PHP_SELF are global variables.

$PHP_SELF: One of those things that makes PHP programmers happy to work on the web. Unlike Perl and ASP, PHP was written specifically for use on the web. Remember this line from perlHoo?

my $baseurl = '/cgi-bin/perlhoo.pl';

And these lines from Yahasp?

Const website = ""
Const baseURL = "/yahasp/yahasp2.asp"

Both of these lines hard code the path and filename to their respective programs. This means that if you decide to change the name of the program or change it's location, you'll need to edit the source to make it work again. The PHP developers decided, in their infinite wisdom, that this was a fairly silly idea. PHP programs -always- know how to find themselves again by using the constant $PHP_SELF environment variable. This means that a PHP program, if written properly, can change it's name or location - even move to an entirely new server, and not have to change a single line of code.

Line 16 of breadcrumbs() calls our recursive query get_ParentsInt($CatID). The idea is that, given a specific category ID, the breadcrumbs() function will find the trail back to the top. The category ID submitted to this function is the "current" category the user is visiting. Since get_ParentsInt doesn't return any data, we have to grab the results from the $db->TRAIL variable. The '$path' we get back is an associative array. The while loop between lines 20 and 25 walk through the array and create the HTML needed for each segment of the breadcrumb trail. The "first" key and value we get is the "last" part of the trail, so on line 24 we assemble the trail by recursively adding segments to the beginning of the trail. We return the finished html on line 29. Part of this HTML is the "search" form, which we'll go into detail on later in the tutorial.

Line 32 through 54 is the "start_page()" function. This function is mostly just HTML output - starting the page using a dynamic $title, optional client $msg, and the current category ID number. Using the CatID we call breadcrumbs() from within this function and output the $trail as the last part of the start_page() function.

Lines 56 through 97 , "start_browse()", is the main function in phpHoo. Lines 58 and 59 again make $PHP_SELF and $db available to the function. start_browse() accepts a CatID and calls the two MySQL methods get_Cats($CatID) and get_Links($CatID). Remember that get_Cats actually gets the sub categories for the specified CatID. If CatID is empty or not specified, get_Cats() and get_Links() will both return the "Top" level Categories and Links. Lines 64 through 72 comprise some simple logic checking for "Top" level results.

Lines 74 through 93 look complex, but it's all very simple. Based on the results of the two arrays returned by get_Cats and get_Links, we walk through the returned arrays and output the HTML associated with the Categories and Links for this page. If the two arrays are empty, we won't print anything. Lines 94 through 96 finish off the HTML output by offering the "suggest" link.

Line 100 is where program execution actually begins. Before we can go into much detail here, we need to look at how PHP handles Form data.


Form Data in PHP

Unlike Perl and ASP, PHP was written for the web. (Yes, beating a dead horse is one of my favorite recreational activities). The perl version of this program requires the CGI.pm module to handle the query string and form elements. Yahasp is closer with it's "Request" objects, but PHP makes it even easier. PHP automatically parses POST and GET form methods and assigns variables with the forms "NAME" attribute. here's an example:

      echo "<FORM METHOD=POST ACTION=\"$PHP_SELF\">\n";
      echo "<INPUT TYPE=TEXT NAME=HiThere VALUE=\"\">\n";
      echo "<INPUT TYPE=SUBMIT NAME=SUBMIT VALUE=\"Submit\">\n";
          echo "Results of your form submission<BR>\n";
          echo "You entered [$HiThere] in the form<BR>\n";

This simple program outputs an HTML form. When you enter something in the text box and hit submit, it will print the form and also tell you what you entered in the HiThere box. When the form is submitted, PHP goes through the entered form elements and assigns variables using the form element's NAME. The value of the variable will be whatever the user entered into the form. As an example, if you entered 'PHP Rules' in the text box and hit submit, you would see the form printed as well as:

Results of your form submission
You entered [PHP Rules] in the form

Now back to phpHoo. On line 101 , we check to see if there's a query string available. On line 103 , we check for two conditions: There is a '$viewCat' variable, or, there is no query string or form POST entry. If either of these conditions are true, we call start_page($viewCat) and start_browse($viewCat). If there is no $viewCat, the two functions will print out the very beginning of phpHoo. If there is a $viewCat, (which would be a category ID number) we'll see that specific category.

On line 109 , we check for $add. $add is created automatically by PHP if the user has clicked on the "Suggest new link" link, which as you can see from the start_browse function looks like this:


If we're at the top, then $currentID will equal "Top", so we check for that and if this is the case we set the Category for this suggestion to zero on line 111. We could have let the link say 'add=0' and saved ourselves a check here but I though add=Top would be more intuitive for the user of the form. We then grab the category name for this category ID. We then print out the HTML form to the visitor and ask for all the usual stuff. The Url, LinkName etc.

Line 133 looks for the '$suggest' variable. $suggest is set if the user clicks on the submit button of the 'Suggest new link' form. Line 136 calls the MySQL class method with something we haven't seen yet:


$HTTP_POST_VARS is an array that is created automatically by PHP as the result of a POST action on a form. If we had set our 'suggest new link' form ACTION to GET, then we would use the $HTTP_GET_VARS array. $HTTP_POST_VARS is an associative array of Key = Value pairs created by the form NAME= and VALUE= pairs. Now it's time to take a look at the MySQL class Suggest method.


Inserting data in MySQL

The 'suggest' method on lines 244 through 277 of the MySQL class will handle the entry of data in our database. The method accepts one argument - an array of key=value pairs. (Our $HTTP_POST_VARS array). The first thing we need to do is validate all of the data. Line 251 first makes sure we have an array to work with. If there's no data or it's not an array, it immediately bails out. Lines 253 through 259 gets the data from the array by associative name and sets the SubmitDate to the current time(). Lines 261 through 265 makes sure that the data is complete. If any of these elements are missing, the method will abort the entry.

Lines 267 and 268 are special cases. At the beginning of the MySQL class, we assigned the AUTOAPPROVE global to be "true". If you want your visitors to be able to enter new links in the database automatically, leave it set to "true". If you want to manually approve each entry, set AUTOAPPROVE to "false". If AUTOAPPROVE is true, suggested links will automatically and instantly be available to your visitors with no work at all on your part. If set to false, you'll need to approve new links before they are made available to your visitors.

Remember I mentioned that trying to determine what your table needs would be from the outset is usually futile? Currently the Links table has no idea which links have been approved, and which links have not been. To tell the difference, we're going to need a new column in the Links table. Connect to your MySQL database and alter the Links table as follows:

mysql> ALTER TABLE Links ADD COLUMN Approved tinyint(8) DEFAULT '0';

Since we're accepting submissions, it would be nice to be able to track who is submitting what to our database, so let's add a few more columns:

    ALTER TABLE Links ADD COLUMN SubmitName varchar(64) NOT NULL;
    ALTER TABLE Links ADD COLUMN SubmitEmail varchar(64) NOT NULL;
    ALTER TABLE Links ADD COLUMN SubmitDate bigint(21) NOT NULL;

If you do a 'show columns from Links' command, you should see the following:

mysql> show columns from Links;
| Field       | Type         | Null | Key | Default | Extra          |
| LinkID      | bigint(21)   |      | PRI | 0       | auto_increment |
| CatID       | bigint(21)   |      |     | 0       |                |
| Url         | varchar(255) |      | UNI |         |                |
| LinkName    | varchar(64)  |      |     |         |                |
| Description | text         |      |     |         |                |
| Approved    | tinyint(8)   | YES  |     | 0       |                |
| SubmitName  | varchar(64)  |      |     |         |                |
| SubmitEmail | varchar(64)  |      |     |         |                |
| SubmitDate  | bigint(21)   |      |     | 0       |                |
9 rows in set (0.00 sec)

Our Links table can now track whether a link has been approved, and gives us the added information about who submitted the link, and when.

On line 271 of the MySQL class, we start our SQL query string and continue it over the next 3 lines. (271 through 274). Notice that all string values in our SQL query are surrounded by 'single quotes'. Line 275 calls the insert() method using this query and immediately returns the results (As either false to indicate failure, or a positive integer value for success).

Back to the phpHoo source code. Lines 136 through 143 calls the suggest() method and then, based on the results, output a "Success" or "Failure notice. Since we want the visitor to go back to the Top of the phpHoo hierarchy after a submission, we send an empty "junk" variable to the start_page() function, and no arguments at all are given to the start_browse() function.

Savvy PHP programmers are now howling about MySQL data needing to be "escaped" before being entered in the database, so let's address this issue now.


Meta Characters and MySQL

Meta Characters are characters that have special meaning to both PHP and MySQL. Meta Characters include the following:

    Dollar Sign  : $
    Back Slash   : \
    Single Quote : '
    Double Quote : "
    Asterisk     : *
    Question Mark: ?
    Period       : .
    Caret        : ^
    Plus Sign    : +
    Brackets     : [ 
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded