mwop.net :: Tag: personal

When array_key_exists just doesn't work

I've been playing with parameter testing in my various Cgiapp classes, and one test that seemed pretty slick was the following:

if (!array_key_exists('some_string', $_REQUEST)) {
    // some error
}

Seems pretty straight-forward: $_REQUEST is an associative array, and I want to test for the existence of a key in it. Sure, I could use isset(), but it seemed… ugly, and verbose, and a waste of keystrokes, particularly when I'm using the param() method:

if (!isset($_REQUEST[$this->param('some_param')])) {
    // some error
}

However, I ran into a pitfall: when it comes to array_key_exists(), $_REQUEST isn't exactly an array. I think what's going on is that $_REQUEST is actually a superset of several other arrays — $_POST, $_GET, and $_COOKIE — and isset() has some logic to descend amongst the various keys, while array_key_exists() can only work on a single level.

Whatever the explanation, I ended up reverting a bunch of code. :-(

MySQL Miscellanae

Inspired by a Slashdot book review of High Performance MySQL.

I've often suspected that I'm not a SQL guru… little things like being self taught and having virtually no resources for learning it. This has been confirmed to a large degree at work, where our DBA has taught me many tricks about databases: indexing, when to use DISTINCT, how and when to do JOINs, and the magic of TEMPORARY TABLEs. I now feel fairly competent, though far from being an expert — I certainly don't know much about how to tune a server for MySQL, or tuning MySQL for performance.

Last year around this time, we needed to replace our MySQL server, and I got handed the job of getting the data from the old one onto the new. At the time, I looked into replication, and from there discovered about binary copies of a data store. I started using this as a way to backup data, instead of periodic mysqldumps.

One thing I've often wondered since: would replication be a good way to do backups? It seems like it would, but I haven't investigated. One post on the aforementioned Slashdot article addressed this, with the following summary:

Set up replication
Do a locked table backup on the slave

Concise and to the point. I only wish I had a spare server on which to implement it!

PHP_SELF versus SCRIPT_NAME

I've standardized my PHP programming to use the environment variable SCRIPT_NAME when I want my script to refer to itself in links and form actions. I've known that PHP_SELF has the same information, but I was more familiar with the name SCRIPT_NAME from using it in perl, and liked the feel of it more as it seems to describe the resource better (PHP_SELF could stand for the path to the PHP executable if I were to go by the name only).

However, I just noticed a post on the php.general newsgroup where somebody asked what the difference was between them. Semantically, there isn't any; they should contain the same information. However, historically and technically speaking, there is. SCRIPT_NAME is defined in the CGI 1.1 specification, and is thus a standard. However, not all web servers actually implement it, and thus it isn't necessarily portable. PHP_SELF, on the other hand, is implemented directly by PHP, and as long as you're programming in PHP, will always be present.

Guess I have some grep and sed in my future as I change a bunch of scripts…

PHP: Continue processing after script aborts

Occasionally, I've needed to process a lot of information from a script, but I don't want to worry about PHP timing out or the user aborting the script (by clicking on another link or closing the window). Initially, I investigated register_shutdown_function() for this; it will fire off a process once the page finishes loading. Unfortunately, the process is still a part of the current connection, so it can be aborted in the same way as any other script (i.e., by hitting stop, closing the browser, going to a new link, etc.).

However, there's another setting initialized via a function that can override this behaviour — i.e., let the script continue running after the abort. This is ignore_user_abort(). By setting this to true, your script will continue running after the fact.

This sort of thing would be especially good for bulk uploads where the upload needs to be processed — say, for instance, a group of images or email addresses.

Practical PHP Programming

In the past two days, I've seen two references to Practical PHP Programming, an online book that serves both as an introduction to programming with PHP5 and MySQL as well as a good advanced reference with many good tips.

This evening, I was browsing through the Performance chapter (chapter 18), and found a number of cool things, both for PHP and MySQL. Many were common sense things that I've been doing for a while, but which I've also seen and shaken my head at in code I've seen from others (calculating loop invariables at every iteration, not using variables passed to a function, not returning a value from a function, not using a return value from a function). Others were new and gave me pause for thought (string concatenation with the '.' operator is expensive, especially when done more than once in an operation; echo can take a comma separated list).

Some PHP myths were also dispelled, some of which I've been wondering about for awhile. For instance, the amount of comments and whitespace in PHP are not a factor in performance (and PHP caching systems will often strip them out anyways); double quotes are not more expensive than single quotes unless variable interpolation occurs.

It also has some good advice for SQL optimization, and, more importantly, MySQL server optimization. For instance, the author suggests running OPTIMIZE TABLE table; on any table that has been added/updated/deleted from to any large extent since creation; this will defrag the table and give it better performance. Use CHAR() versus VARCHAR(); VARCHAR() saves on space, but MySQL has to calculate how much space was used each time it queries in order to determine where the next field or record starts. However, if you have any variable length fields, you may as well use as many as you need — or split off variable length fields (such as a TEXT() field) into a different table in order to speed up searching. When performing JOINs, compare on numeric fields instead of character fields, and always JOIN on rows that are indexed.

I haven't read the entire book, but glancing through the TOC, there are some potential downfalls to its content:

It doesn't cover PhpDoc It doesn't appear to cover unit testing Limited
coverage of templating solutions (though they are mentioned) Limited usage of
PEAR. The author does mention PEAR a number of times, and often indicates that use of certain PEAR modules is preferable to using the corresponding low-level PHP calls (e.g., Mail and Mail_MIME, DB), but in the examples rarely uses them.
PHP-HTML-PHP… The examples I browsed all created self-contained scripts that did all HTML output. While I can appreciate this to a degree, I'd still like to see a book that shows OOP development in PHP and which creates re-usable web components in doing so. For instance, instead of creating a message board script, create a message board class that can be called from anywhere with metadata specifying the database and templates to use.

All told, there's plenty of meat in this book — I wish it were in dead tree format already so I could browse through it at my leisure, instead of in front of the computer.

Get Firefox!

Those who know me know that I love linux and open source. One particular program that firmly committed me to open source software is the Mozilla project — a project that took the Netscape browser's codebase and ran with it to places I know I never anticipated when I first heard of the project.

What do I like about Mozilla? Well, for starters, and most importantly, tabbed browsing changed the way I work. What is tabbed browsing? It's the ability to have multiple tabs in a browser window, allowing you to switch between web pages without needing to switch windows.

Mozilla came out with a standalone browser a number of months back called, first Phoenix, then Firebird, and now Firefox. This standalone browser has a conservative number of basic features, which allow for a lean download — and yet, these basic features, which include tabbed browsing and disabling popups, far surpass Internet Explorer's features. And there are many extensions that you can download and integrate into the browser.

One such extension is a tabbed browsing extension that makes tabbed browsing even more useful. With it, I can choose to have any links leaving a site go to a new tab; or have bookmarks automatically load in a new tab; or group tabs and save them as bookmark folders; or drag a tab to a different location in the tabs (allowing easy grouping).

Frankly, there's few things I can find that Firefox can't do.

And, on top of that, it's not integrated into the operating system. So, if you're on Windows, that means if you use Firefox, you're less likely to end up with spyware and adware — which often is downloaded and installed by special IE components just by visiting sites — ruining your internet experience.

So, spread the word: Firefox is a speedy, featureful, SECURE alternative to Internet Explorer!

Get Firefox

Cgiapp Roadmap

I've had a few people contact me indicating interest in Cgiapp, and I've noticed a number of subscribers to the freshmeat project I've setup. In addition, we're using the library extensively at the National Gardening Association in developing our new site (the current site is using a mixture of ASP and Tango, with several newer applications using PHP). I've also been monitoring the CGI::Application mailing list. As a result of all this activity, I've decided I need to develop a roadmap for Cgiapp.

Currently, planned changes include:

Version 1.x series:
- Adding a Smarty registration for stripslashes (the Smarty "function" call will be sslashes).
- param() bugfix: currently, calling param() with no arguments simply gives you a list of parameters registered with the method, but not their values; this will be fixed.
- error_mode() method. The CGI::Application ML brought up and implemented the idea of an error_mode() method to register an error_mode with the object (similar to run_modes()). While non-essential, it would offer a standard, built-in hook for error handling.
- $PATH_INFO traversing. Again, on the CGI::App ML, a request was brought up for built-in support for using $PATH_INFO to determine the run mode. Basically, you would pass a parameter indicating which location in the $PATH_INFO string holds the run mode.
- DocBook tutorials. I feel that too much information is given in the class-level documentation, and that usage tutorials need to be written. Since I'm documenting with PhpDoc and targetting PEAR, moving tutorials into DocBook is a logical step.
Version 2.x series:

Yes, a Cgiapp2 is in the future. There are a few changes that are either necessitating (a) PHP5, or (b) API changes. In keeping with PEAR guidelines, I'll rename the module Cgiapp2 so as not to break applications designed for Cgiapp.

Changes expected include:
- Inherit from PEAR. This will allow for some built in error handling, among other things. I suspect that this will tie in with the error_mode(), and may also deprecate croak() and carp().
- Changes to tmpl_path() and load_tmpl(). In the perl version, you would instantiate a template using load_tmpl(), assign your variables to it, and then do your fetch() on it. So, this:
```
$this->tmpl_assign('var1', 'val1');
$body = $this->load_tmpl('template.html');
```
  Becomes this:
```
$tmpl = $this->load_tmpl();
$tmpl->assign('var1', 'val1');
$body = $tmpl->fetch('template.html');
```
  OR
```
$tmpl = $this->load_tmpl('template.html');
$tmpl->assign('var1', 'val1');
$body = $tmpl->fetch();
```
  (Both examples assume use of Smarty.) I want to revert to this behaviour for several reasons:
  - Portability with perl. This is one area in which the PHP and perl versions differ greatly; going to the perl way makes porting classes between the two languages simpler.
  - Decoupling. The current set of template methods create an object as a parameter of the application object — which is fine, unless the template object instantiator returns an object of a different kind.
    
    Cons:
    - Smarty can use the same object to fill multiple templates, and the current methods make use of this. By assigning the template object locally to each method, this could be lost. HOWEVER… an easy work-around would be for load_tmpl() to create the object and store it an a parameter; subsequent calls would return the same object reference. The difficulty then would be if load_tmpl() assumed a template name would be passed. However, even in CGI::App, you decide on a template engine and design for that engine; there is never an assumption that template engines should be swappable.
    - Existing Cgiapp1 applications would need to be rewritten.
- Plugin Architecture: The CGI::App ML has produced a ::Plugin namespace that utilizes a common plugin architecture. The way it is done in perl is through some magic of namespaces and export routines… both of which are, notably, missing from PHP.
  
  However, I think I may know a workaround for this, if I use PHP5: the magic __call() overloader method.
  
  My idea is to have plugin classes register methods that should be accessible by a Cgiapp-based class a special key in the $_GLOBALS array. Then, the __call() method would check the key for registered methods; if one is found matching a method requested, that method is called (using call_user_func()), with the Cgiapp-based object reference as the first reference. Voilá! instant plugins!
  
  Why do this? A library of 'standard' plugins could then be created, such as:
  - A form validation plugin
  - Alternate template engines as plugins (instead of overriding the tmpl_* methods)
  - An authorization plugin
  Since the 'exported' methods would have access to the Cgiapp object, they could even register objects or parameters with it.

If you have any requests or comments on the roadmap, please feel free to contact me.

New site is up!

The new weierophinney.net/matthew/ site is now up and running!

The site has been many months in planning, and about a month or so in actual coding. I have written the site in, instead of flatfiles, PHP, so as to:

Allow easier updating (it includes its own content management system)
Include a blog for my web development and IT interests
Allow site searching (everything is an article or download)

I've written it using a strict MVC model, which means that I have libraries for accessing and manipulating the database; all displays are template driven (meaning I can create them with plain-old HTML); and I can create customizable applications out of various controller libraries. I've called this concoction Dragonfly.

There will be more developments coming — sitewide search comes to mind, as well as RSS feeds for the blog and downloads.

Stay Tuned!

What's keeping that device in use?

Ever wonder what's keeping that device in use so you can't unmount it? It's happened to me a few times. The tool to discover this information? lsof.

Basically, you type something like: lsof /mnt/cdrom and it gives you a ps-style output detailing the PID and process of the processes that are using the cdrom. You can then go and manually stop those programs, or kill them yourself.

PHP and Template Engines

On PhpPatterns, I recently read an article on Template Engines in PHP. It got my ire up, as it said (my interpretation):

"template engines are a bad idea"
"templating using PHP natively can be a good idea"
"template engines… are not worth the text their written in"

Okay, so that's actually direct quotes from the article. I took issue with it, immediately — I use Smarty for everything I do, and the decision to do so was not done lightly. I have in fact been advocating the use of template engines in one language or another for several years with the various positions in which I've been employed; I think they are an essential tool for projects larger than a few pages. Why?

Mixing of languages causes inefficiency. When I'm programming, it's incredibly inefficient to be writing in up to four different languages: PHP or Perl, X/HTML, CSS, and Javascript. Switching between them while in the same file is cumbersome and confusing, and trying to find HTML entities buried within quoting can be a nightmare, even when done in heredocs. Separating the languages into different files seems not only natural, but essential.
Views contain their own logic. In an MVC pattern, the final web page View may be dependent on data passed to it via the Controller; however, this doesn't mean that I want the full functionality of a language like PHP or Perl to do that. I should only be doing simple logic or looping constructs — and a full scripting language is overkill. (I do recognize, however, that template engines such as Smarty are written using PHP, so the language is being invoked regardless. What I speak of here is the language used to compose the template.)
Abstraction and Security. The fewer internals that are divulged on the template page, the better. For security purposes, I may not want clients able to know how data got to the page, only what data is available to them. In addition, if this data is abstracted enough, any number of backends could be connected to the page to produce output.

So, reading the aforementioned article really got my hackles up. However, it got me thinking, as well. One issue raised is that PHP can be used as your templating language. While I can understand why this might be desirable — everything from load issues to flexibility — I also feel that this doesn't give enough abstraction.

Using PHP seems to me to be inefficient on two fundamental levels, based on my understanding of The Pragmatic Programmer:

Domain Langauge. The Pragmatic Programmer suggests that subsets of a language should be used, or wholly new mini-languages developed, that speak to the domain at hand. As an example, you might want to use a sharp tool to open a can; an axe would be overkill, but a knife might work nicely. Using PHP to describe a template is like using an axe to open a can; it'll do the job, but it may also make a mess of it all, simply because it's too much sharp edge for the job.
Metadata. Metadata is data about data; to my thinking, templates describe the data they are communicating; the compiled template actually contains the data. In this regard, again, putting PHP into the script is overkill as doing so gives more than just some hints as to what the data is.

The author of the article also makes a case for teaching web designers PHP — that the language is sufficiently easy to pick up that they typically will be able to learn it as easily, if not more easily, than a template language. I agree to a degree… But my experience has shown that web designers typically struggle with HTML, let alone PHP. (Note: my experience in this regard is not huge, and I'm sure that this is an exaggeration.) I find that it's typically easiest for me to give an example template, explain what the funny, non-HTML stuff can do, and let them go from there. Using this approach, they do not need to learn anything new — they simply work with placeholders.

Still, I think the author raises some fine points. I wish he'd bothered to do more research into why people choose template engines and the benefits that arise from using them before simply outright slamming them. Of course, the article is also a bit dated; it was written over two years ago, and much has changed in the world of PHP and many of its template engines. I'm curious as to whether they would feel the same way today.

Me? My mind is made up — the benefits, in my circumstances, far outweigh any costs associated. I'll be using template engines, and Smarty in particular, for years to come.