mwop.net :: Blog Posts

robots.txt

One thing I've wondered about is the syntax of the robots.txt file, where it's placed, and how it's used. I've known that it is used to block spiders from accessing your site, but that's about it. I've had to look into it recently because we're offering free memberships at work, and we don't want them indexed by search engines. I've also wondered how we can exclude certain areas, such as where we collate our site statistics, from these engines.

As it turns out, it's really dead simple. Simply create a robots.txt file in your htmlroot, and the syntax is as follows:

User-agent: *
Disallow: /path/
Disallow: /path/to/file

The User-agent can specify specific agents or the wildcard; there are so many spiders out there, it's probably safest to simply disallow all of them. The Disallow line should have only one path or name, but you can have multiple Disallow lines, so you can exclude any number of paths or files.

More SSH tips: Tunnelling

I wrote up a short tutorial today on the IT wiki about SSH tunneling. What I didn't know is that you can start a tunnel after you've already ssh'd to another machine. Basically, you:

Press Enter
Type ~C

and you're at an ssh> prompt. From there, you can issue the tunnel command of your choice: -R7111:localhost:22, for instance.

IT hiring principles

I was just reading an article about the Dean campaign's IT infrastructure, and there's an interesting quote from their IT manager, Harish Rao:

"I believe in three principles", he said. "First I always make sure I hire people I can trust 100%. Second, I always try to hire people who are smarter than I am. Third, I give them the independence to do as they see fit as long as they communicate about it to their other team members. We've had a lot of growing pains, a lot of issues; but we've been able to deal with them because we have a high level of trust, skill and communication."

I know for myself that when I (1) don't feel trusted, and/or (2) am not given independence to do what I see as necessary to do my job, I don't communicate with my superiors about my actions, and I also get lazy about my job because I don't feel my work is valued.

Fortunately, I feel that in my current work situation, my employers followed the same principles as Rao, and I've felt more productive and appreciated than I've felt in any previous job.

PHP standards ruminations

I've been thinking about trying to standardize the PHP code we do at work. Rob and I follow similar styles, but there are some definite differences. It would make delving into eachother's code much easier if we both followed some basic, agreed upon, guidelines.

One thing I've been thinking about is function declarations. I find that I'm often retooling a function to make it more general, and in doing so either need to decrease or increase the number of arguments to it. This, of course, breaks compatability.

So I propose that we have all functions take two arguments: $data and $db. $data is a hash which can then be extract'd via PHP. To change the number of arguments, you can simply set defaults for arguments or return meaningful errors for missing arguments.

Another thought going through my mind deals with the fact that we reuse many of our applications across our various sites, and also export some of them. I think we should try and code the applications as functional libraries or classes, and then place them somewhere in PHP's include path. We can then have a "demo" area that shows how to use the libraries/classes (i.e., example scripts), and to utilize a given application, we need simply include it like: include 'apps/eventCalendar/calendar.inc';. This gives us maximum portability, and also forces us to code concisely and document vigorously.

I was also reading on php.general tonight, and noticed some questions about PHP standards. Several people contend that PEAR is becoming the de facto standard, as it's the de facto extension library. In addition, because it is becoming a standard, there's also a standard for documenting projects, and this is phpdocumenter. The relevant links are:

Making RCS a little easier...

One thing I noticed today when using RCS is that it isn't terribly user friendly — you need to checkout a file to make edits. Often, I make edits, and then want to commit my changes.

So I wrote a wrapper script called revise. It makes a temporary copy of the file you've been editing, checks it out of RCS with locking, makes it writeable, moves the temporary copy to the permanent name, checks it in and unlocks it (which prompts for a log message), and then makes the file writeable for the user and group again. The script is outlined here:

###!/bin/bash
FILE=$1
cp $FILE $FILE.new
co -l $FILE
chmod u+w $FILE
mv $FILE.new $FILE
ci -u $FILE
chmod ug+w $FILE

Being the ROX-Filer centric person I am, I also wrote a quick perl script called rox-revise that I can then put in my SendTo menu. It parses the file's path, changes to that directory, and then calls the revise script on the filename, from within a terminal. This script follows:

###!/usr/bin/perl -w
use strict;

use vars qw/$path $file $TERMCMD $REVISE $ZENITY/;

### Configurable variables
$TERMCMD = "myTerm";    # What terminal command to use; must be xterm compliant
$REVISE  = "revise";    # What command to use to revise (i.e. rcs ci) the file
$ZENITY  = "zenity";    # The zenity or dialog or xdialog command to use

### Grab the filename from the command line
$path = shift;
$file = $path;

### If no file given, raise a dialog and quit
if (!$path || ($path eq '')) {
    system(
        $ZENITY, 
        '--title=Error', 
        '--warning', 
        "--text=No path given to $0; rox-revise quit!"
    );
    exit 0;
}

### Get the path to the file and switch to that directory
if ($path =~ m#/#) {
    $path =~ s#^(.*)/.*?$#$1#;
    if ($path !~ m#^/#) { $path = "./$path"; }
    chdir $path or die "$path not found!n";
} else {
### Or else assume we're in the current directory
    $path = './';
}

### Get the filename
$file =~ s#^.*/(.*?)$#$1#;

### Execute the revise statement
my $failure = system($TERMCMD, '-e', $REVISE, $file);
if ($failure) {
    # on failure, raise a dialog
    system(
        $ZENITY, 
        '--title=Error', 
        '--warning', 
        "--text=Unable to revise $file"
    );
}

1;

Now I just need to check out Subversion, and I can have some robust versioning!

SSH tips and tricks

In trying to implement some of the hacks in Linux Server Hacks, I had to go to the ssh manpage, where I discovered a number of cool tricks.

In order to get key-based authentication (i.e., passwordless) working, the $HOME/.ssh directory must be mode 0700, and all files in it must be mode 0600. Once that's setup properly, key-based authentication works perfectly.
You can have a file called config in your $HOME/.ssh directory that specifies user-specific settings for using SSH, as well as a number of host-specific settings:

Compression yes turns on compression
ForwardX11 yes turns on X11 forwarding by default
ForwardAgent yes turns on ssh-agent forwarding by default
Host-based settings go from one Host keyword to the next, so place them at the end of the file. Do it in the following order:

```apacheconf
Host nickname
HostName actual.host.name
User username_on_that_host
Port PortToUse
```

This means, for instance, that I can ssh back and forth between home using the same key-based authentication and the same ssh-to script ([more below](#ssh-to)) I use for work servers -- because I don't have to specify the port or the username.

I mentioned a script called ssh-to earlier. This is a neat little hack from the server hacks book as well. Basically, you have the following script in your path somewhere:

###!/bin/bash
ssh -C `basename $0` $*

Then, elsewhere in your path, you do a bunch of ln -s /path/to/ssh-to /path/to/$HOSTNAME, where $HOSTNAME is the name of a host to which you ssh regularly; this is where specifying a host nickname in your $HOME/.ssh/config file can come in handy. Then, to ssh to any such server, you simply type $HOSTNAME at the command line, and you're there!

RCS quickstart

Gleaned from Linux Server Hacks

Create an RCS directory
Execute a ci -i filename
Execute a co -l filename and edit as you wish.
Execute a ci -u filename to check in changes.

The initial time you checkout the copy, it will be locked, and this can cause problems if someone else wishes to edit it; you should probably edit it once and put in the version placeholder in comments somewhere at the top or bottom:

$VERSION$

and then check it back in with the -u flag to unlock it.

Linux Server Hacks

I stopped at Borders in downtown Burlington on New Year's Eve day, and found a book called Linux Server Hacks. I loved it immediately, but I wasn't quite willing to shell out $25 for such a slim volume, even if it did have many tidbits I could immediately use.

When I told my co-worker, Rob, about it, it turned out he already had the book, and brought it in to work for me to borrow the next day.

My nose has barely been out of it since. I've done such things as:

Create personal firewalls for my home and office machines. I've always used scripts for this, but the hacks for iptables showed the basics of how they work, and I've now got nice robust firewalls that are very simple scripts. To make them even more user-friendly, I borrowed some syntax from the various /etc/init.d scripts so that I can start, stop, and reload the firewall at will.
I don't use perl at the command line much, even though I've long known the -e switch; it just seems to cumbersome. However, combine it with the -p and/or -i switch, and you can use perl as a filter on globbed files!
I know much more about SSH now, and am using ssh-agent effectively at work now to bounce around servers and transfer groups of files between servers (often by piping tar commands with ssh).
A script called movein.sh turned my life around when it came to working on the servers. I now have a .skel directory on my work machine that contains links to oft-used configuration files and directories, as well as to my ~/bin directory; this allows me to then type movein.sh server and have all these files uploaded to the server. I can now use vim, screen, and other programs on any system we have in exactly the manner I expect to.
I've started thinking about versioning more, and have plans to put into place a subversion repository to store server configs, database schema, and development projects so we won't make as many mistakes in the future — at least not ones we can't rollback from.
I rewrote a shell script in perl that was originally intended for IP takeover, and have been utilizing it to determine if and/or when a server we've reinstalled goes down.
A bunch of Apache and MySQL tips are included, including mod_rewrite hacks, how to make your directory indexes show full file names, and more; as well as how to monitor your mysql processes and, if necessary, kill them. I'm also very interested in how to use MySQL as an authentication backend for an FTP daemon — it could give us very fine-grained control of our webserver for editors.

And that's just the tip of the iceberg. All in all, I highly recommend the book — though most likely as a book to check out from the library for a few weeks, digest, put into practice, and return. The hacks are so damn useful, I've found that after using one, I don't need to refer to that one ever again. But that's the point.

Perl Cookbook, 2nd Edition

Tonight was Papa night, which meant that I got to look after Maeve while Jen worked late doing a group at work. Last week, Maeve and I established that Papa Night would always include going to the bookstore, which means Barnes & Noble in South Burlington.

Last week, Maeve was perfectly content to look at books by herself, and didn't want me interfering, so I decided this week to grab a book for myself to peruse while she was busy. It didn't work as I intended — Maeve saw that I wasn't paying full attention to her, and then demanded my attention — but I was able to look through some of the new items in the second edition of The Perl Cookbook.

Among them were:

Setting up both an XML-RPC server and client, using SOAP::Lite
Setting up both a SOAP-RPC server and client, using SOAP::Lite and other modules; I could have used this in ROX::Filer to communicate with ROX instead of using the filer's RPC call.

Better coverage of DBI (it actually covered it!):

When you expect only a single row, this is a nice way to grab it:
```
$row = $dbi->selectrow_(array|hash)ref($statement)
```

This is a great way to grab a bunch of columns from a large resultset:

$results = $dbi->selectall_hashref($sql);
foreach $record (keys(%{$results})) {
    print $results->{$record}{fieldname};
}

This one is nice for a large resultset from which you only want one column:

$results = $dbi->selectcol_arrayref($sql);
foreach $result (@{$results}) {
    print $result;
}

If you need to quote values before inserting them, try:

$quoted = $dbi->quote($unquoted);
$sql = "UPDATE table SET textfield = $quoted";

If you need to check for errors, don't check with each DBI call; instead, wrap all of them in an eval statement:

eval {
    $sth = $dbi->prepare($sql);
    $sth->do;
    while ($row = $sth->fetchrow_hashref) {
        ...
    }
}
if ($@) {
    print $DBI::errstr; 
}

Coverage of templating, including Text::Template (very interesting!)
Whole new chapters on mod_perl and XML (including DOM!) which I didn't really even get to peruse.
autouse pragma: if you use:
```
use autouse Module::Name;
```
perl will use the module at runtime instead of compiletime; basically, it only uses it if it actually needs it (i.e., if it encounters code that utilizes functionality from that module). It's a good way to keep down on the bloat — I should use this with librox-perl, and possibly with CGI::App.