I've been working with SimpleXML a fair amount lately, and have run into an issue a number of times with character encodings. Basically, if a string has a mixture of UTF-8 and non-UTF-8 characters, SimpleXML barfs, claiming the "String could not be parsed as XML."
I tried a number of solutions, hoping actually to automate it via mbstring INI settings; these schemes all failed. iconv didn't work properly. The only thing that did work was to convert the encoding to latin1 -- but this wreaked havoc with actual UTF-8 characters.
Then, through a series of trial-and-error, all-or-nothing shots, I stumbled on a simple solution. Basically, I needed to take two steps:
which is accomplished with:
$enc = mb_detect_encoding($xml); $xml = mb_convert_encoding($xml, 'UTF-8', $enc);
The conversion is performed even if the detected encoding is UTF-8; the conversion ensures that all characters in the string are properly encoded when done.
It's a non-intuitive solution, but it works! QED.
I've been working on Cgiapp in the past few months, in particular to introduce one possibility for a Front Controller class. To test out ideas, I've decided to port areas of my personal site to Cgiapp2 using the Front Controller. Being the programmer I am, I quickly ran into some areas where I needed some reusable code -- principally for authentication and input handling.
So what did I choose? To reinvent the wheel, of course!
To that end, I've opened a new PEAR channel that I'm calling PHLY, the PHp LibrarY, named after my blog. The name implies soaring, freedom, and perhaps a little silliness.
It is designed with the following intentions:
Please feel free to use this code however you will. Comments, feedback, and submissions are always welcome.
I released Cgiapp 1.9.0 into the wild last night. The main difference between 1.8.0 and 1.9.0 is that I completely removed the plugin system. I hadn't had any users reporting that they were using it, and, in point of fact, the overloading mechanism I was using was causing some obscure issues, particularly in the behaviour of cgiapp_postrun().
As usual, you can find more information and links to downloads at the Cgiapp site.
I got into trouble on the PEAR
list when I tried to propose it for inclusion in that project, when I made
the mistake of describing it as a framework. (This was before frameworks
became all the rage on the PHP scene; PEAR developers, evidently, will not
review anything that could possibly be construed or interpreted as a
framework, even if it isn't.) I mistakenly called Cgiapp a
framework once when considering proposing it to PEAR. But if it's not a
framework, what is Cgiapp? Stated simply:
Cgiapp is the Controller of a Model-View-Controller (MVC) pattern. It can be either a front controller or an application controller, though it's typically used as the latter.
I generally try to stay out of politics on this blog, but this time something has to be said, as it affects anyone who uses the internet, at least in the US.
Basically, a number of telcos and cable providers are talking about charging internet content providers -- the places you browse to on the internet, places like Google, Yahoo!, Amazon, etc. -- fees to ensure bandwidth to their sites. Their argument is that these content providers are getting a 'free ride' on their lines, and generating a lot of traffic themselves, and should thus be paying for the cost of bandwidth.
This is patently ridiculous. Content providers already have to pay for their bandwidth -- they, too, have ISPs or agreements with telcos in place, either explicitly or via their hosting providers. Sure, some of them, particularly search engines, send out robots in order to index or find content, but, again, they're paying for the bandwidth those robots generate. Additionally, people using the internet are typically paying for bandwidth as well, through their relationship with their ISP. What this amounts to is the telcos getting paid not just by each person to whom they provide internet access, but every end point on the internet, at least those within the US.
What this is really about is telcos wanting more money, and wanting to push their own content. As an example, let's say your ISP is AOL. AOL is part of Time Warner, and thus has ties to those media sources. Now, those media sources may put pressure on AOL to reduce bandwidth to sites operated by ABC, CBS, NBC, FOX, Disney, PBS, etc. This might mean that your kid can no longer visit the Sesame Street website reliably, because AOL has reduced the amount of bandwidth allowed to that service -- but any media site in the TWC would get optimal access, so they could get to Cartoon Network. Not to slam Cartoon Network (I love it), but would you rather have your kid visiting cartoonnetwork.com or pbskids.org? Basically, content providers would not need to compete based on the value of their content, but on who they can get to subscribe to their service.
Here's another idea: your ISP is MSN. You want to use Google... but MSN has limited the bandwidth to Google because it's a competitor, and won't accept any amount of money to increase that bandwidth. They do the same with Yahoo! So, now you're limited to MSN search, because that's the only one that responds reliably -- regardless of whether or not you like their search results. By doing so, they've just artificially inflated the value of their search engine -- without needing to compete based on merit.
Additionally, let's say Barnes and Noble has paid MSN to ensure good bandwidth, but part of that agreement is a non-compete clause. Now you find your connections to Amazon timing out, meaning that you can't even see which book provider has the better price on the book you want; you're stuck looking and buying from B&N.
Now, let's look at something a little more close to home for those of us developing web applications. There have been a number of success stories the last few years: MySpace, Digg, and Flickr all come to mind. Would these endeavors have been as successful had they needed to pay multiple times for bandwidth, once to their ISP and once each to each telco charging for content providers? Indeed, some of these are still free services -- how would they ever have been able to pay the extra amounts to the telcos in the first place?
So, basically, the only winners here are the telcos.
Considering how ludicrous this scheme is, one must be thinking, isn't the US Government going to step in and regulate against such behaviour? The answer, sadly, is no. The GOP doesn't like regulation, and so they want market forces to decide. Sadly, what this will likely do is force a number of content providers to offshore their internet operations -- which is likely to have some pretty negative effects on the economy.
The decision isn't final -- efforts can still be made to prevent it (the above link references a Senate committee meeting; there's been no vote on it). Call your representatives today and give them an earful. Tell them it's not just about regulation of the industry, but about fair competition in the market. Allowing the telcos to extort money from content providers will only reduce the US' economic chances in the world, and stifle innovation and choice.
I don't blog much any more. Much of what I work on any more is for my employer, Zend, and I don't feel at liberty to talk about it (and some of it is indeed confidential). However, I can say that I've been programming heavily on PHP5 the past few months, and had a chance to do some pretty fun stuff. Among the new things I've been able to play with are SPL and PHPUnit -- and, recently, together.
On perlmonks today, a user was needing to maintain a PHP app, and wanted to know what the PHP equivalent of "perl -wc script.pl" was -- specifically, they wanted to know how to run a PHP script from the commandline and have it display any warnings (ala perl's strict and warnings pragmas).
Unfortunately, there's not as simple a way to do this in PHP as in perl. Basically, you need to do the following:
Alternatively, you can create a file with the lines:
<?php error_reporting(E_ALL | E_STRICT); ini_set('display_errors', true);
and then set the php.ini setting 'auto_prepend_file' to the path to that file.
NOTE: do not do any of the above on a production system! PHP's error messages often reveal a lot about your applications, including file layout and potential vectors of attack. Turn display_errors off on production machines, set your error_reporting somewhat lower, and log_errors to a file so you can keep track of what's going on on your production system.
The second part of the question was how to run a PHP script on the command line. This is incredibly simple: php myscript.php. No different than any other scripting language.
You can get some good information by using some of the switches, though. '-l' turns the PHP interpreter into a linter, and can let you know if your code is well-formed (which doesn't necessarily preclude runtime or parse errors). '-f' will run the script through the parser, which can give you even more information. I typically bind these actions to keys in vim so I can check my work as I go.
If you plan on running your code solely on the commandline, add a shebang to the first line of your script: #!/path/to/php. Then make the script executable, and you're good to go. This is handy for cronjobs, or batch processing scripts.
All of this information is readily available in the PHP manual, and the commandline options are always available by passing the --help switch to the PHP executable. So, start testing your scripts already!
Today, I have released two versions of Cgiapp into the wild, Cgiapp 1.8.0 and Cgiapp2 2.0.0rc1.
Cgiapp 1.8.0 is a performance release. I did a complete code audit of the class, and did a number of changes to improve performance and fix some previously erratic behaviours. Additionally, I tested under both PHP4 and PHP5 to make sure that behaviour is the same in both environments.
However, Cgiapp 1.8.0 markes the last feature release of Cgiapp. I am deprecating the branch in favor of Cgiapp2.
Cgiapp2 is a PHP5-only version of Cgiapp. Some of the changes:
Keep reading for more information on the evolution of Cgiapp2.
I wrote earlier of my experiences using Windows XP, a move I've considered somewhat unfortunate but necessary. I've added a couple more tools to my toolbox since that have made the environment even better.
I was able to roll a long-needed (and by some, long awaited) bugfix release of Cgiapp this morning. Cgiapp 1.7.1 corrects the following issues:
Update: The link on my site for downloading Cgiapp has been broken; I've now fixed it.