Gijs van Tulder

PhD student in machine learning   |   computer scientist   |   the Netherlands

A simple XML publishing system

How to use PHP, Sablotron, XML and XSL to publish a small web site.

Requirements

To run this script, your home computer or server should have the command-line version of PHP installed with:

And, if you want to use Tidy to clean up the HTML, you should also install that.

The XML source files

On your local computer, you'll build an XML version of your site. The script will convert each XML document to an HTML web page, using the XSL template you specify. It will also provide the template with some navigation.

Let's take a look at a sample source file structure:

  • nav.xml
  • index.xml
  • articles.xml
  • articles/nav.xml
  • articles/foo.xml
  • articles/bar.xml

If you run the script, this will result in the following structure on the server:

  • index.html
  • articles/index.html
  • articles/foo/index.html
  • articles/bar/index.html

What happened?

  1. The nav.xml files are special files that are used to build the navigation. They don't get processed by the templates;
  2. The other .xml files are uploaded as index.html files in their own directories.

The nav.xml files

Each source directory should have a nav.xml file. This file contains a listing of the files in that directory, which is used to build the navigation.

This is the nav.xml file of the 'notities' directory of this web site: ('name' refers to the file name of the pages)

<?xml version="1.0"?>
<menu>
    <page>
        <title>xml publishing system</title>
        <name>xmlpublishing</name>
    </page>
    <page>
        <title>array to xml function</title>
        <name>arraytoxml</name>
    </page>
    <page>
        <title>paaseieren</title>
        <name>paasei</name>
    </page>
    <page>
        <title>reactiesysteem met php en mysql</title>
        <name>reactiesysteem</name>
    </page>
</menu>
       

If you compare this with the menu you see on this page, you'll see that every link on this level is stored in the nav.xml file. If you don't want a file to appear in the menu, simply don't list it in the nav.xml.

Where to put things?

On your local computer, you'll need to make a directory for the script. In this directory, make a directory called 'xml' and a directory called 'html'. You should put your source xml in the 'xml' directory, the resulting html will be saved in 'html'.

You'll need to save your XSL template as 'style.xsl' in the same directory. To use Tidy, you'll also need to copy the 'tidy' binary to the script's directory.

Remember to make the script executable. Chmod 755, or else it won't work. Also check the location of your php installation, and change the first line of the script to point to the right file.

The FTP configuration

The script automatically uploads the HTML files to an FTP server. Fill in the $ftp_* variables at the beginning of the script to match your own login data. The $ftp_target_dir variable is the name of the remote directory you want the files to be stored in.

The script

You can copy-paste the script from this page, or you can download a tar-gz of my own directory, containing this script.

#!/usr/bin/php -q
<?php

// enter your own ftp data
$ftp_host = 'gvtulder.f2o.org';
$ftp_user = 'gvtulder';
$ftp_pass = 'password';

// the html directory on the ftp server
$ftp_target_dir = '/www/public_html/f2o';



// connect to the ftp server and login
$ftp_conn = ftp_connect('gvtulder.f2o.org');
if (!$ftp_conn) {
    die("Sorry, couldn't connect to the ftp server.");
}
if (!ftp_login($ftp_conn, $ftp_user, $ftp_pass)) {
    die('Login failed.');
}


// first, remove the old html files
exec('rm -r html/*');

// start processing the xml files
processdir();


// index.html is saved in the 'index' directory
// that's wrong, it should be in the base directory
rename('html/index/index.html', 'html/index.html');

// remove the html/index/ directory
rmdir('html/index');

// upload index.html to the server
ftp_chdir($ftp_conn, $ftp_target_dir);
ftp_put($ftp_conn, 'index.html', './html/index.html', FTP_ASCII);
echo 'ftp_put: '.$ftp_target_dir." ./html/index.html\n";


// close the connection
ftp_close($ftp_conn);






// this function walks through the directory structure
// and converts and uploads every xml page in those
// directories. it should start with the 'xml' directory.
function processdir($dir='xml') {
    // the ftp connection
    global $ftp_conn, $ftp_target_dir;

    // open this directory
    $dh = opendir($dir);

    while ($file = readdir($dh)) {
        // repeat for each file in this directory

        // the . and .. files and 'nav.xml' are special
        // files.
        if ($file!='nav.xml' and is_file($dir.'/'.$file)) {
            // display status information
            echo $file."\n";
            flush();

            // get this file's path without the xml/ prefix
            $current_url = str_replace('xml/','',$dir.'/').$file;

            // create a new sablotron parser
            $xslt_parser = xslt_create();

            // get the navigation data for this page
            $navigation = navigation(str_replace('.xml','',
                                     $current_url));

            // convert to xml and send to sablotron
            $arguments = array('/_navigation'=>toxml($navigation));

            // let sablotron parse the xsl template
            $html = xslt_process($xslt_parser, $dir.'/'.$file,
                                 'style.xsl', NULL, $arguments);

            // if the target directory doesn't exist, create it
            $target_dir = './html/'.str_replace('xml/','',$dir.'/').
                        str_replace('.xml','',$file);
            if (!is_dir($target_dir)) {
                mkdir($target_dir);
            }

            // store the html file
            $fp = fopen($target_dir.'/index.html','w');
            fwrite($fp, $html);
            fclose($fp);

            xslt_free($xslt_parser);

            // run tidy on this html file
            passthru('tidy -quiet -config tidy.conf '.$target_dir.
                     '/index.html 2>&1');

            // if this is not the special index.xml file, upload it
            if ($file!='index.xml') {
                // if the directory doesn't exist, create it
                $target_dir_ftp = $ftp_target_dir.'/'.
                               str_replace('xml/','',$dir.'/').
                               str_replace('.xml','',$file);
                if (!@ftp_chdir($ftp_conn, $target_dir_ftp)) {
                    ftp_mkdir($ftp_conn, $target_dir_ftp);
                    ftp_chdir($ftp_conn, $target_dir_ftp);
                }

                // upload the file
                ftp_put($ftp_conn,$target_dir_ftp.'/index.html',
                        $target_dir.'/index.html',FTP_ASCII);

                // display status
                echo "ftp_put: $target_dir_ftp\n";
            }
        } elseif ($file!='..' and $file!='.' and
                  is_dir($dir.'/'.$file)) {
            // if $file is a directory, we should also process the
            // files in this directory
            echo "\n::".str_replace('xml/','',$dir.'/').$file."\n";
            flush();

            // create directory if it doesn't exist
            if (!is_dir('./html/'.str_replace('xml/','',
                        $dir.'/').$file)) {
                 mkdir('./html/'.str_replace('xml/','',
                        $dir.'/').$file);
            }
            // same for ftp
            @ftp_mkdir($ftp_conn, $ftp_target_dir.'/'.
                        str_replace('xml/','',
                        $dir.'/').$file);

            // and run this function
            processdir($dir.'/'.$file);
        }
    }

    // all files in this directory done
    closedir($dh);
}

// converts the array to xml
function toxml($array) {
    return '<?xml version="1.0"?>'."\n<menu>".
             array_to_xml(array('page'=>$array)).'</menu>';
}
function array_to_xml($array, $level=1) {
    $xml = '';
    foreach ($array as $key=>$value) {
        $key = strtolower($key);
        if (is_array($value)) {
            $multi_tags = false;
            foreach($value as $key2=>$value2) {
                if (is_int($key2)) {
                    $xml .= str_repeat("\t",$level)."<$key>\n";
                    $xml .= array_to_xml($value2, $level+1);
                    $xml .= str_repeat("\t",$level)."</$key>\n";
                    $multi_tags = true;
                }
            }
            if (!$multi_tags and count($value)>0) {
                $xml .= str_repeat("\t",$level)."<$key>\n";
                $xml .= array_to_xml($value, $level+1);
                $xml .= str_repeat("\t",$level)."</$key>\n";
            }
        } else {
            if (trim($value)!='') {
                if (htmlspecialchars($value)!=$value) {
                    $xml .= str_repeat("\t",$level).
                                  "<$key><![CDATA[$value]]></$key>\n";
                } else {
                    $xml .= str_repeat("\t",$level).
                                  "<$key>$value</$key>\n";
                }
            }
        }
    }
    return $xml;
}


// gets the navigation menu for this page
function navigation($current_url) {
    // explode the url to see which nav.xml files
    // we should use in the menu
    $parts = explode('/','xml/'.$current_url);

    // generate the menu xml
    return navlevel($parts);
}

// a recursive function that opens all nav.xml files
// for the given url.
// for example: when parsing xml/articles/foo.xml
// the function opens:  xml/nav.xml
//                and:  xml/articles/nav.xml
function navlevel($parts_todo, $path_done='') {
    // load the variables used by the xml parser
    global $current_page, $current_element, $pages;

    // start with an empty menu
    $navigation = array();

    // create a new xml parser
    $xml_parser = xml_parser_create();

    // set the xml parser functions
    xml_set_element_handler($xml_parser, 'startElement', 'endElement');
    xml_set_character_data_handler($xml_parser, 'characterData');

    // add the first element of $parts_todo to the
    // $path_done variable. this is used to find the
    // correct nav.xml
    $path_done .= $parts_todo[0].'/';

    // empty the xml parser variables
    $current_page = array();
    $current_element = array();
    $pages = array();

    // if there is no nav.xml for this directory, stop this function
    if (!file_exists($path_done.'nav.xml')) {
        return false;
    }

    // load the nav.xml for this path and parse it
    $this_nav = loadfile($path_done.'nav.xml');
    xml_parse($xml_parser, $this_nav, true);

    // save the result in a non-global variable
    $pages_a = $pages;

    // for each of the pages in the menu ...
    for ($i=0; $i<count($pages_a); $i++) {
        // ... check if this element is 'active' ...
        if (count($parts_todo)==2 and $pages_a[$i]['name']==$parts_todo[1]) {
            $pages_a[$i]['active'] = true;
        }
        // ... and maybe go deeper in the directory structure
        if (count($parts_todo)>1 and $pages_a[$i]['name']==$parts_todo[1]) {
            // there's a submenu for this page
            $parts_todo2 = $parts_todo;
            array_shift($parts_todo2);

            // load the submenu
            $nextnav = navlevel($parts_todo2, $path_done);

            // if there are any pages in this submenu, add it to the
            // current navigation
            if ($nextnav) {
                $pages_a[$i]['menu']['page'] = $nextnav;
            }
        }

        // add information about this page to the navigation array
        $pages_a[$i]['name'] = substr($path_done.$pages_a[$i]['name'],4);
        $navigation[] = $pages_a[$i];
    }
    return $navigation;
}



// returns the contents of a file
function loadfile($file) {
    $fp = fopen($file, 'r');
    $return = '';
    while ($buffer = fread($fp, 4096)) {
        $return = $buffer;
    }
    fclose($fp);
    return $return;
}


// functions used when parsing the nav.xml xml
function startElement($parser, $tagname, $attribs) {
    global $current_page;
    if ($tagname=='PAGE') {
        $current_page = array();
        if (isset($attribs['KOP'])) {
            $current_page['kop'] = true;
        }
    }
}
function characterData($parser, $data) {
    global $current_element;
    $current_element = $data;
}
function endElement($parser, $tagname) {
    global $current_page, $current_element, $pages;
    switch ($tagname) {
        case 'TITLE':
            $current_page['title'] = $current_element;
            break;
        case 'NAME':
            $current_page['name'] = $current_element;
            break;
        case 'PAGE':
            $pages[] = $current_page;
            break;
    }
}

?>

This is an archived copy of a note that I published on my own site in November 2003.