xyzzy

ftpsynch and dir2html

If you have REXX but no CMS use a batch script like bingo to update local homepage directories from various sources, dir2html to create simple XHTML directory listings, and finally ftpsynch to upload only new and updated files to one or more Web hosters (optionally including ftp://localhost/ for testing). In 2006 I have added a sitemap generator script to the mix, it is called by dir2html.

With a make tool like wmake you can even manage OS/2 EAs like icons and an absolute base URL of XHTML manuals within distributed ZIP archives, see cthema.

In 2007 a system crash forced me to port ftpsynch.rex for Windows NT with ooRexx RxFtp. The other Rexx scripts dir2html and sitemap worked "as is" after renaming *.cmd to *.rex. In 2010 I added a REXX script dirty to clean up 8+3 file names created by old DOS tools.

top   ftpsynch

For ftpsynch you need the REXX FTP API available from IBM for OS/2. Maybe IBM also distributes it with Object REXX for other systems, or you find a clone for Regina. Below you see a "screenshot" of ftpsynch.cmd after updating my homepage:

FtpSys UNIX Type: L8 Version: BSD-199506
FtpPwd "/webspace/dos" is current directory.
FtpPwd "/webspace/eis" is current directory.
FtpPwd "/webspace/home/test" is current directory.
ignore  d:\misc\test\homepage\home\test\.spinner.htm
FtpPwd "/webspace/home" is current directory.
FtpPwd "/webspace/kex" is current directory.
FtpPwd "/webspace/pub" is current directory.
FtpPwd "/webspace/src" is current directory.
FtpPwd "/webspace" is current directory.
update  d:\misc\test\homepage\ftpsynch.htm
home.claranet.de ready:
 473 files (8.6 MB), 0 added, 1 updated, 1 ignored

If run as PM application you would only see the last line with the update statistics, after any confirmation requests for the removal of orphans (= files not found locally), confirmation request for the creation of new remote directories, and of course error messages. Please note that the shown total size of remote files in MB in the status message does not include the overhead for directories etc. depending on the remote file system.

To install ftpsynch.cmd on your system copy it to any directory of your PATH and "configure" the first lines. I use something like the following "configuration" and three program objects for ftpsynch.cmd 1 password etc.:

   HOME = 'd:\misc\test\homepage'
   HOST.1 = 'home.claranet.de'      ;  LOGIN.1 = 'guess'
   ROOT.1 = '/webspace'
   HOST.2 = 'people-ftp.freenet.de' ;  LOGIN.2 = 'guess'
   ROOT.2 = '/'
   HOST.3 = 'localhost'             ;  LOGIN.3 = 'mirror'
   ROOT.3 = 'd:/tmp/tmp/mirror'

The third entry is for tests with a local ftpd, and some parts of ftpsynch.cmd are only needed to parse OS/2 FtpDir results. I have tested the three shown servers, you may have to fix the script for other directory listings. Parsing weird timestamps and sizes is great fun... ;-)

"New" files (= not found on the remote server) are ignored, if the name contains non-ASCII or special characters, or if the name starts with a dot. A file like 2&1.htm existing on some FTP servers is handled normally, but ftpsynch doesn't try to add this file to other servers, where it doesn't already exist, e.g. because character & is illegal. The simple rule used by both ftpsynch and dir2html is datatype(translate(FILE,,'-._',0),'A'). In other words, only 0..9, A..Z, a..z, and -._ are considered as safe, and this is the set supported by claranet.de.

In version 0.3 I added workarounds for two obscure problems: FtpPut cannot handle local character % (a printf ?), using %% solves this problem. Local character d2c(255) has to be given twice in all FTP functions, see Telnet character IAC in RfC 1123. This does not help if the server translates file name characters in ASCII directory listings. My local ftpd listed codepage 850 NBSP d2c(255) as Latin 1 NBSP d2c(160), and it listed ASCII DEL d2c(127) as SUB d2c(26), until I used ftpd -cp none. Then file names are listed by FtpDir('*') as is, and RxFTP can match local and remote names directly.

Version 0.5 creates log files %TMP%\ftpsynch.00user in the directory specified by environmental variable TMP. Here user is the number given on the command line to select a host, login, and remote root directory. Only added and updated files are shown in the log file, with detailed local vs. remote timestamps for updates. Example:

Sat 20 Mar 2004 00:56:19 d:\misc\test\homepage
\src\ftpsynch.cmd                    2004-03-20 00:54:16 > 2004-03-20 00:49:00
\src\index.html                      2004-03-20 00:55:34 > 2004-03-20 00:31:00
home.claranet.de ready:
 391 files (5.6 MB), 0 added, 2 updated, 3 ignored

Version 0.6 fixes a minor bug found with the log files of version 0.5. If the server returns only the date as timestamp for old files, then the client should assume 23:59:59 instead of 00:00:00 to avoid useless updates.

Version 0.7 uses FtpDir('.') instead of FtpLs('-la'), the latter did not work for a tested FTP server. Now FtpDir('*') is only used if FtpSys reports an OS/2 FTP server.

Version 0.8 is a bug fix: If FtpDir returns a timestamp like Dec 29 in January it is not talking about this year. It took me some years to catch this bug, because it only affected modified files with the same size.

Bugs: If the timezones for timestamps of the local and the remote file system are different ftpsynch might either upload new files unnecessarily, or skip updated files erroneously. The former problem vanishes after the timezone difference, but the latter case can be a serious issue. Check the log file to see if it affects you.

top   ooRexx

Version 0.9 is actually a new ooRexx script based on RxFtp.cls instead of IBM's Rxftp.dll. Additionally a few other details had to be changed, e.g. the Windows version of ooRexx has no SysProcessType() and no SysGetMessage() functions.

Looking into the code again (after some years of using it), there are some ugly backslashes etc., and so ftpsynch.rex won't run as is with other operating systems supporting ooRexx. Porting it should be simple, though.

top   dir2html

If you are interested in a "screenshot" of dir2html.cmd check out WySiWyG:src/index.html. A sample "configuration" is contained in the script, here are some not so obvious details:

H.0E = 'Ellermann' and H.0F = 'F' are used in two links at the end of each page, the name H.0E is a link to H.. = 'http://purl.net/xyzzy' and the initial H.0F in a link to the page itself. The latter is unusual, but it simplifies the switch from a local page to the published version in my browser, and it may help GoogleBot to find my moved pages while the old pages are still online.

H.0C = H.. ¦¦ '/mailto/webmaster' should confuse address harvesters, otherwise it is used for the <link rev="made" href="H.0C" /> in the header of all pages. Here webmaster is of course hubris - a webmaster administers real servers and not only vanity hosts - but without this convincing role account my obfuscated mailto:­link could be too cryptic for human readers.

H.0D = '/valid.jpg' is the relative URL of an icon for the link to the W3C validator at the end of each page. The W3C allows to use their GIF or PNG variants, but for various reasons I prefer JPG. Edit the mentioned variables H.. (homepage), H.0C (made URL), H.0D (icon), H.0E (name), and H.0F (initial) as needed for your system.

H.0B = '/pub/homepage.jpg' is an optional banner with a relative link to the root directory DIR.. (local) or H.. (server) added in version 0.5. Now H.0B (banner) and H.0D (icon) also work locally if their URL starts with a slash.

H.0A = '/w3c/xyzzy.css' is an optional style sheet added in version 0.6. Use H.0A = '' if you want no style sheet, same idea as for H.0B.

DIR.. = 'd:\misc\test\homepage' is the root directory of the subdirectories DIR.1 etc. For my system I need six directory listings including DIR.6 = '\home\test'.

Optionally specify some file extensions in variable PLAIN, corresponding links in the directory listing get attribute type="text/plain". If possible configure MIME types in .htaccess files for the Web server: PLAIN is only a hack to realize a similar function on a vanity host for (some) plain text files.

Please note that dir2html.cmd lists files sorted by local timestamps. This is a feature, users visiting a public directory listing are generally only interested in the most recent updates and in the file sizes. For similar reasons dir2html keeps the timestamp of unmodified directory listings with the help of index.bak files - used to ignore the line with the date of the last update near the end of each page. Otherwise ftpsynch would unnecessarily upload "touched" directory listings.

Version 0.7 supports permalinks by rel="bookmark". It also tags CSS files as type="text/css" and adds a dummy title="plain text" to type="text/plain" files, excluding extension TXT, where it would only waste download bandwidth.

Version 0.8 exits calling external procedure sitemap.cmd to update sitemaps. Version 0.9 was a minor fix for ooREXX, my XDEL-wrapper didn't know that WindowsNT also has a SysFileDelete().

Just in case version 1.0 adds .rdf to the list of hardcoded file extensions getting an explicit MIME type attribute in links. The W3C validator now wants the official XHTML 1.0 system identifier, not a link to the copy in its own SGML catalog. BTW, file extension .dtd is also hardcoded; Web servers often don't know application/xml-dtd.

Version 1.1 recognizes identical directory listings with the same timestamp as identical. The old logic to ignore different timestamps erroneously treated listings with identical timestamps as different. The optional banner image now uses H.. as alt text. The list of hardcoded plain text file extensions now includes .cls for ooREXX classes. I've removed cruft like .asm, my .asm files are archived and not shown in public directory listings.

top   sitemap

Some search engines support sitemap files in a rather simple XML fomat. With sitemap.cmd I get three files: A sitemap index file with the URLs of all (here two) sitemaps, a siteold sitemap with the URLs of old files (here about 2.5 years), and a sitenew sitemap with the URLs of recently added or updated files. Overkill for less than 500 files, but I don't want to upload the siteold stuff when it is unnecessary. The old files rarely change, and Web crawlers can leave them alone.

For version 0.2 sitemap.cmd is called at the end of dir2html instead of bingo. Temporary versions of the sitemap files are created as dot-files .sitemap.xml etc., so this requires a filesystem supporting dot-files.

For version 0.3 the relevant schemas/sitemap/0.9 definitions are referenced in the root element of the schema instances. This gibberish means that you can now validate the generated sitemaps.

Optionally you can add a line like...
    Sitemap: http://www.example.com/sitemap.xml
...to your robots.txt to inform Web crawlers about the sitemap (index) URL.

top   dirty

dirty.rex fixes all upper case 8+3 file names created by old DOS tools in and below a specified directory. Used without argument dirty.rex cleans up files and subdirectories in the dir2html root directory.

The dirty header is a short installation and maintenance guide for the four REXX content management system scripts; check it out.

top   bingo

You don't need bingo.cmd, it is shown here only as an example for hardcore OS/2 batch fans illustrating my usage of dir2html:

@echo off
if not .%1 == . goto CALL
rem --------------------------------------------------
echo adding new F:\BIN\*.* to C:\REXX\BIN
for %%F in (F:\BIN\*.*) do replace %%F C:\REXX\BIN /A >\dev\nul 2>&1
rem --------------------------------------------------
echo update new F:\BIN\*.* in C:\REXX\BIN
for %%F in (F:\BIN\*.*) do replace %%F C:\REXX\bin /U >\dev\nul
rem --------------------------------------------------
echo adding new C:\REXX\KEX\*.* to E:\BINW\KEDITW\USER
for %%F in (C:\REXX\KEX\*.*) do replace %%F E:\BINW\KEDITW\USER /A >\dev\nul
rem --------------------------------------------------
echo update new C:\REXX\KEX\*.* in E:\BINW\KEDITW\USER
for %%F in (C:\REXX\KEX\*.*) do replace %%F E:\BINW\KEDITW\USER /U >\dev\nul
rem --------------------------------------------------
setlocal
set HOME=D:\MISC\TEST\HOMEPAGE
rem --------------------------------------------------
echo update new F:\BIN\*.* in %HOME%\SRC
for %%F in (F:\BIN\*.*) do replace %%F %HOME%\SRC /U /P >\dev\nul
rem --------------------------------------------------
echo update new D:\APPS\BIN\*.* in %HOME%\SRC
for %%F in (D:\APPS\BIN\*.*) do replace %%F %HOME%\SRC /U /P >\dev\nul
rem --------------------------------------------------
echo update new F:\BIN\EIS\AWK\*.* in %HOME%\EIS
for %%F in (F:\BIN\EIS\AWK\*.*) do replace %%F %HOME%\EIS /U /P >\dev\nul
rem --------------------------------------------------
echo remove EAs from %HOME%\SRC\*.*
for %%F in (%HOME%\SRC\*.*) do eautil %%F NUL /S >\dev\nul
rem --------------------------------------------------
echo update new C:\REXX\KEX\*.* in %HOME%\KEX
for %%F in (C:\REXX\KEX\*.*) do replace %%F %HOME%\KEX /U /P >\dev\nul
rem --------------------------------------------------
echo remove  ^^Z from %HOME%\KEX\*.K??
set COPY=%HOME%\KEX\$$COPY$$.TMP
if exist %COPY% del %COPY%
TOUCH.exe -r . %COPY% 2>\dev\nul
if errorlevel 1 echo skipped: TOUCH.exe error
if errorlevel 1 goto SKIP
if not exist %COPY% echo skipped: TOUCH.exe failure
if not exist %COPY% goto SKIP
for %%F in (%HOME%\KEX\*.K??) do call %0 %%F
if exist %COPY% del %COPY%
goto SKIP
rem --------------------------------------------------
:CALL
type %1 > %COPY%
if errorlevel 1 goto EXIT
TOUCH.exe -r %1 %COPY%
if errorlevel 1 goto EXIT
copy %COPY% %1 >\dev\nul
goto EXIT
rem --------------------------------------------------
:SKIP
endlocal
for %%F in (F:\BIN\NEMO\*.wmk) do wmake -h -f %%F
call dir2html.cmd
:EXIT

Major parts of bingo.cmd copy new files from one directory to another directory on a different partition. Actually this is a kind of backup, because the partitions are on different physical disks still using a FAT for native DOS access without HPFS driver. The trick to get rid of ^Z is more interesting, bingo.cmd calls itself using type in conjunction with GNU touch.exe -r to preserve the timestamps. Some problems with Watcom's wtouch -f forced me to use touch, and bingo.cmd first verifies its function with a dummy file. Some HPFS file names won't work with bingo, originally I used it only as "quick FAT backup", and added the update of homepage directories later.

top   cthema

The automatical update of ZIP-archives with Watcom's wmake (or any other make tool) is an obvious idea. The more complex *.wmk files use some magic to copy an updated manual like rxdyndns.htm into the corresponding ZIP-archive.

The extra magic preserves the *.htm timestamp in a temporary *.html manual, adds an absolute href= URL attribute to the existing <base /> tag, and finally moves the *.html into the archive. With this magic relative links in the archived manuals work on any system. Below you see a simple *.wmk file without extra magic, but the methods to handle icon EAs and standard <URL:>-constructs might be still interesting:

# update homepage\src\cthema.zip

HOME    = f:\bin
PAGE    = d:\misc\test\homepage
.SILENT

# files

SRCICON = d:\Icons\Icons for Warp\cthema.ico

ABSTEMP = $(PAGE)\src\cthema.cmd
ARCHIVE = $(PAGE)\src\cthema.zip
COMMENT = <URL:http://purl.net/xyzzy/src/cthema.zip>

# targets

cthema : $(ARCHIVE) .SYMBOLIC
        @if not exist $(ABSTEMP) echo $(ARCHIVE) is upto date
        @if     exist $(ABSTEMP) echo $(ARCHIVE) updated
        @if     exist $(ABSTEMP) del  $(ABSTEMP)

$(ARCHIVE) :      $(HOME)\cthema.cmd
        @copy     $(HOME)\cthema.cmd  $(ABSTEMP) > \dev\nul
        @eautil   $(ABSTEMP) \dev\nul /S
        @rexxtry  exit 1- SysSetIcon('$(ABSTEMP)', '$(SRCICON)')
        @if exist $@ del $@
        @zip -joq $@ $(ABSTEMP)      "$(SRCICON)"
        %create      $(ABSTEMP)
        %write       $(ABSTEMP) $(COMMENT)
        @zip -zTq $@          < $(ABSTEMP)

top   See cthema.zip (5 KB) for the resulting ZIP-comment and icon EA, or check out other REXX scripts.


W3 validator Last update: 18 Apr 2010 22:00 by F.Ellermann