README(1) | "Html2Wml Documentation" | README(1) |
Html2Wml -- Program that can convert HTML pages to WML pages
Html2Wml can be used as either a shell command:
$ html2wml file.htmlor as a CGI:
/cgi-bin/html2wml.cgi?url=/index.htmlIn both cases, the file can be either a local file or a URL.
Html2Wml converts HTML pages to WML decks, suitable for being viewed on a Wap device. The program can be launched from a shell to statically convert a set of pages, or as a CGI to convert a particular (potentially dynamic) HTML resource.
Althought the result is not guarantied to be valid WML, it should be the case for most pages. Good HTML pages will most probably produce valid WML decks. To check and correct your pages, you can use W3C's software: the HTML Validator, available online at http://validator.w3.org and HTML Tidy, written by Dave Raggett.
Html2Wml provides the following features:
Please note that most of these options are also available when
calling Html2Wml as a CGI. In this case, boolean options are given the value
"1" or "0", and other options simply receive the value
they expect. For example, `--ascii' becomes
`?ascii=1' or `?a=1'. See
the file t/form.html for an example on how to call Html2Wml as a CGI.
Conversion Options
If this really bother you, you can deactivate this behaviour
with the --nocollapse option.
Second, as they can't be nested, and as typical HTML pages heavily use imbricated tables to create their layout, it's impossible to decide which one could be kept. So the best thing is to keep none of them.
[Note] Although you can deactivate this behaviour, and
although there is internal support for tables, the unlinearized mode has
not been heavily tested with nested tables, and it may produce
unexpected results.
Links Reconstruction Options
Splitting Options
HTTP Authentication
Proxy Support
Output Options
Take a look in wml_compilation/ for more information on
how to use a WML compiler with Html2Wml.
Debugging Options
The deck slicing is a feature that Html2Wml provides in
order to match the low memory capabilities of most Wap devices. Many can't
handle cards larger than 2,000 bytes, therefore the cards must be
sufficiently small to be viewed by all Wap devices. To achieve this, you
should compile your WML deck, which reduce the size of the deck by 50%, but
even then your cards may be too big. This is where Html2Wml comes with the
deck slicing feature. This allows you to limit the size of the cards,
currently only before the compilation stage.
Slice by cards or by decks
On some Wap phones, slicing the deck is not sufficient: the WML browser still tries to download the whole deck instead of just picking one card at a time. A solution is to slice the WML document by decks. See the figure below.
_____________ _____________ ⎪ deck ⎪ ⎪ deck #1 ⎪ ⎪ _________ ⎪ ⎪ _________ ⎪ ⎪ ⎪ card #1 ⎪ ⎪ ⎪ ⎪ card ⎪ ⎪ ⎪ ⎪_________⎪ ⎪ ⎪ ⎪_________⎪ ⎪ ⎪ _________ ⎪ ⎪_____________⎪ ⎪ ⎪ card #2 ⎪ ⎪ ⎪ ⎪_________⎪ ⎪ . . . ⎪ _________ ⎪ ⎪ ⎪ ... ⎪ ⎪ _____________ ⎪ ⎪_________⎪ ⎪ ⎪ deck #n ⎪ ⎪ _________ ⎪ ⎪ _________ ⎪ ⎪ ⎪ card #n ⎪ ⎪ ⎪ ⎪ card ⎪ ⎪ ⎪ ⎪_________⎪ ⎪ ⎪ ⎪_________⎪ ⎪ ⎪_____________⎪ ⎪_____________⎪
WML document WML document sliced by cards sliced by decksWhat this means is that Html2Wml generates several WML documents. In CGI mode, only the appropriate deck is sent, selected by the id given in parameter. If no id was given, the first deck is sent.
Note on size calculation
Currently, Html2Wml estimates the size of the card on the fly, by summing the length of the strings that compose the WML output, texts and tags. I say "estimates" and not "calculates" because computing the exact size would require many more calculations than the way it is done now. One may objects that there are only additions, which is correct, but knowing the exact size is not necessary. Indeed, if you compile the WML, most of the strings of the tags will be removed, but not all.
For example, take an image tag: `<img src="images/dog.jpg" alt="Photo of a dog">'. When compiled, the string `"img"' will be replaced by a one byte value. Same thing for the strings `"src"' and `"alt"', and the spaces, double quotes and equal signs will be stripped. Only the text between double quote will be preserved... but not in every cases. Indeed, in order to go a step further, the compiler can also encode parts of the arguments as binary. For example, the string `"http://www."' can be encoded as a single byte (`8F' in this case). Or, if the attribute is `href', the string `href="http://' can become the byte `4B'.
As you see, it doesn't matter to know exactly the size of the textual form of the WML, as it will always be far superior to the size of the compiled form. That's why I don't count all the characters that may be actually written.
Also, it's because I'm quite lazy ;-)
Why compiling the WML deck?
If you intent to create real WML pages, you should really consider to always compile them. If you're not convinced, here is an illustration.
Take the following WML code snipet:
<a href='http://www.yahoo.com/'>Yahoo!</a>It's the basic and classical way to code an hyperlink. It takes 42 bytes to code this, because it is presented in a human-readable form.
The WAP Forum has defined a compact binary representation of WML in its specification, which is called "compiled WML". It's a binary format, therefore you, a mere human, can't read that, but your computer can. And it's much faster for it to read a binary format than to read a textual format.
The previous example would be, once compiled (and printed here as hexadecimal):
1C 4A 8F 03 y a h o o 00 85 01 03 Y a h o o ! 00 01This only takes 21 bytes. Half the size of the human-readable form. For a Wap device, this means both less to download, and easier things to read. Therefore the processing of the document can be achieved in a short time compared to the tectual version of the same document.
There is a last argument, and not the less important: many Wap devices only read binary WML.
Actions are a feature similar to (but with far less
functionalities!) the SSI (Server Side Includes) available on good servers
like Apache. In order not to interfere with the real SSI, but to keep the
syntax easy to learn, it differs in very few points.
Syntax
Basically, the syntax to execute an action is:
<!-- [action param1="value" param2='value'] -->Note that the angle brackets are part of the syntax. Except for that point, Actions syntax is very similar to SSI syntax.
Available actions
Only few actions are currently available, but more can be
implemented on request.
`file=path' -- The file is read from the local disk.
`file=path' -- The file is read from
the local disk.
Generic parameters
The following parameters can be used for any action.
Examples
If you want to share a navigation bar between several WML pages, you can `include' it this way:
<!-- [include virtual="nav.wml"] -->Of course, you have to write this navigation bar first :-)
If you want to use your current HTML pages for creating your WML pages, but that they contains complex tables, or unnecessary navigation tables, etc, you can simply `skip' the complex parts and keep the rest.
<body> <!--[skip for="wml"]--> unnecessary parts for the WML pages <!--[end_skip]--> useful parts for the WML pages </body>
The links reconstruction engine is IMHO the most important part of Html2Wml, because it's this engine that allows you to reconstruct the links of the HTML document being converted. It has two modes, depending upon whether Html2Wml was launched from the shell or as a CGI.
When used as a CGI, this engine will reconstructs the links of the HTML document so that all the urls will be passed to Html2Wml in order to convert the pointed files (pages or images). This is completely automatic and can't be customized for now (but I don't think it would be really useful).
When used from the shell, this engine reconstructs the links with
the given templates. Note that absolute URLs will be left untouched. The
templates can be customized using the following syntax.
Templates
Syntax
The template is a string that contains the new URL. More precisely, it's a Text::Template template. Parameters can be interpolated as a constant or as a variable. The template is embraced between curcly bracets, and can contain any valid Perl code.
The simplest form of a template is `{PARAM}' which just returns the value of PARAM. If you want to do something more complex, you can use the corresponding variable; for example `{"foo $PARAM bar"}', or `{join "_", split " ", PARAM}'.
You may read the Text::Template manpage for more information on what is possible within a template.
If the original URL contained a query part or a fragment part,
then they will be appended to the result of the template.
Available parameters
This can be resumed this way:
URL = http://www.server.net/path/to/my/page.html ------------^^^^ ---- ⎪ ⎪ \ ⎪ ⎪ \ FILEPATH FILENAME FILETYPENote that `FILETYPE' contains all the extensions of the file, so if its name is index.html.fr for example, `FILETYPE' contains "`.html.fr'".
Examples
To add a path option:
{URL}$wapUsing Apache, you can then add a Rewrite directive so that URL ending with `$wap' will be redirected to Html2Wml:
RewriteRule ^(/.*)\$wap$ /cgi-bin/html2wml.cgi?url=$1To change the extension of an image:
{FILEPATH}{FILENAME}.wbmp
Html2Wml uses LWP built-in proxy support. It is activated by default, and loads the proxy settings from the environment variables, using the same variables as many others programs. Each protocol (http, ftp, etc) can be mapped to use a proxy server by setting a variable of the form `PROTOCOL_proxy'. Example: use `http_proxy' to define the proxy for http access, `ftp_proxy' for ftp access. In the shell, this is only a matter of defining the variable.
For Bourne shell:
$ export http_proxy="http://proxy.domain.com:8080/"For C-shell:
% setenv http_proxy "http://proxy.domain.com:8080/"Under Apache, you can add this directive to your configuration file:
SetEnv http_proxy "http://proxy.domain.com:8080"but this has the default that another CGI, or another program, can use this to access external resources. A better way is to edit Html2Wml and fill the option `proxy-server' with the appropriate value.
Html2Wml tries to make correct WML documents, but the well-formedness and the validity of the document are not guarantied.
Inverted tags (like "<b>bold <i>italic</b></i>") may produce unexpected results. But only bad software do bad stuff like this.
Download
[ http://www.html2wml.org/ ]
[ http://www.maddingue.org/softwares/ ]
Resources
[ http://www.wapforum.org/ ]
[ http://www.wap.com/ ]
[ http://www.w3.org/ ]
[ http://www.tuxmobil.org/ ]
Programmers utilities
[ http://www.w3.org/People/Raggett/tidy ]
[ http://www.kannel.org/ ]
[ http://pwot.co.uk/wml/ ]
WML browsers and Wap emulators
[ http://www.opera.com/ ]
[ http://fsinfo.cs.uni-sb.de/~abe/wApua/ ]
[ http://tofoa.free-system.com/ ]
[ http://www.ezos.com/ ]
[ http://www.pyweb.com/tools/ ]
[ http://www.apachesoftware.com/ ]
[ http://www.winwap.org/ ]
[
http://www.edgematrix.com/edge/control/MainContentBean?page=downloads ]
[ http://www.yourwap.com/ ]
[ http://mobilizer.sourceforge.net/ ]
[ http://www.wmlbrowser.org/ ]
[ http://alphaworks.ibm.com/aw.nsf/techmain/wapsody ]
[ http://wapreview.sourceforge.net ]
[ http://membres.lycos.fr/picowap/ ]
Werner Heuser, for his numerous ideas, advices and his help for the debugging
Igor Khristophorov, for his numerous suggestions and patches
And all the people that send me bug reports: Daniele Frijia, Axel Jerabek, Ouyang
Sebastien Aperghis-Tramoni <sebastien@aperghis.net<gt>
Copyright (C)2000, 2001, 2002 Sebastien Aperghis-Tramoni
This program is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 or later.
0.4.11 | 3rd Berkeley Distribution |