Previous PageTable Of ContentsNext Page


4 Preparing HTML texts and tables for entry

4.1 Preparing HTML texts for entry

The texts that are entered into the system should be in HTML format. These texts should contain only the necessary tags in order to keep the amount of saved texts in the database as low as possible. By eliminating the unnecessary tags the loading time of the page is also reduced because of the amount of transferred HTML text is reduced.

Converting texts to HTML by using software (for example FrontPage, Netscape Composer or Word) generates lot of tags, which are not needed. After using some of these applications for converting, HTML texts should be cleaned from unnecessary tags.

Paragraph tags (<p>, </p>), Bolding tags (<b></b>) and Italics tags (<i>, </i>) should be left most probably as they are. But tags like <html>, <head>, <font>, <meta> and etc. should deleted (the system doesn’t need those). It is important to notice that tags which user doesn’t know should not be erased. And after cleaning tags the result should be checked with browser (Internet Explorer, Netscape) or an editor (FrontPage, Netscape Composer).

Table 2. HTML text examples and guideline for shortening texts.

Short HTML

Long HTML (including unnecessary and confusing tags)

Original text

Les produits forestiers non ligneux (PFNL) les plus importants du Tchad sont la gomme d'Acacia senegal, les fruits de Ziziphus spina-christ, Ziziphus mauritiana (Jujube), Tamarindus indica et de Vitellaria paradoxa (Karité) ainsi que les noix du dernier. De plus, il y a les plantes fourragères (Acacia senegal, Khaya senegalensis) et les graines de Parkia biglobosa (Néré). Parmi les autres PFNL importants se trouvent les plantes médicinales (e.g. Salvadora persica, Disopyros mespiliformis) et le savonnier (Balanites aegyptiaca).

<p>Les produits forestiers non ligneux (PFNL) les plus importants

du Tchad sont la gomme d&acute;<i>Acacia senegal</i>, les fruits de <i>Ziziphus

spina&minus;christ, Ziziphus mauritiana</i> (Jujube), <i>Tamarindus indica</i> et <i>de

Vitellaria paradoxa </i>(Karit&eacute;) ainsi que les noix du dernier. De plus, il y a

les plantes fourrag&egrave;res (<i>Acacia senegal, Khaya senegalensis)</i> et les

graines de <i>Parkia biglobosa</i> (N&eacute;r&eacute;).&nbsp;Parmi les autres PFNL

importants se trouvent les plantes m&eacute;dicinales (e.g. <i>Salvadora persica,

Disopyros mespiliformis)</i> et le savonnier (<i>Balanites aegyptiaca</i>).</p>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

<meta name="GENERATOR" content="Microsoft FrontPage 4.0">

<meta name="ProgId" content="FrontPage.Editor.Document">

<title>New Page 1</title>

</head>

<body>

<p><font face="Arial">Les produits forestiers non ligneux (PFNL) les plus

importants du Tchad sont la gomme d'<i>Acacia senegal, </i>les fruits de <i>Ziziphus

spina-christ, Ziziphus mauritiana </i>(Jujube), <i>Tamarindus indica </i>et<i>

de Vitellaria paradoxa</i> (Karité) ainsi que les noix du dernier. De plus, il

y a<i> </i>les plantes fourragères (<i>Acacia senegal, Khaya senegalensis) </i>et

les <i>g</i>raines de <i>Parkia biglobosa</i> (Néré).</font></p>

<p>&nbsp;</p>

<p><font face="Arial">Parmi les autres PFNL importants se trouvent les plantes médicinales

(e.g. <i>Salvadora persica, Disopyros mespiliformis) </i>et le savonnier (<i>Balanites

aegyptiaca</I>).</font></p>

</body>

</html>

Original text

Honey and beeswax are among the most important Zambian NWFP. In 1992, a production of 90.000 kg of honey (worth US$ 172 000) and 29 000 kg of beeswax (worth US$ 74 000 $) was recorded in official statistics (Ministry of Environment and Natural Resources, 1997c; Njovu, 1993). Honey production varies considerably from year to year.

<p>Honey and beeswax are among the most important

Zambian NWFP. In 1992, a production of 90.000 kg of honey (worth US$

172&nbsp;000) and 29&nbsp;000 kg of beeswax (worth US$ 74&nbsp;000 $) was

recorded in official statistics (Ministry of Environment and Natural Resources,

1997c; Njovu, 1993). Honey production varies considerably from year to year.</p>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

<meta name="GENERATOR" content="Microsoft FrontPage 4.0">

<meta name="ProgId" content="FrontPage.Editor.Document">

<title>New Page 1</title>

</head>

<body>

<p><font FACE="Arial" SIZE="3">Honey and beeswax are among the most important

Zambian NWFP. In 1992, a production of 90.000 kg of honey (worth US$

172&nbsp;000) and 29&nbsp;000 kg of beeswax (worth US$ 74&nbsp;000 $) was

recorded in official statistics (Ministry of Environment and Natural Resources,

1997c; Njovu, 1993). Honey production varies considerably from year to year.</font></p>

</body>

</html>

Font and the size of font should be left as default and “extra” tags (indicated in Bold) in the beginning and at the end of text should be erased.

The Netscape is more sensitive web browser than the Internet Explorer. It is important that the order of tags is logical. This means that if for example a paragraph tag is opened before a Bold or Italic tag, the tag for paragraph tag should be closed after closing the bold / Italic tag. All tags should be closed in reverse order of opening those.

Table 3. Right Order of opening and closing the tags.

Example of paragraph

Wrong HTML

Correct HTML

Tectona grandis is common.

<i><p>Tectona grandis </i>is common. </p>

<p><i>Tectona grandis</i> is common. </p>

Header tags <h1> and <h2> cannot be used by in HTML code, because of system is already using these headers. User can use headings between level three and six (<h3> - <h6>).

4.2 Preparing HTML tables for entry

Special effort has to be done when tables are converted to HTML text. For example, converting following table to HTML will create lot more tags than are needed.

Table 4. A simple table.

Local name

Scientific name

Bambi castanha

Cephalophus leucochilus

To describe all the details of table for the web browser takes lot of space and plenty of tags. The easiest way to describe properties is to define <table>. This means that table properties are not fixed. Unnecessary tags are written in Bold.

Left side in the following table is the target. Right side is a starting point, which needs editing.

Right side is at least three times longer than right side. This will delay of loading pages.

By using FrontPage for editing table following things should be changed.

Both, the alignment and the size of table should be left as default. Also the font and the size of font should be left as default. Cell size should not be defined. Header cells should be marked as <th> instead of <td> for getting right formatting in the pages. Also detailed information of table (<table BORDER="1" CELLSPACING="1" CELLPADDING="7" WIDTH="529">) can be replaced with simple table tag (<table>).

<tr> stands for table row

<td> stands for table data 

<th> stands for table heading

Table 5. HTML for producing a simple table, before and after cleaning.

Short HTML

Long HTML (including unnecessary and confusing tags)

<table>

<tr>

<th><b>

<p>Local name</p>

</b></th>

<th><b>

<p>Scientific name</b></p>

</th>

</tr>

<tr>

<td>

<p>Bambi castanha</p>

</td>

<td>

<p><i>Cephalophus leucochilus</i></td>

</tr>

</table>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

<meta name="GENERATOR" content="Microsoft FrontPage 4.0">

<meta name="ProgId" content="FrontPage.Editor.Document">

<title>New Page 1</title>

</head>

<body>

<font FACE="Arial" SIZE="3"></font>

<table BORDER="1" CELLSPACING="1" CELLPADDING="7" WIDTH="529">

<tr>

<td WIDTH="43%" VALIGN="TOP"><b><font FACE="Arial" SIZE="1">

<p ALIGN="CENTER">&nbsp;</p>

<p ALIGN="CENTER">Local name</p>

<p ALIGN="CENTER"></font></b></td>

<td WIDTH="57%" VALIGN="TOP"><b><font FACE="Arial" SIZE="1">

<p ALIGN="CENTER">&nbsp;</p>

<p ALIGN="CENTER">Scientific name</font></b></td>

</tr>

<tr>

<td WIDTH="43%" VALIGN="TOP"><font FACE="Arial" SIZE="1">

<p ALIGN="JUSTIFY">Bambi castanha</p>

<p ALIGN="JUSTIFY"></font></td>

<td WIDTH="57%" VALIGN="TOP"><i><font FACE="Arial" SIZE="1">

<p ALIGN="JUSTIFY">Cephalophus leucochilus</font></i></td>

</tr>

</table>

</body>

</html>

Following tags should be cleaned from table code: <html>, <head>, <meta>, <center>, <align>, <width>, <font> and <meta name>. Also tabular tags (&nbsp;) have to be erased if there is not a very special need for those (see Annex 1).

4.3 How to edit HTML files?

If the original file is a Word document format it can be converted either by using save as HTML command or by copying it to the HTML editor (FrontPage). After the file conversion in MS Word to HTML it can be opened in Notepad and after this cleaning of the HTML begins. If FrontPage is used as an editor, after pasting the text into the editor HTML code can be made visible by clicking the "HTML" tab and editing of source can go on.

The HTML tables can be formatted by using for example FrontPage or Netscape composer. Cleaning of HTML texts can be made by using basic text editor, like Notepad, although FrontPage is much more convenient. All the unnecessary tags can be deleted and when the file is in compact format, it is ready for inserting to the database.

Plain HTML text can be easily managed by Notepad or WordPad, but complicated HTML tables should be handled with more sophisticated editing software.

Figure 6. Both FrontPage and Notepad can be used for editing.

Automatic tool that would make all the cleaning for users would be needed. Some basic system development work has been done for making this kind of tool. If this tool will be developed, it is important that it would be properly checked and tested (see Annex 2).


Previous PageTop Of PageNext Page