TJL73 - Tutorial for digitally signing HTML sources

Tutorial for digitally signing
HTML sources

(or, how to sign a HTML source in a "transparent" way for the parser)

Pagina disponibile anche in

- italiano -

(di TJL73)

Page also available in

- english -

(by Carlo Luciano Bianco)

Stránka je dostupná aj v

- slovenčine -

(preložil )

Page également disponible en

- français -

(par Rinux)

Disclaimer

The aim of this tutorial is to describe a possible procedure to attach an OpenPGP digital signature to a web page, in such a way that it does not affect the browser parsing.

FOREWORD: In this document the name "OpenPGP" is incorrectly used like it was the name of a software, but it is not: this name means any software which is compatible with the OpenPGP standard, like, e.g., GnuPG and PGP.

The following procedure is valid only for the static HTML code (including JavaScript, CSS and any other thing interpreted "client side").
[I like to emphasize that, with the "DHTML" term, it is meant a "page with dynamic contents", while its HTML source is static. What follows can then be applied also to this kind of pages.]
Dynamic sources, generated by "server side" languages (like, e.g., ASP, CGI, PERL, PHP), are not compatible with the technique here described, because they are generated dynamically, each time, by the web server. A digital signature made by the author by his own hands is then impossible: by definition, the author can only sign static code (i.e. statically saved on the web server). It is possible to implement an automatic signature of the dynamic code generated by the server, installing an appropriate script on the server itself. This implies that the key, used to make the signature, must reside in a non trusted place and, then, such a signature cannot be trusted like the one made by the author himself by his own hands.

This method is in no way a replacement for SSL/TLS, e.g. because it does not allow encrypted connections and is one-way only "server -> client". However, due to the fact that it does not need to be implemented "server side", this method is very useful in all the situations in which the web server is not under the direct control of the web designer, when SSL/TLS cannot be used (the most common example is someone who wants to sign a personal page hosted on a free hosting system like the present one, freely provided by Altervista.org).
In many situations, this method represents the only way to authenticate a HTML web page.

Creation of the signed HTML code

To reproduce what happens in this document, it is necessary to put two markers of closing and opening comment, respectively at the beginning and at the end of the HTML code to be signed, in the following way:

-->
<HTML>
[Text or HTML code to be signed]
</HTML>
<!--

It must be noted (and, if you have not noted it yet, I am making you note it now) that, before the closing comment marker ("-->"), there is a blank space.

-->
<HTML>
[Text or HTML code to be signed]
</HTML>
<!--

This is needed in order to avoid that, during the creation of the OpenPGP digital signature, a "- " (hyphen space) is added to the beginning of that line (following the OpenPGP specifications), obtaining "- -->".
It must be noted that the presence of that "- " will not have any side-effect on the parsing, but its absence will result in a cleaner HTML code.

NOTE: If, in your pages, for any reason, there is a hyphen "-" as the first character of a line, it will be necessary to add a comment delimiter in the first column. In this way OpenPGP will not add a "hyphen space" at the beginning of the line when applying the signature.
The resulting code will then be: "-". This will not have any side-effect on the code parsing but will avoid a code modification made by OpenPGP.

Now, you can select all the HTML code of the page (including the "line-end" at the end of the document):

-->
<HTML>
[Text or HTML code to be signed]
</HTML>
<!--

and let OpenPGP apply to it a "Clear Text Signature".

NOTE: If you use tools like PGP or GPGshell, you have to disable the "word wrap" feature, because it may modify your HTML code making it impossible to read for your browser.

After the signature, you can paste back again the clearsigned HTML code in your preferred HTML editor:

and add two comment markers at the beginning and at the end of the source code:

Now, we have a so-structured HTML code:

<HTML>
[Signed text or HTML code]
</HTML>


The green emphasized text will be considered like a comment by your browser and will then be ignored during the parsing. The yellow emphasized part, i.e. the original HTML code, will then be interpreted by the browser exactly like if the OpenPGP signature was not there at all.

On this page, you can see the result of that HTML code (of course, this is just a trivial example, then the signature will not be verifiable. To see a real example of a page signed with such a technique, you can look at my X-Face page or at the page with my OpenPGP keys).

NOTE: The code must be signed again after any update of the page, or the signature will no longer be verified.

Of course, the presence of a digital signature is transparent to the browser, and then it would be appropriate to add, at the bottom of the page, a note revealing its presence with a link to the tutorial explaining what it is about.
I have created a logo which can be freely used to mark the presence of an OpenPGP digital signature:
Such logo is available in two different graphic formats and should be inserted at the bottom of each page which has an OpenPGP digital signature.
If you decide to use this authentication technique in your HTML pages, you are kindly requested to insert this logo, following these simple rules. To me, this will be for sure an honour.

Verification of a signed HTML code

Let's proceed, now, to the verification of the page signed using OpenPGP.

The procedure to verify the HTML code is very simple.
You have just to view the HTML code of the page (every browser allows to do such an operation; it is usually present an appropriate menu item right-clicking on the page) and verify it using OpenPGP.
If the digital signature of the page will be verified, it would mean that the page has not been modified. Instead, if it will NOT be verified, it would imply that the page has been tempered (or, anyway, that it does not match any longer the original one which has been signed).

Note for Gecko users

Mozilla v1.7.8 and FireFox v2.0.0.14 (and, I think, all the browsers which uses the Gecko parser) do not show correctly the sources signed with this technique because, when the code is shown by the browser "View source" command and is copied to the clipboard, some blank lines are randomly added. These modifications do not harm in any way the HTML parsing by the browser but make invalid any OpenPGP signature of the page. Other browsers, like Opera and Internet Explorer, instead, can copy correctly the original HTML source code in the clipboard, so that the digital signature can be correctly verified.
There is already a Bugzilla entry about this bug. You can follow there the evolution of this bug report and, maybe, give your vote to the bug to rise its "priority".

For a possible workaround of this bug, Gecko users can save the page in a local file, from the "View source" window or directly from the browser, using the Ctrl+S keyboard shortcut or the appropriate item in the "File" menu.
Saving the code from the "View source" window should not create any problem. If, instead, you want to save it directly from the browser, be sure to select "HTML only" in the "Save as..." DialogBox, to avoid any modifications in the links to the locally saved external objects (e.g. images).
After having saved locally the HTML source, it will be possible to view it with any editor and verify the OpenPGP digital signature.
I do not know the behaviour of other browsers but the ones I tested: (Opera, Mozilla, FireFox and Internet Explorer).
Any report about this is, of course, very appreciated.

Creation of a signed part of the document

Of course, it is possible to apply all the procedure described above also to single parts of the page.

Maybe a page, which contains important informations to be verified, must be translated in different languages. If such informations does not need to be translated, (like, e.g., the checksum of some distros), it will be good to sign these parts only, leaving open the possibility to translate all the remaining parts of the page without breaking the signature attached by the original author.

This can be an example of source code explaining this concept:

<HTML>
<HEAD>
<TITLE>Example</TITLE>
</HEAD>
<BODY>
[Text or HTML code to be translated or, anyway, which can modified]

<TABLE>
[Signed text or HTML code]
</TABLE>

[Text or HTML code to be translated or, anyway, which can modified]
</BODY>
</HTML>

In this example, the red-emphasized parts are the ones which can be translated or modified, without breaking the signature.
The green-emphasized parts are the ones which contains the digital signature (and which will be ignored by the browser). They cannot, then, be modified.
The yellow-emphasized parts are the ones digitally signed by the original author. Also these parts, like the green ones, must not be modified or the digital signature will become invalid.

A real-life example of this application is available on this page. If the signatures attached to that page were made by Werner Koch, and not be me, you could be sure, beyond any doubt, of the validity of that checksums, without having to look on the network for other informations (like, instead, suggested on the original page).

WARNING: These two pages, signed by me, present on my website, do NOT have any validity but merely explicative.

Of course, the document can contain different portions signed by different authors, allowing, e.g., the original author(s) signature(s) on some portions of text or code and the signature of the whole page source attached by the translator. This allows a nidification of the signatures, leaving valid them all, each one at its onw level, like in the following example:

,---------[ Signature by translator "T" ]---------.
|                                                 |
|  Text portion translated by "T"                 |
|                                                 |
|                                                 |
|  ,--------[ Signature by author "A" ]--------.  |
|  |                                           |  |
|  |   Text written and signed by author "A"   |  |
|  |                                           |  |
|  `-------------------------------------------'  |
|                                                 |
|  Other portion of text translated by "T"        |
|                                                 |
|                                                 |
|  ,--------[ Signature by author "B" ]--------.  |
|  |                                           |  |
|  |   Text written and signed by author "B"   |  |
|  |                                           |  |
|  `-------------------------------------------'  |
|                                                 |
|  ,--------[ Signature by author "C" ]--------.  |
|  |                                           |  |
|  |   Text written and signed by author "C"   |  |
|  |                                           |  |
|  `-------------------------------------------'  |
|                                                 |
|  ,-----[ Other signature by author "B" ]-----.  |
|  |                                           |  |
|  |   Text written and signed by author "B"   |  |
|  |                                           |  |
|  `-------------------------------------------'  |
|                                                 |
|  Other portion of text translated by "T"        |
|                                                 |
|                                                 |
|  ,-----[ Other signature by author "A" ]-----.  |
|  |                                           |  |
|  |   Text written and signed by author "A"   |  |
|  |                                           |  |
|  `-------------------------------------------'  |
|                                                 |
|  Other portion of text translated by "T"        |
|                                                 |
`-------------------------------------------------'

In the scheme on the left, it is possible to see that, modifying one of the portions contained in the small boxes, both the signs of the translator and of the corresponding author are invalid (unless the modification has been made before the attachment of the signature by the translator, of course).
Instead, modifying one of the portions external to the small boxes, only the translator signature will become invalid, but the signatures of the various authors will be still valid.

If, instead, the original author wants to certify also the translation, he can sign the page after the translator, like in the following scheme:

,----------[ Second signature by author "A" ]-----------.
|                                                       |
|  ,--------[ Signature by translator "T" ]----------.  |
|  |                                                 |  |
|  |  Portion of text translated by "T"              |  |
|  |                                                 |  |
|  |                                                 |  |
|  |  ,--------[ Signature by author "A" ]--------.  |  |
|  |  |                                           |  |  |
|  |  |   Text written and signed by author "A"   |  |  |
|  |  |                                           |  |  |
|  |  `-------------------------------------------'  |  |
|  |                                                 |  |
|  |  Other portion of text translated by "T"        |  |
|  |                                                 |  |
|  `-------------------------------------------------'  |
|                                                       |
`-------------------------------------------------------'

Here, the author "A", created and signed a text (or a HTML code).
Then, the translator "T" translated the text portions in his own language, signing them with his own key.
Finally, the original author "A" signed again, with his own key, the whole page, certifying also the translation made by "T" (without applying any further modification to the HTML code).

Certifying also the external objects (in case of a real paranoia)

Maybe there is the need to certify also the objects inserted in the HTML page. I mean, mainly, the images.
In this case, you can compute a checksum of such objects (MD5 or SHA1) and write such checksums in the signed part of the comments.
This, contrary a the single "Detached ASCII-armored Signature" of each object, allows to reduce the needed space, keeping the same strength for the certification. After all, a digital signature is simply the certification of a hash of the object, and this is exactly what we are doing.
Of course, not all the images will need a certification... only the ones which have a real importance for the verification of the page will be authenticated. This means that, e.g., all the decorations, animations and backgrounds (unless they have a real importance for the content of the page) will not need to be certified with their checksums.

Let's see a practical example of such a technique.

This is the HTML code to be signed:

-->
<HTML>
<HEAD>
<TITLE>Example of images authentication</TITLE>
</HEAD>
<BODY BACKGROUND="Background.gif">
<DIV>
[Signed text or HTML code]

<IMG SRC="Image1.jpg">
<IMG SRC="Image2.gif">

<IMG SRC="Small_Box.gif">

<IMG SRC="Image3.jpg">
<IMG SRC="Image4.png">

[Signed text or HTML code]
</DIV>

</BODY>
</HTML>

<!--
Image checksums:
File Name | Size | MD5 Checksum
-----------+------------+------------------------------------
Image1.jpg | 46338 byte | E2EE9822 BFF5407D 696B573E FB2989C0
Image2.gif | 17379 byte | F1AD8852 6150B054 4DF46226 2B7D6BD4
Image3.jpg | 40762 byte | 8A4F262F E336F854 077714E7 ABF3C911
Image4.png | 18065 byte | F06A61F2 6808B01A B9EEF949 F1577264

In this example I marked with different colours the HTML code:
in blue the tags,
in green the comments correctly delimited,
in black the text shown by the parser,
in orange the files to be certified,
in violet the ones we do not care about,
in grey what would be a comment but has not been correctly delimited yet.
This allows to see "at a first look" the various parts of the code we are analysing.

What follows, instead, is the HTML code after the attachment of the digital signature:

<HTML>
<HEAD>
<TITLE>Example of images authentication</TITLE>
</HEAD>
<BODY BACKGROUND="background.gif">
<DIV>
[Signed text or HTML code]

<IMG SRC="Image1.jpg">
<IMG SRC="Image2.gif">

<IMG SRC="Small_Box.gif">

<IMG SRC="Image3.jpg">
<IMG SRC="Image4.png">

[Signed text or HTML code]
</DIV>

</BODY>
</HTML>



The structure of the table (or the list of the certified files, if you do not like to use a table), is merely an exemplification but, to have be useful, it should have at least the name of the object (with path and extension, to identify the file format), its size in bytes and its checksum (specifying the hashing algorithm used to compute it).
This implies that, if even only one value of type, size or checksum was modified (to try to justify a change or modification of an image), the signature of the page would become invalid.
Then, if the signed portion of the HTML code, source of the page, resulted authentic after a verification of the signature, we would be sure that such checksums, corresponding to the present images (or other objects), are valid.
Then, it will be enough to compute the checksum of the image which we are interested to verify, cross-checking it with size and type, to be sure that the image has not been modified.
All the above, speaking about "objects", can be applied, beside to the images, also to CSS and other linked files like, e.g., flash animations, text files and much more.
It is important, however, that such objects are under the direct control of the signing author because, in the opposite case, a modification by the actual owner would make the checksums invalid.
Instead, for other kind of files, like compressed archives or executables which can be downloaded from the web page, it is better to write the checksum the traditional way, so the downloaders can verify the integrity with the classic technique.
Another good method is to add a "Detached Signature" to the files.

Alternative method

An alternative possible technique is to put the checksum directly in the link to the object. In this way, every checksum will be directly linked to the corresponding object.

In the following, an example of this technique:

In the first line of the example (for sake of clarity), there have been emphasized: in yellow the tag, in red the object size in bytes, in green the hashing algorithm used, in blue the object checksum, computed with such an algorithm.

Of course, this technique introduces a tag ("CHKSUM") which is not valid for the parser. This means that no browser will interpret such a tag, which will then remain invisible. On the other hand, such HTML source could be considered not valid by the "W3C Validator".

However, in this way it can be possible for the browser itself (natively or with a plug-in, like e.g. Enigmail) to verify directly the validity of an object, computing its checksum.

Then, if the page will be validated and these tags verified, we will have obtained our goal.

Signing portions of JavaScript code

Since JavaScript, like HTML, is a code to be executed client side, it is also possible to sign portions of JavaScript code, both "embedded" in the HTML code or "included" from an external .js file.
Our HTML page already contains all the "embedded" JavaScript parts, so they are already signed together with the whole page.

On the other hand, it will be necessary to explicitly sign the JavaScript code contained in any external .js file.
To sign such an external code, it will be necessary to create some "commented out" parts in every .js file to insert the signature, like we already did with the HTML code.

In the following example we see how to sign a portion of JavaScript code:

*/

var List = new Array (
"Element 1","Element 2","Element 3","Element 4","Element 5"
)

/*

As you may have already noticed, there are an "end comment" (" */ ") at the beginning of the JavaScript code and a "start comment" (" /* ") at the end. All you need to do is to select the content of the .js file to be signed, to sign it and to add the two comment delimiters before and after the signed part, as in the following:

/*

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

*/

var List = new Array (
"Element 1","Element 2","Element 3","Element 4","Element 5"
)

/*

-----BEGIN PGP SIGNATURE-----
Version: OpenPGP
Comment: Clearsigned JavaScript Source Code

xX0XXXXXxXxxXxxxXxX0xXXXXx0XXX0XXxxXxxXxxx0XXXXXxX+X0XxX0XXxXxxx
X0XXXxXxxxXXXxx0XXxxxxX=
=xXxX
-----END PGP SIGNATURE-----

*/

The green lines, like in the case of the HTML code, will be ignored by the JavaScript interpreter, while the yellow part will be considered:

Also in this case we can add the checksums of the objects related to the JavaScript code, as in the following example:

/*

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

*/

var List = new Array ( // Checksum: "Algo;Size;Digest"
"Element 1",           // Checksum: "SHA1;1234;E1EC203D81C37D201D18BB8D9AEE50E37DB6E21A"
"Element 2",           // Checksum: "SHA1;5678;BB8D9AEE50E37DBE1EC203D8E21A01D181C37D26"
"Element 3",           // Checksum: "SHA1;9012;C203D8137D26BB8D9AEE50E37DBCE1EE21A01D18"
"Element 4",           // Checksum: "SHA1;3456;6E21A01D18BB8EE50E37DB1EC203D81C37D2D9AE"
"Element 5"            // Checksum: "SHA1;7890;1A01D18BB8E1ECD26E2D9AEE50E37DB203D81C37"
)

/*

-----BEGIN PGP SIGNATURE-----
Version: OpenPGP
Comment: Clearsigned JavaScript Source Code

xX0XXXXXxXxxXxxxXxX0xXXXXx0XXX0XXxxXxxXxxx0XXXXXxX+X0XxX0XXxXxxx
X0XXXxXxxxXXXxx0XXxxxxX=
=xXxX
-----END PGP SIGNATURE-----

*/

In fact, all that appears after the "line commented" symbol (" // ") will be ignored by the JavaScript interpreter, and it will be used only to certify the related objects (like it already happens in the case of the HTML code).

To verify the signed JavaScript code, therefore, it will be enough to open the .js file and to verify the signature.

Last change:

The content of this page is subjected to free use and distribution,
as described into Creative Commons Public License v3.0, but its property remains exclusively owned by TJL73.

If you want to use (also only a part of) this tutorial, you must cite the source, in your documents.
I hope that many other localized versions of this page will follow soon.
Comments, suggestions and bug reports are of course appreciated.

Thanks to Theoretical for having strongly requested and obtained such a page and to Carlo Luciano Bianco,
Rinux, Lapo Luchini, Stefano Winzozz, Martin Lukáč and Vincenzo Reale for having quietly tested this madness.

The translators are the only responsibles for the content of the translated texts.
I, TJL73, am just hosting their translations, without assuming any responsibility
on the content of such non-Italian versions of the page.