Transcription in HTML

From emcawiki
Jump to: navigation, search

Transcripts in HTML

Anyone working with Jefferson-based transcription symbols probably has the experience of not being able to produce these symbols easily with one’s word processor. The high dot indicating inbreath and the up and down arrows indicating intonation shifts are cases in point.

Exploring HTML (= Hyper Text Markup Language), I encountered similar difficulties and therefore I was wondering whether anyone has found solutions to such problems. Below is overview of common problems in this field, with some solutions, partly based on my own experiences, partly on suggestions provided by colleagues reacting to a previous version of this note.

Note: When looking at these explanations with a browser, you will see the effects, not the methods used. In order to see the methods you should either use a ‘view source’ kind of facility, or file this text in HTML-format and look at it with a text editor.


In HTML you can use the ordinary keyboard symbols without difficulty. So signs like [ ( = ? : can be produced as usual.

Italic, to indicate stress, requires tags around the letters to be so marked: for example word. Or you can use <em> and </em>, for emphasis, which in my browser, is also presented as italic. Underscore should be produced with <u> and </u>, but it often doesn’t work. To my eyes, however, italic is not very prominent in a transcript text. Therefore, I propose to use bold instead <b>,  </b> respectively, which can also be produced by strong,. i.e. <strong>, </strong>.

Left and right carats, are used in HTML to mark tags, and are therefore not usable directly within the transcript. They can be produced using special escape sequences: ‘greater than’ by “>” and ‘smaller than’ by “<” (no quotes).

The degree sign, used to mark softer parts of utterances, also requires a special ‘escape sequence’: “°”, as in °soft°.

The dot-prefaced h, to indicate inbreath, can be simulated by either a low point as in “.hhh”, or the single quotation marker as in “’hhh”; but you can also use “·”, which should produce a centered point, as in “·hhh”.

As far as I know, there are no possibilities to use ‘overstrike’, as in the combination of a question mark and a comma (?+,) to indicate an intermediary rise in intonation. One could use the upside down question mark “¿”. Alternatively, one could use the <strike> tage, which produces: ?.

In a first version of this note, the only symbols I was not able to produce or simulate were the up and down arrows, used to indicate intonation shifts. Therefore, I proposed:  ^ (“shift 6”) for an upward arrow, and upside down exclamation marker “¡” for a downward one.

In the mean time, however, Celso Alvarez-Caccamo, Depto. de Linguistica Geral  e Teoria da Literatura, Universidade da Corunha, 15071 A Corunha, Galiza (Espanha) (see his Web page!) helped me out by creating graphic image files that can be used for some of these symbols. These are transparent GIF files, which are very small in size and can be used with any background color:

Note: you can down load these small file for your own use by shift-clicking on the links in the list above.

He suggests the following procedure:

Use a special, single character for each arrow or triangle (for instance, $, % @ and \ ). After composing the text, do global Replace commands to insert the HTML tags. Activate the Verify or Confirm options, just in case. Each special symbol should be replaced with the <img src=“GRAPHICFILE”> tag.  For instance, “in$credible” should yield “in<img src=”arup.gif“>credible”.

You may look at how the arrows look in a document he has created  (he uses arrows for final junctures).

If your transcript is in a language that uses special characters, for example letters with an umlaut or an accent, not available within standard ASCII, you also need special escape sequences for that.


A special difficulty arises from the fact that browsers reading HTML ignore multiple spaces, which means that it is hard to make overlaps etc. visibly clear. In order to obtain precise alignments in transcripts: always use <pre> and </pre> tags around the extracts and prepare your text using a non-proportional, monospaced font, such as Courier.


I have prepared some exemplary html transcription extracts, using these solutions and conventions, in which you can observe how these work. And you can also take a look at Celso’s document, mentioned (as a link) above.