Custom Translation

From webtrees
Jump to: navigation, search

webtrees is written in English, and uses English internally. The GEDCOM file format also uses English keywords and identifiers. But since human beings speak a variety of languages, we use translation files to convert text into each language. The translation files are stored in the language/ directory.

Note that we also have translations into the regional varients of English. Hence we have separate translations for "British English" and "American English".

You can customise these translations by adding a local translation file, and this page explains how to do this. But before starting, you should consider why you are making the change. If it is to improve or correct a translation, then it is better to suggest it for inclusion in the official translation file, so that everyone can benefit. Translations are currently stored at github.com, although a web-interface is planned.

Where to store the local language files

You should create a subdirectory "language" in the webtrees data directory. i.e. data/language/. If you have moved the data directory outside the webserver's document root, then it should be placed in your new, private data directory.

File names and formats

The filenames consist of the language code, plus an extension to indicate the format. Examples of language codes are:

  • it (Italian)
  • fr (French)
  • de (German)
  • sr-Latn (Serbian in latin script)
  • en-AU (Australian English)

Three formats can be used, and these are described below. The translations are always from US English to the target language.

Note that translations are always case-sensitive, so that "Hello" is translated differently to "hello".

CSV format

The simplest format is CSV. Although CSV stands for "comma separated values", we actually use semicolons instead of commas, and all text must be surrounded by double quotes. Here is an example of a CSV file for British English, which would be called data/language/en-GB.csv.

"yes";"certainly"
"no";"perhaps not"

This would cause the word "yes" to be replaced with "certainly", throughout the program. To include a double quote in the text, precede it with a backslash "\".

The advantages of this format are:

  • Simplicity. You can create the files using a simple text editor or a spreadsheet.

The disadvantages of this format are:

  • They cannot be used for complex translations, such as plurals (different languages have different ways of expressing plurals, depending on the number of objects), or multiple contexts (where the same word has many translations, depending on the context).
  • Depending on your server configuration, it may be possible for a visitor to your site to view the contents of the file, and you may have included sensitive information in it.

PHP format

The second format uses a simple PHP script to create an array of translations. As well as the translations, the file contains a simple check, to prevent visitors to your site from accessing the data. Here is an example of a PHP file for American English, which would be called data/language/en-US.php. To include a single quote in the text, precede it with a backslash "\".

<?php
return array(
  'yes'=>'for sure!',
  'no'=>'no way!'
);

It is important that there are no blank lines or spaces before the initial <?php. Also, the final ?> is optional - so we don't include it.

The advantages of this format are:

  • Relatively simple. You can create the files using a simple text editor.

The disadvantages of this format are:

  • They cannot be used for complex translations, such as plurals (different languages have different ways of expressing plurals, depending on the number of objects), or multiple contexts (where the same word has many translations, depending on the context).

Gettext format

The third format is called gettext. Gettext is an open-source, industry-standard, library for translations. Each language requires two files. There is a PO file, which contains human-readable text. Although you *can* edit these with text editors, it is common to use a dedicated "PO editor". POEdit is a common tool for this, although it has a number of limitations, and we recommend Better PO Editor, which was written by one of the webtrees developers, Michele Locati. The PO file needs to be converted into a MO file, which is a machine-readable, compressed version of the same file. But you may need to take the binary MO file in the release and convert it back to a PO file in order to edit it with the Better PO Editor. Only POEdit has the msgunfmt executable to do this reverse conversion. So you may need both tools.

Here is an example of a PO file for American English, which converts certain Jewish terms to Ashkenazi spellings. The file would be called en-US.po and you would need to convert it to a .mo file, which would be called data/language/en-US.mo. Note that it includes several context-sensitive translations, where we can have different translations for the same English text.

msgid ""
msgstr ""
"Project-Id-Version: webtrees\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"X-Poedit-Language: en\n"
"X-Poedit-Country: US\n"
"X-Poedit-SourceCharset: utf-8\n"

msgctxt "NOMINATIVE"
msgid "Heshvan"
msgstr "Cheshvan"

msgctxt "NOMINATIVE"
msgid "Tevet"
msgstr "Teves"

msgctxt "GENITIVE"
msgid "Heshvan"
msgstr "Cheshvan"

msgctxt "GENITIVE"
msgid "Tevet"
msgstr "Teves"

msgctxt "LOCATIVE"
msgid "Heshvan"
msgstr "Cheshvan"

msgctxt "LOCATIVE"
msgid "Tevet"
msgstr "Teves"

msgctxt "INSTRUMENTAL"
msgid "Heshvan"
msgstr "Cheshvan"

msgctxt "INSTRUMENTAL"
msgid "Tevet"
msgstr "Teves"

msgid "Heshvan"
msgstr "Cheshvan"

msgid "Tevet"
msgstr "Teves"

msgid "Bat mitzvah"
msgstr "Bas mitzvah"

msgid "Date of bat mitzvah"
msgstr "Date of bas mitzva"

msgid "Place of bat mitzvah"
msgstr "Place of bas mitzvah"

msgid "Source for bat mitzvah"
msgstr "Source for bas mitzvah"


The advantages of this format are:

  • It can be used for all translations, including the complex plurals and contexts. It is the format used by webtrees for its own translation files.
  • For files containing a large number of translations, it is faster and uses less memory than the other options, as it does not need to load all translations into memory.

The disadvantages of this format are:

  • It requires additional software to create the files.

Character encoding

Files should be saved in UTF8 encoding, rather than ISO-8859, latin etc. You should be especially careful with some text editors, as they often start the file with an invisible character (called a Byte-Order-Mark). If, after installing your file, you get an error of the form "Headers already sent", you have most likely included a BOM character.

General notes

  • Whatever language you are translating into, your custom language file must always translate from the text in the core code (I18N::translate('xxxxx'), not from any of the standard language files. If you want to see what these texts are on-screen, ensure you are viewing in en-US, rather than any other language.
  • The translations are cached for performance. After adding your custom file either clear the cache files (Control panel -> Website -> Clean-up data folder), or wait at least one hour.
This page has not been reviewed since webtrees 1.7.4. It may be out of date.