Font2targa UTf-8 encoding

**dolorosa** · 26.02.2021 23:11

Hi guys, as a part of this worldwide community, I am willing to share some of my experiences with all of you.
I have recently planned to create a font for Gothic 2 Returning. Everything worked fine, but not as I wanted. I have asked a guy to implement support for UTf-8 encoding into the software called font2targa. He did it, BUT the reason why I wanted it, it was to have my own romanian letters with diacritics(Ă ă Î î â Â Ș ș Ț ț)-> the last four letters are comma below and not cedilla ones. Cedilla ones are similar, but they aren't the same thing(they are supported by windows 1250 encoding). All I want is the comma below ones. UtF-8 is the only that has support for them. So after I created the font with the new encoding(UTF-8), entered the game I noticed that the letters with comma weren't shown, and the ones with circummflex and breve were replaced by other letters. I will leave the link with the tool, maybe someone who has more knowledge is willing to help within this matter. If you want to test it, run it from:Font2Targa/bin/Debug folder

https://drive.google.com/file/d/1HCJ...ew?usp=sharing

**Lehona** · 26.02.2021 23:21

I don't think a utf-8 aware font will help you if the game itself does not use utf-8.

**dolorosa** · 26.02.2021 23:31

Zitat von Lehona

I don't think a utf-8 aware font will help you if the game itself does not use utf-8.

But how does it come that the text file of my translation database that I import with easy Gothic mod translator is utf-8,and it works?

**TopLayer** · 27.02.2021 03:58

Imagine russian text "Блин".
According to the Unicode symbol table it is the sequence of numbers "1041 1083 1080 1085".
Any compiler must have the way to store this sequence of numbers as the sequence of bytes.
Russian version of Daedalus compiler use cp1251 encoding algorithm to convert the sequence of numbers to the sequence of bytes: "193 235 232 237" (if some Unicode numbers can not be represented in this encoding they are removed or replaced). This sequence is stored in DAT file.

Gothic text printing system has no info about language or encoding in wich strings are stored. It just get numeric value of each byte of the string, select the cell according to that NUMERIC value from the font texture you provided and draw that piece of texture on the screen.

Imagine you want to translate the mod to romanian. You open Easy Gothic Mod Translator and explicitly select the language you want to translate from! You choose from RU to RO. The tool reads the sequence of bytes "193 235 232 237" from the DAT file and, because you choosed "from RU", supposes that these bytes are cp1251 representation of the string, uses cp1251 decoding algorithm and gets the number sequence "1041 1083 1080 1085". Then it uses UTF-8 encoding algorithm and gets bytes "208 145 208 187 208 184 208 189", which are send to the Google service. Google translate it and sends back the string "Clătită" in UTF-8 encoded representation "67 108 196 131 116 105 116 196 131". EGMT accumulates these UTF-8 encoded bytes of russian and romanian strings, adds other info and stores it in the text file you can edit by any modern text editor.

Then you ask the tool to inject translation to .DAT file. Because you choosed "to RO", the UTF-8 bytes are converted to cp1250 bytes: "67 108 227 116 105 116 227". As result the sequence of bytes
"193 235 232 237"
is replaced in .DAT file with
"67 108 227 116 105 116 227"
If you start the game with russian font textures you will see "Clгtitг" string (because in the cell 227 russian letter 'г' is drawn). So, you should take care that all used russian fonts are effectively replaced by romanian fonts.

**dolorosa** · 27.02.2021 09:22

Zitat von TopLayer

Imagine russian text "Блин".
According to the Unicode symbol table it is the sequence of numbers "1041 1083 1080 1085".
Any compiler must have the way to store this sequence of numbers as the sequence of bytes.
Russian version of Daedalus compiler use cp1251 encoding algorithm to convert the sequence of numbers to the sequence of bytes: "193 235 232 237" (if some Unicode numbers can not be represented in this encoding they are removed or replaced). This sequence is stored in DAT file.

Gothic text printing system has no info about language or encoding in wich strings are stored. It just get numeric value of each byte of the string, select the cell according to that NUMERIC value from the font texture you provided and draw that piece of texture on the screen.

Imagine you want to translate the mod to romanian. You open Easy Gothic Mod Translator and explicitly select the language you want to translate from! You choose from RU to RO. The tool reads the sequence of bytes "193 235 232 237" from the DAT file and, because you choosed "from RU", supposes that these bytes are cp1251 representation of the string, uses cp1251 decoding algorithm and gets the number sequence "1041 1083 1080 1085". Then it uses UTF-8 encoding algorithm and gets bytes "208 145 208 187 208 184 208 189", which are send to the Google service. Google translate it and sends back the string "Clătită" in UTF-8 encoded representation "67 108 196 131 116 105 116 196 131". EGMT accumulates these UTF-8 encoded bytes of russian and romanian strings, adds other info and stores it in the text file you can edit by any modern text editor.

Then you ask the tool to inject translation to .DAT file. Because you choosed "to RO", the UTF-8 bytes are converted to cp1250 bytes: "67 108 227 116 105 116 227". As result the sequence of bytes
"193 235 232 237"
is replaced in .DAT file with
"67 108 227 116 105 116 227"
If you start the game with russian font textures you will see "Clгtitг" string (because in the cell 227 russian letter 'г' is drawn). So, you should take care that all used russian fonts are effectively replaced by romanian fonts.

There is a thing, windows 1250 doesn't support ț Ț ș Ș(letters with comma below),but supports the ones with cedilla,they look similar,but gramatically speaking they aren't correct. So my question is: will utf-8 encoding fonts work?

**TopLayer** · 27.02.2021 12:19

Zitat von dolorosa

There is a thing, windows 1250 doesn't support ț Ț ș Ș(letters with comma below),but supports the ones with cedilla,they look similar,but gramatically speaking they aren't correct. So my question is: will utf-8 encoding fonts work?

No. But you can exploit letters that are not actually used in other case.

cp1250 encoding has the following symbol:

Code:

172: '¬'

But nobody actually uses it in Gothic texts. So, in Sargon's .csv file you can replace 'ț' by '¬'.
Then you redraw symbol 172 in font texture, so it looks like 'ț'.
The same trick can be done for other symbols.

**dolorosa** · 27.02.2021 13:22

Zitat von TopLayer

No. But you can exploit letters that are not actually used in other case.

cp1250 encoding has the following symbol:

Code:

172: '¬'

But nobody actually uses it in Gothic texts. So, in Sargon's .csv file you can replace 'ț' by '¬'.
Then you redraw symbol 172 in font texture, so it looks like 'ț'.
The same trick can be done for other symbols.

Where can I find this whole list with all the number and the characters they correspond to? Do you have a list? One more thing that came up in my mind. If i do as you say, what am I gonna do with the .fnt files? The font is made of two types of files: .FNT and .TEX, and they both go together in the same.vdf file.

**Lehona** · 27.02.2021 13:29

Wikipedia has a list.

**dolorosa** · 27.02.2021 13:43

Zitat von Lehona

Wikipedia has a list.

One more thing that came up in my mind. What am I gonna do with the .fnt files? The font is made of two types of files: .FNT and .TEX, and they both go together in the same.vdf file.

**TopLayer** · 27.02.2021 14:56

Zitat von dolorosa

Where can I find this whole list with all the number and the characters they correspond to? Do you have a list?

For example: https://ideone.com/JW0KRD

Zitat von dolorosa

The font is made of two types of files: .FNT and .TEX, and they both go together in the same.vdf file.

AFAIK they can be created by Gothic engine from tga files (on first usage).
For example, I created file \_WORK\Data\Textures\FONTS\NOMIP\Font_10_book_RO.tga.
When, using plugin Union_MarvinHelper I writed in marvin console

Code:

call printscreen "SomeText" -1 -1 "Font_10_book_RO.tga" 10

After command is executed, I could find the corresponding .fnt and .tex files in _WORK\Data\Textures\_Compiled\ folder.
After that I should rename this files (delete RO from the names) so they can replace original fonts.
Finally, I need to pack them in vdf file. If timestamp of the new vdf file is higher than timestamps of other vdf files containing fonts with the same name then the new font is used for text output.

**MadFaTal** · 27.02.2021 15:09

Zitat von dolorosa

One more thing that came up in my mind. What am I gonna do with the .fnt files? The font is made of two types of files: .FNT and .TEX, and they both go together in the same.vdf file.

So far I know Gothic can handle up to a maixum of 256 characters and Gothic do not know the code page of your installed operating system.
So far I know Gothic can't handle code pages or stuff like UTF.

The .fnt file hold's information which characters are available and the location in the texture (.tex file): UV coordinates
The texture file holds the visual information of the characters.

My topic is in German, but maybe you find some help: here

Font2targa UTf-8 encoding

Themen-Optionen

Font2targa UTf-8 encoding

Font

Berechtigungen