Cannot export TM from txt into tmx because of strange characters in TM
Tópico cartaz: ni-cole
ni-cole
ni-cole  Identity Verified
Suíça
Local time: 16:11
alemão para francês
+ ...
Mar 3, 2012

I am using Windows 7, Office 2010 and Wordfast Classic 6.01 g

I sometimes need to convert a txt-TM into tmx and usually do that with the Speical Filters in Wordfast or using Olifant (export).

In the last days, I had a lot of problems. In Wordfast, it seemed to work, but when I wanted to open the tmx in Olifant, an error-message appear and then, only 124 of the about 1000 segments where in the TM. When trying to convert in Olifant, it didn't work and same error-message a
... See more
I am using Windows 7, Office 2010 and Wordfast Classic 6.01 g

I sometimes need to convert a txt-TM into tmx and usually do that with the Speical Filters in Wordfast or using Olifant (export).

In the last days, I had a lot of problems. In Wordfast, it seemed to work, but when I wanted to open the tmx in Olifant, an error-message appear and then, only 124 of the about 1000 segments where in the TM. When trying to convert in Olifant, it didn't work and same error-message appeared.

Here is the message (Screenshot from Olifant):
http://www.screencast.com/t/FW6gVZos6DTD

After looking to the txt in Olifant and in Notepad, I finally noticed that they were some strange characters in some segments: in the middle of a word, there is writing US on a black background (see image), visible only in Notepad and not in Olifant. When I try to copy it to look for then with Ctrl + H, they don't appear. They also don't appear when I am copying the whole word (as in the picture).

Here the strange characters (Screenshot from Notepad, trying to copy it into the "Find + Replace"-Window):
http://www.screencast.com/t/OK1oCCnt

I deleted all this strange characters one by one and then it was possible to export the TM in tmx with Olifant.

I saw that this characters also appear in other TM and I really don't want to delete then all manually as it takes a lot of time. And I also don't know why they came and if they will reappear later...

Does anyone know this problem? Has anyone an idea how to resolve it?

Many thanks in advance!
Collapse


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 17:11
finlandês para francês
Use the 'Check for invalid XML characters' option in Olifant Mar 3, 2012

ni-cole wrote:
I deleted all this strange characters one by one and then it was possible to export the TM in tmx with Olifant.

When importing a TMX in Olifant, you can use the 'Check for invalid XML characters' option:



It should take care of your "strange characters" without you having to delete them one by one.


 
Christel Zipfel
Christel Zipfel  Identity Verified
Local time: 16:11
Membro (2004)
italiano para alemão
+ ...
I had the same problem some time ago Mar 3, 2012

I noticed there were lots of &'84 and the like (or &' with different numbers) in the TM and much target TUs within quotations marks whereas in the source text there weren't.

I spent a lot of time trying to export the TM in tmx and tried also with Olifant. It was a very big TM so it wasn't even conceivable to delete all those characters manually. Finally it worked with Wf's special filter "Mark suspicious TUs" which I then deleted and I guess I have lost quite a few entries in this w
... See more
I noticed there were lots of &'84 and the like (or &' with different numbers) in the TM and much target TUs within quotations marks whereas in the source text there weren't.

I spent a lot of time trying to export the TM in tmx and tried also with Olifant. It was a very big TM so it wasn't even conceivable to delete all those characters manually. Finally it worked with Wf's special filter "Mark suspicious TUs" which I then deleted and I guess I have lost quite a few entries in this way, but I didn't have another choice, unfortunately.

I wonder very much, too, what these characters and the quotations marks mean.
Collapse


 
ni-cole
ni-cole  Identity Verified
Suíça
Local time: 16:11
alemão para francês
+ ...
CRIADOR(A) DO TÓPICO
'Check for invalid XML characters' don't help Mar 4, 2012

Thank you for helping me!

I tried the option 'Check for invalid XML characters' but the same error message came and again the new tmx-TM had only 110 segments instead of about 15000.

It said that:

Importing file...
Error: Unerwartetes Token '#'. Erwartet wurde das Token ';'. Zeile 1004, Position 11.
bei System.Xml.XmlTextReaderImpl.Throw(Exception e)
bei System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue,
... See more
Thank you for helping me!

I tried the option 'Check for invalid XML characters' but the same error message came and again the new tmx-TM had only 110 segments instead of about 15000.

It said that:

Importing file...
Error: Unerwartetes Token '#'. Erwartet wurde das Token ';'. Zeile 1004, Position 11.
bei System.Xml.XmlTextReaderImpl.Throw(Exception e)
bei System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos)
bei System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
bei System.Xml.XmlTextReaderImpl.ParseText()
bei System.Xml.XmlTextReaderImpl.ParseElementContent()
bei System.Xml.XmlReader.WriteNode(XmlTextWriter xtw, Boolean defattr)
bei System.Xml.XmlReader.ReadInnerXml()
bei Olifant.TMXReader.ProcessSeg()
bei Olifant.TMXReader.ReadItem()
Error: The last entry read was the entry number 110.
Number of entries read = 110 (Added = 110)
Duration = 00:00:08.8920156
--- 04-Mrz-2012 16:43:10 ---------- End Task
Error count = 2
=================================== End Process


Actually, I think I have to find out why this strange characters appear in the txt-TM and how to get then away from the txt-TM. I tried to import the txt-TM into Olifant, but then, this option doesn't appear, so that I couldn't resolve it like this, it need to be a tmx-TM that is imported.

I allready cleaned 4 TMs, but there are some more with the same problem. I not only don't have the time to clean everything manually, I am also afraid that the strange characters will went back later and that I will have to do this all the time.

If there is no other solution, I may try the one of Christel: better lose some TMs than the 90% of them!

But I am still hoping there is a good solution out there!
Collapse


 
ni-cole
ni-cole  Identity Verified
Suíça
Local time: 16:11
alemão para francês
+ ...
CRIADOR(A) DO TÓPICO
I may found the origin of one sort of strange character! Mar 5, 2012

I opened the TM I just was using a today and saw some new strange characters in it. As they were in my actual source-text, I had a look on it.

It seems to be a kind of hyphen, the one you can put with Ctrl + hyphen to indicate where to hyphenate the word in case of a line break (I don't know if I could explain it correctly).

So may be my question should be: how to take them out of my source-textes as fast and easy as possible ?

(I am afraid that this resol
... See more
I opened the TM I just was using a today and saw some new strange characters in it. As they were in my actual source-text, I had a look on it.

It seems to be a kind of hyphen, the one you can put with Ctrl + hyphen to indicate where to hyphenate the word in case of a line break (I don't know if I could explain it correctly).

So may be my question should be: how to take them out of my source-textes as fast and easy as possible ?

(I am afraid that this resolves only a part of my problem, because this were the strange characters I showed in my first post. I by now discovered some more different strange characters, so I am still hopingt someone as a solution for all kind of them!)

Many thanks in advance!
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Cannot export TM from txt into tmx because of strange characters in TM







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »