Cannot export TM from txt into tmx because of strange characters in TM Tópico cartaz: ni-cole
| ni-cole Suíça Local time: 16:11 alemão para francês + ...
I am using Windows 7, Office 2010 and Wordfast Classic 6.01 g I sometimes need to convert a txt-TM into tmx and usually do that with the Speical Filters in Wordfast or using Olifant (export). In the last days, I had a lot of problems. In Wordfast, it seemed to work, but when I wanted to open the tmx in Olifant, an error-message appear and then, only 124 of the about 1000 segments where in the TM. When trying to convert in Olifant, it didn't work and same error-message a... See more I am using Windows 7, Office 2010 and Wordfast Classic 6.01 g I sometimes need to convert a txt-TM into tmx and usually do that with the Speical Filters in Wordfast or using Olifant (export). In the last days, I had a lot of problems. In Wordfast, it seemed to work, but when I wanted to open the tmx in Olifant, an error-message appear and then, only 124 of the about 1000 segments where in the TM. When trying to convert in Olifant, it didn't work and same error-message appeared. Here is the message (Screenshot from Olifant): http://www.screencast.com/t/FW6gVZos6DTD After looking to the txt in Olifant and in Notepad, I finally noticed that they were some strange characters in some segments: in the middle of a word, there is writing US on a black background (see image), visible only in Notepad and not in Olifant. When I try to copy it to look for then with Ctrl + H, they don't appear. They also don't appear when I am copying the whole word (as in the picture). Here the strange characters (Screenshot from Notepad, trying to copy it into the "Find + Replace"-Window): http://www.screencast.com/t/OK1oCCnt I deleted all this strange characters one by one and then it was possible to export the TM in tmx with Olifant. I saw that this characters also appear in other TM and I really don't want to delete then all manually as it takes a lot of time. And I also don't know why they came and if they will reappear later... Does anyone know this problem? Has anyone an idea how to resolve it? Many thanks in advance! ▲ Collapse | | | Use the 'Check for invalid XML characters' option in Olifant | Mar 3, 2012 |
ni-cole wrote: I deleted all this strange characters one by one and then it was possible to export the TM in tmx with Olifant. When importing a TMX in Olifant, you can use the 'Check for invalid XML characters' option: It should take care of your "strange characters" without you having to delete them one by one. | | | I had the same problem some time ago | Mar 3, 2012 |
I noticed there were lots of &'84 and the like (or &' with different numbers) in the TM and much target TUs within quotations marks whereas in the source text there weren't. I spent a lot of time trying to export the TM in tmx and tried also with Olifant. It was a very big TM so it wasn't even conceivable to delete all those characters manually. Finally it worked with Wf's special filter "Mark suspicious TUs" which I then deleted and I guess I have lost quite a few entries in this w... See more I noticed there were lots of &'84 and the like (or &' with different numbers) in the TM and much target TUs within quotations marks whereas in the source text there weren't. I spent a lot of time trying to export the TM in tmx and tried also with Olifant. It was a very big TM so it wasn't even conceivable to delete all those characters manually. Finally it worked with Wf's special filter "Mark suspicious TUs" which I then deleted and I guess I have lost quite a few entries in this way, but I didn't have another choice, unfortunately. I wonder very much, too, what these characters and the quotations marks mean. ▲ Collapse | | | ni-cole Suíça Local time: 16:11 alemão para francês + ... CRIADOR(A) DO TÓPICO 'Check for invalid XML characters' don't help | Mar 4, 2012 |
Thank you for helping me! I tried the option 'Check for invalid XML characters' but the same error message came and again the new tmx-TM had only 110 segments instead of about 15000. It said that: Importing file... Error: Unerwartetes Token '#'. Erwartet wurde das Token ';'. Zeile 1004, Position 11. bei System.Xml.XmlTextReaderImpl.Throw(Exception e) bei System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, ... See more Thank you for helping me! I tried the option 'Check for invalid XML characters' but the same error message came and again the new tmx-TM had only 110 segments instead of about 15000. It said that: Importing file... Error: Unerwartetes Token '#'. Erwartet wurde das Token ';'. Zeile 1004, Position 11. bei System.Xml.XmlTextReaderImpl.Throw(Exception e) bei System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos) bei System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) bei System.Xml.XmlTextReaderImpl.ParseText() bei System.Xml.XmlTextReaderImpl.ParseElementContent() bei System.Xml.XmlReader.WriteNode(XmlTextWriter xtw, Boolean defattr) bei System.Xml.XmlReader.ReadInnerXml() bei Olifant.TMXReader.ProcessSeg() bei Olifant.TMXReader.ReadItem() Error: The last entry read was the entry number 110. Number of entries read = 110 (Added = 110) Duration = 00:00:08.8920156 --- 04-Mrz-2012 16:43:10 ---------- End Task Error count = 2 =================================== End Process Actually, I think I have to find out why this strange characters appear in the txt-TM and how to get then away from the txt-TM. I tried to import the txt-TM into Olifant, but then, this option doesn't appear, so that I couldn't resolve it like this, it need to be a tmx-TM that is imported. I allready cleaned 4 TMs, but there are some more with the same problem. I not only don't have the time to clean everything manually, I am also afraid that the strange characters will went back later and that I will have to do this all the time. If there is no other solution, I may try the one of Christel: better lose some TMs than the 90% of them! But I am still hoping there is a good solution out there! ▲ Collapse | |
|
|
ni-cole Suíça Local time: 16:11 alemão para francês + ... CRIADOR(A) DO TÓPICO I may found the origin of one sort of strange character! | Mar 5, 2012 |
I opened the TM I just was using a today and saw some new strange characters in it. As they were in my actual source-text, I had a look on it. It seems to be a kind of hyphen, the one you can put with Ctrl + hyphen to indicate where to hyphenate the word in case of a line break (I don't know if I could explain it correctly). So may be my question should be: how to take them out of my source-textes as fast and easy as possible ? (I am afraid that this resol... See more I opened the TM I just was using a today and saw some new strange characters in it. As they were in my actual source-text, I had a look on it. It seems to be a kind of hyphen, the one you can put with Ctrl + hyphen to indicate where to hyphenate the word in case of a line break (I don't know if I could explain it correctly). So may be my question should be: how to take them out of my source-textes as fast and easy as possible ? (I am afraid that this resolves only a part of my problem, because this were the strange characters I showed in my first post. I by now discovered some more different strange characters, so I am still hopingt someone as a solution for all kind of them!) Many thanks in advance! ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Cannot export TM from txt into tmx because of strange characters in TM Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |