Sometimes your customer only has a PDF, and go have to get your hands dirty, no matter what. Can your translation tool (CAT) handle it?įirst thing to try is opening the file straight in your CAT. I like how Jost Zetzsche described it in his 176th ToolKit newsletter. I guess some other CATs are also capable of opening some PDFs.Īs far as I know SDL Studio, Alchemy Publisher, MateCat and Wordfast Pro use a third-party PDF-to-DOC conversion tool that they have integrated as a filter. You have no control on what these filters are doing, but when it is working, you have a no-brainer solution. You need to check though that all text has been found and filtered. If you want to understand how tricky it can be to extract text from PDF, try it for yourself. You will, in many cases, end up with a plain text document where lines are ending with a hard return inside a paragraph, where words at the end of lines are split with a hard hyphen… Messy is the best word to describe that kind of output. Only if the PDF was generated through XSL-FO (so from an XML format) this text extraction goes smoothly. In all other cases it depends on the virtual printer that has been used to generate the PDF. WORDFAST PRO 4 CRACKĮven though I can crack most PDF files, I always find pleasure in extracting text the stupid way: it allows me to see how much problems there are in the document, and I know what I need to check later on. If your CAT can’t open the PDF, you should convert the PDF to DOCX yourself.Ībusing Google Docs just to convert files, also works quite well with PDF. The way to do this, I explained in a previous post. You could also use CloudConvert, a tool I basically use for all file conversions. Since I have MS Word 2013 (and now 2016) I prefer that tool. Some customers forbid me to use online services, and MS-Word is an off-line tool if you don’t use the OneDrive cloud. If the PDF contains text as images, you have to convert that yourself ( see a previous post), but when the PDF contains text, MS Word does a real good job for many languages: it is capable of fixing line endings, something many other tools cannot do If there are a lot of hard hyphens, a search & replace all will solve that issue. If Word does not do a good job, ABBYY Fine Reader and Nuance OmniPage are definitely worth trying as well. They are no longer as expensive as they used to be, and they can support many languages. These language variants/plugins know more fonts and more character sets, and they come with a dictionary helping the OCR process itself. The risk of converting PDF yourself is that you’ll waste a lot of time. So make sure you quote and get paid for this. What you do for free, is often not appreciated as it should be. One of the problems of using CATs when translating OCR’d text and PDF’s converted to Word is the code clutter you may end up with. You definitely need to remove the clutter before you apply your TMs on the job. As long as all the clutter is in, you will only see low fuzzy matches. You can use Translator Tools Document Cleaner or CodeZapper, or you can do it manually using this guideline. When you receive a password protected file, you need to remove the password first. Otherwise none of the tools here above will do a good job. There is an extra reason why removing a password comes in handy: sometimes PDFs have been “protected” so you cannot search for words in them even when they contain plain text. I’m using the VeryPDF PDF Password Remover for this. I like this one because it can remove user passwords (responsible for encrypting and preventing unauthorised opening) and owner passwords (restricting printing, copying, extracting… even when the document is decrypted). I never use free and online tools for this: if a file is user password protected, the owner of that document did not want everybody to have access to it, so sending it to an online service may be bad for your business. When a customer asks me to remove a user password, I always ask for a written and signed instruction to do so, because I find it strange that he does not have the password.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |