Archive for the 'seo' Category

Automatic blog posts translation into many languages with Altavista’s Babelfish engine

I think everybody will agree with me that a blog translated into many different languages may attract a bit more visitors and traffic. The idea of such translation isn’t new, I’ve seen this somewhere over Internet (sorry, but I don’t remember where) and I’m sure it makes sense. In this post I’d like to share my observation regarding popular web page translation engine Babelfish

There are many posts in my blog which aren’t translated yet and it would take much time to make them translated in case I do it manually (go to babelfish’s site, copy/paste post’s contents, choose proper translation direction and etc.). That’s why I prefer automatic page translation which is done for me while I have cup of cofee, have sex or whatever instead of sitting in front of blue screen of the monitor and copying/pasting text into babelfish’s textareas.

All what we need is Linux or Unix distribution (Ubuntu rocks for me) and ‘wget’ utility which usually comes with any Linux, Unix distributions. The idea lies in downloading links to the posts from rss feed of the blog. I’ve chosen blogger service by google and that’s why my rss feed can be foud here. Any other blog engines provides feeds, I’m sure.

So, create shell script somewhere and make it be executable by commands:
echo “#!/bin/bash” > /tmp/
chmod +x /tmp/

My script looks as follows:


\rm /tmp/tr_output -r
wget -O /tmp/rss.xml
cat /tmp/rss.xml | sed “s/></>\n</g” | grep “<link>” | awk -F ‘<link>’ ‘{print$2}’ | awk -F ‘</link>’ ‘{print$1}’ | grep “2007” | while read link;do
mkdir /tmp/tr_output &> /dev/null
save=$(echo $link | awk -F ‘.html’ ‘{print$1}’ | awk -F ‘/’ ‘{print$6}’)
echo “Translating $save…”
mkdir /tmp/tr_output/$save
echo -e “nl\nfr\nde\nel\nit\nja\nko\npt\nru\nes” | while read lang;do
#echo $lang
wget “$lang&url=$link&#8221; -U “Mozilla” –wait=10 -O /tmp/tr_output/$save/$save”.”$lang”.html”
sleep 600
\rm -f /tmp/rss.xml

This one downloads rss.xml file (don’t forget to change URL to rss feed), parses it and sends every post to Babelfish’s input, after translation script saves output to /tmp/tr_output directory, waits 10 minutes and proceeds with nex language. Translation is performed into 10 languages. 10 minutes waiting period is needed as babelfish may accuse your script as a bot and ban you.

After some time you’ll find pretty large amount of data in /tmp/tr_output which you can copy/paste to your blog seamlessly. I recommend not to publish these posts to main page and keep it only for googlebot 🙂

Good luck!

P.S. If anybody know how to perform automatic posts translation with “google translate” instead of babelfish, it would be very appreciated if you leave some comments regarding it. Thanks!