May 16, 2012

The Game of Thrones problem

And there I was, excited to see the new GOT's (Game of Thrones) episode, preparing the pop corn and all the stuff. As my native language is Portuguese, I managed to download a subtitles file.

But the file came wrong, all the subtitles were late! The actor said "You can't revenge him!", and only after some seconds the subtitle appeared.
I thought: "I could search for another subtitle file on the web, but why if I can just solve it with a small program?"

So there was my challenge: Make the subtitles fit, because I really wanted to watch that episode.

As the blog's spirit is to dive ourselves in programming adventures, I decided to use a language that I'm not that familiar with: Python.

The first thing I did was understand my problem. I needed to process a .srt file, and then modify it to fit my needs (in other words, make the subtitles a little earlier, or, more easily, N seconds earlier).

The "anatomy" of a .srt file is:

id
HH:MM:ss,mmmm --> HH:MM:ss,mmmm
Subtitle goes here. In our case, an exemple could be:
Você nunca se vingará!


where id = Integer value to identify the subtitle; HH = Hours; MM = Minutes; ss = Seconds and mmmm = Milliseconds.

Basically, it shows, for each subtitle, its identification, starting time, ending time and content.

What I needed to do? Create an algorithm that catches all the time lines and modify them to be N seconds earlier.

There's two things to discover: How to make I/O operations using Python and how to handle datetime objects. That wasn't so difficult, I found all the answers at Python's doc page: http://docs.python.org/tutorial/inputoutput.html.

The pseudo-code:
       
void Shift_Subtitles (file, seconds):
     timedelta shift <-- timedelta(seconds)
     content <-- file.read()
     Foreach line in content:
           If line is timeline:
                 get start_time and end_time
                 transform to datetime object
                 new_start <-- start_time - shit
                 new_end <-- end_time - shift
                 replace line with new_start and new_end


     file.write(content)

And that's it! A solution for the subtitle problem using a simple algorithm. If you want to see the Python implementation, here it is:

 import sys  
 import datetime  
 from datetime import timedelta  
 def shiftSubtitles(str_file, sec):  
   file = open(str_file, 'r')  
   lines = file.readlines()  
   file.close()  
   file = open(str_file, 'r')  
   content = file.read()  
   file.close()  
   shift = timedelta(seconds = int(sec))  
   for line in lines:  
     if '-->' in line:  
       # Here we get HH:MM:SS:ss  
       _start = (line.split('-->')[0].rstrip().lstrip()).replace(',', ':').split(':')  
       _end = (line.split('-->')[1].rstrip().lstrip()).replace(',', ':').split(':')  
       start = datetime(1999, 01, 01, int(_start[0]), int(_start[1]), int(_start[2]), 0)  
       end = datetime(1999, 01, 01, int(_end[0]), int(_end[1]), int(_end[2]), 0)  
       new_start = (start - shift).isoformat(' ').split(' ')[1].replace('.', ',')  
       new_end = (end - shift).isoformat(' ').split(' ')[1].replace('.', ',')  
       new_line = new_start + ',000 --> ' + new_end + ',000\n'  
       content = content.replace(line, new_line)  
   file = open(str_file, 'w')  
   file.write(content)  
   file.close()  
 def main(args):  
   shiftSubtitles(args[0], args[1])  
 if __name__ == '__main__':  
   main(sys.argv[1:])  

The way to use it is very simple too:

$ > python shift_sub.py [filename].srt [seconds]

Well, I'm not a Python programmer, so I don't know if this is the best way to do it. If you know how to make this code better, please comment for us!

January 22, 2012

Starting

Several times you see yourself in a situation where you think "There could be a program to assist me on that". So why we don't just try it, in different ways? Develop solutions for our problems in different languages, just to learn, practice and use it!

This blog's goal is to show how the programming world can become an adventure, and a good one!

Subscribe because every thursday there'll be a new post :)

Bye!