May 16, 2012

The Game of Thrones problem

And there I was, excited to see the new GOT's (Game of Thrones) episode, preparing the pop corn and all the stuff. As my native language is Portuguese, I managed to download a subtitles file.

But the file came wrong, all the subtitles were late! The actor said "You can't revenge him!", and only after some seconds the subtitle appeared.
I thought: "I could search for another subtitle file on the web, but why if I can just solve it with a small program?"

So there was my challenge: Make the subtitles fit, because I really wanted to watch that episode.

As the blog's spirit is to dive ourselves in programming adventures, I decided to use a language that I'm not that familiar with: Python.

The first thing I did was understand my problem. I needed to process a .srt file, and then modify it to fit my needs (in other words, make the subtitles a little earlier, or, more easily, N seconds earlier).

The "anatomy" of a .srt file is:

id
HH:MM:ss,mmmm --> HH:MM:ss,mmmm
Subtitle goes here. In our case, an exemple could be:
Você nunca se vingará!


where id = Integer value to identify the subtitle; HH = Hours; MM = Minutes; ss = Seconds and mmmm = Milliseconds.

Basically, it shows, for each subtitle, its identification, starting time, ending time and content.

What I needed to do? Create an algorithm that catches all the time lines and modify them to be N seconds earlier.

There's two things to discover: How to make I/O operations using Python and how to handle datetime objects. That wasn't so difficult, I found all the answers at Python's doc page: http://docs.python.org/tutorial/inputoutput.html.

The pseudo-code:
       
void Shift_Subtitles (file, seconds):
     timedelta shift <-- timedelta(seconds)
     content <-- file.read()
     Foreach line in content:
           If line is timeline:
                 get start_time and end_time
                 transform to datetime object
                 new_start <-- start_time - shit
                 new_end <-- end_time - shift
                 replace line with new_start and new_end


     file.write(content)

And that's it! A solution for the subtitle problem using a simple algorithm. If you want to see the Python implementation, here it is:

 import sys  
 import datetime  
 from datetime import timedelta  
 def shiftSubtitles(str_file, sec):  
   file = open(str_file, 'r')  
   lines = file.readlines()  
   file.close()  
   file = open(str_file, 'r')  
   content = file.read()  
   file.close()  
   shift = timedelta(seconds = int(sec))  
   for line in lines:  
     if '-->' in line:  
       # Here we get HH:MM:SS:ss  
       _start = (line.split('-->')[0].rstrip().lstrip()).replace(',', ':').split(':')  
       _end = (line.split('-->')[1].rstrip().lstrip()).replace(',', ':').split(':')  
       start = datetime(1999, 01, 01, int(_start[0]), int(_start[1]), int(_start[2]), 0)  
       end = datetime(1999, 01, 01, int(_end[0]), int(_end[1]), int(_end[2]), 0)  
       new_start = (start - shift).isoformat(' ').split(' ')[1].replace('.', ',')  
       new_end = (end - shift).isoformat(' ').split(' ')[1].replace('.', ',')  
       new_line = new_start + ',000 --> ' + new_end + ',000\n'  
       content = content.replace(line, new_line)  
   file = open(str_file, 'w')  
   file.write(content)  
   file.close()  
 def main(args):  
   shiftSubtitles(args[0], args[1])  
 if __name__ == '__main__':  
   main(sys.argv[1:])  

The way to use it is very simple too:

$ > python shift_sub.py [filename].srt [seconds]

Well, I'm not a Python programmer, so I don't know if this is the best way to do it. If you know how to make this code better, please comment for us!