Feineigle.com - Khan Academy Video Sorter

Home · Projects · 2015 · Khan Academy Video Sorter

Published: September 23, 2015 (8 years 7 months ago.)
Tags:  Python · Software

Khan Academy is great, but it requires you to be online to use and limits you to watching the videos in a browser. The software KA-lite is designed to be deployed in unconnected locations with the video files pre-downloaded. This is nice because, aside from the software itself, all the videos are in .mp4 format. I prefer playing these mp4 files directly so I can increase the playback speed. The downside is, at least in my version, all the video are named like: q01wqwsb66E.mp4, annoyingly difficult to find what you are looking for.

This short script takes a path to a section on the local KA-lite server and copies/renames all the associated videos into a directory structure mimicking that of the KA-lite local site. The original files are kept intact so that local server could still serve them, but new copies are very easy to playback directly.

This is pretty raw, requiring you to change the PAGE variable inside the script with each execution, and being required to run on ever “lesson” section. These individual runs allow you to extract videos as desired from the hashed named ones without having to keep 2 complete sets of videos around, simply rip what you need on the fly.


#!/usr/bin/env python
# extract and rename KA videos
# Mark Feineigle 2015/06/28, 2015/12/15, 2015/12/21
# set PAGE
from urllib import urlopen
from re import DOTALL, findall
from os import makedirs, walk
from shutil import copy
#SET PAGE to scan

pagefolder = PAGE.split("/")[-2] #last folder in page
pageroot = "/".join(PAGE.split("/")[:-2]) #the path to folder above the page
#Original video path
origVidPath = "/home/egg/archive/video/khan_academy/ka-lite-videos/"
try: #how many folders are in the destination directory (for numbering)
  folderNum = len([i for i in walk(origVidPath+"sorted/"+pageroot)][0][1])+1
except: folderNum = 1
if len(str(folderNum)) < 2: #add a leading 0 if needed
  folderNum = "0"+str(folderNum)
try: #make the destination directory (full path) for a new category
except Exception, e: print str(e)
#destination path
destPath = origVidPath+"sorted/"+pageroot+"/"+str(folderNum)+"_"+pagefolder+"/"
#Extract video and title
html = urlopen(""+PAGE).read() #open starting PAGE
#find video title [list]  and  #find local video id [list]
videotitles = findall('video-available">\n(.*?)</a>', html, DOTALL)
videoids = findall('data-video-id="(.*?)"></span>', html)
videotitles = [i.strip() for i in videotitles] #cleanup whitespace
#copy and print results
x = 0 #counter
while x*2 < len(videotitles):
  print videotitles[x*2]+"\t:\t"+videoids[x]
  #copies, numbers new videos, x+1 b/c x starts at 0
  #x*2 b/c videotitles repeat, process every other one
  if x < 9: # leading 0 in filename, test with 10+ videos
    copy(origVidPath+videoids[x]+".mp4", destPath+"0"+str(x+1)+"_"
    copy(origVidPath+videoids[x]+".mp4", destPath+str(x+1)+"_"
  x+=1 #increment counter