TODO:
- link to ka-lite
Khan Academy is great, but it requires you to be online to use and limits you to watching the videos in a browser. The software KA-lite is designed to be deployed in unconnected locations with the video files pre-downloaded. This is nice because, aside from the software itself, all the videos are in .mp4 format. I prefer playing these mp4 files directly so I can increase the playback speed. The downside is, at least in my version, all the video are named like: "_01wqwsb66E.mp4", annoyingly difficult to find what you are looking for.
This short script takes a path to a section on the local KA-lite server and copies/renames all the associated videos into a directory structure mimicking that of the KA-lite local site. The original files are kept intact so that local server could still serve them, but new copies are very easy to playback directly.


This is pretty raw, requiring you to change the PAGE variable inside the script with each execution, and being required to run on ever "lesson" section. These individual runs allow you to extract videos as desired from the hashed named ones without having to keep 2 complete sets of videos around, simply rip what you need on the fly.
ka-sorter.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | #!/usr/bin/env python # extract and rename KA videos # Mark Feineigle 2015/06/28, 2015/12/15, 2015/12/21 #USAGE # set PAGE from urllib import urlopen from re import DOTALL, findall from os import makedirs, walk from shutil import copy #SET PAGE to scan PAGE="/math/trigonometry/polynomial_and_rational/polynomial_tutorial/" pagefolder = PAGE.split("/")[-2] #last folder in page pageroot = "/".join(PAGE.split("/")[:-2]) #the path to folder above the page #Original video path origVidPath = "/home/egg/archive/video/khan_academy/ka-lite-videos/" try: #how many folders are in the destination directory (for numbering) folderNum = len([i for i in walk(origVidPath+"sorted/"+pageroot)][0][1])+1 except: folderNum = 1 if len(str(folderNum)) < 2: #add a leading 0 if needed folderNum = "0"+str(folderNum) try: #make the destination directory (full path) for a new category makedirs(origVidPath+"sorted/"+pageroot+"/"+str(folderNum)+"_"+pagefolder) except Exception, e: print str(e) #destination path destPath = origVidPath+"sorted/"+pageroot+"/"+str(folderNum)+"_"+pagefolder+"/" #Extract video and title html = urlopen("http://127.0.0.1:8008/"+PAGE).read() #open starting PAGE #find video title [list] and #find local video id [list] videotitles = findall('video-available">\n(.*?)</a>', html, DOTALL) videoids = findall('data-video-id="(.*?)"></span>', html) videotitles = [i.strip() for i in videotitles] #cleanup whitespace #copy and print results x = 0 #counter while x*2 < len(videotitles): print videotitles[x*2]+"\t:\t"+videoids[x] #copies, numbers new videos, x+1 b/c x starts at 0 #x*2 b/c videotitles repeat, process every other one if x < 9: # leading 0 in filename, test with 10+ videos copy(origVidPath+videoids[x]+".mp4", destPath+"0"+str(x+1)+"_" +videotitles[x*2]+".mp4") else: copy(origVidPath+videoids[x]+".mp4", destPath+str(x+1)+"_" +videotitles[x*2]+".mp4") x+=1 #increment counter |