extract the following information from the imdb web-pages by python

timer Asked: Dec 7th, 2015

Question description

I was wondering could you help me with this, about python:

Write a python program to extract the following information from the imdb web-pages of several movies:

1.  Director

2.  Producer

3.  Genres (comma separated)

4.  Main 5 actors (comma separated).  They're sorted in order of importance (billing) on the web page.  So just grab the first 5

5.  Plot keywords (comma separated).

All of these things should go on one line, per movie, separated by some special character, e.g. '|'. 

I just have these code and I don't know how it works. Can you give me an solution code how to do it of one movie. Thanks a lot.

import re

import urllib2

response = urllib2.urlopen('http://www.imdb.com/title/tt1951266/?ref_=nv_sr_1')

htmlsource = response.read()

p = re.compile('itemprop="director"(.|\n)*?</div>')

dir_part = p.search(htmlsource).group(0)

p1 = re.compile('itemprop="name">([\w\-. \']+?)<')

directors = p1.findall(dir_part)

Tutor Answer

(Top Tutor) Studypool Tutor
School: UCLA
Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors