Use python to compute the term-frequency matrix for a set of documents.

timer Asked: Nov 4th, 2015

Question description A term frequency matrix is a table, where rows represent documents and columns represent the terms/words. The value in cell (i,j) is the number of times that word j occurs in document i.

To do this, your python program first needs to go through the files in the input folder, where each file is a separate document (thus, the number of documents in the number of files), and build a set of all unique terms across all the documents.

Let's call this list of terms T, which contains n terms.

Then you'll need to go through each file/document, and compute the number of times that each of the n words occurs in that document.  Doing this, you will produce the term-document matrix.

The program should save this matrix in a file, where each row of the matrix appears on a separate line, and all terms occurrence frequencies are separated by commas.

The folder with the documents, representing movie reviews, is included in the assignment.  

This is NOT a group project.  I will fail automatically any submission which looks like another's.

Tutor Answer

(Top Tutor) Studypool Tutor
School: UT Austin
Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags
Study Guides

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors