![]() |
Members | Description | ||
numberDocuments | ||||
numberTerms | ||||
numberFactors | ||||
matrixFile | ||||
termsFile | ||||
matrixReader | ||||
termsReader | ||||
indexedTermsList | ||||
indexedVectorsList | ||||
singularValues | ||||
termHash | ||||
RewriteMatrix | ||||
Populate |
Loads the terms into a list, loads vectors into a list, cross indexes lists
and creates a term hash (word, {vector,scale} )
|
|||
LoadTerms |
Read terms.dat. Each line is triple word\tindex\tweight. Triples are stored
as IndexedTerms in a list
|
|||
LoadMatrix |
Reads matrix.txt. There are 3 regions of vectors, term vectors, document vectors,
and singular values. We don't care about document vectors. Term vectors are
stored as IndexedVectors in a list
|
|||
Skip |
A silly method to skip past the document vectors we don't care about
|
|||
ProcessTerms |
Turn the vectors in the terms region of the file into IndexedVectors and store them
in a list
|
|||
GetNextVector |
A maximum of six floats per line, sometimes less, depending on
the number of dimensions. We're therefore cagey about reading
lines.
|
|||
CreateTermHash |
Cross index the indexed lists of terms and vectors. Combine these two
redundant data structures into a non-redundant term hash
|
|||
CreateLSASpace |
Populate the termHash with terms and vectors, then make a space. Note the added
features for tracking author, creation date, and documents used in the space
|
|||
SerializeLSASpace |
The name of the file is automatically generated from the name and date of the space
|
|||
GetSourceFiles |
Given a directory name, returns a string of all the files in that directory,
with file extensions and spaces between file names. Convenient when calling
CreateLSASpace with lots of source files
|
|||
ImplementCustomILSACalculator |
Unwrap loops/etc and build a custom LSACalculator class for this particular space. Should be
faster than the normal loops. Watch changing the namespaces or classnames of items in this
assembly -- it may break the code generated here.
|