Based on this idea, in this paper we propose a cluster based approach and presents two new codes for inverted file index compression: mixed gamma/flat binary code and mixed delta/flat binary code.
An inverted file index on a document collection maps each unique term to an inverted list of all the documents containing the term.
Based on this idea, in this paper we propose a cluster based approach and present two new mixed codes for inverted file index compression: mixed k-base gamma/k-flat binary code and mixed k-base delta/k-flat binary code.
We first built inverted file index using gamma code, and then built new indices using mixed codes based on existing index.
Even in a blocked signature file significantly more decompression time will be required than by the corresponding inverted file index.
In an inverted file index records of widely differing lengths pose no special problems, since long problems merely result in index entries in a relatively large number of the inverted lists.
Thus, for any bitslice that can be selected to AND with the list of candidate records in a signature file index, an inverted list that is at least as sparse can be selected in the corresponding inverted file index (and if the signature file bitslice is for a word that has already been processed in the inverted file case, then no action is required).
Hence, the only situation when the number of disk accesses in the inverted file index can exceed the number of disk accesses with a signature file index is when the inverted file vocabulary cannot be held in memory.
In this latter table the number of matches is shown in square brackets against each entry, and so the number of false matches can be inferred by subtracting the number of (true) matches recorded for the inverted file index.
Specifically, online commercial services and searchable database archives on publisher Web sites should continue to maintain the inverted file index terms and tags that identify material barred from full-text delivery by the Tasini decision.
When users search, they only use the inverted file index until they create a set of search results.
In this proposal, the underlying inverted file index upon which the searching process rests would continue to retain all the index terms generated in the past by processing the full-text articles.