There are several developments that make processing of raw text
and unstructured data in real-time more compelling and affordable.
For Sinclair (2004), the unannotated corpus or raw text
is the 'pure' corpus.
Is it possible to learn parsing knowledge from raw text
databases, perhaps supplemented with a bit of annotated data?
Penguin these days has to watch its own back: the ultimate cherry-pickers, cheap-reprint lines such as the new firm Wordsworth, can offer raw texts
of really famous books at a knockdown price (such as a pound), because these books have so long been edited that basic data on them, at least, is widely available.
But you need technology that can analyze raw text
for hidden signals and sentiments, handle enormous amounts of data and perform predictive analyses.
SAS point-and-click interface guides users through developing the initial taxonomy and defining taxonomy rules from raw text
inputs, which streamlines text model building for data analysts.
Then, we index both the raw text
and its tags so users can search for positive comments within the category of "customer support.
Each reading was accompanied by careful note-taking in ATLAS/ti that allowed a tandem viewing of the raw text
alongside the researcher's comments.
Database inversion is a potential limiting factor, but a recently developed algorithm can invert an estimated 5Gb of raw text
in 12 hours with only 40Mb of main memory .
Support for more file formats - PDF, Pages, Raw Text
Processing the raw text
data is necessary for Sybase IQ to carry out its analyses.