Saturday, March 5, 2011

Predicting the fluency of text with shallow structural features


The paper described below is "Predicting the fluency of text with shallow structural features" by Jieun Chae and Ani Nenkova.


Sentence fluency is an important component of overall text readability but few studies in natural language processing have sought to understand the factors that define it. Numerous natural language applications involve the task of producing fluent text. Consideration of sentence fluency are also key in sentence simplification, sentence compression, text regeneration and headline regeneration. Despite of its importance much more attention has been devoted to discourse-level constraints on adjacent sentences, indicative of coherence and good text flow. Perceived sentence fluency is influenced by many factors. The way the sentence fits in the context of surrounding sentences is one obvious factor. Another well-known factor is vocabulary use: the presence of uncommon difficult words are known to pose problems to readers and to render text less readable. But these discourse- and vocabulary level features measure properties at granularities different from the sentence level.Hence several syntactic surface level features were considered.

The Charniak's parser was used to parse the sentence and calculated the several syntactic surface level features which are given below:

1.sentence length-In general one would expect that shorter sentences are easier to read and thus are perceived as more fluent.

2.Parse tree depth-. Generally, longer sentences are syntactically more complex that can slow processing and lead to lower perceived fluency of the sentence.

3.Number of fragment tags in the sentence parse indicating the presence of ungrammaticality in the sentence Fragments occur in headlines (e.g. “Cheney willing to hold bilateral talks if Arafat observes U.S. cease-fire arrangement”.

4.Phrase type proportion was computed for prepositional phrases (PP), noun phrases (NP) and verb phrases (VP). The length in number of words of each phrase type was counted, then divided by sentence length.

Example:. The longer the noun phrases, the less fluent the sentence is.. Long noun phrases take longer to interpret and reduce sentence fluency/readability.

• [The dog] jumped over the fence and fetched the ball.

• [The big dog in the corner] fetched the ball.

Similarly the length of verb phrases signal potential fluency problems.

- Most of the US allies in Europe publicly [object to invading Iraq]VP .

- But this [is dealing against some recent remarks of Japanese financial minister, Masajuro Shiokawa]VP.

VP distance (the average number of words separating two verb phrases) is also negatively correlated with sentence fluency.

Consider the following two sentences:

• In his state of the Union address, Putin also talked about the national development plan for this fiscal year and the domestic and foreign policies.

• Inside the courtyard of the television station, a reception team of 25 people was formed to attend to those who came to make donations in person.

5.Average phrase length is the number of words comprising a given type of phrase, divided by the number of phrases of this type.

6.Phrase type rate was also computed for PPs, VPs and NPs and is equal to the number of phrases of the given type that appeared in the sentence, divided by the sentence length. Phrase length i.e,The number of words in a PP, NP, VP, without any normalization; it is computed only for the largest phrases. Length of NPs/PPs contained in a VP The average number of words that constitute an NP or PP within a verb phrase, divided by the length of the verb phrase


For all experiments they used four of the classifiers(the classifiers usually emphasizes quantitative evaluation i.e. measuring accuracy) in Weka—decision tree (J48), logistic regression, support vector machines (SMO), and multilayer perceptron.

Overall the best classifier was the multi-layer perceptron. On the task using all available data of machine and human translations, the classification accuracy for the task of distinguishing machine and human translations was 86.99% from multilayer perceptron. Hence the surface structural statistics can distinguish very well between fluent and non-fluent sentences when the examples come from human and machine-produced text respectively.

In pairwise comparison of sentences with different fluency, accuracy of predicting which of the two is better is 90% for the multi-layer perceptron classifier.

But the features correlated with fluency levels in machine-produced text(worst and best machine translations using a set of observation data) are not the same as those that distinguish between human and machine translations. Such results raise the need for caution when using assessments for machine produced text to build a general model of fluency.

The discourse aspects(inferences, references, recall of prior knowledge) and language model features(vocabulory) were proved to be much more important then text fluency in predicting the overall text quality.

For future research it will be beneficial to build a dedicated corpus in which human-produced sentences are assessed for fluency.

No comments:

Post a Comment