A Right to Access Implies A Right to Know: An Open Online Platform for Research on the Readability of Law

Michael Curtotti, Eric McCreath


The widespread availability of legal materials online has opened the law to a new and greatly expanded readership. These new readers need the law to be readable by them when they encounter it. However, the available empirical research supports a conclusion that legislation is difficult to read if not incomprehensible to most citizens. We review approaches that have been used to measure the readability of text including readability metrics, cloze testing and application of machine learning. We report the creation and testing of an open online platform for readability research. This platform is made available to researchers interested in undertaking research on the readability of legal materials. To demonstrate the capabilities ofthe platform, we report its initial application to a corpus of legislation. Linguistic characteristics are extracted using the platform and then used as input features for machine learning using the Weka package. Wide divergences are found between sentences in a corpus of legislation and those in a corpus of graded reading material or in the Brown corpus (a balanced corpus of English written genres). Readability metrics are found to be of little value in classifying sentences by grade reading level (noting that such metrics were not designed to be used with isolated sentences).


readability; legislation; legal informatics; corpus linguistics; machine learning


Eloise Abrahams (2003), Efficacy of plain language drafting in labour legislation. Master's thesis on Human Resource Management), Cape Peninsula University of Technology, South Africa.

Sandra Aluisio, Lucia Specia, Caroline Gasperin, and Carolina Scarton (2010), Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, pp. 1-9.

F.A.R. Bennion (1983), Statute law. Oyez

Robert W. Benson (1984), End of legalese: The game is over. NYU Review of Law & Social Change , Vol. 13, p. 519.

Steven Bird, Edward Loper, and Ewan Klein (2009), Natural Language Processing with Python. O'Reilly Media Inc.

J.R. Bormuth (1967), Cloze readability procedure. University of California Los Angeles.

Kevyn Collins-Thompson and James P Callan (2004), A language modeling approach to predicting reading difficulty. In HLT-NAACL, pp. 193-200.

O. De Clercq, V. Hoste, B. Desmet, P. Van Oosten, M. De Cock, and L. Macken (2013), Using the crowd for readability prediction. Natural Language Engineering, pp. 1-33.

Felice Dell'Orletta, Simonetta Montemagni, and Giulia Venturi (2011), Read-it: Assessing readability of Italian texts with a view to text simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, Association for Computational Linguistics, pp. 73-83.

W.H. DuBay (2004), The principles of readability. Impact Information, pp. 1-76.

Lijun Feng, Martin Jansche, Matt Huenerfauth, and Noémie Elhadad (2010), A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Association for Computational Linguistics, pp. 276-284.

W. N. Francis and H. Kucera (1964), A Standard Corpus of Present-Day Edited American. Revised 1971, Revised and Amplified 1979. Department of Linguistics, Brown University Providence, Rhode Island, USA. Available at: www.hit.uib.no/icame/brown/bcm.htm

GLPi and V. Smolenka (2000), A Report on the Results of Usability Testing Research on Plain Language Draft Sections of the Employment Insurance Act. Available at:


Mark Hall, Eibe Frank, Geofrey Holmes, Bernhard Pfahringer, Peter

Reutemann, and Ian H. Witten (2009). The weka data mining software. ACM SIGKDD Explorations, Vol. 11, No. 1.

J. Harrison and M. McLaren (1999), A plain language study: Do New Zealand consumers get a "fair go" with regard to accessible consumer legislation. Issues in Writing, Vol. 9, pp. 139-184.

Michael Heilman, Kevyn Collins-Thompson, and Maxine Eskenazi (2008), An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, pp. 71-79.

P. Heydari and A.M. Riazi ( 2012). Readability of texts: Human evaluation versus computer index. Mediterranean Journal of Social Sciences, Vol. 3 No. 1, 2012, pp. 177-190.

Miller J. (2005), The development of the legal information institutes around the world. Canandian Law Library Review, Vol. 30, No. 1, p. 8

Simon James and Ian Wallschutzky (1997), Tax law improvement in Australia and the UK: the need for a strategy for simplification. Fiscal Studies, Vol. 18 No. 4, pp. 445-460

Rohit J Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J Mooney, Salim Roukos, and Chris Welty (2010).

Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computa- tional Linguistics, Association for Computational Linguistics, pp. 546-554

J. Kimble (1994), Answering the critics of plain language. The Scribes Journal of Legal Writing, Vol. 5, p. 51.

G.R. Klare (2000), Readable computer documentation. ACM Journal of Computer Documentation (JCD), Vol. 24, No. 3, pp. 148-168

Uta Kohl (2005), Ignorance is no defense, but is inaccessibility? On the accessibility of national laws to foreign online publishers. Information & Communications Technology Law, Vol. 14, No. 1, pp. 25-41

Hugo Liu (2004), Montylingua: An end-to-end natural language processor with common sense. Available at: http://web.media.mit.edu/~hugo/montylingua/

P.W. Martin (2000), The mushrooming virtual law library on the net. In Cornell Law Forum, Vol. 27.

D. Melham (1993), Clearer Commonwealth Law: Report of the Inquiry into Legislative Drafting by the Commonwealth. Technical report, House of Representatives Standing Committee on Legal and Constitutional Affairs.

Jay Milbrandt and Mark Reinhardt (2012), Access Denied: Does Withholding the Law Violate Human Rights? Regent Journal of International Law, Forthcoming. Available at SSRN: http://ssrn.com/abstract=2132672

Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily (2010), Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, Association for Computational Linguistics, pp. 122-130.

PCO NZ (2007), Presentation of New Zealand Statute Law: Issues Paper 2. Technical Report 2, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office.

PCO NZ (2008), Presentation of New Zealand Statute Law. Technical Report 104, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office.

OLR (2003), Inland Revenue Evaluation of the Capital Allowances Act 2001 rewrite, Opinion Leader Research. Technical report, UK Inland Revenue.

OPC-Australia (2003), Plain English. Technical report, Australian Common wealth Office of Parliamentary Counsel.

OPC-UK (2013), When Laws Become Too Complex: A Review into the Causes of Complex Legislation. Technical report, United Kingdom Office of Parliamentary Counsel.

PCO-NZ (2011). A Review of Methods for Measuring the Quality of Legislation. Technical report, New Zealand Parliamentary Counsel's Office.

N. Pettigrew, S. Hall, and D. Craig (2006), The Income Tax (Earnings and Pensions) Act - Post-Implementation Review, Final Report MORI.

Emily Pitler and Ani Nenkova (2008), Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 186-195.

G. Richardson and D. Smith (2002). Readability of Australia's goods and services tax legislation: An empirical investigation, Federal Law Review, Vol. 30, p. 475.

Adrian Sawyer (2010), Enhancing compliance through improved readability: Evidence from New Zealand rewrite experiment. Recent Research on Tax Administration and Compliance.

Sarah E Schwarm and Mari Ostendorf (2005), Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting on Association for Compu- tational Linguistics, Association for Computational Linguistics, pp. 523-530

Luo Si and Jamie Callan (2001), A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, ACM, pp. 574-576.

Johan Sjöholm (2012), Probability as readability: A new machine learning approach to readability assessment for written Swedish. PhD thesis, Linköpings University, Sweden. Available at: http://www.ida.liu.se/projects/webblattlast/Rapporter/lasbarhet.pdf

D. Smith and G. Richardson (1999), The readability of Australia's taxation laws and supplementary materials: an empirical investigation. Fiscal Studies, Vol. 20, No. 3, pp. 321-349.

Edwin Tanner (2002), Seventeen years on: Is Victorian legislation less grammatically complicated. Monash University Law Review, Vol. 28, p. 403.

C van Noortwijk, RV De Mulder, and RW van Kralingen (1995), Word use in legal texts: statistical facts and practical applicability. Legal Know- edge Based Systems: Telecommunication and AI & Law (JURIX95), Lelystad: Koninklijke Vermande, pp. 91-100.

G. Venturi (2008), Parsing legal texts. A contrastive study with a view to Knowledge Management Applications. In Language Resources and Evaluation LREC 2008 Workshop on the Semantic Processing of Legal Texts, p. 1.

G. Wagner (1986), Interpreting cloze scores in the assessment of text readability and reading comprehension.

B. Woods, G. Moscardo, T. Greenwood, et al. (1998), A critical review of readability and comprehensibility tests. Journal of Tourism Studies, Vol. 9, No. 2, pp. 49-61

Full Text: PDF


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.