Citizen Science for Citizen Access to Law


  • Michael Curtotti Research School of Computer Science, Australian National University
  • Wayne Weibel Legal Information Institute, Cornell University Law School
  • Eric McCreath Research School of Computer Science, Australian National University
  • Nicolas Ceynowa Legal Information Institute, Cornell University Law School
  • Sara Frug Legal Information Institute, Cornell University Law School
  • Tom R Bruce Legal Information Institute, Cornell University Law School


readability, legislation, legal informatics, corpus linguistics, machine learning, natural language processing, readability metrics, cloze testing, crowdsourcing, citizen science


This papers sits at the intersection of citizen access to law, legal informatics and plain language. The paper reports the results of a joint project of the Cornell University Legal Information Institute and the Australian National University which collected thousands of crowdsourced assessments of the readability of law through the Cornell LII site. The aim of the project is to enhance accuracy in the prediction of the readability of legal sentences. The study requested readers on legislative pages of the LII site to rate passages from the United States Code and the Code of Federal Regulations and other texts for readability and other characteristics. The research provides insight into who uses legal rules and how they do so. The study enables conclusions to be drawn as to the current readability of law and spread of readability among legal rules. The research is intended to enable the creation of a dataset of legal rules labelled by human judges as to readability. Such a dataset, in combination with machine learning, will assist in identifying factors in legal language which impede readability and access for citizens. As far as we are aware, this research is the largest ever study of readability and usability of legal language and the first research which has applied crowdsourcing to such an investigation. The research is an example of the possibilities open for enhancing access to law through engagement of end users in the online legal publishing environment for enhancement of legal accessibility and through collaboration between legal publishers and researchers.

Author Biographies

Michael Curtotti, Research School of Computer Science, Australian National University

PhD Researcher

Wayne Weibel, Legal Information Institute, Cornell University Law School

Interface Developer

Eric McCreath, Research School of Computer Science, Australian National University

Senior Lecturer

Nicolas Ceynowa, Legal Information Institute, Cornell University Law School

System Administrator

Sara Frug, Legal Information Institute, Cornell University Law School

Associate Director for Technology

Tom R Bruce, Legal Information Institute, Cornell University Law School



Eloise Abrahams. Efficacy of plain language drafting in labour legislation. Master's thesis, 2003.

Sandra Aluisio, Lucia Specia, Caroline Gasperin, and Carolina Scarton. Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1-9. Association for Computational Linguistics, 2010. Robert W Benson. End of legalese: The game is over. NYU Rev. L. & Soc. Change, 13:519, 1984.

Steven Bird, Edward Loper, and Ewan Klein. Natural Language Processing with Python. O'Reilly Media Inc, 2009.

J.R. Bormuth. Cloze readability procedure. University of California Los Angeles, 1967.

F. Bowers. Victorian reforms in legislative drafting. Tijdschrift voor Rechtsgeschiedenis, 48:329, 1980.

Dennis L Clason and Thomas J Dormody. Analyzing data measured by individual likert-type items. Journal of Agricultural Education, 35:4, 1994.

Kevyn Collins-Thompson. Computational assessment of text readability: A survey of past, present, and future research. working draft, July 2014.

Kevyn Collins-Thompson and James P Callan. A language modeling approach to predicting reading difficulty In HLT-NAACL, pages 193-200, 2004.

Anna B Costello and Jason W Osborne. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment Research & Evaluation, 10(7):2, 2005.

Mick P Couper, Roger Tourangeau, Frederick G Conrad, and Eleanor Singer. Evaluating the effectiveness of visual analog scales a web experiment. Social Science Computer Review, 24(2):227-245, 2006.

Michael Curtotti and Eric McCreath. Enhancing the visualization of law. In Law via the Internet Twentieth Anniversary Conference, Cornell University, 2012.

Michael Curtotti and Eric McCreath. A right to access implies a right to know: An open online platform for research on the readability of law. Journal of Open Access to Law, 1(1), 2013.

O. De Clercq, V. Hoste, B. Desmet, P. Van Oosten, M. De Cock, and L. Macken. Using the crowd for readability prediction. Natural Language Engineering, 2013.

Joost CF de Winter and Dimitra Dodou. Five-point likert items: t-test versus mann-whitney-wilcoxon. Practical Assessment, Research & Evaluation, 15(11):1-12, 2010.

Felice Dell'Orletta, Simonetta Montemagni, and Giulia Venturi. Read-it: Assessing readability of italian texts with a view to text simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pages 73-83. Association for Computational Linguistics, 2011.

W.H. DuBay. The principles of readability. Impact Information, pages 1-76, 2004.

Lijun Feng, Martin Jansche, Matt Huenerfauth, and Noémie Elhadad.

A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276-284. Association for Computational Linguistics, 2010.

Carolina Ferrari, Tomas Garcia & Short. Legibility and readability on the world wide web, 2002.

Frank J Floyd and Keith F Widaman. Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3):286, 1995.

John Fox. The R Commander: A basic statistics graphical user interface to R. Journal of Statistical Software, 14(9):1-42, 2005.

W. N. Francis and H. Kucera. A Standard Corpus of Present-Day Edited American. Revised 1971, Revised and Amplified 1979. Department of Linguistics, Brown University Providence, Rhode Island, USA., 1964.

Edward Fry. A readability formula for short passages. Journal of Reading, 33(8):594-597, May 1990.

Ron Garland. A comparison of three forms of the semantic differential Marketing Bulletin, 1(1):19-24, 1990.

GLPi and V. Smolenka. A Report on the Results of Usability Testing Research on Plain Language Draft Sections of the Employment Insurance Act., 2000.

Elizabeth M Grieco, Yesenia D Acosta, G Patricia de la Cruz, Christine Gambino, Thomas Gryn, Luke J Larsen, Edward N Trevelyan, and Nathan P Walters. The foreign-born population in the united states: 2010. American Community Survey Reports, 19:1-22, 2012.

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software. SIGKDD Explorations, 11(1), 2009.

Richard H Hall and Patrick Hanna. The impact of web page text-background colour combinations on readability, retention, aesthetics and behavioural intention. Behaviour & information technology, 23 (3):183-195, 2004.

Wolfgang Härdle and Léopold Simar. Applied Multivariate Statistical Analysis. Published online, 2003.

J. Harrison and M. McLaren. A plain language study: Do new zealand consumers get a "fair go" with regard to accessible consumer legislation. Issues in Writing, 9:139-184, 1999.

Kauchak, David, Obay Mouradi, Christopher Pentoney, and Gondy Leroy. Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In System Sciences (HICSS), 2014 47th Hawaii International Conference on, pp. 2616-2625. IEEE, 2014.

Richard M Heiberger and Naomi B Robbins. Design of diverging stacked bar charts for likert scales and other applications. Journal of Statistical Software submitted, pages 1-36, 2013.

Michael Heilman, Kevyn Collins-Thompson, and Maxine Eskenazi. An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pages 71-79. Association for Computational Linguistics, 2008.

P. Heydari and A.M. Riazi. Readability of texts: Human evaluation versus computer index. Mediterranean Journal of Social Sciences, 3 (1):177-190, 2012.

Simon James and Ian Wallschutzky. Tax law improvement in Australia and the UK: the need for a strategy for simplification. Fiscal Studies, 18(4):445-460, 1997.

Frances Johnson. Using semantic differentials for an evaluative view of the search engine as an interactive system. In EuroHCIR, pages 7-10, 2012.

Sasikiran Kandula and Qing Zeng-Treitler. Creating a gold standard for the readability measurement of health texts. In AMIA Annual Symposium Proceedings, volume 2008, page 353. American Medical Informatics Association, 2008.

Tapas Kanungo and David Orr. Predicting the readability of short web summaries. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, volume WSDM '09, February 9-12, 2009, Barcelona, Spain., pages 202-211. ACM, 2009.

Rohit J Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J Mooney, Salim Roukos, and Chris Welty. Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 546-554. Association for Computational Linguistics, 2010.

J. Kimble. Answering the critics of plain language. Scribes J. Leg. Writing, 5:51, 1994. Updated 2003

G.R. Klare. Readable computer documentation. ACM Journal of Computer Documentation (JCD), 24(3):148-168, 2000.

D. Melham. Clearer Commonwealth Law: Report of the Inquiry into Legislative Drafting by the Commonwealth. Technical report, House of Representatives Standing Committee on Legal and Constitutional Affairs, 1993.

Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 122-130. Association for Computational Linguistics, 2010.

Geoff Norman. Likert scales, levels of measurement and the "laws" of statistics. Advances in health sciences education, 15(5):625-632, 2010.

PCO NZ. Presentation of New Zealand Statute Law: Issues Paper, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office, 2007.

PCO NZ. Presentation of New Zealand Statute Law. Technical Report 104, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office, 2008.

Law Reform Commission of Victoria. Access to the Law - the structure and format of legislation. Technical Report 33, Law Reform Commission of Victoria, 1990.

OLR. Inland Revenue Evaluation of the Capital Allowances Act 2001 rewrite, Opinion Leader Research. Technical report, UK Inland Revenue, 2003.

OPC-Australia. Plain English. Technical report, Australian Commonwealth Office of Parliamentary Counsel, 2003.

OPC-UK. When Laws Become Too Complex: A Review into the Causes of Complex Legislation. Technical report, United Kingdom Office of Parliamentary Counsel, 2013.

N. Pettigrew, S. Hall, and D. Craig. The Income Tax (Earnings and Pensions) Act - Post-Implementation Review, Final Report MORI, 2006.

Emily Pitler and Ani Nenkova. Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 186-195. Association for Computational Linguistics, 2008.

G. Richardson and D. Smith. Readability of Australia's goods and services tax legislation: An empirical investigation, the. Fed. L. Rev., 30:475, 2002.

Camille L Ryan. Language use in the United States: 2011 American community survey reports. Washington, DC: US Census Bureau, 2013.

Camille L Ryan and Julie Siebens. Educational attainment in the United States: 2009. Washington, DC: US Census Bureau, 2012.

Adrian Sawyer. Enhancing compliance through improved readability: Evidence from New Zealand's rewrite "experiment". Recent Research on Tax Administration and Compliance, 2010.

Sarah E Schwarm and Mari Ostendorf. Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 523-530. Association for Computational Linguistics, 2005.

Luo Si and Jamie Callan. A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, pages 574-576. ACM, 2001.

Johan Sjöholm. Probability as readability: A new machine learning approach to readability assessment for written Swedish. PhD thesis, Linköping, 2012.

D. Smith and G. Richardson. The readability of Australia's taxation laws and supplementary materials: an empirical investigation. Fiscal Studies, 20(3):321-349, 1999.

Edwin Tanner. Seventeen years on: Is victorian legislation less grammatically complicated. Monash UL Rev., 28:403, 2002.

R Core Team et al. R: A language and environment for statistical computing. 2012.

Carol Tullo. Solving the challenge of the 21st century statute book. In Law via the Internet Conference, 2013.

Arnold POS Vermeeren, Efe Lai-Chong Law, Virpi Roto, Marianna Obrist, Jettie Hoonhout, and Kaisa Vänäänen-Vainio-Mattila. User experience evaluation methods: current state and development needs. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries, pages 521-530. ACM, 2010.

G. Wagner. Interpreting cloze scores in the assessment of text readability and reading comprehension, 1986.

B. Woods, G. Moscardo, T. Greenwood, et al. A critical review of readability and comprehensibility tests. Journal of Tourism Studies, 9(2):49-61, 1998.






Data organization and legal informatics