Citizen Science for Citizen Access to Law

Authors

  • Michael Curtotti Research School of Computer Science, Australian National University
  • Wayne Weibel Legal Information Institute, Cornell University Law School
  • Eric McCreath Research School of Computer Science, Australian National University
  • Nicolas Ceynowa Legal Information Institute, Cornell University Law School
  • Sara Frug Legal Information Institute, Cornell University Law School
  • Tom R Bruce Legal Information Institute, Cornell University Law School

Keywords:

readability, legislation, legal informatics, corpus linguistics, machine learning, natural language processing, readability metrics, cloze testing, crowdsourcing, citizen science

Abstract

This papers sits at the intersection of citizen access to law, legal informatics and plain language. The paper reports the results of a joint project of the Cornell University Legal Information Institute and the Australian National University which collected thousands of crowdsourced assessments of the readability of law through the Cornell LII site. The aim of the project is to enhance accuracy in the prediction of the readability of legal sentences. The study requested readers on legislative pages of the LII site to rate passages from the United States Code and the Code of Federal Regulations and other texts for readability and other characteristics. The research provides insight into who uses legal rules and how they do so. The study enables conclusions to be drawn as to the current readability of law and spread of readability among legal rules. The research is intended to enable the creation of a dataset of legal rules labelled by human judges as to readability. Such a dataset, in combination with machine learning, will assist in identifying factors in legal language which impede readability and access for citizens. As far as we are aware, this research is the largest ever study of readability and usability of legal language and the first research which has applied crowdsourcing to such an investigation. The research is an example of the possibilities open for enhancing access to law through engagement of end users in the online legal publishing environment for enhancement of legal accessibility and through collaboration between legal publishers and researchers.

Author Biographies

  • Michael Curtotti, Research School of Computer Science, Australian National University
    PhD Researcher
  • Wayne Weibel, Legal Information Institute, Cornell University Law School
    Interface Developer
  • Eric McCreath, Research School of Computer Science, Australian National University
    Senior Lecturer
  • Nicolas Ceynowa, Legal Information Institute, Cornell University Law School
    System Administrator
  • Sara Frug, Legal Information Institute, Cornell University Law School
    Associate Director for Technology
  • Tom R Bruce, Legal Information Institute, Cornell University Law School
    Director

References

Eloise Abrahams. Efficacy of plain language drafting in labour legislation. Master's thesis, 2003. http://digitalknowledge.cput.ac.za/xmlui/handle/11189/461

Sandra Aluisio, Lucia Specia, Caroline Gasperin, and Carolina Scarton. Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1-9. Association for Computational Linguistics, 2010. Robert W Benson. End of legalese: The game is over. NYU Rev. L. & Soc. Change, 13:519, 1984.

http://www.cs.rochester.edu/~tetreaul/bea5/Aluisio-BEA5.pdf

Steven Bird, Edward Loper, and Ewan Klein. Natural Language Processing with Python. O'Reilly Media Inc, 2009. http://www.nltk.org/book/

J.R. Bormuth. Cloze readability procedure. University of California Los Angeles, 1967.

https://www.cse.ucla.edu/products/reports/R004.pdf

F. Bowers. Victorian reforms in legislative drafting. Tijdschrift voor Rechtsgeschiedenis, 48:329, 1980.

http://heinonline.org/HOL/LandingPage?handle=hein.journals/tijvrec48&div=32&id=&page=

Dennis L Clason and Thomas J Dormody. Analyzing data measured by individual likert-type items. Journal of Agricultural Education, 35:4, 1994.

http://pubs.aged.tamu.edu/jae/pdf/Vol35/35-04-31.pdf

Kevyn Collins-Thompson. Computational assessment of text readability: A survey of past, present, and future research. working draft, July 2014.

http://www-personal.umich.edu/~kevynct/pubs/ITL-readability-invited-article-v6-4review.pdf

Kevyn Collins-Thompson and James P Callan. A language modeling approach to predicting reading difficulty In HLT-NAACL, pages 193-200, 2004.

http://www.cs.cmu.edu/~callan/Papers/hlt04-kct.pdf

Anna B Costello and Jason W Osborne. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment Research & Evaluation, 10(7):2, 2005.

http://pareonline.net/pdf/v10n7.pdf

Mick P Couper, Roger Tourangeau, Frederick G Conrad, and Eleanor Singer. Evaluating the effectiveness of visual analog scales a web experiment. Social Science Computer Review, 24(2):227-245, 2006. http://www.irss.unc.edu/content/pdf/couper%20visual%20analog%20scales.pdf

Michael Curtotti and Eric McCreath. Enhancing the visualization of law. In Law via the Internet Twentieth Anniversary Conference, Cornell University, 2012.

http://blog.law.cornell.edu/lvi2012/presentation/enhancing-the-visualization-of-law/

Michael Curtotti and Eric McCreath. A right to access implies a right to know: An open online platform for research on the readability of law. Journal of Open Access to Law, 1(1), 2013. http://ojs.law.cornell.edu/index.php/joal/article/view/16

O. De Clercq, V. Hoste, B. Desmet, P. Van Oosten, M. De Cock, and L. Macken. Using the crowd for readability prediction. Natural Language Engineering, 2013.

http://www.lt3.ugent.be/en/publications/using-the-crowd-for-readability-prediction/

Joost CF de Winter and Dimitra Dodou. Five-point likert items: t-test versus mann-whitney-wilcoxon. Practical Assessment, Research & Evaluation, 15(11):1-12, 2010.

http://pareonline.net/getvn.asp?v=15&n=11

Felice Dell'Orletta, Simonetta Montemagni, and Giulia Venturi. Read-it: Assessing readability of italian texts with a view to text simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, pages 73-83. Association for Computational Linguistics, 2011.

http://www.aclweb.org/anthology/W11-2308

W.H. DuBay. The principles of readability. Impact Information, pages 1-76, 2004.

http://www.impact-information.com/impactinfo/readability02.pdf

Lijun Feng, Martin Jansche, Matt Huenerfauth, and Noémie Elhadad.

A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276-284. Association for Computational Linguistics, 2010.

http://aclweb.org/anthology/C10-2032

Carolina Ferrari, Tomas Garcia & Short. Legibility and readability on the world wide web, 2002.

http://bigital.com/english/files/2008/04/web_legibility_readability.pdf

Frank J Floyd and Keith F Widaman. Factor analysis in the development and refinement of clinical assessment instruments. Psychological assessment, 7(3):286, 1995.

http://psychology.ucdavis.edu/labs/widaman/mypdfs/wid079.pdf

John Fox. The R Commander: A basic statistics graphical user interface to R. Journal of Statistical Software, 14(9):1-42, 2005. http://www.jstatsoft.org/v14/i09.

W. N. Francis and H. Kucera. A Standard Corpus of Present-Day Edited American. Revised 1971, Revised and Amplified 1979. Department of Linguistics, Brown University Providence, Rhode Island, USA.

www.hit.uib.no/icame/brown/bcm.html, 1964.

Edward Fry. A readability formula for short passages. Journal of Reading, 33(8):594-597, May 1990.

http://www.jstor.org/discover/10.2307/40030514?uid=3737536&uid=2129&uid=2&uid=70&uid=4&sid=21104474036251

Ron Garland. A comparison of three forms of the semantic differential Marketing Bulletin, 1(1):19-24, 1990.

http://marketing-bulletin.massey.ac.nz/V1/MB_V1_A4_Garland.pdf

GLPi and V. Smolenka. A Report on the Results of Usability Testing Research on Plain Language Draft Sections of the Employment Insurance Act.

http://www.davidberman.com/wp-content/uploads/glpi-english.pdf, 2000.

Elizabeth M Grieco, Yesenia D Acosta, G Patricia de la Cruz, Christine Gambino, Thomas Gryn, Luke J Larsen, Edward N Trevelyan, and Nathan P Walters. The foreign-born population in the united states: 2010. American Community Survey Reports, 19:1-22, 2012. http://www.census.gov/prod/2012pubs/acs-19.pdf

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software. SIGKDD Explorations, 11(1), 2009.

Richard H Hall and Patrick Hanna. The impact of web page text-background colour combinations on readability, retention, aesthetics and behavioural intention. Behaviour & information technology, 23 (3):183-195, 2004.

http://lite.mst.edu/media/research/ctel/documents/LITE-2003-04.pdf

Wolfgang Härdle and Léopold Simar. Applied Multivariate Statistical Analysis. Published online, 2003. http://www.stat.wvu.edu/~jharner/courses/stat541/mva.pdf

J. Harrison and M. McLaren. A plain language study: Do new zealand consumers get a "fair go" with regard to accessible consumer legislation. Issues in Writing, 9:139-184, 1999.

http://www.write.co.nz/site/writegroup/files/A%20plain%20language%20study%20-%20Jacqueline%20Harrison.pdf

Kauchak, David, Obay Mouradi, Christopher Pentoney, and Gondy Leroy. Text Simplification Tools: Using Machine Learning to Discover Features that Identify Difficult Text. In System Sciences (HICSS), 2014 47th Hawaii International Conference on, pp. 2616-2625. IEEE, 2014.

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6758930&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6758930

Richard M Heiberger and Naomi B Robbins. Design of diverging stacked bar charts for likert scales and other applications. Journal of Statistical Software submitted, pages 1-36, 2013.

http://www.jstatsoft.org/v57/i05/paper

Michael Heilman, Kevyn Collins-Thompson, and Maxine Eskenazi. An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pages 71-79. Association for Computational Linguistics, 2008. http://www.aclweb.org/anthology/W08-0909

P. Heydari and A.M. Riazi. Readability of texts: Human evaluation versus computer index. Mediterranean Journal of Social Sciences, 3 (1):177-190, 2012.

https://www.academia.edu/1282021/Readability_of_Texts_Human_Evaluation_Versus_Computer_Index

Simon James and Ian Wallschutzky. Tax law improvement in Australia and the UK: the need for a strategy for simplification. Fiscal Studies, 18(4):445-460, 1997.

http://www.ifs.org.uk/fs/articles/fsjames.pdf

Frances Johnson. Using semantic differentials for an evaluative view of the search engine as an interactive system. In EuroHCIR, pages 7-10, 2012. http://ceur-ws.org/Vol-909/paper2.pdf

Sasikiran Kandula and Qing Zeng-Treitler. Creating a gold standard for the readability measurement of health texts. In AMIA Annual Symposium Proceedings, volume 2008, page 353. American Medical Informatics Association, 2008.

http://www.ncbi.nlm.nih.gov/pubmed/18999150

Tapas Kanungo and David Orr. Predicting the readability of short web summaries. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, volume WSDM '09, February 9-12, 2009, Barcelona, Spain., pages 202-211. ACM, 2009.

http://wsdm2009.org/papers/p202-kanungo.pdf

Rohit J Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J Mooney, Salim Roukos, and Chris Welty. Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 546-554. Association for Computational Linguistics, 2010.

http://www.cs.utexas.edu/~ml/papers/kate.coling10.pdf

J. Kimble. Answering the critics of plain language. Scribes J. Leg. Writing, 5:51, 1994. Updated 2003 http://www.plainlanguagenetwork.org/kimble/critics.htm

G.R. Klare. Readable computer documentation. ACM Journal of Computer Documentation (JCD), 24(3):148-168, 2000. http://dl.acm.org/citation.cfm?id=344645

D. Melham. Clearer Commonwealth Law: Report of the Inquiry into Legislative Drafting by the Commonwealth. Technical report, House of Representatives Standing Committee on Legal and Constitutional Affairs, 1993.

http://www.aph.gov.au/parliamentary_business/committees/house_of_representatives_committees?url=reports/1993/1993_pp127.pdf

Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 122-130. Association for Computational Linguistics, 2010.

http://www.robertmunro.com/research/munro10crowdsourcing.pdf

Geoff Norman. Likert scales, levels of measurement and the "laws" of statistics. Advances in health sciences education, 15(5):625-632, 2010.

http://www.ncbi.nlm.nih.gov/pubmed/20146096

PCO NZ. Presentation of New Zealand Statute Law: Issues Paper, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office, 2007. http://www.lawcom.govt.nz/sites/default/files/publications/2007/09/Publication_132_373_IP02.pdf

PCO NZ. Presentation of New Zealand Statute Law. Technical Report 104, New Zealand Law Reform Commission and New Zealand Parliamentary Counsel's Office, 2008.

http://www.lawcom.govt.nz/sites/default/files/publications/2008/12/Publication_132_421_Part_1_R104%20part%201.pdf

Law Reform Commission of Victoria. Access to the Law - the structure and format of legislation. Technical Report 33, Law Reform Commission of Victoria, 1990.

OLR. Inland Revenue Evaluation of the Capital Allowances Act 2001 rewrite, Opinion Leader Research. Technical report, UK Inland Revenue, 2003.

OPC-Australia. Plain English. Technical report, Australian Commonwealth Office of Parliamentary Counsel, 2003.

OPC-UK. When Laws Become Too Complex: A Review into the Causes of Complex Legislation. Technical report, United Kingdom Office of Parliamentary Counsel, 2013.

https://www.gov.uk/government/publications/when-laws-become-too-complex

N. Pettigrew, S. Hall, and D. Craig. The Income Tax (Earnings and Pensions) Act - Post-Implementation Review, Final Report MORI, 2006.

Emily Pitler and Ani Nenkova. Revisiting readability: A unified framework for predicting text quality. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 186-195. Association for Computational Linguistics, 2008.

http://www.cis.upenn.edu/~nenkova/papers/revisitingReadability.pdf

G. Richardson and D. Smith. Readability of Australia's goods and services tax legislation: An empirical investigation, the. Fed. L. Rev., 30:475, 2002.

http://www.austlii.edu.au/au/journals/FedLRev/2002/17.html

Camille L Ryan. Language use in the United States: 2011 American community survey reports. Washington, DC: US Census Bureau, 2013. http://www.census.gov/prod/2013pubs/acs-22.pdf

Camille L Ryan and Julie Siebens. Educational attainment in the United States: 2009. Washington, DC: US Census Bureau, 2012.

http://beta.census.gov/prod/2000pubs/p20-528.pdf

Adrian Sawyer. Enhancing compliance through improved readability: Evidence from New Zealand's rewrite "experiment". Recent Research on Tax Administration and Compliance, 2010.

Sarah E Schwarm and Mari Ostendorf. Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 523-530. Association for Computational Linguistics, 2005. http://dl.acm.org/citation.cfm?id=1219905

Luo Si and Jamie Callan. A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, pages 574-576. ACM, 2001. http://dl.acm.org/citation.cfm?id=502695

Johan Sjöholm. Probability as readability: A new machine learning approach to readability assessment for written Swedish. PhD thesis, Linköping, 2012.

http://www.ida.liu.se/projects/webblattlast/Rapporter/lasbarhet.pdf

D. Smith and G. Richardson. The readability of Australia's taxation laws and supplementary materials: an empirical investigation. Fiscal Studies, 20(3):321-349, 1999.

http://onlinelibrary.wiley.com/doi/10.1111/j.1475-5890.1999.tb00016.x/abstract

Edwin Tanner. Seventeen years on: Is victorian legislation less grammatically complicated. Monash UL Rev., 28:403, 2002.

http://138.25.65.17/au/journals/MonashULawRw/2002/16.pdf

R Core Team et al. R: A language and environment for statistical computing. 2012.

http://web.mit.edu/r_v3.0.1/fullrefman.pdf

Carol Tullo. Solving the challenge of the 21st century statute book. In Law via the Internet Conference, 2013.

Arnold POS Vermeeren, Efe Lai-Chong Law, Virpi Roto, Marianna Obrist, Jettie Hoonhout, and Kaisa Vänäänen-Vainio-Mattila. User experience evaluation methods: current state and development needs. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries, pages 521-530. ACM, 2010.

http://dl.acm.org/citation.cfm?id=1868973

G. Wagner. Interpreting cloze scores in the assessment of text readability and reading comprehension, 1986.

http://www.directions.usp.ac.fj/collect/direct/index/assoc/D769931.dir/doc.pdf

B. Woods, G. Moscardo, T. Greenwood, et al. A critical review of readability and comprehensibility tests. Journal of Tourism Studies, 9(2):49-61, 1998.

http://www-public.jcu.edu.au/learningskills/idc/groups/public/documents/journal_article/jcudev_012662~6.pdf

Downloads

Published

2015-03-23

Issue

Section

Data organization and legal informatics