In a 2019 article about learning analytics in The Hechinger Report, Kyle Jones pointed out that universities use Big Data technologies for good reasons: They want to “develop new retention models and figure out where they could potentially improve services to increase retention.” Though the intentions are good, Jones cautions that they came “at the expense of creating a pretty significant surveillance system.”
As a former librarian and now an LIS educator focused on information ethics and policy, Kyle has published widely on the ethical dimensions of using student data in higher education. Most recently, he and research collaborators are conducting a federally-funded, three-year research project at eight institutions to explore how learning analytics (LA) initiatives are understood by students and how libraries might develop and use LA in ways that respect student privacy.
We caught up with Kyle in October 2019 to find out more about his work and why he thinks it is important to balance the value of privacy against the benefits of predictive analytics. (Interview posted: October 14, 2019)
PIL: What are learning analytics, and why should we care, especially now?
Kyle: Learning analytics involve the application of data mining and analytic practices in educational contexts to directly and indirectly improve learning outcomes and the educational experience. I focus on higher education in my research, but learning analytics have been and continue to be applied at primary and secondary levels as well. Some people distinguish between learning analytics and other data-informed practices, arguing that not all analytics focus on learning. For instance, developing retention models may not be considered a form of learning analytics. I argue differently. Retention analytics may help institutions develop new policies, inform pedagogical changes, and create social services for students—all of which contribute to a student’s ability to learn and achieve educational goals.
Why we should care about learning analytics is an important question. First, it’s important to note that learning analytics focuses on the quantification of an experience that is hard to quantify: learning. Regardless of the empirical research stating that learning analytics are only somewhat effective, there is significant political motivation to quantify learning.
That leads us to a second reason to care, which is that learning analytics are an expression of power. Those who advocate for learning analytics have an educational policy agenda in mind. What they choose to quantify and analyze in part signals what is important to them. But what is important or valuable for those who have the power to pursue analytics may not be the same for those who become the subjects and targets of learning analytics.
Finally, we should care because learning analytics can be highly invasive. Where students are concerned, higher education institutions capture granular details about a student’s life, interests, and behaviors because they often live their life on a college campus for four years or more. An institution’s systems, systems on which students rely for information and services, capture a great deal of revealing data.
PIL: Some argue the whole purpose of learning analytics is to help students succeed by anticipating problems and steering students who may be in trouble toward resources for support. Why does privacy matter for students, and are there ways for libraries and universities to use student data without invading their privacy?
Kyle: There’s nothing inherently wrong in wanting to provide students the right resources at the right time. In fact, given all students are up against when pursuing their degree—the financial burdens, the mental stress, the social conflicts, a highly bureaucratic institution— we should applaud learning analytics advocates for their efforts. That said, the problem is that many of these analytic systems use a combination of limited predictive models, potentially biased algorithms, and paternalistic nudging strategies to turn student behaviors toward outcomes the institution believes worthwhile; it doesn’t follow that students share their institution’s views and goals.
Among other things, privacy is an instrumental value, which is to say that it helps us accomplish tasks and goals in our lives. It gives us protection against influence and the ability to pursue intellectual interests unencumbered. But when institutions have access to such fine-grained data and use those data to effectively direct student lives, then it becomes worthy of critique.
This is especially so because the power dynamic between students and their institution is far from equal. Students are in a disadvantaged social system where they often feel their futures are at the whim of their professors, and only their advisors really know how to navigate Byzantine curricula and resource systems. All of this is exacerbated by the fact that institutions do not practice informed consent when it comes to their data practices. The research literature and the media often highlight how poorly informed consent is done with social media and in apps, but at least it is attempted.
Are there ways for universities and their libraries to ethically access and analyze student data without invading student privacy? Yes, but my colleagues in higher education need to first change their attitude toward student privacy. I hear from students, conversations with colleagues, and through secondhand accounts the common refrain that “students don’t care about their privacy.” Unsurprisingly, this tends to originate with those who have a stake in the success of learning analytics and for whom student privacy is perceived as a barrier to their work.
This view is wrong. Students do care about their privacy, and my empirical research and the empiricalresearch of several others says as much. Part of the problem with learning analytics is that institutions assume a paternalistic role where they make privacy decisions on behalf of their students. Any ethical learning analytics project must first begin by allowing students to speak for themselves. Additionally, every institution pursuing learning analytics must be transparent about their practices and make a legitimate attempt to educate its students about the aims of the analytic projects, the potential benefits, and the possible short and long term harms.
PIL: In phase one of your current project, your team has interviewed over 100 students about their awareness of the ways universities use their data and how they feel about privacy. While you are still analyzing that data, what would you say are some of the most striking findings so far? What are your next steps?
Kyle: Some findings were fairly unsurprising, while others were completely unexpected.
With regard to the former, it was clear that students had a difficult time expressing how the institution accessed and used data and information about them; their views were either inaccurate or rudimentary. However, the interview proved educational for many students, which allowed them to express privacy-related questions and begin to express their perspectives and expectations.
We found their privacy questions to reflect advanced concerns around information flow characteristics that many policy makers ask. The educational dialogue between students and researchers demonstrated that the former are willing to engage in privacy conversations and have particularized views. Moreover, students are not anti-data or anti-analytics; they see value in using data for particular purposes, but they argue that these things must be done with some privacy protections in place.
The unexpected findings concern students’ privacy brightlines and whom they trust. It was abundantly clear upon analyzing the data that students will have no patience with institutions who share data for non-educational purposes. Furthermore, any selling of student data is completely off limits. In part, this was surprising to the research team because there are no widely known instances of direct sales of student data. (That said, there are some lesser known and murkier instances involving Piazza and Turnitin.)
Finally, there were distinct levels of trust expressed by students. They typically don’t trust social media and eCommerce sites (e.g., Amazon) to use their data in ways that respect their privacy preferences. But when questions of trust turn to their institution, their perspective changed. They see colleges and universities as moral institutions who will use data to serve their interests and with care. They expressed nearly absolute trust in their institution’s library, fully expected it to be aggregating and analyzing data (including database searches and materials use data), and had little to no concerns about library data practices. Maintaining this trust will be key for libraries who want to pursue learning analytics.
Completing the interviews reflects the end of the first of three phases in this project, and our paper for this phase is currently under review at a journal. Next, we’ll conduct a multi-institution survey of undergraduate students. This survey will allow us to analyze demographic data in relation to the data associated with privacy questions, which we were unable to do with interviews because we did not seek student demographics.
Additionally, the survey builds on some of the findings from the interviews. In fall of 2020, we will start the final phase, which involves scenario-based focus groups. We will make most of our research instruments and related documentation available via a toolkit so others can attempt to replicate or build off of our research. I’m grateful to my collaborators for their work on this project, and we are collectively thankful to the Institute of Museum and Library Services (IMLS) for supporting our efforts.
PIL: In previous work, you’ve surveyed research libraries about their LA practices and you’ve explored how academic advisors feel about the use of analytics for advising students. What are some takeaways from your earlier research?
Kyle: The ARL SPEC Kit surveyed the Association of Research Libraries (ARL) membership in 2018 to investigate if these libraries were participating in learning analytics and to explore related practices, policies, and ethical issues. Fifty-three of the 125 members responded to our call for participation. Of those respondents, 83% indicated they were participating in learning analytics projects; of these who indicated participation, 75% allocated staff to work on learning analytics. We interpreted these numbers as healthy participation in and dedication to learning analytics.
Respondents reported a wide variety of data protection practices, but we noted how few addressed securing data in transit, data deletion practices, and limited data retention strategies. Perhaps more concerning was the fact that some respondents qualitatively described practices they believed would anonymize data sufficiently but if tested would prove inadequate.
Where privacy practices were concerned, there was very little consistency. Policies were updated infrequently or only as required. Only two ARL members reported having a data management plan for their learning analytics projects. Eighteen libraries informed students of their learning analytics work, but only 7 of these libraries allowed for some form of opt-out or non-inclusion in the analytics.
Unsurprisingly, many libraries did not pursue ethical consultation with their institutional review boards (IRBs) because they considered their learning analytics projects to be for internal purposes only (i.e., not to be disseminated in presentations or publications) and, therefore, they did not require review.
More can be said about the findings, but this brief summary reflects, I think, the fact that there are a variety of practices, policies, and ethical interpretations around learning analytics. To me, that signals a lack of awareness regarding the moral sensitivity of data analytic practices among some respondents. If privacy-related practices were protective, if policies were more up-to-date, and if ethical checks more strategic and purposeful, my view would change. Instead, I would think that these libraries were carefully considering the harms their analytic practices may create and are pursuing systematic efforts to fully protect their students. Unfortunately, I don’t see that reflected in the data.
My interview-based research with academic advisors was illuminating. All of my participants were forced by their administrators to use a vendor-built predictive advising system. This system used historical student data from the institution combined with a student’s profile to predict in which courses and programs of study a student was predicted to find success; it also assigned risk scores identifying if a student was predicted to fail to graduate. Every single participant refused to use the predictive scores for advising, arguing that the technology was burdensome, inefficient, and did not reflect their professional values. When the advisors’ administrator discovered that his employees weren’t using the system by analyzing the usage logs, he scolded them. As a result, the advisors started to log into this costly tool simply to mask their non-use.
This project was less focused on the privacy implications of analytics in higher education and more geared toward how analytic goals and technologies shape and direct professional work. It’s important to not lose sight of the fact that analytics can be redirected quickly to focus on and contour the labor and values of faculty, advisors, librarians, and others much in the same way that they attempt to direct student behaviors.
PIL: Broadening our focus, why do you say we all have “data doubles,” and what does this mean for society more broadly? How should higher education help students understand this issue?
Kyle: The concept of the “data double” was developed Kevin Haggerty and Richard Ericson (2000) in “The surveillant assemblage.” They rightly theorized very early on that the interconnection of data systems enabled a wide variety of businesses, institutions, and other organizations to aggregate and analyze wide swaths of highly descriptive data about individuals.
Effectively, this combination of data can create individual data-based profiles that reflect our human bodies and lives in a datafied form. We’re past the tipping point of being able to protect against the creation of a data double, and in some ways we benefit from them. But, these doubles are consequential for our lives because with analytics they can predetermine our opportunities, financial, social, or otherwise, and put barriers in our way. We need to make sure these doubles are accurate and that those who use them for their analytic projects do so in ways that are just and respect our human rights.
It’s incumbent on higher education that its institutions follow through on their educational responsibilities to prepare students for more than just a job. They have a duty to provide students with experiences and intellectual tools to navigate a highly complex and morally suspect world, and that includes participating in and contributing to a society driven by and reliant on data.
Each institution should be questioning whether or not they have in place a core, required course that focuses on digital citizenship and addresses these large questions. Further, colleges and universities need to investigate the respective curricula of their computer science, data science, statistics, and engineering and technology programs to determine if they are taking ethics seriously, because it is often the case that these programs cast ethics aside or treat it as something that only humanists do. If anything, these programs have a higher responsibility to engage ethical questions because the technological products their students create play such an important role in all our lives.
After earning an undergraduate degree in English literature and certification in secondary education from Elmhurst College (Elmhurst, IL) and a master’s degree in Library and Information Science from Dominican University (River Forest, IL), Kyle M.L. Jones worked as an information technology specialist for the A.C. Buehler Library at Elmhurst College. Afterward, he worked as a learning services librarian at another award-winning library, the Darien (CT) Library. He earned his PhD from the iSchool at UW-Madison in 2015.
Currently, Kyle is an assistant professor in the Department of Library and Information Science at Indiana University-Indianapolis’ (IUPUI) School of Informatics and Computing. He is also a faculty associate of IUPUI’s interdisciplinary Data Lab. Kyle’s research focuses on information policy and ethics issues related to big data practices and technologies in educational contexts. His website is The Corkboard and his Twitter handle is @thecorkboard.
Smart Talks are informal conversations with leading thinkers about new media, information-seeking behavior, and the use of technology for teaching and learning in the digital age. The interviews are an occasional series produced by Project Information Literacy (PIL). This interview with Kyle Jones was made possible with the generous support from the Knight Foundation. PIL is an ongoing and national research study about how students find, use, and create information for academic courses and solving information problems in their everyday lives and as lifelong learners. Smart Talk interviews are open access and licensed by Creative Commons.
Suggested citation format: “Kyle M.L. Jones: The Datafied Student and the Ethics of Learning Analytics” (email interview) by Barbara Fister, Project Information Literacy, Smart Talk Interview, no. 32 (14 October 2019). This interview is licensed under a Creative Commons Attribution-Non-commercial 3.0 Unported License.