Four CHI ’26 papers I wish I wrote

In the very first paper session at CHI 2026, I found myself thinking, “I wish I wrote this paper!” And then I found myself thinking that a few times more. Since I’m on the hunt for new research questions, it seems worth digging into why I had these reactions. I’ll do so here.

For each paper, I’ll address the following questions:

  • What is the paper about?
  • Why do I wish I wrote it?
  • Could I, in fact, have written it?
  • What next steps does the paper inspire?

Monday, April 13, 11:51 am

Liu, Alicia T. H., Mina Lee, and Xuechunzi Bai. 2026. “Writing with AI Can Reduce Gender Bias in Hiring Evaluations.” Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (New York, NY, USA), CHI ’26, April 13, 1–30. https://doi.org/10.1145/3772318.3791136.

What is the paper about? The authors contribute a large-N, between-subjects experiment in which participants evaluate résumés from “male” and “female” job applicants (“John” and “Jennifer”). Participants compose their evaluations using a writing tool with LLM-based autocomplete suggestions. For “John” these suggestions are neutral with respect to gender stereotypes; for “Jennifer,” these suggestions may be neutral, or they may enforce or counter gender stereotypes. (As an aside: I question whether these AI suggestions are or can be truly neutral.)

The authors report that counter-stereotypical suggestions increased Jennifer’s perceived competence and likelihood of being selected as a trusted leader. Such suggestions also brought Jennifer’s salary offers up to parity with John’s. However, counter-stereotypical suggestions activated gender backlash, with Jennifer seen as less likable than in the other two conditions. John remained the preferred candidate across all experimental conditions.

Participants were largely unaware of the intent to manipulate gender stereotype activation, with open-ended feedback focused mainly on the utility of the autocompletion-based writing support.

Why do I wish I wrote it? It’s a natural post-LLM follow-up to the currently-abandoned project Reading for Gender Bias, to which I contributed in summer 2020. In a nutshell, Reading for Gender Bias provides academic recommendation letter writers with advice for reducing gender bias in their letters. The project predates the current LLM craze, but when I asked for advice in 2022 and 2023 about how to move it forward, it was universally recommended to integrate LLMs as a source of revisions and advice.

I always had a feeling that those reading recommendation letters as part of making hiring decisions might be a more valuable audience than those who were writing them. I often thought about impact beyond academia – and not only because it’s so hard to obtain a corpus of academic recommendation letters. And even before joining Reading for Gender Bias in 2020 – even before joining Whitman in 2015 – I was curious about the potential for persuasive technology focused on language use to affect people’s attitudes and biases. I appreciate how this paper uses suggestions to integrate debiasing into task performance rather than operating at the reflective or metacognitive level.

I’m further attracted to this study in part because its potential applications are ethically murky – the kind of problem that brought me to persuasive technology in 2006 when I was on the cusp of completing my PhD and embarking on a new research program. I find myself wondering if it would be more ethical and more effective to use tools that obscure candidates’ gender, as in the gender-blind orchestra auditions pioneered in the 1970s and 80s. Hmm, I wonder if that experiment has already been done.

Finally, it received a CHI Best Paper Award. With its social implications, I feel like it deserves some visibility beyond CHI.

Could I, in fact, have written it? Probably not. I’ve hesitated to conduct formal experiments as part of my research. I’ve even tried, without success, to seek collaborators with expertise in experimental methods. At the same time, this kind of experiment seems within my reach – particularly with an experimental collaborator and with student support for software development. It’s probably no accident this paper was written by two psychologists in collaboration with a computer scientist.

I still wish I’d thought of it first.

What next steps does the paper inspire? Regarding the gender-blinded résumé experiment, I’ve thought before about developing “gender-neutralizing” text manipulation tools in the context of Degender the Web and in the context of developing training data for Reading for Gender Bias. I might have a potential collaborator in Cambridge behavioral economist Konstantinos Ioannidis (who is the spouse of a Cybercrime Centre PhD student). But first, I should probably find out if such an experiment has already been done.

Another direction would be to return to debiasing academic recommendation letters, à la Reading for Gender Bias. I don’t think that’s useless – a number of colleagues have told me they use Thomas Forth‘s Gender Bias Calculator to get feedback on potential bias in their recommendation letters.

How would I approach that problem differently after reading this paper? First, I’m curious about using LLMs to generate autocomplete suggestions, rather than the hand-coded keyword-search approach taken in that earlier work. Or perhaps suggestions could be integrated with the spellchecker-style feedback approach taken by the current prototype. Second, I think my focus would be on building a system good enough to evaluate experimentally, before building a system good enough to deploy.

How would I approach the recommendation letter problem differently than the candidate evaluation problem? First, I think the tool and study would need to be transparent about the intention to manipulate gender stereotypes. Second, given the goal of the manipulation, it seems like there should be some kind of external assessment of gender bias in the resulting letters. And finally, study participants would need richer scenarios to write from, going beyond the fictional applicant’s résumé.

Tuesday, April 14, 9:36 am

Chanenson, Jake, Tara Matthews, Sunny Consolvo, et al. 2026. “‘It Didn’t Feel Right but I Needed a Job so Desperately’: Understanding People’s Emotions and Help Needs During Scams.” Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (New York, NY, USA), CHI ’26, April 13, 1–22. https://doi.org/10.1145/3772318.3790556.

What is the paper about? The authors examine 405 Reddit posts “seeking help for a range of known and emerging scams.” The authors aim to understand why people engage with scammers, how they feel at different stages of the scam, and what kinds of help they seek. They look for the tactics scammers use to elicit emotional responses along with the contextual factors that increase susceptibility. The end goal is to inform interventions that reduce potential targets’ vulnerability to online financial scams.

Across 12 different scam types (Beals, et al., 2015), the authors found five fundamental emotional motivations for target engagement: fear, hope, trust, guilt, and belonging. Across the stages of the User States Framework (Matthews, et al., 2025), the authors identified four types of help needs: sensemaking, guidance, emotional support, and action. Factors that elevate risk include financial, legal, and employment precarity, as well as neurodiversity and mental health conditions.

Proposed interventions focus on “just-in-time” messaging (Intille, 2004) rather than the preventative education being explored elsewhere (e.g., Deng, et al., 2026, also on my reading list from CHI). This approach is more likely to be effective in high-stakes events, but also more difficult to get right.

Why do I wish I wrote it? As a member of the Cambridge Cybercrime Centre, I felt obliged to come to this session on scams. This paper showed me a potential connection between cybercrime and persuasion, or more specifically, emotional manipulation. A question I have is whether scammers use classic influence strategies à la Robert Cialdini to manipulate their targets.

The work also points the way towards persuasive technologies that might help potential victims steer clear of scams. This is a kind of connection I might have hoped to make myself.

I also found the paper personally relevant. One of the strangest phone calls of my life came on a summer night from a former student who was having a tough time on the job market. As in the title of the paper, they had received a job offer that “didn’t feel right.” They wanted my help figuring out if the job offer was legitimate or a scam – the Diagnostic substate in the expanded User States Framework developed in this paper. Like the scam targets whose help requests are analyzed in this paper, my student was made vulnerable by their hope. They called me seeking help with making sense of the offer and guidance about whether to accept it, as well as emotional support.

Finally, I’ve always admired the work of Sunny Consolvo, who I met at UW in Seattle and had long conversations with at PERSUASIVE 2009 in Claremont, CA (my college stomping grounds). Although she wasn’t the presenter, I see her fingerprints on the paper. I was sitting behind her at the presentation but didn’t get a chance to say hello.

Could I, in fact, have written it? Probably not. First, I don’t have access a large team of dedicated researchers. Second, my interest in cybercrime is far too new.

On the other hand, the data is publicly available. The analysis methods strike me as similar to those used in the Cybercrime Centre to learn about cybercrime methods and motivations – albeit with a focus on criminals rather than their targets.

What next steps does the paper inspire? I’m not sure. For myself, are there intervention tactics I could contribute to developing and evaluating? For the Cybercrime Centre, is there value in a complementary paper that focuses on conversations amongst scammers rather than victims? In particular, could such research contribute to automatic scam detection?

The last author, Amelia Hassoun, is a Junior Research Fellow at Cambridge. I believe the work was presented by the first author, Jake Chanenson, a PhD student at the University of Chicago. Perhaps there’s a mutually beneficial connection to be made with the Cybercrime Centre.

Certainly, I can propose this paper for the Cybercrime Centre’s weekly reading group. I’m curious what the group will think.

Monday, April 14, 11:51 am

Hoefer, Michael J., Raegan Rychecky, and Stephen Voida. 2026. “How Does My Time Use Align With My Values? Personal Informatics for Connecting Abstract Values to Everyday Life.” Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (New York, NY, USA), CHI ’26, April 13, 1–24. https://doi.org/10.1145/3772318.3791113.

What is the paper about? The authors present the design of a time tracking system with support for annotating time entries with connections to personal values, along with a visualization mapping activities to values. The system was deployed in a multi-week in-situ study with 15 participants, who found the the act of annotation led them to reflect on their values and how their daily activities did (or did not) support their values. Some participants made behavioral changes, even though this was not an explicit system goal.

The authors propose a reorientation of (some) behavior change support systems towards supporting life studies, brief but intensive projects of self-tracking and reflection on a periodic or episodic basis, rather than a universal norm of continuous use. I find myself thinking of Marie Kondo’s method of tidying up: it’s a significant project to examine and interrogate everything.

Why do I wish I wrote it? I keep coming back to my roots in Value Sensitive Design. As for many others, the COVID-19 pandemic got me reflecting on my personal values. I started thinking about how technology could support such reflection. This work does that. Although it’s not intended as a behavior change support system (a type of persuasive technology intended to support changing one’s own behavior), the authors nonetheless found that several participants made behavioral changes as a result of their reflections. This connects values to my interest in persuasive technology, and to a uniquely ethical kind of persuasive technology, supporting users in enacting their own authentic values.

Could I, in fact, have written it? If I had kept thinking along these lines, I might have arrived at a similar system and study design, which I probably could have executed with a team of undergraduates. I might not have done as well situating the work in design theory, or developing theory about dimensions of reflection on values and activities.

In 2020 it would not have occurred to me to develop an LLM-based assistant for implementing the Day Reconstruction Method, but I’m not sure how essential that is to the success of the system design.

What next steps does the paper inspire? The authors admit that a limitation of the study is the convenience recruiting of participants, and the resulting homogeneity. The authors wonder about studies in which value sets are predefined rather than constructed by the participants themselves, which would enable longitudinal study of change (as the authors suggest) or cross-cultural studies, e.g., based on Schwartz’s Theory of Basic Human Values (one of my thoughts). It’s also hard not to wonder about adapting the study (and perhaps the design) to groups of participants that have some intentional commonality.

Last author Stephen Voida presented the paper on behalf of first author Michael Hoefer. I’ve met Steve before, and I chatted with him at the end of the session. I understand that Michael may have a study underway in his new position at the University of St. Thomas in which seminary students would use the tool to reflect on their enactment of religious values. I find myself thinking, too, about how values reflection might play a part in pre-marital counseling.

In reading about the participants, I noticed only one was a parent. I think there is some potential for new contributions here. For example, having a first child is a major life transition, and often doesn’t go quite as new parents expect. Could a values “life study” be a worthwhile project for parents-to-be to revisit when their first child is a year or so old? Thinking about my own recent move, could a values “life study” be a worthwhile individual or family project before and after a household move? I find myself wondering how (or if) the available value sets or system design would be adapted to a parent/child context. Of course, participant recruiting could also center on other major life events.

The discussion section considers suggestions from participants, such as ranking values or explicitly supporting goal-directed behavior change. I am curious if Michael Hoefer has pursued further design work or if he would be interested in a collaboration. An obvious next step is to reach out to him.

Friday, April 17, 10:00 am

Seki, Kaoru, Manisha Vijay, and Yasmine Kotturi. 2026. “Participatory, Not Punitive: Student-Driven AI Policy Recommendations in a Design Classroom.” Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (New York, NY, USA), CHI ’26, April 13, 1–29. https://doi.org/10.1145/3772318.3790691.

What is the paper about? In a student-led participatory design workshop series, students developed GenAI use policies for a design course they had recently taken together. They published their policy recommendations in a zine which was distributed on campus (and at CHI) as well as online.

Why do I wish I wrote it? I’ve been avoiding GenAI research because I feel so conflicted about it, but I don’t think I’ll be able to avoid it forever. As an educator, I’m still working out my own thinking on the role of GenAI in work by students and faculty. I love that this work is participatory, generative, student-centered, and focused on education. It was supported at UMBC by an internal Pedagogical Innovation Award.

Could I, in fact, have written it? Perhaps if I had taught HCD 101 again. Perhaps if I had engaged more with institutional conversations around GenAI. Perhaps if I had had more time and space to think expansively about my work and teaching. Perhaps if I had stayed in Walla Walla and planned a project like this for my sabbatical (but then I would have gotten scooped!)

What next steps does the paper inspire? After the presentation, I realized I was sitting right behind the presenters. I chatted with them at the end of the session. I asked if they had thought about publishing their work in an education-oriented venue – perhaps SIGCSE or perhaps a more design-oriented venue. FDHE comes to mind, and I just shared that with them.

I also expressed my interest in developing their work as a practical, replicable process for student-informed GenAI policy development – similar to the CS curriculum design workbook project I’ve been engaged with over the last several years – and also an opportunity for cross-institutional or even cross-cultural comparisons. I learned that the first author, Kaoru Seki, is starting a PhD with Katie Seaborn at Cambridge this fall. I just sent Katie an email.

Concluding thoughts

However long this blog post took you to read, it took me even longer to write. I thought about splitting it up into four separate posts. I’m finding myself grateful that there were not five papers I wish I wrote.

I’m not sure if I have any more insight on what direction I want to take my research. These four papers are very different, and would take me in very different research directions. And along with the ideas written here, there are even more totally different directions I could take based in interactions with Cambridge CST colleagues and collaborations with the SIGCSE Liberal Arts Committee.

Where do I go from here? Possibly proceeding with more of the “next steps” discussed above, to put conversations about potential projects into motion. Possibly some slower and more private reading and journaling to gain further insight into what truly excites me.

I’m reminded of my former advisor Tom Anderson’s advice circa 2001: “The best way to start doing research is to start doing research.” That sounds snarky and tautological, but it was memorable. I think his point was that you need momentum: once you start working on something, better ideas will come. But right now it’s Friday afternoon and I’m tired.

Dear reader, I welcome your experiences and advice!

Leave a comment