Sentiment analysis of report card comments for high school students

mrcopeland · December 6, 2021, 4:14am

I’m a teacher and a big part of my job this year is analyzing the data we collect during term report cards.

Our teachers have written 100 word comments for each student and I want to use sentiment analysis to determine how positive/negative we are being in our feedback. The comments are currently stored in a CSV file but for obvious reasons I’m unable to share this information.

Problems that I’ve had so far is that a lot of sentiment analysis tools aren’t equipped to understand ‘teacher speak’ - words and phrases like conscientious, hard working, diligent etc.

Any advice on the best place to start to successfully analyze this type of data would be greatly appreciated.

mtf · December 6, 2021, 4:44am

Seems no matter the tool, it will have to be taught, or given sufficient data to self-learn. Somehow your real data would need to be anonymized and given as learning fodder. There’s no escaping that if you want this technology to learn. Sooner or later you will need to team up with universities and industry specialists to make this happen. Only they will have the computing resources to make it happen in a suitable time frame.

Looks like you have some spearheading to do.

mrcopeland · December 6, 2021, 5:01am

My hope was that a library had already been created using this type of data. Almost every commercial example I can find focuses on product reviews/feedback, with the odd academic example centered around tweets or, at best, student feedback of higher ed courses (Examples here and here). I can’t be the first educator to want to analyse teacher feedback. I’ll keep looking for a head start before considering what it would mean to train my own tool!

mtf · December 6, 2021, 5:04am

Which would yield very little. This can only be done on a large scale. Start digging up your old peers, advisors and profs, and rattling their cages. If something you need is truly not there, then someone needs to make it happen. Don’t go it alone.

We don’t need to know my story, but at some level I was criticized for writing above the grade eight level of comprehension. Apparently this is an important marker when writing for public consumption. One was also criticized for being too empathetic which leaned toward subjectivity. Furthermore, this writer was cited for using too many negative words. So there, you now have three criteria on which to judge the teacher comments. Pretty sure we can add some more.

Personally, I see a real danger here in parents reading a report card entirely generated by a computer, including teacher comments. Once we go down that self-learning path into AI, there’s not much left to preserve a teacher’s job. Alan Turing didn’t see that coming.

What would be his views on sentiment? Times change. Computers don’t.

lisalisaj · December 6, 2021, 12:00pm

There’s a machine learning and NLP sections of the Data Science path on CC. That might be a good place to start if you want to analyze & build models yourself. Especially since you want a training model that understands “teacher speak”. Yes, there are NLTK libraries (Python based) that one can use for text preprocessing and Scikit for ML & statistical modeling but one needs to have an understanding of Python before you can build a model (but you already know this).

Off topic perhaps b/c I’m genuinely curious and b/c my Dad was a high school teacher for many years…I guess I wonder why the school or district would want to reduce a thoughtfully written–even if it is only 100 words-- review of a student’s status, progress, behavior, etc into a binary variable: positive/negative? There is definitely room for error when building models. I’d also recommend (if you haven’t already) watching “The Social Dilemma” which is about the perils of AI.

vincentb.e · December 6, 2021, 1:27pm

As @mtf pointed out, it is not simple to build these systems, especially if you want them to work as intended. Machine learning and AI bring a lot of good things to the table, but should be used with caution and implemented only when needed.
Before starting on a project using these technologies, it is worth asking yourself:

What is it you wish to gain from the analysis?
What type of data are you analysing? (Numbers, averages, etc. or actual sentences)
Is the data subjective or objective? (Everything is inherently subjective)
What is the intention of the analysis?

I would encourage you to learn more about machine learning and data analysis, if you haven’t already.
These concepts are incredibly complex and the ethical questions which they raise are much more harrowing.

Hope this helps.

mrcopeland · December 7, 2021, 7:21am

Hey Lisa! (You already helped me a short while ago on the Discord community whilst I was trying to figure out how to best introduce my students to dataframes for a computational science unit I was teaching). I was hoping that there would be a pre-trained tool for generating this kind of data so that I could just focus on the analysis but if there isn’t, what you’ve described is exactly what I’ll likely move ahead with. My understanding of Python and intermediate data science is fine, but I’ve yet to work on an NLP project. I guess this is the perfect chance to start!

I should have been clearer in my first post - this is entirely a personal interest, and entirely academic. This wont change how we write reports at all. What it might do (And this is my wondering) is show some correlation between sentiment and progress score / attainment score / department / age / gender / attendance or any other data we collect. It might produce garbage, or it might lead to a human verifying a pattern and noticing that indeed our PE department are overly positive to boys and negative to girls. Or that students make less progress when their reports are overly positive. Or that students that have low attendance generally have more neutral comments. Of course, I know that qualitative data is king when we’re trying to understand the complexities of a student but with an overwhelming amount of qualitative information it can be helpful to first identify the slices that might be worth closer inspection by quantifying it something. I mean, that’s exactly what we do with assessments.

PS: The Social Dilemma is fantastic.

mrcopeland · December 7, 2021, 7:44am

I think the reading level, ‘empathy score’ and ‘negativity score’ would all be useful metrics. Honestly, I’m amazed that it’s a day later and I still haven’t found something to do this for me. I’m approaching the December holidays now, so I’ll spend a little time seeing what I can do.

Just to clarify though - in no way do I want a machine to write comments. I only want to be able to quantify certain characteristics of the comments so that I can then dig into trends in the way that our teachers write comments.

I really have to address your perceived danger too. A big part of my role is coordinating our tech, which puts me in regular contact with AIED providers we use or that want us to use them (Mostly adaptive learning systems with smart learning analytics dashboards, but sometimes other tools too). I was also part of a foundation wide analysis of practical applications of AI in Hong Kong, which I spoke about at the GESS conference in Dubai last month. Alongside all of that, I’ve seen the impact of remote teaching and learning throughout the pandemic. Through all of this not once, and especially not recently, have I felt any concern for the loss of teachers jobs because of AI.

mrcopeland · December 7, 2021, 7:45am

Hey Vincent - thank you for your reply. I think I’ve addressed most of what you’ve asked in my reply to Lisa. It sounds like my best next step is to play around with NLP models and see what comes of it. I’ll likely be hitting the forums with some far more specific questions then!

lisalisaj · December 7, 2021, 10:58am

Ah, okay. So you want to investigate potential (inherent) teacher bias when writing assessments. (?)
It’s very interesting! (My background is sociology so both qualitative & quantitative data are of interest to me).
It sounds like a good off-platform project for you! Good luck!

vincentb.e · December 7, 2021, 11:33am

Reading back on my comment, I realise that I may have come off a bit condescending - which was not my intention.
Your project sounds very interesting and I’m glad to see that you are taking the value of qualitative data seriously.
I will be looking forward to your future posts!
Good luck!