理论力学中的人机协作评测：从竞赛到期末考试

程彬; 李俊峰; 邱信明; 李润道; 宋家隆; 周懿

doi:10.6052/1000-0879-26-170

理论力学中的人机协作评测：从竞赛到期末考试

HUMAN-AI COLLABORATION ASSESSMENT IN THEORETICAL MECHANICS: FROM COMPETITIONS TO FINAL EXAMINATIONS

摘要

摘要: 大语言模型正迅速融入高等教育体系，然而其在高风险、高复杂度的理工科专业考核中的量化影响及人机协作机制仍未明确，这为人工智能（artificial intelligence，AI）驱动的教育评价改革留下了缺口。本研究在清华大学理论力学课程中开展了一组互补对照实验：包括针对顶尖本科生、难度超越竞赛水平的“AI挑战赛”，以及针对重修生群体、采用标准难度的“AI协作期末考试试点”。数据显示了显著的协同增益效应：在期末试点中，AI辅助组的平均成绩达到60.2分，不仅显著高于无辅助对照组（41.2分），更远超AI独立作答的基准水平（39.0分），证实了有效的人机协作可突破单一智能体的能力上限。进一步分析表明，策略性的AI使用是决定协作效能的关键因素：通过对交互日志与解题策略的量化分析，发现学生在5个标准化解题环节中对AI依赖度的方差（即“AI使用选择性”）与协作收益呈正相关。高收益学生倾向于仅在部分环节使用AI而在整体上保持认知独立；相反，不加选择地使用AI的低效协作往往会导致较差的效果。本研究结果表明，理工科教育的重心亟需从考核常规算法熟练度，转型为培养学生的独立批判性判断力与对AI输出的监管能力。这项工作为人工智能时代的教育教学提供了基于实证数据的参考。

Abstract: Large language models (LLMs) are rapidly reshaping higher education landscapes. However, their quantitative impact and the underlying human-AI collaborative mechanisms in high-stakes, high-complexity STEM assessments remain poorly understood, leaving a gap in AI-driven educational evaluation reform. This study conducted a series of complementary controlled experiments within the theoretical mechanics course at Tsinghua University, comprising an “AI Challenge" targeting elite undergraduates with competition-level difficulty, and an “AI-Assisted Final Exam Pilot" for students retaking the course under standard difficulty. Data reveals a significant synergistic gain: in the final exam pilot, the average score of the AI-assisted group reached 60.2, substantially outperforming both the control group without assistance (41.2) and the standalone AI baseline (39.0). This confirms that effective human-AI collaboration can transcend the capability boundaries of individual agents. Key analysis identifies strategic AI usage as the determinant of collaborative efficacy. Quantitative examination of interaction logs and problem-solving strategies shows that the variance in students' AI dependence across five standardized stages—defined as “AI Selectivity"—correlates significantly and positively with collaborative gains. High-performing students tended to apply AI selectively while maintaining overall cognitive independence, whereas indiscriminate AI usage often resulted in inefficient collaboration and suboptimal outcomes. Our findings suggest a critical shift in STEM education: from assessing routine algorithmic proficiency toward fostering independent critical judgment and AI supervisory capabilities. This work offers empirical evidence to inform educational paradigms in the era of Artificial Intelligence.

HTML全文

参考文献(5)

施引文献

资源附件(0)