理论力学中的人机协作评测:从竞赛到期末考试

HUMAN-AI COLLABORATION ASSESSMENT IN THEORETICAL MECHANICS: FROM COMPETITIONS TO FINAL EXAMINATIONS

  • 摘要: 大语言模型正迅速融入高等教育体系,然而其在高风险、高复杂度的理工科专业考核中的量化影响及人机协作机制仍未明确,这为人工智能(artificial intelligence,AI)驱动的教育评价改革留下了缺口。本研究在清华大学理论力学课程中开展了一组互补对照实验:包括针对顶尖本科生、难度超越竞赛水平的“AI挑战赛”,以及针对重修生群体、采用标准难度的“AI协作期末考试试点”。数据显示了显著的协同增益效应:在期末试点中,AI辅助组的平均成绩达到60.2分,不仅显著高于无辅助对照组(41.2分),更远超AI独立作答的基准水平(39.0分),证实了有效的人机协作可突破单一智能体的能力上限。进一步分析表明,策略性的AI使用是决定协作效能的关键因素:通过对交互日志与解题策略的量化分析,发现学生在5个标准化解题环节中对AI依赖度的方差(即“AI使用选择性”)与协作收益呈正相关。高收益学生倾向于仅在部分环节使用AI而在整体上保持认知独立;相反,不加选择地使用AI的低效协作往往会导致较差的效果。本研究结果表明,理工科教育的重心亟需从考核常规算法熟练度,转型为培养学生的独立批判性判断力与对AI输出的监管能力。这项工作为人工智能时代的教育教学提供了基于实证数据的参考。

     

    Abstract: Large language models (LLMs) are rapidly reshaping higher education landscapes. However, their quantitative impact and the underlying human-AI collaborative mechanisms in high-stakes, high-complexity STEM assessments remain poorly understood, leaving a gap in AI-driven educational evaluation reform. This study conducted a series of complementary controlled experiments within the theoretical mechanics course at Tsinghua University, comprising an “AI Challenge" targeting elite undergraduates with competition-level difficulty, and an “AI-Assisted Final Exam Pilot" for students retaking the course under standard difficulty. Data reveals a significant synergistic gain: in the final exam pilot, the average score of the AI-assisted group reached 60.2, substantially outperforming both the control group without assistance (41.2) and the standalone AI baseline (39.0). This confirms that effective human-AI collaboration can transcend the capability boundaries of individual agents. Key analysis identifies strategic AI usage as the determinant of collaborative efficacy. Quantitative examination of interaction logs and problem-solving strategies shows that the variance in students' AI dependence across five standardized stages—defined as “AI Selectivity"—correlates significantly and positively with collaborative gains. High-performing students tended to apply AI selectively while maintaining overall cognitive independence, whereas indiscriminate AI usage often resulted in inefficient collaboration and suboptimal outcomes. Our findings suggest a critical shift in STEM education: from assessing routine algorithmic proficiency toward fostering independent critical judgment and AI supervisory capabilities. This work offers empirical evidence to inform educational paradigms in the era of Artificial Intelligence.

     

/

返回文章
返回