Enhancing professional communication training in higher education through artificial intelligence(AI)-integrated exercises: study protocol for a randomised controlled trial | BMC Medical Education
Study setting
The study is conducted within a Bachelor of Science Psychology program at a medium sized university in a medium sized German city. Most data will be collected using an online survey tool, EFS Survey developed by Tivian, and the online learning platform “Stud.IP” which is routinely used at the university.
Study design
The study is a cluster-randomised controlled trial, with psychology students at a university enrolled in communication skill seminars. The rationale to adopt a cluster-randomised controlled trial design, increasingly used for educational research [18], is as follows: In the context of seminar-based communication training, students within a single class are likely to share resources, study materials, and informal peer discussions. Randomising at the individual level could lead to contamination, where participants assigned to different interventions (AI-based vs. conventional) might inadvertently share experiences. By implementing a cluster-level randomisation (i.e., randomising entire classes), we preserve ecological validity – each class follows a uniform teaching approach – and reduce risk of cross-condition interference [19]. This design choice is also practical for scheduling, teaching logistics, and ethical considerations, ensuring that all students in a class receive the same structured training. Based on practical constraints of the field study (e.g., differing semester schedules, teacher availability, or institutional requirements), classes will be assigned using a parallel group design (see Fig. 1).

Study design. Notes: Classes follow a parallel group design (randomised to Clusters Type A vs. B: AI condition vs. TAU only). Cluster Type C represents optional, non-randomised classes that can serve as observational comparators and are not part of main analyses. Abbreviations: AI, artificial intelligence; CSS, communication skills seminar; TAU, taught as usual
In the parallel group design, classes are randomised to the AI condition (Cluster Type A, with Condition 1 only: AI-enhanced exercises alongside teaching-as-usual, TAU, that includes classical exercises) or the control condition (Cluster Type B, Condition 2: TAU only). Additional non-randomised comparison classes (Cluster Type C, Condition 3) will comprise students with TAU only but not be included in the main analyses.
Details regarding the Conditions are as follows:
In Condition 1 (Intervention with AI-enhanced exercises alongside TAU), Participants will use AI-enhanced communication skill exercises provided via the HAWKI platform alongside TAU, in the context of the standard communication skills seminar, which includes not-AI-enhanced communication skills exercises. HAWKI is a data protection-compliant platform that serves as a wrapper for OpenAI’s ChatGPT. HAWKI is web-based and grants access to state-of-the-art language models (currently GPT-4o), made available through university accounts to protect personal log-in data. It thereby allows users to interact with ChatGPT without requiring personal account creation with OpenAI, thereby enhancing user privacy and aligning with General Data Protection Regulation (GDPR) standards. Students receive access to HAWKI and instructions for its use through the online learning platform ‘Stud.IP’. The exercises will involve AI-facilitated interactions designed to simulate real communication scenarios. Students will follow structured system prompts tailored to improve their communication skills. This condition will be implemented for the full duration of the seminar. The content of the intervention is described in more detail below.
During times where Condition 2 (first comparator, TAU only) is active, participants will attend the standard communication skills seminar, which includes not-AI-enhanced communication skills exercises. This condition will be implemented for the full duration of the seminar within the semester lecture period, with participants getting no specific access to AI tools or related prompts outlined in Condition 1.
Participants undergoing Condition 3 (second, non-randomised comparator, TAU only) will attend communication skills seminar with instructions and exercises comparable to those in Condition 2. This Condition will serve as an observational, non-randomised control to compare against condition 1, but not be part of the main analyses. It is added due to procedural reasons in cases where randomisation of a seminar to Conditions 1 or 2 is not feasible. This may occur for example if certain classes already began instruction before randomisation could take place or if they are run by instructors who cannot accommodate the required arrangement. Such classes serve as an additional, observational comparator to explore potential baseline or contextual differences. We will address these differences by adjusting for key covariates (e.g., baseline skill levels) in the analyses.
Description of AI-based communication skills exercise
The intervention component of this trial involves AI-enhanced communication skill exercises specifically designed to foster basic communication abilities using principles from motivational interviewing and other communication frameworks. The exercises utilise a set of system prompts developed to tailor the generative AI to interact with students as intended. Each exercise session begins with the AI mentioning or outlining the motivational interviewing technique, followed by a short vignette or task that students shall respond to by applying the respective or appropriate technique. After students submit their responses, the AI provides feedback based on predetermined criteria that reflect best practices in the respective technique. Students are also encouraged to tailor their interactions with the AI to simulate conversations with specific personas or subjects possessing distinct characteristics or communication needs. This includes the option for language practice beyond German, accommodating a wider range of communication scenarios. Participants are instructed not to provide any personal information within the interactions with the AI.
To facilitate a structured learning experience, students can request example answers from the system. This feature is intended to provide students with model solutions to compare against their responses. Students interact with these prompts through the HAWKI interface, which connects them to advanced language models from OpenAI’s GPT framework. After each set of interactions, students are required to document the exchange by copying the transcript of their conversation with the AI into a designated field on the Stud.IP platform, ensuring that their responses are recorded and assessable.
The curriculum for the intervention includes a series of distinct AI-enhanced communication skills exercises. Students are encouraged to complete each exercise to reinforce their learning and ensure proficiency in the communication skills being trained. During the designated training periods, students may engage in an unlimited number of exercises, allowing for flexible, self-directed learning tailored to their individual needs and schedules. The exercises shall allow repeated practice and receiving iterative feedback from the AI, enhancing the learning process. The students will receive regular notifications to inform them about the next set of exercises that is consecutively made available during the phase at which condition 1 is implemented. This approach is designed to maximise the accessibility and effectiveness of the training, adapting to the diverse learning preferences and requirements of the students.
Currently, HAWKI supports text-based interactions only, and no voice functionality is available within the university’s data-protected wrapper. Therefore, all AI exercises in this study will be text-based for consistency. If future HAWKI updates enable secure audio or multimodal exchanges, we may explore those in a separate follow-up investigation; they will not be part of the main RCT described here.
Pilot study phase
Before initiating the main trial, we conducted a single-semester pilot study phase (during winter-term 2024/2025) to determine the technical feasibility of the AI-supported exercises, finalise the study instruments, and rehearse all organisational procedures. Voluntary participants from two communication-skills seminars were cluster-randomised to a crossover design: one class followed the sequence AI-enhanced training first, teaching-as-usual (TAU) second, the other TAU first, AI-enhanced training second. Each treatment period lasted 3–4 weeks, with assessments at baseline, mid-seminar, and end-of-seminar. Further, the pilot study phase included students from a non-randomised seminar not receiving the AI-supported exercises. Participation was voluntary and all students provided written informed consent under the same eligibility criteria that apply to the main study, despite the semester at which the seminar took place.
During the pilot we stress-tested the HAWKI platform under routine classroom conditions. We actively sampled transcripts, logged response latencies and flagged any instance in which the AI (i) failed to adopt the instructed patient role, (ii) produced technically or ethically inappropriate feedback, or (iii) deviated from the task instructions. Prompts that did not consistently elicit the intended interactions were iteratively revised or discarded. Only prompts that demonstrated stable performance across all monitored sessions were retained, and we froze this final prompt set for the main trial (changes during the main trial are limited to critical bug-fixes).
The pilot data also served psychometric purposes. We piloted the self-developed communication-skills competence test and refined it where required. Because the pilot’s primary function was developmental, none of its participant-level data will enter the analyses of the main study. Where appropriate, aggregated pilot findings will be reported separately to document feasibility outcomes and instrument refinement.
AI implementation and monitoring
AI version. The HAWKI platform employs current state-of-the-art AI models, currently incorporating the GPT-4o model, version gpt-4o-2024–08-06, as provided by OpenAI as of October 2024. We anticipate changing to new versions throughout the study, following respective releases.
Prompt management and procedure for acquiring and selecting the input data for the AI intervention. The initial input data for the AI intervention, preparing the AI for interaction with the trainees, comprise the pretested system prompts designed to guide the AI in facilitating user interactions. These prompts are specifically developed to support exercises aimed at enhancing communication skills, as outlined in the communication skill seminar.
AI performance surveillance. While AI outputs were actively monitored and analysed through targeted sampling during a pilot phase during winter-term 2024/2025 (see above), the finalised prompts remain unchanged during the main study, except for critical fixes (e.g., to address system outages). This ensures consistent AI behaviour and feedback for all participants.
Inclusion and exclusion criteria at input data level. System prompts are included in the exercise protocol only if during the pilot phase, they consistently elicited the intended user interactions.
Quality handling procedures. System prompts that during the pilot phase resulted in interactions of suboptimal quality – for example, the AI failing to assume a patient role or to provide correct task instructions, or where model answers or feedback are inappropriate – were flagged for review or removed from the protocol. This ensures the reliability and efficacy of the training exercises and maintains the quality of outputs from the AI system and shall also improve adherence to the AI training procedure.
Output of the AI Intervention. The output of the AI intervention consists of detailed transcripts of interactions between the students and the AI system. These transcripts capture the entirety of the communicative exchanges, providing a comprehensive record of the dialogue, responses, and the feedback generated by the AI.
Contribution of Output to Clinical Practice and Decision-Making. The output of the AI intervention is primarily educational and does not directly contribute to clinical decision-making. However, the intervention is designed to enhance the communication skills of psychology students, which is crucial in their professional training and future clinical interactions. By improving these skills, the AI intervention indirectly supports students in making informed decisions and effectively communicating in various future professional settings.
User access. Integration of the AI intervention necessitates setting up access to the HAWKI platform, either via devices provided by the university, or from offsite, where participants require a reliable internet connection to access the platform remotely.
Human-AI Interaction and Required User Expertise. Human-AI interaction is facilitated primarily through the setup and oversight of the AI operations. The system is designed to be user-friendly, requiring no specific technical expertise beyond proficiency with the online learning platform Stud.IP, which is routinely used in classes at the university. This approach ensures that all users, irrespective of their technical background, can efficiently engage with the AI without extensive training.
Study outcomes
The primary and secondary outcomes of this study, along with their assessment instruments, time points, and specific variables, are described below and detailed in Table 1.
The primary outcome is the change in communication skills from baseline, measured using a set of questions designed to reflect the communication skills emphasised in the training. This outcome will be assessed at two time points: t0 (baseline, before the start of training phase) and t1 (end of seminar, after training phase). The primary outcome will be analysed as a z-standardised change score to evaluate the improvement from baseline to subsequent time points. We use data collected during a study pilot phase during winter-term 2024/25, to pilot the set of questions and adapt the assessment if necessary.
Secondary outcomes include measures related to the participants of the course, the use of AI and the perceived quality of the course, such as attitudes towards AI, self-concept and self-efficacy, motivation, user experience with the AI tool, as well as students’ evaluations and feedback. These outcomes are assessed at various time points throughout the course using a combination of validated scales and self-developed measures, as detailed in Table 1.
Each student’s interaction with the AI-system is recorded and can be analysed using criteria derived from motivational interviewing best practices (Motivational Interviewing Treatment Integrity, MITI, 4.2) [32] by human raters and AI. These assessments focus on the accuracy and appropriateness of the techniques used.
Data management, confidentiality, and monitoring
Data collection for this study will utilise both online tools and direct assessments. Most data will be collected via the online data collection tool EFS Survey tool (Tivian). Further, content and volume of communication skills exercise will be retrieved from the “Stud.IP” platform. Participants will be asked to provide a personal code that they individually derive and to provide it within EFS Survey and Stud.IP, allowing for later merging of data collected via different platforms. A list linking names and email addresses of participants with the personal codes will be kept separate from all other data in an access-controlled file.
Data confidentiality will be maintained throughout the study. Data from surveys and assessments will be stored digitally using encrypted and password-protected servers. Data will be pseudonymised using unique participant identifiers to ensure that personal information cannot be directly linked to the collected data. Only authorised research team members will have access to the raw data. Data retention will comply with institutional requirements. Any data shared for analysis or publication purposes will be stripped of identifying information to maintain participant privacy. Data validation procedures will include range checks and consistency checks to identify any irregularities.
This study will not have an independent data monitoring committee (DMC) due to its educational research context and relatively low-risk nature. Data oversight will be managed internally by the principal investigator and designated research team members at our institution. In the event of unexpected issues or potential adverse events related to the intervention, a protocol will be in place for prompt reporting and review by the research team.
Participants, recruitment, randomisation procedures, and sample size estimates
Recruitment for this study will target students enrolled in the communication skill seminar (Seminar Gesprächsführung) as part of the Bachelor of Science in Psychology program at a German university.
Inclusion criteria for study participants are as follows: 1) Being enrolled as student in the communication skill seminar (Seminar Gesprächsführung, A3), which is part of the Bachelor of Science in Psychology program; 2) Participating in one of the seminars from summer-term 2025 up to winter-term 2025/2026; 3) At least 18 years of age; 4) Sufficient proficiency in the German language to participate (as the seminar is conducted in German, this is implicitly given, as German language proficiency is required for course participation). There are no additional exclusion criteria for this study.
Recruitment for the main study begins summer-term 2025 up to winter-term 2025/2026. Participants will be informed about the study by the seminar teachers. Students will be encouraged to participate voluntarily, with informed consent required for enrolment. Reminder messages will be sent out to maintain interest and promote participation throughout the study duration.
The timeline for participant enrolment, interventions, and assessments is as follows: Enrolment begins early during each semester/course, where students provide informed consent. We provide the current informed consent forms (for classes randomised to Conditions 1 and 2 and one for classes in Condition 3) as additional file 1. Allocation occurs at the start of the course (t0), followed by the baseline assessments. The end of the course (t1) includes final assessments. Data close-out for each semester occurs after t1 with optional group discussions. Final data close-out will occur following the last semester during which new data are collected, and any follow-up assessments are completed.
With regard to target sample size, our goal is to enrol at least 50 participants per condition to ensure robust statistical power for detecting medium effect sizes, allow stratified analyses, and to enhance the generalisability of our findings. During the two semesters contributing to the main study, we anticipate recruiting participants from at least 10 seminars.
Randomisation for the study will be conducted at the seminar level, with each seminar class being randomised according to the scheme outlined in Fig. 1. The randomisation process will be conducted by an independent party to ensure unbiased allocation, using coin-flip or computer-generated randomisation procedures, to determine to which condition (Condition 1 or Condition 2; Cluster Types A and B, respectively) each class is assigned, while a rule is implemented so that, if a given instructor teaches multiple classes in one semester, the difference in the number of classes assigned to Cluster Type A vs. Cluster Type B does not exceed one. Our main comparison focuses on the randomised arms (Cluster Types A and B). Additional non-randomised comparison classes (Condition 3, Cluster Type C) will be included only if a given seminar cannot be randomised due to scheduling or instructor constraints. These observational data serve as ancillary information but will not be part of the main randomisation-based analysis.
Allocation concealment/blinding
The blinding strategy for this study includes rater blinding to maintain the integrity of the data and minimise bias. Given the nature of the study, it is not feasible to blind participants to the type of intervention. Raters responsible for evaluating communication skills exercises and correctness of free-text responses will be blinded to the condition to which the participants were assigned. In circumstances where unblinding becomes necessary (e.g., unforeseen technical issues or procedural clarifications), a strict protocol will be followed to disclose the intervention assignment in a controlled manner.
Data analyses
Data collection for the main study phase will be used for the main data analyses, which will be conducted once main data collection has terminated.
Descriptive statistics will be used to summarise the baseline characteristics of the study sample, including measures of central tendency (e.g., mean, median) and dispersion (e.g., standard deviation, interquartile range), as appropriate given the distribution of each variable. These summaries will provide an overview of participant demographics and initial conditions across intervention groups.
For the main effect analyses, mixed-effects models will be employed. Between-class comparisons will be made, accounting for clustering at the class level. If indicated, models will include fixed effects for intervention and period, as well as random effects for semester, classes and participants, as appropriate.
Both intent-to-train (ITT) and per-protocol analyses will be conducted to ensure robustness and comprehensive interpretation of the results. The ITT analysis will include all participants as randomised, regardless of adherence to the intervention, while the per-protocol analysis will focus on data related to periods in which the participant completed the exercises as intended.
Appropriate covariates, such as baseline scores and relevant demographic factors, will be included in the models to adjust for potential confounding. Statistical significance will be set at the alpha level 0.05, and results will be reported with 95% confidence intervals to convey the precision of the estimates.
To minimise missing data, we are employing online data collection, which facilitates real-time entry and immediate correction of inconsistencies, enhancing data completeness. Additionally, we are implementing mixed models in our statistical analyses, which effectively handle missing data by using all available data points to estimate model parameters.
Methods for additional analyses
Additional analyses will include the prediction of exercise intensity and the investigation of associations between exercise intensity and outcomes. These analyses aim to explore how variations in engagement with the AI-enhanced exercises relate to the measured communication skill improvements and other secondary outcomes. Mixed models and regression analyses will be used to examine the relationships between exercise intensity (e.g., frequency and duration of AI-based and non-AI-based exercises) and outcome variables.
Additional subgroup or sensitivity analyses may be performed as needed to investigate specific hypotheses or account for potential confounders. These analyses will provide deeper insights into the mechanisms underlying the observed effects and contribute to refining future training implementations.
Dissemination
All data collected will be handled in compliance with data protection regulations to ensure participant confidentiality.
The findings of the study will be disseminated through various channels, including peer-reviewed journal publications and conference presentations. Efforts will be made to share results with educational institutions and psychology training programs to contribute to best practices in communication skill training using AI-enhanced tools. We do not intend to use professional writers for scientific publications.
Participants will be informed that their data will contribute to publications and presentations, but individual identities will not be disclosed in any disseminated material.
Harms
Given the nature of this educational intervention study, no significant physical or psychological risks are anticipated for participants. The potential harms are minimal, primarily related to possible discomfort or stress associated with the exercises or evaluations. To mitigate these risks, participants will be informed that their participation is voluntary and that they may withdraw at any time without any repercussions. Non-participation does not affect potential access to the AI-enhanced skill training; the latter only depends on the class the student is in.
If participants experience significant distress during the intervention or assessments, they will have the option to pause or discontinue their involvement in the study and no further information will be collected. The research team will remain available to address participant concerns and provide support or referrals to appropriate university resources if needed.
All relevant incidents and adverse events will be documented and reviewed by the research team, with a summary provided to the IRB as necessary.
Ancillary and post-trial care
No specific provisions for ancillary or post-study care are required due to the non-clinical nature of the study. Like all students at the participating university, participants have access to university support services if needed.
Consent or assent
Informed consent will be obtained from all participants before enrolment in the study. The consent process will include an explanation of the study’s purpose, procedures, potential risks and benefits, and the rights of participants, including the right to withdraw at any time without consequences. This information will be provided in written form, and participants will have the opportunity to ask questions before signing the consent form.
The consent forms will be distributed and collected either in person during the initial seminar sessions or electronically through the EFS Survey tool (Tivian), ensuring accessibility for all participants. The form includes details on data privacy, indicating how data will be stored, used, and protected.
Auditing
There are no formal external auditing processes planned for this study due to its educational nature and low-risk profile. Internal audits will be conducted periodically by the principal investigator and the research team at the university to ensure compliance with the study protocol, data management procedures, and ethical guidelines. Any relevant deviations from the protocol identified during internal audits will be documented and addressed promptly.
Trial registration
This trial has been registered in the Open Science Framework (OSF) under identifier ‘th6f4’ on 29–11–2024, last update on 11–04–2025 (accessible via The registration includes all relevant details as required by the OSF platform.
Protocol amendments
Any amendments to the study protocol will be documented before implementation. Protocol changes that affect participant safety, study procedures, or the validity of the data will be communicated promptly to all relevant stakeholders, including participants, research staff, and oversight bodies.
Revised versions of the protocol will be assigned a new version number and date, and updates will be disseminated through the university’s internal communication channels and any applicable registries, where the trial is registered. Participants will be informed of relevant changes that may impact their involvement in the study and will be asked to re-consent if necessary.
Documentation of amendments and the reasons for the changes will be kept in the study records to ensure transparency and adherence to best practices in research management.
Protocol version
This version represents the protocol prepared for the randomised controlled trial involving AI-enhanced communication skill training for psychology students; first version dated from 29–11–2024, with last update on 11–04–2025 (second version).
Names, affiliations, and roles of protocol contributors
Gunther Meinlschmidt: Principal Investigator (PI) and key protocol contributor. Affiliation: Trier University, Department of Psychology, Clinical Psychology and Psychotherapy – Methods and Approaches; Michael Schneider: Key protocol contributor. Affiliation: Trier University, Department of Psychology, Educational Psychology.
Sponsor contact information
Sponsor: Prof. Dr. Gunther Meinlschmidt; Contact Information: Trier University, Department of Psychology, Clinical Psychology and Psychotherapy – Methods and Approaches; Email: [email protected]; Phone: + + 49 651 201 1999.
Composition, roles, and responsibilities of committees
There is no steering committee or data monitoring committee overseeing the trial at this stage.
link
