The Client needed 10,000 real-world videos capturing a wide range of dialects and accents in real-world environments to train its next-gen smart video virtual assistant.
Client specifications included complicated scenarios, diverse demographics, PII redaction and exacting QC control requirements.
Qualitest tapped our robust database for 500 participants and replicated in-home environments at Q Studios, Kirkland.
Experienced data collection moderators followed stringent guidelines from Client engineers for capture quality.
10,000 scripted videos collected over 6 months; 100% participant target and 98% demographics target met.
30% cost savings through data/labeling and quality control at Qualitest Madagascar lab.
Our Client, a world-leading developer and manufacturer of smartphones, personal computers, tablets and wearables, is also a pioneer in AR/VR/MR technology and devices.
The company is committed to keeping ahead of the curve with groundbreaking new products that are also inclusive, offering the same excellent quality to every individual in every environment.
The Client was developing a next-gen smart virtual video assistant with more comprehensive and inclusive word recognition capabilities than current models. This assistant would understand words and instructions delivered with an accent or in a dialect, by speakers of various ages and from a wide range of ethnic backgrounds. It would also understand words in real-world contexts, voiced in complex scenarios in different parts of a home.
To train the AI/ML model, the Client required 10,000 scripted videos produced with exacting technical and quality requirements. They hired Qualitest for our known ability to collect a large volume of ground truth data, our robust and diverse participant pool, and our capacity to replicate in-home environments in our Q-Test Lab.
“The Client received 100% of their required capture target of 10,000 videos across 500 participants.”
Our highly experienced data collection moderators and project management team got busy.
Participants: With firm deadlines and a goal of finding 500 suitable participants, we tapped our diverse database for candidates matching the Client’s desired distribution across demographics bins, avoiding overcollection in any group or groups. We implemented a robust confirmation process and overscheduled appointments to ensure there were always backup participants.
· Physical environments: Because of the way sound bounces off surfaces—for instance, a tiled kitchen compared to a thickly carpeted living room—words can sound different in different rooms. They can also have different meanings or implications. The Client wanted the assistant to be trained in a variety of settings, including dining rooms, living rooms, kitchens, studios and bedrooms. We lined up participant homes for some collections and replicated rooms at the Q-Test Lab for others.
· Scenarios: The Client’s scripting and staging requirements for the scenarios were often intricate, and a number involved multiple participants. Privacy concerns added another challenge: There were to be no visible indicators of personally identifiable information (PII). That meant we had to remove all photos of family and friends, diplomas, letters, notes, license plates, etc., from view before collecting in participant homes. Otherwise, we risked costly retakes or time-consuming editing redactions later.
· Quality control: We followed precise rules regarding minimum resolution, video length and the exact words to be spoken. Our experienced data collection moderators were able to follow technical guidelines and instructions provided by Client engineers, thereby improving capture quality and reducing quality control fails.
In all, collecting the required 10,000 videos took about 6 months. Following the Client’s file naming and upload instructions, our experienced team of data annotators labelled and quality-controlled the videos through our Madagascar lab.