Amazon Lex is happy to announce Check Workbench, a brand new bot testing resolution that gives instruments to simplify and automate the bot testing course of. Throughout bot improvement, testing is the part the place builders test whether or not a bot meets the particular necessities, wants and expectations by figuring out errors, defects, or bugs within the system earlier than scaling. Testing helps validate bot efficiency on a number of fronts equivalent to conversational movement (understanding person queries and responding precisely), intent overlap dealing with, and consistency throughout modalities. Nevertheless, testing is commonly guide, error-prone, and non-standardized. Check Workbench standardizes automated take a look at administration by permitting chatbot improvement groups to generate, preserve, and execute take a look at units with a constant methodology and keep away from customized scripting and ad-hoc integrations. On this publish, you’ll find out how Check Workbench streamlines automated testing of a bot’s voice and textual content modalities and offers accuracy and efficiency measures for parameters equivalent to audio transcription, intent recognition, and slot decision for each single utterance inputs and multi-turn conversations. This lets you rapidly determine bot enchancment areas and preserve a constant baseline to measure accuracy over time and observe any accuracy regression as a consequence of bot updates.
Amazon Lex is a completely managed service for constructing conversational voice and textual content interfaces. Amazon Lex helps you construct and deploy chatbots and digital assistants on web sites, contact heart companies, and messaging channels. Amazon Lex bots assist enhance interactive voice response (IVR) productiveness, automate easy duties, and drive operational efficiencies throughout the group. Check Workbench for Amazon Lex standardizes and simplifies the bot testing lifecycle, which is vital to bettering bot design.
Options of Check Workbench
Check Workbench for Amazon Lex consists of the next options:
- Generate take a look at datasets mechanically from a bot’s dialog logs
- Add manually constructed take a look at set baselines
- Carry out end-to-end testing of single enter or multi-turn conversations
- Check each audio and textual content modalities of a bot
- Assessment aggregated and drill-down metrics for bot dimensions:
- Speech transcription
- Intent recognition
- Slot decision (together with multi-valued slots or composite slots)
- Context tags
- Session attributes
- Request attributes
- Runtime hints
- Time delay in seconds
To check this function, you need to have the next:
As well as, you need to have data and understanding of the next companies and options:
Create a take a look at set
To create your take a look at set, full the next steps:
- On the Amazon Lex console, underneath Check workbench within the navigation pane, select Check units.
You may assessment a listing of current take a look at units, together with primary data equivalent to identify, description, variety of take a look at inputs, modality, and standing. Within the following steps, you’ll be able to select between producing a take a look at set from the dialog logs related to the bot or importing an current manually constructed take a look at set in a CSV file format.
- Select Create take a look at set.
- Producing take a look at units from dialog logs permits you to do the next:
- Embody actual multi-turn conversations from the bot’s logs in CloudWatch
- Embody audio logs and conduct checks that account for actual speech nuances, background noises, and accents
- Pace up the creation of take a look at units
- Importing a manually constructed take a look at set permits you to do the next:
- Check new bots for which there isn’t any manufacturing knowledge
- Carry out regression checks on current bots for any new or modified intents, slots, and dialog flows
- Check fastidiously crafted and detailed eventualities that specify session attributes and request attributes
To generate a take a look at set, full the next steps. To add a manually constructed take a look at set, skip to step 7.
- Select Generate a baseline take a look at set.
- Select your choices for Bot identify, Bot alias, and Language.
- For Time vary, set a time vary for the logs.
- For Current IAM function, select a job.
Be certain that the IAM function is ready to grant you entry to retrieve data from the dialog logs. Refer to Creating IAM roles to create an IAM function with the suitable coverage.
- In the event you favor to make use of a manually created take a look at set, choose Add a file to this take a look at set.
- For Add a file to this take a look at set, select from the next choices:
- Choose Add from S3 bucket to add a CSV file from an Amazon Simple Storage Service (Amazon S3) bucket.
- Choose Add a file to this take a look at set to add a CSV file out of your laptop.
You should utilize the sample test set offered on this publish. For extra details about templates, select the CSV Template hyperlink on the web page.
- For Modality, choose the modality of your take a look at set, both Textual content or Audio.
Check Workbench offers testing help for audio and textual content enter codecs.
- For S3 location, enter the S3 bucket location the place the outcomes might be saved.
- Optionally, select an AWS Key Management Service (AWS KMS) key to encrypt output transcripts.
- Select Create.
Your newly created take a look at set might be listed on the Check units web page with one of many following statuses:
- Prepared for annotation – For take a look at units generated from Amazon Lex bot dialog logs, the annotation step serves as a guide gating mechanism to make sure high quality take a look at inputs. By annotating values for anticipated intents and anticipated slots for every take a look at line merchandise, you point out the “floor reality” for that line. The take a look at outcomes from the bot run are collected and in contrast in opposition to the bottom reality to mark take a look at outcomes as cross or fail. This line stage comparability then permits for creating aggregated measures.
- Prepared for testing – This means that the take a look at set is able to be executed in opposition to an Amazon Lex bot.
- Validation error – Uploaded take a look at recordsdata are checked for errors equivalent to exceeding most supported size, invalid characters in intent names, or invalid Amazon S3 hyperlinks containing audio recordsdata. If the take a look at set is within the Validation error state, obtain the file exhibiting the validation particulars to see take a look at enter points or errors on a line-by-line foundation. As soon as they’re addressed, you’ll be able to manually add the corrected take a look at set CSV into the take a look at set.
Executing a take a look at set
A take a look at set is de-coupled from a bot. The identical take a look at set could be executed in opposition to a unique bot or bot alias sooner or later as what you are promoting use case evolves. To report efficiency metrics of a bot in opposition to the baseline take a look at knowledge, full the next steps:
- Import the sample bot definition and construct the bot (refer to Importing a bot for steering).
- On the Amazon Lex console, select Check units within the navigation pane.
- Select your validated take a look at set.
Right here you’ll be able to assessment primary details about the take a look at set and the imported take a look at knowledge.
- Select Execute take a look at.
- Select the suitable choices for Bot identify, Bot alias, and Language.
- For Check sort, choose Audio or Textual content.
- For Endpoint choice, choose both Streaming or Non-streaming.
- Select Validate discrepancy to validate your take a look at dataset.
Earlier than executing a take a look at set, you’ll be able to validate take a look at protection, together with figuring out intents and slots current within the take a look at set however not within the bot. This early warning serves to set tester expectation for surprising take a look at failures. If discrepancies between your take a look at dataset and your bot are detected, the Execute take a look at web page will replace with the View particulars button.
Intents and slots discovered within the take a look at knowledge set however not within the bot alias are listed as proven within the following screenshots.
- After you validate the discrepancies, select Execute to run the take a look at.
The efficiency measures generated after executing a take a look at set assist you determine areas of bot design that want enhancements and are helpful for expediting bot improvement and supply to help your prospects. Check Workbench offers insights on intent classification and slot decision in end-to-end dialog and single-line enter stage. The finished take a look at runs are saved with timestamps in your S3 bucket, and can be utilized for future comparative critiques.
- On the Amazon Lex console, select Check outcomes within the navigation pane.
- Select the take a look at outcome ID for the outcomes you wish to assessment.
On the subsequent web page, the take a look at outcomes will embrace a breakdown of outcomes organized in 4 predominant tabs: General outcomes, Dialog outcomes, Intent and slot outcomes, and Detailed outcomes.
The General outcomes tab incorporates three predominant sections:
- Check set enter breakdown — A chart exhibiting the overall variety of end-to-end conversations and single enter utterances within the take a look at set.
- Single enter breakdown — A chart exhibiting the variety of handed or failed single inputs.
- Dialog breakdown — A chart exhibiting the variety of handed or failed multi-turn inputs.
For take a look at units run in audio modality, speech transcription charts are offered to point out the variety of handed or failed speech transcriptions on each single enter and dialog varieties. In audio modality, a single enter or multi-turn dialog might cross the speech transcription take a look at, but fail the general end-to-end take a look at. This may be brought on, as an example, by a slot decision or an intent recognition difficulty.
Check Workbench helps you drill down into dialog failures that may be attributed to particular intents or slots. The Dialog outcomes tab is organized into three predominant areas, masking all intents and slots used within the take a look at set:
- Dialog cross charges — A desk used to visualise which intents and slots are chargeable for doable dialog failures.
- Dialog intent failure metrics — A bar graph exhibiting the highest 5 worst performing intents within the take a look at set, if any.
- Dialog slot failure metrics — A bar graph exhibiting the highest 5 worst performing slots within the take a look at set, if any.
Intent and slot outcomes
The Intent and slot outcomes tab offers drill-down metrics for bot dimensions equivalent to intent recognition and slot decision.
- Intent recognition metrics — A desk exhibiting the intent recognition success price.
- Slot decision metrics — A desk exhibiting the slot decision success price, by
You may entry an in depth report of the executed take a look at run on the Detailed outcomes tab. A desk is displayed to point out the precise transcription, output intent, and slot values in a take a look at set. The report could be downloaded as a CSV for additional evaluation.
The road-level output offers insights to assist enhance the bot design and enhance accuracy. For example, misrecognized or missed speech inputs equivalent to branded phrases could be added to customized vocabulary of an intent or as utterances underneath an intent.
To be able to additional enhance dialog design, you’ll be able to consult with this post, outlining finest practices on utilizing ML to create a bot that can delight your prospects by precisely understanding them.
On this publish, we offered the Check Workbench for Amazon Lex, a local functionality that standardizes a chatbot automated testing course of and permits builders and dialog designers to streamline and iterate rapidly by way of bot design and improvement.
We sit up for listening to how you employ this new performance of Amazon Lex and welcome suggestions! For any questions, bugs, or function requests, please attain us by way of AWS re:Post for Amazon Lex or your AWS Help contacts.
Concerning the authors
Sandeep Srinivasan is a Product Supervisor on the Amazon Lex crew. As a eager observer of human conduct, he’s enthusiastic about buyer expertise. He spends his waking hours on the intersection of individuals, know-how, and the long run.
Grazia Russo Lassner is a Senior Guide with the AWS Skilled Companies Pure Language AI crew. She makes a speciality of designing and creating conversational AI options utilizing AWS applied sciences for patrons in numerous industries. Outdoors of labor, she enjoys seaside weekends, studying the newest fiction books, and household.