
Content Classification Training Framework for Intelligent Automation
Content Classification Training Framework for Intelligent Automation Overview
Features
The purpose of this asset is to address one of the first challenges posed by unstructured content, namely how to separate and cleanse data sets into a usable form so these can be easily fed into classification machine learning models.
Specifically targeted for use within the Tungsten Intelligent Automation Platform, this framework provides the components required to ingest, segment and categorise individual paragraphs of text from unstructured content and produce the text files required by Tungsten Transformation to build and train the classification models.
The content provided here can be used as an end-to-end framework for the purposes of POC exercises or solution implementations enabling rapid creation of training data through an intuitive user interface. This interface can be presented to either the solution designer or more importantly to the knowledge workers themselves who have a much deeper understanding of the inbound content and associated categories.
This framework also serves as good enablement example to illustrate some of the key capabilities within both Tungsten TotalAgility and RPA that can be applied to many other use cases, demos and solutions, and includes
• Automatic paragraph detection using the capabilities introduced in TotalAgility 7.8
• Rich validation UI allowing users to easily navigate through blocks of unstructured content
• Use of actions and custom .NET components within IA validation forms
• Ability to introduce document search functionality into IA validation forms
• Use of RPA to perform data export through the use of TotalAgility Web Services
Benefits
With estimates that up to 80% of corporate data is unstructured, in the form of natural language content contained in reports, contracts, images, insurance policies and claims, medical records, emails, chat transcripts, annual reports and so on, the potential value and insights that are contained in these types of unstructured content largely remain untapped, representing a huge opportunity for business.
Traditionally, working with these types of content or documents has required human brain power and specialist skills or training. Examples include employing financial analysts to extract and normalise market data from company financial statements or legal secretaries to assemble case notes and historical legal rulings. Large amounts of the time and cost of these highly skilled individuals is spent doing repetitive manual tasks, reading through documents to find relevant information, and organising it into a usable form for the business.
This asset simplifies and speeds up the process of generating training data for these types of exercises and provides an intuitive user interface allowing non-technical business users to play the lead role in content classification training.
Technical Details
Inputs
Representative document samples for the required classifications
Outputs
Labelled text based training files for every paragraph found within the sample content
Geographic Availability
Additional Information
PLEASE NOTE: Tungsten Labs is independent of Tungsten Automation and this listing is not officially supported or maintained.





