Identify relevant report information from unstructured data
The Big Picture
A leading bank was conducting projects on socioeconomic issues, which resulted in the production of unstructured data in the form of a plethora of files. Given a database of documents and concepts as use cases, the challenge was to analyze and identify insights in the documents related to ‘road safety’.
To solve the company’s challenge, an analysis and assessment framework was used that leveraged text mining to enhance productivity. Deep-learning based techniques were used to identify semantically similar keywords to expand the scope of the search. The approach also cross-tabbed documents and search terms by frequency or correlation to generate cross-document heat maps. The solution computed distribution of search term frequencies, projects counts, and other statistics across countries and report start years.
As a result of the solution, the company was able to use co-occurrence statistics to identify other terms related to search keywords. 60% of the documents were found to be relevant. The solution enabled the company to categorize relevant documents based on ranking.