Understanding Document Analysis: A Handy Guide on Business Analytics Technique

Business analysts play an important role in any product centric organization. While planning, designing, executing, and monitoring a project from scratch, a business analyst team would create tons and tons of content using information from various independent and related sources. If we were to identify the exact source of the information used in project management, it would take us years and decades to accurately identify the context. Luckily, we are living in times where this content assimilation and analysis has been almost or fully automated to a large extent. The technique used to analyze documents and their content is referred to as “document analysis”— an important part of business analysis that pieces together vital information before eliciting outcomes from business analytics tools. If you are pursuing certification from the best business analyst course, you would invariably gain a first hand experience on how to document analysis works and why you should dedicatedly focus on this critical aspect of data driven project management.

What is Document Analysis?

In business analytics, two types of analysis techniques dominate the course curriculum. These are:

Qualitative analysis
Quantitative analysis

Qualitative analysis has many branches, and document analysis is one of them. It involves collecting and scanning through content published in various documents in a systematic manner so as to find out answers to specific qualitative questions based on data collected from unique focus groups such as website users, e-commerce shoppers, sales prospects, forms and survey responders, interviewees and even random observations segmented into distinguished cohorts. In any data analysis, the project proceeds with data based document analysis, which serves as the primary tool of reference for qualitative research in business intelligence domains.

How to identify documents / content to be used for document analysis in BA?

Today, we are flooded by different types of content, and given the democratized nature of the internet and the World Wide Web, it is fairly easy to quote and cite the source of information used in creating a document for research. However, there lies the challenge for every business analyst. Which document content to use for qualitative research and how to ensure this fits the project’s requirements?

Let’s identify the different groups of documents that would be used for analysis in a BI project.

#1 Records available in the public domain (Open source)

90% of the business analysts refer to public documents for their projects. These include assimilating the company’s financial records, stock prices, recent mergers and acquisitions records, employee handbooks, marketing / advertising efforts, and social media outreach.

Many companies and institutions also publish Ebooks, whitepapers, case studies, and other informational assets on their website or third party content management channels. If these are free for downloading, these would fall under the category of public documents.

Pros:

Easy to access and verify
Clear citations are available
An organic analysis is possible using this information

Cons:

Dated information with the least effort taken to update these regularly
Different authors may publish different types of content with contradictory inferences and conclusions
The methodology of public document studies differs from company to company

#2 Personal Records

These are in-person documents authored by researchers and journalists based on their own project requirements. These serve as secondary sources of references in BA. It could also include blogs, video reviews, product testimonials, social media posts (LinkedIn / Reddit /Quora comments), news interviews, incident logs, and guest posts published on verified and reputable non-partisan publication platforms.

#3 Enterprise data

This is the costliest type of information and has a whole different level of data analysis. In the BA domain, qualitative research is based on the relevance and accuracy of data. For example, in lead generation, you would come across technographic data or account based marketing where business analytics would be performed based on specific information pertaining to an individual such as “job title”, location, experience, sales quota handled, and so on. Enterprise data would also include information related to investments in specific tools and technologies for marketing, sales, IT, security, finance, and so on. Companies spend a lot to safeguard this information from getting leaked as it is a huge cost of opportunity considering the kind of investments made to acquire, analyze and store this data. The cost of enterprise data goes up when customers and partners are involved. If your BA project is focusing on the highest form of business intelligence, you should try and gather the information that is a mix of public documents, enterprise data, and personal records.

Pros:

The highest position in the rank of accuracy and verifiability
Doesn’t deteriorate as quickly as other forms of data
Gives the best results in AI and business automation as it can easily help to identify and eradicate algorithmic biases

Cons:

Costly in every way
Compliance, governance, and security frameworks are tight
If found to be using this data in an illegal manner and without prior approvals, legal proceedings can be initiated against the BA teams and organizations that they belong to.

Challenges related to document analysis

It has been observed that the novice BA teams would prefer to use the natural approaches in their analysis, which is, to take the path of “least resistance.” Document analysis philosophies that have emerged in the recent years rely so much on automation and data driven techniques that it becomes incoherent to link unlabelled data with qualitative approaches. Without bringing in AI and machine learning algorithms, such document analysis approaches risk falling prey to “biases” and anomalies.

To eradicate these challenges, trainers are adopting newer approaches to simplify business analytics based on document analysis. We have listed the top approaches here that you can practice in some of the best business analyst courses available online.

RTM Assessment

RTM can be extrapolated in forward and backward directions based on ease of traceability and applicable roles assigned to software engineers, managers, and customers – which means you would have to work with:

Business document
Technical document

In SaaS and Cloud businesses, RTM is a hugely popular form of the document analysis tool. It is a traceability matrix created in a tabulated form to express the relationship between two or more different documents that are used in the BA project. In SDLC, the RTM assessment tool would invariably include documents provided by the client, the project head, and the software testing. (Smaller teams would get software testing documents approved by the project head, so in the end, there are two documents in the RTM).

CATA

Computer aided text analysis is a software based document analysis approach that takes care of all the broad categories of documents such as system functioning reports, system defunct report, log analysis, system features, UI UX report, custom networking, and IT security audits, performance reviews and so on.

In the business analytics course, you should try and gather as much information through CATA which allows BA teams to further refine the content in the form of text, images, videos, and audio files. CATA tools have given rise to a new family of BA tools called video analytics and intelligence and these are gaining massive popularity in the healthcare, media, broadcasting, civilian security, IoT, telecom, automotive, and real estate industries.