Assigning taxonomy codes in the Controls databases
Who is this article for?
Researchers and Analysts working with the Internal Controls (SOX 404) and Disclosure Controls (SOX 302) databases.
Internal Controls or Disclosure Controls subscription required.
Ideagen Audit Analytics analysts assign taxonomy codes to controls records through a text-driven, human-reviewed process that categorises weakness types disclosed in SOX 302 and SOX 404 filings.
This article explains the assignment process in more detail.
Overview
Analysts assign taxonomy codes by reviewing disclosure text from each filing. A machine learning model recommends tags to assist the process, but analysts review and confirm 100% of those recommendations - the final determination is always human.
A single record can carry multiple codes across different categories, reflecting multiple types of issues disclosed in a single filing. The Internal Controls (SOX 404) and Disclosure Controls (SOX 302) databases use distinct taxonomy structures, with some code groups shared between them and others specific to each.
Note
This article assumes familiarity with the Internal Controls (SOX 404) and Disclosure Controls (SOX 302) databases. For background on how those databases relate to each other, see understanding SOX 302 vs. SOX 404 databases.
Understanding taxonomy
The taxonomy fields in the controls databases are among the most analytically rich - and most commonly misunderstood. Researchers analysing patterns in weakness types, or comparing taxonomy distributions across firms, industries, or time periods, need to understand what the taxonomy categories represent, how they are assigned, and why the number of taxonomy codes on a record is not the same as the number of distinct weaknesses.
Taxonomy structure
The taxonomy is organised into code groups, each addressing a different dimension of the disclosure. The Internal Controls and Disclosure Controls databases share some code groups and have distinct ones for the issues specific to each.
Acc - Accounting Rule Application Failures (both databases)
Acc codes capture accounting-specific failures: situations where a company's controls were insufficient to ensure correct application of GAAP to particular accounting areas. There are 28 Acc codes covering areas including revenue recognition, tax accounting, inventory, depreciation, consolidation, derivatives, lease accounting, and acquisition and merger accounting, among others.
These codes reflect what went wrong accounting-wise - the subject matter area where the failure occurred. The same Acc codes appear in both the Internal Controls and Disclosure Controls databases.
IC - Internal Control Weakness (IC database only)
IC codes capture structural and process failures in a company's internal control environment - issues that speak to the underlying control infrastructure rather than the accounting error itself. Examples include inadequate documentation of accounting policies and procedures, insufficient internal audit function, segregation of duties failures, IT access and security issues, management competency concerns, untimely account reconciliations, and evidence of regulatory investigations.
These codes reflect how the control environment failed - the structural or organisational conditions that allowed issues to occur or go undetected.
DC - Disclosure Control Weakness (DC database only)
DC codes capture issues specific to the disclosure controls environment as assessed under SOX 302 - failures in how a company processes, accumulates, and reports information required for external disclosure. Examples include financial close process and timeliness issues, insufficient management review, board and audit committee governance concerns, personnel inadequacies, IT access and security issues in the disclosure context, internal investigations, and evidence of restatements or 404 adverse reports.
While DC codes address some of the same broad themes as IC codes - personnel, IT, governance - they are distinct codes applied specifically in the context of the SOX 302 assessment. You should not assume that similarly named IC and DC codes are equivalent across databases.
Other (DC database only)
Two catch-all codes cover situations that do not fit cleanly into the Acc or DC categories: entity-level control issues and registration or debt issuance issues. These are used infrequently and typically reflect unusual circumstances in the disclosure.
Ex - Exemptions and Special Situations (both databases)
Ex codes apply to records involving special filing circumstances rather than substantive weaknesses - for example, acquisitions during the past year that limit the scope of the IC assessment, management-only opinions where no auditor attestation is required, or multiple intra-year assessments for the same company. These codes do not indicate a weakness; they provide context about the scope or completeness of the assessment.
Assignment process
Taxonomy assignment draws on both a machine learning model and analyst judgement. The model recommends tags based on the disclosure text, but analysts review and confirm every recommendation - no code is applied without human review.
The analyst reads the full disclosure text for the record and evaluates the model's suggestions against the specific language used. Because disclosure language is qualitative and varies significantly across companies and over time, the review requires strong accounting knowledge and reading comprehension. Companies describe the same underlying issues in different ways, use varying levels of specificity, and sometimes disclose issues that span multiple taxonomy categories.
Multiple codes are common on a single record. A company might disclose a material weakness involving both a failure to document accounting policies (IC - Accounting documentation, policy and procedures) and incorrect application of GAAP to revenue transactions (Acc - Revenue recognition issues) - resulting in codes from both groups on the same record.
Records where controls are effective receive no taxonomy codes, with the exception of Ex codes for special filing circumstances. A record with no taxonomy codes does not indicate a data gap - it means no weaknesses or deficiencies were identified.
Significant deficiencies alone do not result in a finding of ineffective controls under SOX 404 and generally do not trigger taxonomy assignment. They may be referenced in the IC text, but will not produce taxonomy codes in most cases.
Interpreting the Number of Weaknesses field
Researchers often expect the Number of Weaknesses field to correspond to the number of taxonomy codes on a record. It does not.
Number of Weaknesses reflects the analyst's count of distinct material weaknesses described in the disclosure text - not the number of codes assigned. A single weakness can result in multiple taxonomy codes if it involves failures across more than one area. Conversely, related issues described together in the text may be treated as a single weakness even if they carry multiple codes.
The relationship between taxonomy code counts and Number of Weaknesses is not one-to-one. Research relying on either field should account for this distinction to avoid double-counting or miscounting.
Accounting for taxonomy evolution
The taxonomy has been refined over time as disclosure language, accounting standards, and internal control practices have evolved. Researchers conducting long-horizon studies should be aware that specific codes may have been added, consolidated, or modified during their study period. The data dictionary reflects the current taxonomy; for questions about historical taxonomy changes that may affect a specific analysis, contact support@auditanalytics.com.
Understanding taxonomy codes in your data
To review taxonomy code definitions:
- Go to the Taxonomy of Issues tab in the data dictionary for each database.
- Review the definitions for every code.
Note
Separate data dictionaries exist for the platform download and data feeds.
To view the full disclosure text:
- Access the platform.
- Navigate to the search results page or company profile.
- Select the link to view the disclosure text.
Note
The full disclosure text is not available in the CSV export. For programmatic access, the text is available in Feed 11 (SOX_404_INTERNAL_CONTROLS) for IC records and Feed 10 (SOX_302_DISCLOSURE_CONTROLS) for DC records, as well as via WRDS.
Reviewing this text can help you understand the context behind a taxonomy assignment for a specific record.
Conducting aggregate analysis using taxonomy fields
When analysing taxonomy data, keep these considerations in mind:
- Code counts per record are not equivalent to weakness counts
- A single weakness can generate codes from more than one category
- Taxonomy language has evolved over time; when comparing across long time horizons, check whether specific codes were added or modified during your study period