Combining data across databases on the Web platform
Who is this article for?
Users who want to combine datasets.
Web subscription is required.
The Ideagen Audit Analytics platform organises data by database, each with its own search page and filters. Combining data across databases is a multi-step process that happens in your spreadsheet after you download your results.
This article covers the most common approaches, from using company profiles for quick lookups to building matched datasets across multiple downloads.
1. Company profile lookups
If you're researching a single company or audit firm, the Company Profile is the fastest way to see data across databases without downloading anything.
To access a company profile:
- Go to the dashboard.
- Search for the entity using Search SEC Companies (by name, ticker, or CIK) or Search All Entities (for non-SEC entities including auditors, European companies, funds, and more).
The company profile brings together information from multiple databases in one view, including audit history, financial data, governance, and legal activity. For audit firms, the profile includes client summaries, inspection reports, office locations, and affiliate information.
Note
Company profiles are useful for quick research and one-off lookups. For systematic analysis across a population of entities, you'll need to download and combine data in a spreadsheet.
2. Analysing the same population across multiple databases
This is the most common scenario: you've built a population in one database and want to pull related data from another.
To combine data from multiple databases:
- Start with whichever database is closest to your primary research question.
- Apply your filters (industry, market cap, date range, etc.).
- Download the CSV.
This gives you your base population with identifiers you can reuse.
To carry your population to the next database:
- Copy the identifiers from your first download (typically CIK codes or tickers for SEC data).
- Paste them into the search field on the next database's search page.
The identifier you use depends on the entity type and market:
- SEC registrants: CIK or ticker (available on most SEC database search pages via the Search by Ticker or CIK Code field)
- Canadian companies: SEDAR Issuer Number
- Funds: Series Number
- European companies, non-profits, broker-dealers, and other entity types: Check the search page for the available identifier field (it varies by database)
To complete the second database search:
- Apply any additional filters relevant to the second database (date range, specific content filters, etc.).
- Download that CSV as well.
3. Matching downloads
With both CSVs open, you need to match rows across the two files. How you do this depends on what you're combining.
To identify the shared fields:
- Locate the company identifier (CIK, ticker, etc.) that appears in both downloads.
- Identify additional fields needed to match at the right level of detail.
Most databases have multiple rows per company (one per year, one per event, one per filing, etc.). To match at the right level of detail, you'll typically need to combine the company identifier with one or more additional fields:
- Company identifier + fiscal year or date: for most combinations (e.g., matching Audit Fees to Audit Opinions for the same company and fiscal year)
- Company identifier + auditor key: when combining auditor-level data (e.g., matching Auditor Changes to Audit Fees where you need to align on which auditor relationship you're looking at)
- Company identifier + date range: when one database uses point-in-time dates and the other covers a range of periods (e.g., matching an internal control opinion to restatements that overlap with that fiscal year)
To build a composite key:
- Create a new column in your spreadsheet.
- Concatenate the shared fields into a single value (e.g., combining CIK and fiscal year into something like 12345_2024).
- Create the same composite key in both downloads using the same fields and format.
- Use a VLOOKUP or pivot table to match rows across the two files.
If your analysis spans more than two databases, repeat the process for each additional dataset, building your composite key consistently across all downloads.
4. Building a population from scratch
Sometimes you don't start with a specific population. Instead, you're looking for entities that meet conditions across multiple databases (e.g., companies that changed auditors and had a restatement in the same period).
To build a cross-database population:
- Start with whichever database has the more restrictive criteria (this gives you a smaller, more targeted starting population).
- Download the results.
- Extract the identifiers.
- Run those identifiers against the second database with its own filters.
- Match the downloads using a composite key as described in the previous section.
Your matched results are the population that meets both criteria.
Tip
Starting with the more restrictive search reduces the number of records you need to process in subsequent steps. If one condition applies to hundreds of companies and the other applies to dozens, start with the dozens.
5. Using peer groups across databases
If you're benchmarking a company against its peers across multiple databases, the Company Profile peer download gives you a ready-made population.
To use peer groups:
- Download the peer list from one company's profile.
- Use those identifiers to search across whatever databases you need.
This saves you from manually building a comparable company list.
6. Tips for successful data combination
- Check that both downloads include the fields you need for matching: Not every CSV includes the same set of identifiers or date fields. Before building your composite key, confirm that the fields you plan to use are present in both downloads. If a field isn't available in the CSV, it may be available on the search page as a filter, or you may need to adjust your approach
- Be consistent with your composite key: Use the same fields, in the same order, with the same formatting in both spreadsheets. Small differences (like date formats or leading zeros in CIK codes) will cause matches to fail
- Start broad, then narrow: If you're not sure how many results to expect, run your searches with fewer filters first to see the total count, then refine. It's easier to filter down a larger download than to realise you were too restrictive after matching
- Be mindful of download limits: Some databases have row limits on CSV downloads. If your results exceed the limit, use filters to narrow your search before downloading
- Save your searches: If you plan to repeat this analysis over time, save each database search so you can rerun and re-download updated data without rebuilding your filters
- Check the search guide: Different databases have different filters and different fields in the CSV download. The search guide documents both