# State of Competition in New Zealand -- 2026 Dataset

## Overview

The file `soc_report_2026_data.csv` contains industry-level competition indicators derived from New Zealand's Longitudinal Business Database (LBD). The data was produced for the New Zealand Commerce Commission's State of Competition report 2026. It combines 15 thematic datasets into a single long-format CSV covering concentration, business dynamism, performance, and related measures for New Zealand industries between 2000 and 2024 (some datasets cover shorter periods).

**These results are not official statistics.** They have been created for research purposes from the Longitudinal Business Database (LBD) which is carefully managed by Stats NZ. For more information about the LBD please visit [https://www.stats.govt.nz/integrated-data/](https://www.stats.govt.nz/integrated-data/). The results are based in part on tax data supplied by Inland Revenue to Stats NZ under the Tax Administration Act 1994 for statistical purposes. Any discussion of data limitations or weaknesses is in the context of using the IDI for statistical purposes, and is not related to the data's ability to support Inland Revenue's core operational requirements. 

## Key concepts

### Firms (PENTs)

The unit of analysis is the "permanent enterprise" (PENT) from the Fabling-Maré productivity datasets derived from the LBD.[^1] A PENT is a longitudinally consistent representation of a firm, constructed to track individual firms over time even as their administrative identifiers change due to restructuring or ownership changes.

All analysis was based on private for-profit enterprises only. Public sector and other non-profit organisations are not included. 

### Industry classification (ANZSIC06)

Industries are classified using the Australian and New Zealand Standard Industrial Classification 2006 (ANZSIC06). The dataset uses two levels of this hierarchy:

- **Division** (1-digit): Up to 19 broad industry groups identified by a single letter (A through S). For example, `C` = Manufacturing.
- **Class** (4-digit): Up to 488 detailed industry classes identified by a 7-character code consisting of the division letter followed by 6 digits. For example, `C241100` = Iron Smelting and Steel Manufacturing.

**All analysis was done at the class level.** Results for divisions reflect output-weighted averages of results for the relevant classes. The division-level results reflect all classes in each division, including classes that could not be reported for confidentiality reasons (see below). 

Due to data limitations including low numbers of for-profit enterprises, some measures are not available for the following divisions and classes within these divisions: 

- Division O: Public Administration and Safety
- Division P: Education and Training
- Division Q: Health Care and Social Assistance

### Output vs revenue

Several datasets offer results calculated using two alternative measures of firm size:

- **Output**: Gross nominal output from the PENT productivity dataset, based on the Annual Enterprise Survey and company tax data. 
  - Output data better reflects firm ownership structures but has narrower coverage (16 divisions, ~450 classes). 
  - Some firms are excluded from the productivity dataset due to data quality issues. Population weights are provided to adjust for missing firms (see below) but this does not help for some competition measures (e.g. concentration measures). 
- **Revenue**: Nominal GST sales less zero-rated sales from the PENT GST dataset. 
  - Revenue data has broader coverage (19 divisions, ~480 classes) but reflects GST reporting structures rather than ownership. For example, retail chains may report GST sales for each individual store, which may overstate the intensity of competition. 
  - GST data may not accurately reflect firm size in industries that supply GST exempt products (e.g. financial services).

When both output-based and revenue-based measures are available, the `method` column distinguishes between them. The analysis in the State of Competition report uses only the *output-based* measures, for reasons discussed in the report.

### Suppression

Some values are suppressed for confidentiality. In these rows, `value` is empty and `suppressed` is `S`. Suppression is based on rules set by Stats NZ to protect the identity of individual firms, typically in industry-years with small numbers of firms. 

- All results based on fewer than 3 firms are suppressed. 
- Results are suppressed where they would enable firms in an industry to learn information about other individual competitors.
- See the [Stats NZ Microdata output guide](https://www.stats.govt.nz/assets/Methods/Microdata-Output-Guide-2020-v5-Sept22update.pdf) for a complete description of the confidentiality rules applied.

### Rounding

All counts of firms are randomly rounded to the nearest multiple of 3. This may affect the accuracy of results derived from small counts.

### Population weights

Some datasets apply population weights to adjust for firms that are present in administrative data but absent from the productivity dataset due to data quality issues. This makes aggregates more representative of the full firm population. Where relevant, this is noted in the dataset descriptions below.

### Financial years

All results are reported for March-ending financial years. For firms that use different financial years, their data is assigned to the nearest March-ending year. 

## Dataset columns

| Column | Type | Description |
|---|---|---|
| `dataset` | character | Identifies which of the 15 thematic datasets the row belongs to. |
| `anzsic_level` | character | `"division"` (1-digit ANZSIC06) or `"class"` (4-digit ANZSIC06). |
| `industry_code` | character | ANZSIC06 code. Single letter for divisions (e.g. `"C"`), 7-character string for classes (e.g. `"C241100"`). |
| `industry_description` | character | Human-readable industry name from the ANZSIC06 classification. |
| `year` | integer | Financial year ending March. For example, `2023` represents the year ended March 2023. |
| `method` | character | Measurement basis where applicable. Values: `"output"`, `"revenue"`, `"ols"`, `"fe"`, or blank. See individual dataset descriptions. |
| `group` | character | Sub-grouping where applicable. Values depend on the dataset (e.g. `"entrants"`, `"fte >= 50"`, an entry cohort year). Blank where not applicable. See individual dataset descriptions. |
| `metric` | character | The name of the measure being reported. See individual dataset descriptions for definitions. |
| `value` | numeric | The numeric value of the measure. Empty when suppressed. |
| `suppressed` | character | `"S"` if the value was suppressed for confidentiality, blank otherwise. |

## Thematic datasets

### cr5

Five-firm concentration ratio (CR5) and associated industry totals.

CR5 is the combined share of the five largest firms in an industry, expressed as a proportion. A CR5 of 0.6 means the five largest firms account for 60% of total industry output or revenue.

#### Dataset scope
| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2001--2024 |
| **method** | `output` or `revenue` |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `cr5` | Five-firm concentration ratio (proportion, rounded to 3 d.p.) |
| `n_pent` | Number of active firms included in the calculation |
| `industry_total_output_millions` | Total gross nominal output of active firms (NZD millions, 1 d.p.). Present when method = `output`. |
| `industry_total_revenue_millions` | Total GST revenue of active firms (NZD millions, 1 d.p.). Present when method = `revenue`. |

### entrant_industry_shares

Industry shares of annual cohorts of entrant firms, tracked over years after entry. For each entry cohort, this reports the combined share of those firms in subsequent years, allowing analysis of how quickly new entrants grow. A firm appears in at most one entry cohort.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division (calculated at class level, aggregated to division) |
| **Financial years** | 2002--2023 |
| **method** | `output` or `revenue` |
| **group** | Entry cohort year (e.g. `"2010"` = firms that first entered in the year ended March 2010) |

#### Metrics

| Metric | Description |
|---|---|
| `entrant_share` | Combined share of industry output or revenue of firms in the entry cohort, in the reported year |

### entrant_survival

Survival rates for annual cohorts of entrant firms. Each cohort consists of firms first observed entering an industry in a given year. A firm appears in at most one cohort. Survival of a firm is based on having any reported GST revenues or employment in the years after entry.

Survival rate can be calculated as `n_entrants_active / n_entrants_in_cohort`. Note that as both values are randomly rounded to the nearest multiple of 3, this ratio may be inaccurate for small numbers.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division (calculated at class level, aggregated to division) |
| **Financial years** | 2002--2023 |
| **method** | Not used |
| **group** | Entry cohort year (e.g. `"2010"`) |

#### Metrics

| Metric | Description |
|---|---|
| `n_entrants_active` | Number of firms in the entry cohort still active in the reported year |
| `n_entrants_in_cohort` | Total number of firms in the entry cohort |

### firm_age_distribution

Percentiles of the distribution of ages of active firms within each industry and year. Firm age is measured based on the birth dates recorded in the LBD. Activity is based on a firm having any reported GST revenue or employment in a year.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2000--2023 |
| **method** | Not used |
| **group** | Not used |

#### Metrics 

| Metric | Description |
|---|---|
| `q10_age_years` | 10th percentile of firm age (years) |
| `q25_age_years` | 25th percentile of firm age (years) |
| `median_age_years` | Median firm age (years) |
| `q75_age_years` | 75th percentile of firm age (years) |
| `q90_age_years` | 90th percentile of firm age (years) |

### hhi

Herfindahl-Hirschman Index (HHI) and associated industry totals.

HHI is calculated as 10,000 times the sum of squared market shares. Values range from near 0 (many small firms) to 10,000 (monopoly). 

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2001--2024 |
| **method** | `output` or `revenue` |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `hhi` | Herfindahl-Hirschman Index (rounded to 1 d.p.) |
| `n_pent` | Number of active firms included in the calculation |
| `industry_total_output_millions` | Total gross nominal output of active firms (NZD millions, 1 d.p.). Present when method = `output`. |
| `industry_total_revenue_millions` | Total GST revenue of active firms (NZD millions, 1 d.p.). Present when method = `revenue`. |

### hhi_change_decomposition

Melitz-Polanec decomposition of annual changes in HHI at the division level. This decomposes each year's change in HHI into four additive components based on what happened to the relevant class-level industries, given that division-level HHI is an output-weighted average of HHI for the classes.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division only |
| **Financial years** | 2002--2023 |
| **method** | `output` or `revenue` |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `unwt_mean_effect` | Change in the unweighted mean HHI across classes within the division |
| `cov_effect` | Change in the covariance between HHI and industry size (reallocation effect) |
| `entry_effect` | Contribution of newly appearing class industries within the division |
| `exit_effect` | Contribution of disappearing class industries within the division |

The four components sum to the total annual change in HHI for the division.

### hhi_ownership_aggregated

HHI calculated after aggregating firm output or revenue to the group-level enterprise (common ownership groups recorded in the LBD) before computing market shares. These can be compared to the values in the `hhi` dataset which does not reflect such aggregation. However, it is not clear whether the LBD data reflects the full extent of ownership and control relationships that exist in practice. 

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2001--2024 |
| **method** | `output` or `revenue` |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `hhi_aggregated` | HHI after ownership aggregation (rounded to 1 d.p.) |

### industry_entry_exit_counts

Counts of active, entering, and exiting firms and employment totals by division and year. "Young" firms are those less than 5 years old. Activity is based on a firm reporting any GST revenue or employment in a year.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division only |
| **Financial years** | 2000--2023 |
| **method** | Not used |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `n_active` | Count of active firms |
| `n_young` | Count of active firms less than 5 years old |
| `n_entries` | Count of firms entering the industry in the year |
| `n_exits` | Count of firms exiting the industry in the year |
| `n_exits_young` | Count of young firms (< 5 years old) exiting |
| `young_l` | Total employment (labour input) of young active firms |
| `total_l` | Total employment (labour input) of all active firms |

### industry_entry_exit_shares

Combined industry output or revenue shares of entrant and exiting firms. Entrant shares are measured in the year after entry (first full year of activity). Exiter shares are measured in the year before exit (last full year of activity).

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division (calculated at class level, aggregated to division) |
| **Financial years** | 2001--2023 |
| **method** | `output` or `revenue` |
| **group** | `entrants` or `exiters` |

#### Metrics

| Metric | Description |
|---|---|
| `industry_share` | Combined share of the group (entrants or exiters) in the division for the year |

### industry_total_output

Population weight-adjusted firm count and gross nominal output at the class level. These figures are adjusted using population weights to account for firms present in administrative data but absent from the productivity dataset, making them more representative of the full economy. This data is used as weights when aggregating other metrics (e.g. profit elasticity, PCM, HHI) from class to division level.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Class only |
| **Financial years** | 2001--2023 |
| **method** | Not used |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `n_pent` | Population weight-adjusted firm count |
| `gross_nominal_output_millions` | Population weight-adjusted gross nominal output (NZD millions) |

### large_firm_age

Mean age of large firms (defined by FTE employment thresholds) by division and year. Some divisions are excluded at the higher FTE threshold due to high inaccuracies caused by base-3 rounding of the number of firms.

Mean firm age = `total_age_years / n_pent`. A pre-calculated `mean_age_years` metric is provided for convenience.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division only |
| **Financial years** | 2000--2023 |
| **method** | Not used |
| **group** | `fte >= 50` or `fte >= 100` (the FTE threshold used to define "large") |

#### Metrics 

| Metric | Description |
|---|---|
| `n_pent` | Number of large firms |
| `total_age_years` | Sum of ages of large firms in years |
| `mean_age_years` | Mean age of large firms in years (derived: `total_age_years / n_pent`) |

### pcm

Price-cost margins (PCM) by industry and year. PCM measures the proportion of output that represents profit: *PCM = (output - total variable cost) / output*. Results are reported both unweighted and weighted by firm outputs (all results are also population-weighted to adjust for missing firms)

- **Unweighted** (`_unwt`): Population-weighted mean of firm-level PCMs, giving equal influence to each firm. Firm-level PCMs are bounded below at -1.
- **Output-weighted** (`_wt`): Output-weighted mean of firm-level PCMs, giving more influence to larger firms.

Industry-years with fewer than 6 firms are excluded.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2001--2023 |
| **method** | Not used |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `pcm_mean_unwt` | Unweighted mean PCM |
| `se_unwt` | Standard error of the unweighted mean |
| `pcm_mean_wt` | Output-weighted mean PCM |
| `se_wt` | Standard error of the output-weighted mean |
| `n_pent_unwt` | Underlying firm count (unweighted sample) |
| `n_pent_wt` | Population weight-adjusted firm count |

### profit_elasticity

Profit elasticity estimates measuring how sensitive firm profits are to changes in costs. A more negative profit elasticity indicates that firms are less able to pass cost increases on to customers, which can be indicative of more competitive markets.

Estimates are produced using two regression methods: ordinary least squares (OLS) and firm-specific fixed effects (FE). At class level, estimates are from industry-year specific regressions. At division level, values are output-weighted means and medians of the class-level estimates.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division and class |
| **Financial years** | 2001--2023 |
| **method** | `ols` or `fe` for estimation-method-specific metrics; blank for firm counts |
| **group** | Not used |

#### Metrics

| Metric | Level | Description |
|---|---|---|
| `wt_mean_profit_elasticity` | Both | Output-weighted mean profit elasticity. At class level this is the regression coefficient; at division level it is the weighted mean of class-level coefficients. |
| `wt_mean_profit_elasticity_se` | Both | Standard error of the profit elasticity estimate. At class level this is the regression standard error; at division level it is a delta-method standard error of the weighted mean. |
| `median_profit_elasticity` | Division only | Median of class-level profit elasticity estimates within the division. |
| `n_obs` | Class only | Number of firm-year observations in the regression. |
| `n_pent_unwt` | Both | Number of unique firms observed. Not method-specific. |
| `n_pent_pop_wt` | Both | Population weight-adjusted firm count. Not method-specific. |

### rank_persistence

Persistence of large firms in top-10 rankings within class industries, aggregated to divisions. This measures how stable the composition of the largest firms is over time.

Rank persistence = `n_top10_3years / n_total_top10`. A higher ratio means the same firms tend to remain among the largest over time.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division only |
| **Financial years** | 2001--2024 |
| **method** | `output` or `revenue` |
| **group** | Not used |

#### Metrics

| Metric | Description |
|---|---|
| `n_top10_3years` | Number of firms ranked in the top 10 for three consecutive years across class industries in the division |
| `n_total_top10` | Total number of top-10 positions across class industries in the division |

### young_firm_shares

Industry shares of young firms (less than 5 years old based on birth dates recorded in the LBD). Shares are reported for two populations: all firms, and only the top-10 ranked firms in each class industry.

#### Dataset scope

| | |
|---|---|
| **ANZSIC06 levels** | Division (calculated at class level, aggregated to division) |
| **Financial years** | 2001--2023 |
| **method** | `output` or `revenue` |
| **group** | `all` (share among all firms) or `top10` (share among top-10 ranked firms) |

#### Metrics

| Metric | Description |
|---|---|
| `young_share` | Combined industry share of young firms |

[^1]: Richard Fabling and David C. Maré (2019). "Improved productivity measurement in New Zealand's Longitudinal Business Database", *Motu Working Paper 19-03*.