Data: To Keep or Not to Keep, That Is the Question

# Data: To Keep or Not to Keep, That Is the Question

## The privacy logic

Privacy law treats identifiable data as something a business should fear and control. This position has excellent public policy underpinnings. Every record retained is a record that can be breached, subpoenaed, repurposed, or leaked, and the cheapest way to shrink that surface is to hold less of it for less time. A great deal of the harm in the data economy traces to organisations keeping things they had no continuing reason to keep, and the discipline of deletion is a genuine protection rather than a bureaucratic reflex.

The blind spot in a corporate data governance strategy that focuses solely on privacy, data protection and data minimisation is that, in an AI-driven economy, your data is a core part of your business value, and as such should be collected and curated rather than deleted and destroyed.

## How big tech kept its data value while others purged for privacy

To date, the most instructive examples of how to treat data as an asset to be preserved, rather than a liability to be purged, sit with the well-known global technology firms whose entire economics rest on holding data. They have had the strongest incentive to find a way for both traditions to be satisfied at once, and the resources to engineer it.

Their consumer-facing surfaces often look like privacy orthodoxy. Auto-delete is offered as a default, retention is capped at a stated number of months, and a user can ask for their history to be removed. Underneath, the architecture frequently answers the destroy-or-keep question in favour of transform. A deletion request can be honoured by replacing the identifier with a random value and moving the now unlinked record into a separate store, so the connection to the person is broken while the data itself persists. The request has been satisfied as a de-identification operation rather than an erasure.

The same instinct runs down the stack. Identifiers are masked, tokenised, or encrypted so the field survives without the name. Noise is added to aggregates so that no individual contribution can be recovered from a model trained on the crowd. Patterns are extracted on the device so the raw identifiable record never has to be centralised at all. In each case the move is the same. Keep the signal, drop the identifiability, retain the asset, shed the liability. The dichotomy between deleting data and keeping it is treated as a false one, dissolved by engineering, on the claim that what is retained is no longer personal in the sense the law cares about.

## The load-bearing claim

For the organisations that solve the asset-versus-risk problem well, the whole reconciliation rests on one load-bearing claim: that de-identification is real and durable, that the link between the data and the person, once cut, stays cut. If the claim holds, the two traditions were never truly in conflict, and de-identification is the bridge that lets an organisation keep its asset and keep faith with the regulator in the same motion. If it does not hold, de-identified retention is simply retention wearing a reassuring label, and the privacy risk has been relabelled rather than removed.

Unfortunately the de-identification solves all claim is contested and becoming more so. Re-identification of supposedly anonymous datasets is a recurring result rather than an exotic one, and regulators have moved away from treating anonymisation as a binary state toward treating it as a spectrum, asking whether a person can still be singled out, linked across records, or inferred from what is left. "De-identified" turns out to name a range of conditions, some robust, some apt to collapse under a determined join against another dataset.

This is what gets lost when the matter is filed under retention policy and left to the engineers. The choice between deletion and de-identified retention is not really a question about engineering. It is a question about what the default destiny of data ought to be, destruction or preservation, and the engineering is being asked to carry a normative weight it cannot always bear. The privacy tradition answered one way, before the asset was visible. The data economy answered the other way, before the privacy cost was understood. A good deal of contemporary practice is built to make the answer appear not to matter. Whether it genuinely does not matter turns entirely on an assertion about irreversibility, and that assertion deserves far more scrutiny than it usually receives, precisely because so much now rests on its being true.

## Consent and permissions baked into the data

De-identification, then, can only ever be part of the answer. It is a control on identifiability, and a valuable one, but it does no work at all on the question of authority. Stripping the person out of a record does not establish that anyone was entitled to use the record in the first place, and it stakes the whole compliance position on a claim about irreversibility that may not survive contact with a determined adversary or a sceptical regulator. An organisation that rests everything on de-identification has loaded all of its weight onto the single plank most likely to give way.

The more durable basis is consent. If a data asset carries within itself a clear and unequivocal record of what it may be used for and by whom, granted by the person the data concerns, then its right to be used no longer depends on the hope that de-identification holds. The authority to use the data sits in the asset, attached to it and travelling with it, rather than in a separate policy filed somewhere else or a technical assertion made once and never revisited. The terms on which the asset may be used travel with it, and they change when the consent that grants them changes.

This is the principle Nooriam is built on. Each data asset travels with its own legal consents and permissions, expressed so that a machine can read them and act on them, which means lawful use is established at the level of the asset rather than reconstructed after the fact. De-identification keeps its place in this picture as one control among several, valued for what it does and not asked to do what it cannot. Consent is king. An asset that carries its own permissions can be put to work with confidence, retained for the value it carries, and defended to the regulator the privacy logic answers to, without the whole arrangement resting on a single claim about whether a link, once cut, stays cut.