Always soft delete

I’ve been recently working on two tasks that involved cleaning up some data. It being duplicate records, or entries that shouldn’t be there now (for one reason or another).

How would I know I deleted the right data? that’d be easy to check, as data to be removed was already identified. It’s not there anymore? good! Now, what about data that shouldn’t have been removed? How do I know I haven’t deleted anything else? well, we can just compare the delta: (data prior the deletion) – (data to be deleted) = (data after the deletion). If that equation doesn’t resolve, you’ve deleted more data than you should’ve, now what?

Rare times you’d want to actually delete data:

  • It’s volatile data. You don’t even care if it’s stored at some point
  • Data retention policies (usually these give sufficient time to work on soft-deleted data)
  • You’re out of space
    • In which case you’ll probably prefer more space than deleting data, otherwise, data should be removed already?

I’d go even further; let’s not talk about deletion, rather, removal. When you remove, you move the thing from one place (into another); when you delete, you make the thing stop to exist. It can be a backup table, a different storage pool, a tag (soft delete) that would move data out of a filter (where soft delete = false).

“Are you sure you want to delete (X)?” is not a replacement at all for soft delete. Yes, the user was sure. Now the user is sure that he/she wants his records back.

Backup is not soft delete. Having a backup does not make up for soft deletion. You can recover data from a backup. But these should be edge cases, and not the standard procedure to retrieve deleted stuff. If it’s a common procedure, backups are then no more than outdated versions of the data you are wanting to back up. (e.g. a real-time data replication would replicate the removal as well). Backups should be accountable for loss of data, not for removal.

Always soft delete. You’ll have a chance of hard deleting it later, but not the other way around. You’d be appreciated for keeping data that was thought to be removed (except when legal matters are in the middle, even then -dependent on possibilities- you might want to consider scrubbing, where you can keep model relationships without having any actual data).

It’s the “measure twice cut once” motto for manipulating data. Remember, you can always delete later.

Leave a Reply

Your email address will not be published. Required fields are marked *