Soft deletion explained in PostgreSQL - and how to implement it
In the world of database management, much of your data is most likely never actually deleted. The concept of deletion has evolved over time.
Enter soft deletion - a well-known technique that's changed how we think about data lifecycle management. This post explores the ins and outs of soft deletion, its benefits, and how to implement it effectively.
What is Soft Deletion?
Imagine you're managing a bustling digital library. Instead of permanently removing books from your shelves (hard deletion), soft deletion is like moving them to a special reserved section. They're out of the main circulation, but still accessible if needed.
In technical terms, soft deletion involves marking records as deleted in the database, rather than permanently removing them. This is typically achieved by adding a boolean flag or better, a timestamp to indicate when a record was "deleted."
The Advantages of Going Soft (on Deletion)
- Data Recovery: Accidentally deleted important information? With soft deletion, recovery is just a database query away.
- Audit Trails: Maintain a complete history of data changes, crucial for compliance and forensic analysis.
- Performance Optimization: Avoid the potential performance hit of cascading deletes in large, complex datasets.
- Trend Analysis: Analyze historical trends including "deleted" data, providing a more complete picture of your data over time.
- Enhanced User Experience: Allow users to recover their own deleted data, adding a layer of forgiveness to your application.
How Soft Deletion Works
- Deletion Flag: Add a boolean column (e.g.,
is_deleted
) or a timestamp column (e.g.,deleted_at
) to your tables. - Modified Queries: Instead of using DELETE statements, update the deletion flag to mark records as deleted.
- Filtered Results: Adjust SELECT queries to exclude "deleted" records by default.
- Recovery Mechanism: Implement a process to "undelete" by resetting the deletion flag.
Handling Related Data: Cascading Soft Deletes
When dealing with related tables, it's important to ensure that soft deletes cascade properly. There are two main approaches:
- Database Triggers: Create triggers that automatically propagate the soft delete to related records.
- Application Logic: Handle cascading deletes in your application code, offering more flexibility but requiring more development effort.
For some tables, cascading may not even be required - but that again, depends on your application. If you find that your data is fine without cascades, then that's fine.
Implementing Soft Deletion in PostgreSQL
Let's look at a practical example of implementing soft deletion in PostgreSQL:
Create tables with a soft delete column:
Use soft deletion in your queries:
For most applications, these changed to your tables is probably enough. You can of course make it more complex and make related data automatically cascade (soft delete) as well.
Below you can see how that can be done, using triggers and functions!
Create a function for soft deletion:
This little piece of code is the heart of our soft deletion system. When you create it, you're essentially telling PostgreSQL, "Hey, I've got a special way of handling deletes." Instead of actually removing a row, this function simply marks it as deleted by setting its deleted_at
field to the current time. It's like putting a timestamp on a piece of paper to show when it was "thrown away", but you're really just putting it in a special drawer instead of the trash can.
Set up triggers for soft deletion:
Now, onto the soft delete triggers. These are like vigilant gatekeepers for your tables. You set them up on your users
and posts
tables, and they stand guard, watching for any attempts to update the deleted_at
field. When they see an update that's trying to change deleted_at
from null to a timestamp, they spring into action. They call our soft delete function, which does the actual work of setting the timestamp. It's like having a librarian who, instead of letting you remove a book from the library, simply marks it as "checked out indefinitely" whenever you try to take it.
Implement cascading soft deletes:
Finally, we have the cascading soft delete function and trigger. This is where things get a bit more sophisticated. Imagine you have a user who has written several posts. If you delete the user, you probably want to delete their posts too, ...right? Actually.. keep it if it's a public forum please!
That's what this function does. When a user is soft deleted, this function goes through all the posts associated with that user and marks them as deleted too. The trigger is what kicks off this process. It's like telling the librarian, "Hey, when you mark this author as inactive, also mark all their books as unavailable."
Restoring data, one query at a time
Let's say you messed up and soft deleted a whole table. Or maybe you want to restore some rows using a point in time. Fortunately, your rows have the timestamps already marked!
You can use your SQL knowledge to make sufficient selects and restore the deleted rows.
Best Practices for Effective Soft Deletion
Indexing: Add an index to the deleted_at
column to maintain query performance.
Indexing: Add an index to the `deleted_at` column
Partial Indexes: Utilize partial indexes to exclude soft-deleted records from your indexes.
Partial Indexes: Exclude soft-deleted records from indexes
Periodic Hard Deletes: Implement a process to permanently delete old soft-deleted records to manage database size. For example, to delete rows that have been deleted years ago.
Schedule this function to run periodically (this would be done outside of PostgreSQL, e.g., using cron)
- Unique Constraints: Be mindful of unique constraints when using soft deletes, especially if you plan to reuse unique values.
Unique Constraints: Use partial unique indexes for active records
- API Design: Design your API to handle soft deletes transparently, maintaining a consistent experience for consumers of your data.
API Design: Create views for active records
Conclusion
Soft deletion is a powerful technique that can significantly enhance your data management strategy. It provides flexibility in data recovery, improves auditing capabilities, and can boost performance in large, interconnected datasets. While it does add some complexity to your database design and queries, the benefits often outweigh the costs for many applications.
Hope you learned something new today. Thanks, and please do share! ;)

