At Pascal Metrics, my team collects large sets of data including ePHI. That subjects us to HIPAA and federal law dictates how we handle said data. Before I can even consider moving a database export out of our hardened production environment, I need to cleanse the export of any personally identifying information - including but not limited to any ePHI.
There are number of ways to achieve that aim, but once our database crossed the threshold into "large" territory - cleansing the data became a real chore. I decided to roll my own mysqldump binary and add a parameter called "nullify-field" which is based upon the "ignore-table" parameter from the official release.
I've posted my modified source to my GitHub account for any who may be interested.
You'll have to compile your own copy - but that's relatively easy:
- Grab the latest mysql 5.0.X source
- Copy my 2 modified source files over the ones that come with your source tarball
- Run "./configure --without-server" to build just the mysql clients
- Run "make" to generate your custom mysqldump under ./client/
Example Usage:
./client/mysqldump -u backupuser -p --nullify-field=my_db.my_sensitive_table.name --nullify-field=my_db.my_sensitive_table.email my_db
I love open source software.