When you move a file to the trash and empty it, that file is not actually deleted. It is just marked as free space by the operating system, but can still be recovered through digital forensics techniques. There are ways to actually delete (shred) files, but they differ based on which type of storage device you are using.
In the past I needed to securely wipe an hard drive, however at the time I had found a lot of confusing information on how to do it properly. This post aims at providing a clear overview of the issue, as well as some practical advice on how to proceed. Additionally, I also tried to provide a list of vetted references to better understand the topic.
In this blog post I will consider only magnetic Hard Drives (HDDs) [1]. In a future blog post I will also analyze Solid State Drives (SSDs)[2].
How to shred?
Securely wiping hard disks is quite straightforward, regardless of whether you want to shred single files or wipe the entire disk. You just need to overwrite the target file(s) with random data, so that the original content is not retrievable anymore. This can be done by tools like shred [3] or BleachBit [4] (which can also shred free space).
Moreover BleachBit presets allow you to remove metadata such as browsing history, recent opened files, etc. This EFF guide [5] is a useful reference on how to use BleachBit.
It is important to note that full disk shredding is usually the best choice for very sensitive data. This is because even if a single file is perfectly shredded, that is not a 100% guarantee that the content of that file is lost forever. The same data could have been saved by other programmes as a temporary copy (e.g. text editors) or by a Journaling file system [15]. Copies can be easily found if they are saved in plain text (e.g. through grepping for part of the content), but what if they are saved encoded or scrambled?
The Great Wiping Controversy
A topic that generated lengthy debates over time has been the number of overwrites needed to make the content of a file completely irrecoverable. In 1996 Peter Gutmman published "Secure Deletion of Data from Magnetic and Solid-State Memory" [6], where he claims that if a file is overwritten once, the old content might still be recovered using techniques such as Magnetic Force Microscopy (MFM). It is important to note that this paper is now out of date, and its content is not valid anymore for modern day hardware.
Nonetheless, I already find it weird that a deleted file can be recovered, now even an overwritten file can be recovered? How is that even possible?!
The answer lays in the physical imperfections of (1996) hard drives. Citing from the paper itself [6]:
Faced with techniques such as MFM, truly deleting data from magnetic media is very difficult. The problem lies in the fact that when data is written to the medium, the write head sets the polarity of most, but not all, of the magnetic domains. This is partially due to the inability of the writing device to write in exactly the same location each time, and partially due to the variations in media sensitivity and field strength over time and among devices.
In conventional terms, when a one is written to disk the media records a one, and when a zero is written the media records a zero. However the actual effect is closer to obtaining a 0.95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one. Normal disk circuitry is set up so that both these values are read as ones, but using specialised circuitry it is possible to work out what previous "layers" contained.
The author then continues describing how these imperfections in the hard drives functioning can be exploited to read 1 or 2 past layers of data using a high-quality oscilloscope. With the use of MFM one can go even further, and recover even older layers (indeed one of the "standard" uses of MFM is evaluating the effectiveness of disk drive servo-positioning mechanisms).
The solution proposed by Gutmman consists of multiple (up to 35) overwrites with random data and other specific patterns. This technique was meant to be a "universal" approach, which ensured guarantees of secure deletion on all HDDs existing at the time, regardless of the encoding used.
Gutmman claims were then contested by the 2008 paper "Overwriting Hard Drive Data: The Great Wiping Controversy" [7], which claimed that everything in [6] is (maybe) valid only for very out of date tech. The article claims that a single random pass should be fine with today's HDDs, as they have higher density and use different encoding algorithms (PRML/EPRML).
Indeed the following image shows the probability of recovering different information payloads from a hard drive which has been overwritten with a single pass of random data:
Gutmman replied to these claims by adding this note to his paper, which in my opinion greatly concludes the debate and sheds clarity on the remaining doubts:
Here are a couple of additional resources that I found useful to better understand the topic:In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don't understand that statement, re-read the paper). If you're using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, "A good scrubbing with random data will do about as well as can be expected". This was true in 1996, and is still true now.
Looking at this from the other point of view, with the ever-increasing data density on disk platters and a corresponding reduction in feature size and use of exotic techniques to record data on the medium, it's unlikely that anything can be recovered from any recent drive except perhaps a single level via basic error-cancelling techniques. In particular the drives in use at the time that this paper was originally written are long since extinct, so the methods that applied specifically to the older, lower-density technology don't apply any more. Conversely, with modern high-density drives, even if you've got 10KB of sensitive data on a drive and can't erase it with 100% certainty, the chances of an adversary being able to find the erased traces of that 10KB in 200GB of other erased traces are close to zero.
Another point that a number of readers seem to have missed is that this paper doesn't present a data-recovery solution but a data-deletion solution. In other words it points out in its problem statement that there is a potential risk, and then the body of the paper explores the means of mitigating that risk.
NOTE: Even if perfectly shredding the entire disk, fragments of your files might still be stored in unreachable parts of the HD, such as damaged sectors. They are not anymore readable/writable from the OS, however the bits might still be there, and might be accessible using specialized equipment.
Additionally, hard drives contain hidden/private areas [10] that are used by the manufacturer for multiple purposes (drive management, diagnostics, etc.). Theoretically data can be saved here, however I did not find any evidence of this happening if not explicitly attempted by the user (e.g. to hide files). Therefore I think it is safe to assume that those areas do not contain any user data (if I am mistaken about this, please contact me. I would love to know more.).
Enough theory, let's get practical
We went over a bit of theory about how data gets stored on hard drives and how it can be safely deleted. Now let's suppose that you have an hard drive that you want to shred. Practically, what is the best way to go about it?
Well, it depends. Let's go through a couple of scenarios.
Scenario 1: you are a spy in enemy territory
And you are in possess of top-secret data that once read needs to be destroyed. Your data getting in the wrong hands could get you killed. Or worse.
Suggested procedure:
- Overwrite the hard drive 76 time with random data.
- Degauss it, which means screwing up the magnetic field of the hard drive by using a powerful magnet. This does not work on SSDs, since they do not rely on magnetism to store data.
- Shoot it with your silenced gun and smash it with a sledgehammer.
- Shred it to pieces using an industrial hard drive shredder, then burn the pieces using an incinerator.
- Spread the ashes over 5 different rural locations, far away from cities, possibly in 2 different continents.
- Ask a couple of questions to yourself, such as: why are you even reading this blog?
Alternatively you can follow these (serious) suggestions [11] on how to physically destroy an hard drive at home. Please do it in an environmentally safe way.
Scenario 2: you are a normal guy
Who is selling an hard drive and does not want the next owner to be able to retrieve any of your data.
Just do a full disk overwrite with random data, it will be fine. You can follow the instructions from this Arch Linux Wiki page [14] (which is also a great source on information on the subject).
A very useful preventive measure is to use Full-Disk Encryption [12]. This makes it impossible to access your data without having the decryption key. Indeed a technique used to shred data on the cloud is the so-called crypto-shredding [13], where data is encrypted and then only the decryption key is overwritten.
References
- https://en.wikipedia.org/wiki/Hard_disk_drive
- https://en.wikipedia.org/wiki/Solid-state_drive
- https://linux.die.net/man/1/shred
- https://www.bleachbit.org/
- https://ssd.eff.org/module/how-delete-your-data-securely-linux
- https://www.cs.auckland.ac.nz/%7Epgut001/pubs/secure_del.html
- https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=0ceeb87cb3b7df0f3659f7692c43954784d730fb
- https://security.stackexchange.com/questions/26132/is-data-remanence-a-myth
- https://security.stackexchange.com/questions/10464/why-is-writing-zeros-or-random-data-over-a-hard-drive-multiple-times-better-th
- https://security.stackexchange.com/questions/11313/how-do-you-destroy-an-old-hard-drive
- https://ubuntu.com/core/docs/uc20/full-disk-encryption
- https://en.wikipedia.org/wiki/Crypto-shredding
- https://wiki.archlinux.org/title/Securely_wipe_disk
- https://en.wikipedia.org/wiki/Journaling_file_system