Interesting deletion idea

All of lore.kernel.org
 help / color / mirror / Atom feed

* Interesting deletion idea
@ 2004-10-08  5:55 John Richard Moser
  2004-10-08 22:14 ` Valdis.Kletnieks
  0 siblings, 1 reply; 6+ messages in thread
From: John Richard Moser @ 2004-10-08  5:55 UTC (permalink / raw)
  To: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm not subscribed, CC me replies please.

It would be interesting to add an erasure hook into ReiserFS or Reiser4,
and add erasure algorithms to the kernel to apply to memory mapped
lengths of disk.  For example:

{
~  void *p;
~  p = mmap_disk_segment(start, length);
~  kernel_erase(&alg, p, length);
~  munmap_segment(p);
}

Yeah, really cheap block of pseudocode >:P

Anyway, the idea is that to erase things off the disk (deleted files,
moved data, journal transactions, etc), the following logic could be
applied:

~ - Segment of disk becomes invalid
~ - mmap() segment of disk into memory
~ - pass segment and length to erasure function
~ - Unmap segment

It'd be fun to be able to mount -o remount,erase=gutman / and have the
gutman algorithm erase everything.  It may be interesting to get the
journal to work around parts of the journal being erased, and to do
other things in an attempt to allow heavy erasure algorithms (Gutman is
a 34 pass alg IIRC) to function without slowing operations down visibly.

The most important part of this would be to add hooks inside and outside
of ReiserFS.  The kernel should supply the erasure mechanisms so that
all filesystems can take advantage of them.  Because erasure is . . .
well, erasure. . . this would not be filesystem dependent.

The erasure should probably only apply to relavent parts of disk.  Inode
information, for example, would be pointless; journal transactions, file
data, and directory entries, on the other hand, are all possible
sensitive information; the filename may be sensitive data (directory entry).

The downside to such erasure is that it would place areas containing
meta-data at risk.  For example, erasing directory entries places the
directory at risk, as there may be junk left in that area if the system
goes down.

To avoid damage, transactions storing meta-data should erase the target
area just before being flushed.  This way, if the system goes down, it
will come back up and allow the transaction to erase the area and then
flush, creating no risk for an inconsistent state.  The flush MUST be
done when the system comes back up to give a 100% guarantee that the
data was properly destroyed.

Transactions are marked as "finished" when completed.  After marking
them as finished, they should be erased with the erasure hooks.  If
there is a failure at this point, then the journal replay will see a
"finished" transaction and repeat the erasure, again to ensure that the
data is actually destroyed.

Buffering multiple overwrites of the same area and applying them in a
sane and orderly manner may allow you to catch rapid, repeted overwrites
of disk areas and wait until several have gone by before actually
applying them.  This would allow you to avoid some of the overhead of
attempting to destroy overwritten data.

It should also be considered that when files are deleted, the file data
should be erased.  The transaction should note *all* disk areas
containing that file's data so that they can be appropriately erased.
The transaction should not be marked as finished until the system has
completed an erasure pass on the file data.  Multiple transactions may
be used to incrimentally destroy pieces of large files upon deletion.

That's probably not everything, but that's all I can think of.  All
design problems and architectural issues are beyond me.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZivGhDd4aOud5P8RAvvHAJ9iBGswntJPxnbop0iWzuePnm1nSgCffufU
e6YLURiMM2fOta2SGQmRCd4=
=jQuG
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Interesting deletion idea
  2004-10-08  5:55 Interesting deletion idea John Richard Moser
@ 2004-10-08 22:14 ` Valdis.Kletnieks
  2004-10-08 23:52   ` John Richard Moser
  0 siblings, 1 reply; 6+ messages in thread
From: Valdis.Kletnieks @ 2004-10-08 22:14 UTC (permalink / raw)
  To: John Richard Moser; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 3260 bytes --]

On Fri, 08 Oct 2004 01:55:19 EDT, John Richard Moser said:

> It'd be fun to be able to mount -o remount,erase=gutman / and have the
> gutman algorithm erase everything.  It may be interesting to get the
> journal to work around parts of the journal being erased, and to do
> other things in an attempt to allow heavy erasure algorithms (Gutman is
> a 34 pass alg IIRC) to function without slowing operations down visibly.

Anybody seriously proposing Gutman's 35 passes needs to be taken out back
and shot - or at least worked over with a rubber hose.  The *only* reason
that there's 35 passes is so that at least 3 or 4 passes will tickle a
corner case of some on-media encoding scheme (for instance, if you don't
have any MFM drives left, you can toss like half the entries).

Current thinking from the spooks who should know:

Canadian RCMP TSSIT OPS-II says: "Must first be checked for correct functioning
and then have all storage areas overwritten once with the binary digit ONE,
once with the binary digit ZERO and once with a single numeric, alphabetic or
special character, " (http://jya.com/rcmp2.htm)

American DoD 5220-22.M says: Overwriting all addressable locations with a
character, its complement, then a random character and verify.

DOD 5220-22.M applies to civilian contractors, and is approved for material
rated up to SECRET.  TOP SECRET or higher still calls for physical destruction
of media or mass degaussing.

In other words, our spooks think that if 3 passes isn't enough, you need to
totally destroy it.

(Two notes - (1) that read-back verify *is* required to make sure you did it
right, and (2) neither one worries about the information leakage from bad blocks
that have been remapped by the drive)

> The erasure should probably only apply to relavent parts of disk.  Inode
> information, for example, would be pointless; journal transactions, file
> data, and directory entries, on the other hand, are all possible
> sensitive information; the filename may be sensitive data (directory entry).

Careful analysis of the inodes themselves has a *lot* more information leakage
than you might expect - if the filesystem uses *ANY* sort of predictable order
for inode allocation, you can look at the free inodes and trace back what
order they were freed in (very easy if the filesystem has a free inode list,
a bit more of a challenge if it allocates on the fly like reiser3).  Once
you know that, you know what uid/gid the file belonged to, its size, and
the ctime/mtime/atime.

That's a *LOT* of info that can be used to reconstruct what was going on.

> Buffering multiple overwrites of the same area and applying them in a
> sane and orderly manner may allow you to catch rapid, repeted overwrites
> of disk areas and wait until several have gone by before actually
> applying them.  This would allow you to avoid some of the overhead of
> attempting to destroy overwritten data.

Actually, that's the *last* think you want to do - you really need to send
3 overwrites down the pipe to the disk *and make sure you have a write barrier
between them*.  The *last* think you want is to send 3 writes to the disk,
and have the disk's write cache bugger^Wbuffer "optimize" it so only the
last written block actually goes to disk....

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Interesting deletion idea
  2004-10-08 22:14 ` Valdis.Kletnieks
@ 2004-10-08 23:52   ` John Richard Moser
  2004-10-09  0:22     ` Valdis.Kletnieks
  0 siblings, 1 reply; 6+ messages in thread
From: John Richard Moser @ 2004-10-08 23:52 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Valdis.Kletnieks@vt.edu wrote:
| On Fri, 08 Oct 2004 01:55:19 EDT, John Richard Moser said:
|

[...]

Eh.  The gutman algorithm is kind of a toy, but *shrug*

I thought the DOD algorithm was 7 pass?

|
|>Buffering multiple overwrites of the same area and applying them in a
|>sane and orderly manner may allow you to catch rapid, repeted overwrites
|>of disk areas and wait until several have gone by before actually
|>applying them.  This would allow you to avoid some of the overhead of
|>attempting to destroy overwritten data.
|
|
| Actually, that's the *last* think you want to do - you really need to send
| 3 overwrites down the pipe to the disk *and make sure you have a write
barrier
| between them*.  The *last* think you want is to send 3 writes to the disk,
| and have the disk's write cache bugger^Wbuffer "optimize" it so only the
| last written block actually goes to disk....

no, no, I mean like this:

a = open("/some/file.txt");
seek(a, 0, 0);
fputc(a,'N');
seek(a, 0, 0);
fputc(a, 'D');
seek(a,0,0);
fputc(a, 'X');
....

If some program overwrites a part of a file a bunch of times, you don't
want:

seek(a, 0, 0);
overwrite_40_times();
fputc(a,'N');
seek(a, 0, 0);
overwrite_40_times();
fputc(a, 'D');
seek(a,0,0);
overwrite_40_times();
fputc(a, 'X');
......

but you will probably want

seek(a, 0, 0);
fputc(a,'N');
seek(a, 0, 0);
fputc(a, 'D');
seek(a,0,0);
overwrite_40_times();
fputc(a, 'X');
......
overwrite_40_times();
fputc(a, 'P');
fclose(a);

If this is going on rapidly, there's no point in trying to completely
destroy the disk for *every* logical operation; but buffering the
operations and then only doing the most recent one, and destroying the
area before that one exactly, would be OK.  The idea is that rapid
overwrites from userspace get collapsed into a single overwrite; and
then the kernel overwrites a bunch of times before flushing that data to
disk to securely erase it.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZygthDd4aOud5P8RAlUCAJ4uj2PX0skLF73334oBsUUAP9rhbgCdFVDS
ptcwcABzE5+cs75HrJYpy9M=
=xhA7
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Interesting deletion idea
  2004-10-08 23:52   ` John Richard Moser
@ 2004-10-09  0:22     ` Valdis.Kletnieks
  2004-10-09  0:34       ` John Richard Moser
  2004-10-09  6:43       ` Emil Larsson
  0 siblings, 2 replies; 6+ messages in thread
From: Valdis.Kletnieks @ 2004-10-09  0:22 UTC (permalink / raw)
  To: John Richard Moser; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 3271 bytes --]

On Fri, 08 Oct 2004 19:52:14 EDT, John Richard Moser said:

> I thought the DOD algorithm was 7 pass?

Citation please?  If you have a better reference than DOD 5220-22.M,
feel free to share it.

> If this is going on rapidly, there's no point in trying to completely
> destroy the disk for *every* logical operation; but buffering the
> operations and then only doing the most recent one, and destroying the
> area before that one exactly, would be OK.  The idea is that rapid
> overwrites from userspace get collapsed into a single overwrite; and
> then the kernel overwrites a bunch of times before flushing that data to
> disk to securely erase it.

The point is that you have no really good way to know beforehand that
the flurry of writes is over, and it's time to collapse the writes into
a single write.  

To demonstrate using your example:

a = open("/some/file.txt");
seek(a, 0, 0);
fputc(a,'N');
seek(a, 0, 0);
fputc(a, 'D');
seek(a,0,0);
fputc(a, 'X');

At what point do you do the overwrite?  You place it just before the
fputc 'X' - but you can't really delay to that rather than at the
'N' or 'D' unless you *know* that the 'X' one will happen 'Soon Enough'.
There's also the point that fputc() is stdio and buffered by default,
unless you've called fflush() or setlinebuf() or similar.  Even if you
look at the read()/write() syscall level, the Linux kernel will almost
certainly automatically do most of the needed collapsing in the buffer
cache code (look at fs/buffer.c for the gory details) - in fact, most
of the time, you need to use fsync() or similar to *force* the data to
actually get to the disk (often, the data doesn't go out until long after
the process has actually exited - and then there's the different way
that the different I/O elevators schedule things, just to add another
layer of unpredictability into things).  The end result is that it's
a lot harder than it looks to get this right...

In addition, doing the overwrite at *THAT* point is *the wrong point* - as
you're about to overwrite the block at least once *anyhow*.  You *really* need to
be doing erasing in the handling for the unlink() and (f)truncate() syscalls,
because *that* is the point you're freeing the disk blocks - and the point of
erasing is to prohibit scavenging of old data off the disk.  This has the added
benefit of being something you *can* do basically at the filesystem's leisure,
subject to a requirement that you return blocks to the free list fast enough
to prevent disk space exhaustion (which is trickier than it looks - under heavy
file create/write/read/unlink loads, you need to be doing it as fast as possible
at exactly the time you have the least idle bandwidth - at worst case, a 3-pass
erase of all blocks will limit you to 25% of the effective write bandwidth in a
steady-state high-load situation).

Also, you *really* need to be *very* careful regarding write barriers and the
like - look at the linux-kernel archives for the last few months where a *long*
series of threads about the problems on IDE.

Basically, if the drive has a write cache on it, you have to either disable
it or jump through some *real* hoops in order to get strictly correct write
barrier semantics (and on some drives, the situation is totally impossible).

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Interesting deletion idea
  2004-10-09  0:22     ` Valdis.Kletnieks
@ 2004-10-09  0:34       ` John Richard Moser
  2004-10-09  6:43       ` Emil Larsson
  1 sibling, 0 replies; 6+ messages in thread
From: John Richard Moser @ 2004-10-09  0:34 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Valdis.Kletnieks@vt.edu wrote:
| On Fri, 08 Oct 2004 19:52:14 EDT, John Richard Moser said:
|
[...]

blahblahblahblahblah.

Lots of interesting points, but i'm really not interested enough to
comment on them TBH.

Why don't you guys figure out if you want to have a go at it, and if so
go ahead with whatever.  Arguing with me is useless, as I'm most likely
not going to hand you any awesome design insight or any code.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZzIihDd4aOud5P8RAq0ZAJ4wBCvnHRlmYPiMhVFfiUWxbRwsWwCgkVY0
/ReYTFolS4UO4olCYNhcH4k=
=7q7X
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Interesting deletion idea
  2004-10-09  0:22     ` Valdis.Kletnieks
  2004-10-09  0:34       ` John Richard Moser
@ 2004-10-09  6:43       ` Emil Larsson
  1 sibling, 0 replies; 6+ messages in thread
From: Emil Larsson @ 2004-10-09  6:43 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: John Richard Moser, reiserfs-list

Valdis.Kletnieks@vt.edu wrote:

>On Fri, 08 Oct 2004 19:52:14 EDT, John Richard Moser said:
>
>  
>
>>I thought the DOD algorithm was 7 pass?
>>    
>>
>
>Citation please?  If you have a better reference than DOD 5220-22.M,
>feel free to share it.
>
>  
>
To the best of my knowledge, "DOD 7-pass" or similar expressions refer 
to the sequential use of the "e", "c" and "e" overwrite methods as 
described in DoD 5220.22-M / NISPOM 8-306. There is no basis for this in 
any official regulations or guidelines that I have been made privy to - 
it's just typical "more-must-be-better" thinking.

>>If this is going on rapidly, there's no point in trying to completely
>>destroy the disk for *every* logical operation; but buffering the
>>operations and then only doing the most recent one, and destroying the
>>area before that one exactly, would be OK.  The idea is that rapid
>>overwrites from userspace get collapsed into a single overwrite; and
>>then the kernel overwrites a bunch of times before flushing that data to
>>disk to securely erase it.
>>    
>>
>
>The point is that you have no really good way to know beforehand that
>the flurry of writes is over, and it's time to collapse the writes into
>a single write.  
>
>To demonstrate using your example:
>
>a = open("/some/file.txt");
>seek(a, 0, 0);
>fputc(a,'N');
>seek(a, 0, 0);
>fputc(a, 'D');
>seek(a,0,0);
>fputc(a, 'X');
>
>At what point do you do the overwrite?  You place it just before the
>fputc 'X' - but you can't really delay to that rather than at the
>'N' or 'D' unless you *know* that the 'X' one will happen 'Soon Enough'.
>There's also the point that fputc() is stdio and buffered by default,
>unless you've called fflush() or setlinebuf() or similar.  Even if you
>look at the read()/write() syscall level, the Linux kernel will almost
>certainly automatically do most of the needed collapsing in the buffer
>cache code (look at fs/buffer.c for the gory details) - in fact, most
>of the time, you need to use fsync() or similar to *force* the data to
>actually get to the disk (often, the data doesn't go out until long after
>the process has actually exited - and then there's the different way
>that the different I/O elevators schedule things, just to add another
>layer of unpredictability into things).  The end result is that it's
>a lot harder than it looks to get this right...
>
>In addition, doing the overwrite at *THAT* point is *the wrong point* - as
>you're about to overwrite the block at least once *anyhow*.  You *really* need to
>be doing erasing in the handling for the unlink() and (f)truncate() syscalls,
>because *that* is the point you're freeing the disk blocks - and the point of
>erasing is to prohibit scavenging of old data off the disk.  This has the added
>benefit of being something you *can* do basically at the filesystem's leisure,
>subject to a requirement that you return blocks to the free list fast enough
>to prevent disk space exhaustion (which is trickier than it looks - under heavy
>file create/write/read/unlink loads, you need to be doing it as fast as possible
>at exactly the time you have the least idle bandwidth - at worst case, a 3-pass
>erase of all blocks will limit you to 25% of the effective write bandwidth in a
>steady-state high-load situation).
>
>Also, you *really* need to be *very* careful regarding write barriers and the
>like - look at the linux-kernel archives for the last few months where a *long*
>series of threads about the problems on IDE.
>
>Basically, if the drive has a write cache on it, you have to either disable
>it or jump through some *real* hoops in order to get strictly correct write
>barrier semantics (and on some drives, the situation is totally impossible).
>
>
>  
>
/Emil


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-10-09  6:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-08  5:55 Interesting deletion idea John Richard Moser
2004-10-08 22:14 ` Valdis.Kletnieks
2004-10-08 23:52   ` John Richard Moser
2004-10-09  0:22     ` Valdis.Kletnieks
2004-10-09  0:34       ` John Richard Moser
2004-10-09  6:43       ` Emil Larsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.