* Plugin for corruption resistance?
@ 2005-02-11 18:58 Gregory Maxwell
2005-02-11 20:39 ` Jake Maciejewski
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Gregory Maxwell @ 2005-02-11 18:58 UTC (permalink / raw)
To: reiserfs-list
Anyone ever given a though to adding support to reiserfs to store a
cryptographic checksum along with a file?
The idea is that files get a hidden attribute that contains their SHA1 hash.
If the file is modified, the hash is marked as 'unclean'. A trusted
cleaner comes by eventually and hashes the file, OR the file is hashed
right away if someone tried to read the attribute while the file is
unclean.
Fsck could be optionally told to go check the hash on every file.
Files could also be tested via a background process that randomly
tests some files every night.
Why would this be useful?
1. Lots of applications today (such a P2P sharing systems) need the
hashes of files.. it's inefficient to keep recomputing them. The file
system always knows when a file changes, so it can be setup to always
return the correct hash.
2. Random disk corruption can go undetected (even if the drives ECC is
sufficient to prevent corruption there could be memory, bus, or kernel
issues the corrupt data, a hash will help it be detected).
3. Although there are encrypted block devices available in Linux, none
of them can provide authentication.. So it's possible for an attacker
(with access to your disk) to replace hunks of files with random (and
potentially chosen depending on the chaining mode) crud without
detection.
4. It could greatly speed up casual verification of files for changes
(if you don't trust the kernel to report the true hash, then you
couldn't trust it to return the real file to some userspace file
verifier either).... it could also be used to help locate duplicates
in a very efficient manner..
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Plugin for corruption resistance? 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell @ 2005-02-11 20:39 ` Jake Maciejewski 2005-02-11 20:53 ` Tom Vier ` (3 subsequent siblings) 4 siblings, 0 replies; 16+ messages in thread From: Jake Maciejewski @ 2005-02-11 20:39 UTC (permalink / raw) To: reiserfs-list I think this is a great idea. Solaris ZFS is supposed to have a similar feature, but reiser4 metas would allow application-level access. The purpose of checksumming in ZFS is more like Gregory's 2nd point, except Solaris takes it one step further. If you have RAID, ZFS will fix the corruption automatically. Even if we didn't have automatic correction, which would probably impossible without an integrated volume manager like ZFS, it would still be nice to know if your hard drive is flipping bits behind your back. On Fri, 2005-02-11 at 13:58 -0500, Gregory Maxwell wrote: > Anyone ever given a though to adding support to reiserfs to store a > cryptographic checksum along with a file? > > > The idea is that files get a hidden attribute that contains their SHA1 hash. > If the file is modified, the hash is marked as 'unclean'. A trusted > cleaner comes by eventually and hashes the file, OR the file is hashed > right away if someone tried to read the attribute while the file is > unclean. > > Fsck could be optionally told to go check the hash on every file. > Files could also be tested via a background process that randomly > tests some files every night. > > Why would this be useful? > > 1. Lots of applications today (such a P2P sharing systems) need the > hashes of files.. it's inefficient to keep recomputing them. The file > system always knows when a file changes, so it can be setup to always > return the correct hash. > > 2. Random disk corruption can go undetected (even if the drives ECC is > sufficient to prevent corruption there could be memory, bus, or kernel > issues the corrupt data, a hash will help it be detected). > > 3. Although there are encrypted block devices available in Linux, none > of them can provide authentication.. So it's possible for an attacker > (with access to your disk) to replace hunks of files with random (and > potentially chosen depending on the chaining mode) crud without > detection. > > 4. It could greatly speed up casual verification of files for changes > (if you don't trust the kernel to report the true hash, then you > couldn't trust it to return the real file to some userspace file > verifier either).... it could also be used to help locate duplicates > in a very efficient manner.. -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell 2005-02-11 20:39 ` Jake Maciejewski @ 2005-02-11 20:53 ` Tom Vier 2005-02-12 5:19 ` David Masover 2005-02-13 3:48 ` Esben Stien ` (2 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Tom Vier @ 2005-02-11 20:53 UTC (permalink / raw) To: Gregory Maxwell; +Cc: reiserfs-list On Fri, Feb 11, 2005 at 01:58:59PM -0500, Gregory Maxwell wrote: > 1. Lots of applications today (such a P2P sharing systems) need the > hashes of files.. it's inefficient to keep recomputing them. The file > system always knows when a file changes, so it can be setup to always > return the correct hash. That should be done in userland, imho. Especially since different apps use lots of different hashes. I was thinking about this kind of stuff (ECC plugin for r4) not too long ago. Hashing the whole file is too slow; if you update a single block, the whole file has to be read in to recalculate. Adding, say, one sector of crc for each block would be a lot more feasible. I think the best way to do this though, would be to write a virtual blk driver that works like loop back (ie, uses a backing file/dev), and shortens the overall size by one sector * number of blocks. Actually, you could probably copy the raid5 md code and rewrite it to only use one device. I'd try that first. -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-11 20:53 ` Tom Vier @ 2005-02-12 5:19 ` David Masover 0 siblings, 0 replies; 16+ messages in thread From: David Masover @ 2005-02-12 5:19 UTC (permalink / raw) To: Tom Vier; +Cc: Gregory Maxwell, reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 First, let me second the original idea. So long as the hash isn't updated until that attribute is read, it should be fine. Tom Vier wrote: | On Fri, Feb 11, 2005 at 01:58:59PM -0500, Gregory Maxwell wrote: | |>1. Lots of applications today (such a P2P sharing systems) need the |>hashes of files.. it's inefficient to keep recomputing them. The file |>system always knows when a file changes, so it can be setup to always |>return the correct hash. | | | That should be done in userland, imho. Especially since different apps use | lots of different hashes. For no particularly good reason, imho. md4/md5 are reasonably fast, sha1 is reasonably secure. What else do you need? | I was thinking about this kind of stuff (ECC plugin for r4) not too long | ago. Hashing the whole file is too slow; if you update a single block, the | whole file has to be read in to recalculate. Adding, say, one sector of crc | for each block would be a lot more feasible. If it was actually to work like ECC. But this doesn't sound like it would be checked at every read, but rather when fsck is run, or when some app needs it. If it is supposed to be checked at every read, your below suggestion is better. | I think the best way to do this though, would be to write a virtual blk | driver that works like loop back (ie, uses a backing file/dev), and shortens | the overall size by one sector * number of blocks. Actually, you could | probably copy the raid5 md code and rewrite it to only use one device. I'd | try that first. Been done. Think it's part of dev-mapper or evms or some disk magic like that. But it's also useful for individual files, for application reasons. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQg2RxXgHNmZLgCUhAQKxyhAAkun806f/kI3RmnKX8gV0KZ+ubdMoyNWK vUF4Ln79jMLAxxe2fxrkBZux7qQhNUpaO69+jIAfYFqPGj/L1RS03lAqhz7bZCDp 2GOiBdoOhB7fBJuv1XKbHBrDJdROE8QTJWLuFMyAvxUC7u+uZZ2yU8EVHlKWTLoH fA40Vr5t7p77ll/zALG1qpEd9GhSDXAbQ0cbqMvy9cYzo+Wreo9xifH4bT9u8SGk NgqGTf3iMKhetfFWqxmgg9F34SMVF9IuyRud2mHvqY7NQW1B3k7MFjOax7fgFTRF xxwUzt2lE77tmEUM87r16sCkK+YSJTNNaTancV4yYhzQ+Oz43NwkUW0nwy0jOOVz C3sydKjsYoOMiBAVind+arILSrmLwMXpgZ7/6/NV5A7XiUZWy2TeZGYLXjEZbNOV V5Tg1KsMnJxPS2n/y7FG9HQXx/iFapWG8RWkz3O9Pzg/Zywsi4LbcgI+72iHImLj 5+b5YXxQsv9F415pXEaCSNGmMg7FZ/wURXPXwEruPJrs1aJ1SipoZzUCUXN9OpvJ efW+IQmbx1tUhQvBfiYmeGj/vscPfkXbwnXlwZOpU7tkkVw8F+t/OJT4jL6Z5wOj el8FDYz3swRR1W+nUTJK+NOBkkR3RPjtdOqUahXPF7jqK3Wc1EsZ4MGhlBaLwp+l UqUzpsVgmdc= =jkeg -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell 2005-02-11 20:39 ` Jake Maciejewski 2005-02-11 20:53 ` Tom Vier @ 2005-02-13 3:48 ` Esben Stien 2005-02-14 2:01 ` Reiser 4 Apple Michael James 2005-02-14 17:45 ` Plugin for corruption resistance? Hans Reiser 4 siblings, 0 replies; 16+ messages in thread From: Esben Stien @ 2005-02-13 3:48 UTC (permalink / raw) To: reiserfs-list Gregory Maxwell <gmaxwell@gmail.com> writes: > Anyone ever given a though to adding support to reiserfs to store a > cryptographic checksum along with a file? Yes, I thought about putting it there as an extended attribute -- Esben Stien is b0ef@esben-stien.name http://www.esben-stien.name irc://irc.esben-stien.name/%23contact [sip|iax]:b0ef@esben-stien.name ^ permalink raw reply [flat|nested] 16+ messages in thread
* Reiser 4 Apple 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell ` (2 preceding siblings ...) 2005-02-13 3:48 ` Esben Stien @ 2005-02-14 2:01 ` Michael James 2005-02-14 18:49 ` Hans Reiser 2005-02-14 17:45 ` Plugin for corruption resistance? Hans Reiser 4 siblings, 1 reply; 16+ messages in thread From: Michael James @ 2005-02-14 2:01 UTC (permalink / raw) To: reiserfs-list Dear Hans, Have you ever thought of porting reiser4 to BSD? Apple have: Bags of money A current filesystem that totally sucks An OS that cries out for plugins to satisfy its quirks Could be a good marriage, michaelj -- Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Reiser 4 Apple 2005-02-14 2:01 ` Reiser 4 Apple Michael James @ 2005-02-14 18:49 ` Hans Reiser 0 siblings, 0 replies; 16+ messages in thread From: Hans Reiser @ 2005-02-14 18:49 UTC (permalink / raw) To: Michael James; +Cc: reiserfs-list Michael James wrote: >Dear Hans, > >Have you ever thought of porting reiser4 to BSD? > >Apple have: > Bags of money > A current filesystem that totally sucks > An OS that cries out for plugins to satisfy its quirks > >Could be a good marriage, >michaelj > > > Please convince them to pay for it and I will.... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell ` (3 preceding siblings ...) 2005-02-14 2:01 ` Reiser 4 Apple Michael James @ 2005-02-14 17:45 ` Hans Reiser 2005-02-15 20:42 ` Adam 4 siblings, 1 reply; 16+ messages in thread From: Hans Reiser @ 2005-02-14 17:45 UTC (permalink / raw) To: Gregory Maxwell; +Cc: reiserfs-list Its on the legitimate wish list, if someone wants to code it, let me know. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-14 17:45 ` Plugin for corruption resistance? Hans Reiser @ 2005-02-15 20:42 ` Adam 2005-02-17 4:10 ` David Masover 0 siblings, 1 reply; 16+ messages in thread From: Adam @ 2005-02-15 20:42 UTC (permalink / raw) To: reiserfs-list Hans Reiser <reiser <at> namesys.com> writes: > > Its on the legitimate wish list, if someone wants to code it, let me know. > > Hans, does this mean that you think that this type of functionality should be implemented as a Reiser4 plugin and therefore in kernelspace? Why wouldn't this be better implemented in userspace via a daemon that is notified of file modification via dnotify/inotify? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-15 20:42 ` Adam @ 2005-02-17 4:10 ` David Masover 2005-02-17 10:53 ` Christian Iversen 0 siblings, 1 reply; 16+ messages in thread From: David Masover @ 2005-02-17 4:10 UTC (permalink / raw) To: Adam; +Cc: reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Adam wrote: | Hans Reiser <reiser <at> namesys.com> writes: | | |>Its on the legitimate wish list, if someone wants to code it, let me know. |> |> | | | Hans, does this mean that you think that this type of functionality should be | implemented as a Reiser4 plugin and therefore in kernelspace? Why wouldn't this | be better implemented in userspace via a daemon that is notified of file | modification via dnotify/inotify? Because dnotify/inotify don't scale. I don't think they lock on event, either, whereas a plugin could guarentee that the hash was up-to-date (no race conditions). We've been over this before. There's a reason reiser4 and its plugins are in kernel space, and not in something like Fuse. =============================== Disclaimer: I don't work here. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQhQZKHgHNmZLgCUhAQIIDA/+JtV8NjA9GJwbdr7m7yiSunjnEaecP3AP nK4qfOwMgwZAMyoeeWfq8b66I0AZwSY9u/pAAGoZcqsdxPzaA/1BNoxIVckT1adJ uszfFiVNo9NHJNdZdn28C7IQbdM8utYLoQ8QiJr4mjmfPsQevxmpqwNqLcuShHwb +sy0Ckdkq1IjQzntZC60ZIo3J/g5agn1KuRJ1u7mhrG+jA8kEpeTka3j1I8fDbUK 2ODnJE3nV2QIJ32U271OGPBwgC5Zvca0cui4WsYsad0aD3/8KPZibp9rA/RZc8Ud xD2XtILL8V4skr7Q0G81UzHoj3ISFj9HQgiwaQt7YPie8YeC68AIwOk8ISWSlTIQ ifyY/1d/JLTpD2qPxemh6yc6Dje71apYeic0YKoOBfd2Ck1LBgwJWwaBPoYQUsYN f+f41iuaYJRnXYqfI7A5sniXt8pwI/2RQQc31pGyMA6UZXVIgfJnzDZ+uZphGpFf kiJyRGS7RhgX0DNFVJ9V3jlYHoqIAe4zsDjwdTk3zLO4dnDFVX0M8LeiuXd8bcnM UA6cODryfvR3ZA3t4GKTm8ir7RYr5mQIhSwN6s0KTJjVmRMBOPWAnCsVgv9t7rjF cVF1fS9V1VUZFEoatlI9W1Ju1qautIe390z4lPCiBqF0SUaFfkFUa+QPir2TYbTD 9usfiEWzPpM= =6prc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-17 4:10 ` David Masover @ 2005-02-17 10:53 ` Christian Iversen 2005-02-18 3:43 ` David Masover 0 siblings, 1 reply; 16+ messages in thread From: Christian Iversen @ 2005-02-17 10:53 UTC (permalink / raw) To: reiserfs-list On Thursday 17 February 2005 05:10, David Masover wrote: > Adam wrote: > | Hans Reiser <reiser <at> namesys.com> writes: > |>Its on the legitimate wish list, if someone wants to code it, let me > |> know. > | > | Hans, does this mean that you think that this type of functionality > > should be > > | implemented as a Reiser4 plugin and therefore in kernelspace? Why > > wouldn't this > > | be better implemented in userspace via a daemon that is notified of file > | modification via dnotify/inotify? > > Because dnotify/inotify don't scale. I don't think they lock on event, > either, whereas a plugin could guarentee that the hash was up-to-date > (no race conditions). > > We've been over this before. There's a reason reiser4 and its plugins > are in kernel space, and not in something like Fuse. And, surely, updating a hash value when 1 byte changes in a gigabyte file, would be much faster from kernel space where you can actually see the changeset? -- Regards, Christian Iversen ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-17 10:53 ` Christian Iversen @ 2005-02-18 3:43 ` David Masover 2005-02-18 4:28 ` Valdis.Kletnieks 0 siblings, 1 reply; 16+ messages in thread From: David Masover @ 2005-02-18 3:43 UTC (permalink / raw) To: Christian Iversen; +Cc: reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christian Iversen wrote: | On Thursday 17 February 2005 05:10, David Masover wrote: | |>Adam wrote: |>| Hans Reiser <reiser <at> namesys.com> writes: |>|>Its on the legitimate wish list, if someone wants to code it, let me |>|> know. |>| |>| Hans, does this mean that you think that this type of functionality |> |>should be |> |>| implemented as a Reiser4 plugin and therefore in kernelspace? Why |> |>wouldn't this |> |>| be better implemented in userspace via a daemon that is notified of file |>| modification via dnotify/inotify? |> |>Because dnotify/inotify don't scale. I don't think they lock on event, |>either, whereas a plugin could guarentee that the hash was up-to-date |>(no race conditions). |> |>We've been over this before. There's a reason reiser4 and its plugins |>are in kernel space, and not in something like Fuse. | | | And, surely, updating a hash value when 1 byte changes in a gigabyte file, | would be much faster from kernel space where you can actually see the | changeset? Wouldn't it be sane to just export the changeset to userland? This way is easier, though. But I was thinking about accessing the file. I don't know of any hashes that can be easily updated from part of the file, unless you're hashing only pieces of the file in the first place, but it'd be nice to not bother hashing at all until the hash is needed, especially if we are hashing the whole file. Plus, there's the race condition thing. Definitely a lot of reasons to put more stuff in the kernel. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQhVkS3gHNmZLgCUhAQIMNA/+IgaEx9p3bceATrrYDEweAB+N4K98dFyM BgDAtxFS2vzaw6lsF9vtEiHEuhvp5raCAhxcoEO1KgRh21Yc6cR+Yu5FaRc7BV4n WleJtNk521XFwUsmQXs5nYYHzfNlfJbQax9RBqX4IllbXbHX6YUHDAof/Zy8M4MJ Wytp10igur9QVcqXSeEYoRbYHXS3MRyT3cIl6Y1VXAdZRYu/ItlLf0ItRPkRyfB7 1yVK4kOaR4c6U95gaHL0S08tLddtiep+9XIAJ+JXdhnP8yfEH43ItoM/KxrGSc/K PWcgIDicYek6kWWNb8H5dTYIknaW8fYwStuoBdfaLt/9aGO00O4sNmg97skW8H2q +87d22MTiCFtbHyYnD5cV6EzKe+IdUqcTaISOMbctltQmBsPcQjeAlU+BmaLbzEF sN91egbv/iuirroO1/OzCCQrihE6u8/9tK6LO2Y+LGO2N+ZpB10ZHaGa8uHqMFy1 w19r9XcTIJHo+mjuWhM4hnRrna8cwsCepf++tyKE26ZD3iPPaUzB8h+U81R+59y0 TXNLKLCfJfuOBs4IK2MgmcSkvxEpzBos3vJJdlvA3s4aMz6ZAP8vp5Wgj/BvMrxj QhFdy/IPCdDLHc5EO3rOZo3b8e/GZcfxgOJ54nuMMzk0xFrlIbtBtYCBU41o9jiq yP/8bcQ4cqY= =UEM3 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-18 3:43 ` David Masover @ 2005-02-18 4:28 ` Valdis.Kletnieks 2005-02-18 13:36 ` Gregory Maxwell 0 siblings, 1 reply; 16+ messages in thread From: Valdis.Kletnieks @ 2005-02-18 4:28 UTC (permalink / raw) To: David Masover; +Cc: Christian Iversen, reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1540 bytes --] On Thu, 17 Feb 2005 21:43:08 CST, David Masover said: > This way is easier, though. But I was thinking about accessing the > file. I don't know of any hashes that can be easily updated from part > of the file, unless you're hashing only pieces of the file in the first > place, but it'd be nice to not bother hashing at all until the hash is > needed, especially if we are hashing the whole file. There's plenty of CRC functions that are quite easily set up for an incremental update (see RFCs 1141 and 1624 on how to do it for the CRC function used for Internet IP packets). You'd of course not want to use that CRC-16, but the same basic principle applies to other CRC functions. The problem is that most CRC functions aren't very much good at detecting multi-bit errors, and when you're talking about hundreds of gigabytes of disk on a modern RAID, the CRC functions are hardly bulletproof. On the flip side, hash functions like MD5 or the SHA family are fairly bulletproof, but are essentially impossible to develop an incremental update for (if there existed a fast incremental update for the hash function, that would imply a very low preimage resistance, rendering it useless as a cryptographic hash). Also, there's another issue - unlike standard ECC codes that can actually *fix* the problem (for at least small number of bit errors), it's unclear what you should do if you find a mismatch between the hash of a block and the block contents, as you don't know whether it's the actual data or the hash that's corrupted.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-18 4:28 ` Valdis.Kletnieks @ 2005-02-18 13:36 ` Gregory Maxwell 2005-02-18 22:09 ` Valdis.Kletnieks 0 siblings, 1 reply; 16+ messages in thread From: Gregory Maxwell @ 2005-02-18 13:36 UTC (permalink / raw) To: Valdis.Kletnieks@vt.edu; +Cc: reiserfs-list On Thu, 17 Feb 2005 23:28:09 -0500, Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote: > On the flip side, hash functions like MD5 or the SHA family are fairly bulletproof, > but are essentially impossible to develop an incremental update for (if there > existed a fast incremental update for the hash function, that would imply a > very low preimage resistance, rendering it useless as a cryptographic hash). Tree hashes. Divide the file into blocks of N bytes. Compute size/N hashes. Group hashes into pairs. Compute N/2 N' hashes, this is fast because hashes are small. Group N' hashes into pairs compute N'/2 N'' hashes etc.. Reduce to a single hash. A number of useful tradeoffs are possible: By enlarging N you improve the strength along various cryptographic dimensions. By changing the fanout, and deciding how many N your store, which N you store, which N' you store, etc you decide how easy it is to update the hash and you decide what the smallest increment you can test is... you trade off storage (and a little computation) for this ease. > Also, there's another issue - unlike standard ECC codes that can actually *fix* > the problem (for at least small number of bit errors), it's unclear what you should > do if you find a mismatch between the hash of a block and the block contents, as > you don't know whether it's the actual data or the hash that's corrupted.... In my initial suggestion I offered that hashes could be verified by a userspace daemon, or by fsck (since it's an expensive operation)... Such policy could be controlled in the daemon. In most cases I'd like it to make the file inaccessible until I go and fix it by hand. It would also be useful to have the checker daemon watch the logs (or recieve notifications through some kernel interface)... and any block level errors (or smartd errors) backprojected up (through raid and lvm remappings) to the file system level ... After identifying the potentially corrupted file, it could then test the file. If the file has been corrupted, the configured action is taken. If this policy is in userspace, the level of action sopication could be very high: for example, if I was on a distribution with package management, and the file was outside of /home, and the package flags didn't indicate it was a config file.. then go fetch the package, and replace the file and send me an email so I don't forget how wonderful my OS is. :) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-18 13:36 ` Gregory Maxwell @ 2005-02-18 22:09 ` Valdis.Kletnieks 2005-02-19 3:28 ` Gregory Maxwell 0 siblings, 1 reply; 16+ messages in thread From: Valdis.Kletnieks @ 2005-02-18 22:09 UTC (permalink / raw) To: Gregory Maxwell; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1445 bytes --] On Fri, 18 Feb 2005 08:36:51 EST, Gregory Maxwell said: > Tree hashes. > Divide the file into blocks of N bytes. Compute size/N hashes. > Group hashes into pairs. Compute N/2 N' hashes, this is fast because > hashes are small. Group N' hashes into pairs compute N'/2 N'' hashes > etc.. Reduce to a single hash. You get massively I/O bound real fast this way. You may want to re-evaluate whether this *really* buys you anything, especially if you're not using some sort of guarantee that you know what's actually b0rked... > In my initial suggestion I offered that hashes could be verified by a > userspace daemon, or by fsck (since it's an expensive operation)... > Such policy could be controlled in the daemon. > In most cases I'd like it to make the file inaccessible until I go and > fix it by hand. You're still missing the point that in general, you don't have a way to tell whether the block the file lived in went bad, or the block the hash lived in went bad. Sure, if the file *happens* to be ascii text, you can use Wetware 1.5 to scan the file and tell which one went bad. However, you'll need Wetware 2.0 to do the same for your multi-gigabyte Oracle database... :) (And yes, I *have* seen cases where Tripwire went completely and totally bananas and claimed zillions of files were corrupted, when the *real* problem was that the Tripwire database itself had gotten stomped on - so it's *not* a purely theoretical issue.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Plugin for corruption resistance? 2005-02-18 22:09 ` Valdis.Kletnieks @ 2005-02-19 3:28 ` Gregory Maxwell 0 siblings, 0 replies; 16+ messages in thread From: Gregory Maxwell @ 2005-02-19 3:28 UTC (permalink / raw) To: Valdis.Kletnieks@vt.edu; +Cc: reiserfs-list On Fri, 18 Feb 2005 17:09:00 -0500, Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote: > On Fri, 18 Feb 2005 08:36:51 EST, Gregory Maxwell said: > > > Tree hashes. > > Divide the file into blocks of N bytes. Compute size/N hashes. > > Group hashes into pairs. Compute N/2 N' hashes, this is fast because > > hashes are small. Group N' hashes into pairs compute N'/2 N'' hashes > > etc.. Reduce to a single hash. > > You get massively I/O bound real fast this way. You may want to re-evaluate > whether this *really* buys you anything, especially if you're not using some > sort of guarantee that you know what's actually b0rked... I brought up tree hashes because someone pointed out there was no way to incrementally update a normal hash. Tree hashes can easily be incrementally updated if you keep all the sub parts. I don't think that would suddenly make it useful for frequently updated files. > > In my initial suggestion I offered that hashes could be verified by a > > userspace daemon, or by fsck (since it's an expensive operation)... > > Such policy could be controlled in the daemon. > > In most cases I'd like it to make the file inaccessible until I go and > > fix it by hand. > > You're still missing the point that in general, you don't have a way to tell whether > the block the file lived in went bad, or the block the hash lived in went bad. I'm not missing the point. Compare the number of disk blocks a file takes vs the hash. Compare the ease of atomically updating the hash data vs atomically updating the hash. If they don't match, It is far more likely that the file has been silently corrupted than hash has been.. In either case, something seriously wrong has happened (i.e. that *any* data has been corrupted without triggering alarms elsewhere). Wetware will be required figure out what is going on. Perhaps correct a serious problem before it eats the whole file system... Automagic correction of stuff that is automagically correctable is useful in that it might prevent something worse from happening... For example, if the corrupted file was /sbin/init.. regardless of the cause of the problem I'd be glad if the system took some action while the wetware was in an uninteruptable sleep. ;) > Sure, if the file *happens* to be ascii text, you can use Wetware 1.5 to scan > the file and tell which one went bad. However, you'll need Wetware 2.0 to > do the same for your multi-gigabyte Oracle database... :) Such a proposed system would likely not be all that useful on a live database.. the overhead of computing hashes would likely be too great.. Rather, it would be useful if the database system used it's knowledge of how data was stored to do this efficiently. If the database system were written with reiserfs in mind and rather than using a couple of big opaque files it stored it's data in tens of thousands of files... then perhaps such a hashing scheme might actually work out okay. > (And yes, I *have* seen cases where Tripwire went completely and totally bananas > and claimed zillions of files were corrupted, when the *real* problem was that > the Tripwire database itself had gotten stomped on - so it's *not* a purely > theoretical issue.... The discussion is to store the hash in the file metadata. ... If that is getting stomped on, it's a *good* thing if the system goes totally bananas. In a great many situations I'd rather lose a file completely than have some random bytes in it silently corrupted. (and of course, attaching hashes doesn't mean you lose the file... it means it gets brought to your attention) As things stand today, there are hundreds of ways a system could end up with files getting silently corrupted. Many of them would be fairly difficult to detect until it's far too late (to recover cleanly or even detect the root cause). Right now most distros have a package management system that can detect changes in some system files, which is useful against a small subset of these problems, but not most since it will only detect problems in files that almost never change. The proposed system of attaching hashes in metadata would protect all files that are not constantly updated (so that counts out database and single file mailboxes), but could protect most everything else. .. And the things that can't be protected could be with changes to their operation that would be useful to make for reiserfs due to other reasons. (there is no performance reason in reiserfs to make a mail box a single file, for example). Furthermore, attached hashes could greatly speed up applications using hashes in a way that no userspace solution can: Userspace solutions can't maintain a cache of the files hashes because they have no way to be *sure* that the file wasn't monkied with while they weren't watching... so caches are useless for p2p apps or for security checking.. (and useless for verifying that the system isn't silently corrupting data, except for completely static files). If the integerty of the hash is insured by the file system then your trust of the hash should be equal to your trust of the kernel, which is the same level of trust you have in read(), thus you should be able to use the stored hash in any place where you'd read the file and compute the hash itself. I agree that there are applications for additional realtime block level protection which can't be provided by hashes-as-metadata. These would be better addressed via device-mapper... We don't see them because it's hard to avoid them because they often become useless due to an overlap with the disks underlying protection. (because all modern disks have ECC, we tend to lose entire physical blocks at a time. Since we can't access the underlying correction data in a useful way we can't use it in correction...we might be duping it entirely, and worse, since a block level ecc or CRC scheme would change the size of a disk block, we'd end up with all blocks taking multiple disk blocks... Even ignoring the potential performance and atomicity issues, this would greatly increase the impact of block level corruption: you'd always lose two blocks!) Raid and disk ECC address low level corruption. *Some* applications do testing to catch higher level corruption, but the vast majority don't simply because it's not the applications primary duty to make sure it's host isn't broken. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2005-02-19 3:28 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-11 18:58 Plugin for corruption resistance? Gregory Maxwell 2005-02-11 20:39 ` Jake Maciejewski 2005-02-11 20:53 ` Tom Vier 2005-02-12 5:19 ` David Masover 2005-02-13 3:48 ` Esben Stien 2005-02-14 2:01 ` Reiser 4 Apple Michael James 2005-02-14 18:49 ` Hans Reiser 2005-02-14 17:45 ` Plugin for corruption resistance? Hans Reiser 2005-02-15 20:42 ` Adam 2005-02-17 4:10 ` David Masover 2005-02-17 10:53 ` Christian Iversen 2005-02-18 3:43 ` David Masover 2005-02-18 4:28 ` Valdis.Kletnieks 2005-02-18 13:36 ` Gregory Maxwell 2005-02-18 22:09 ` Valdis.Kletnieks 2005-02-19 3:28 ` Gregory Maxwell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.