* Data Integrity Check on EXT Family of Filesystems [not found] <5856b37a-3b66-4404-a6f7-3c120b14ae95@zimbra> @ 2013-09-23 21:08 ` Andrew Martin 2013-09-23 22:00 ` Mike Fleetwood 0 siblings, 1 reply; 5+ messages in thread From: Andrew Martin @ 2013-09-23 21:08 UTC (permalink / raw) To: linux-fsdevel Hello, I am considering writing a tool to perform data integrity checking on filesystems which do not support it internally (e.g. ext4). When storing long-term backups, I would like to be able to detect bit rot or other corruption to ensure that I have intact backups. The method I am considering is to recreate the directory structure of the backup directory in a "shadow" directory tree, and then hash each of the files in the backup directory and store the hash in the same filename in the shadow directory. Then, months later, I can traverse the backup directory, taking a hash of each file again and comparing it with the hash stored in the shadow directory tree. If the hashes match, then the file's integrity has been verified (or at least has not degraded since the shadow directory was created). Does this seem like a reasonable approach for checking data integrity, or is there an existing tool or different method which would be better? Thanks, Andrew Martin ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data Integrity Check on EXT Family of Filesystems 2013-09-23 21:08 ` Data Integrity Check on EXT Family of Filesystems Andrew Martin @ 2013-09-23 22:00 ` Mike Fleetwood 2013-09-24 14:57 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Mike Fleetwood @ 2013-09-23 22:00 UTC (permalink / raw) To: Andrew Martin; +Cc: linux-fsdevel On 23 September 2013 22:08, Andrew Martin <amartin@xes-inc.com> wrote: > > Hello, > > I am considering writing a tool to perform data integrity checking on filesystems > which do not support it internally (e.g. ext4). When storing long-term backups, > I would like to be able to detect bit rot or other corruption to ensure that I > have intact backups. The method I am considering is to recreate the directory > structure of the backup directory in a "shadow" directory tree, and then hash > each of the files in the backup directory and store the hash in the same filename > in the shadow directory. Then, months later, I can traverse the backup directory, > taking a hash of each file again and comparing it with the hash stored in the > shadow directory tree. If the hashes match, then the file's integrity has been > verified (or at least has not degraded since the shadow directory was created). > > Does this seem like a reasonable approach for checking data integrity, or is there > an existing tool or different method which would be better? > > Thanks, > > Andrew Martin Here's a couple of integrity checking tools to consider: tripwire - http://sourceforge.net/projects/tripwire/ aide - http://aide.sourceforge.net/ Don't use them, just providing options. Thanks, Mike On 23 September 2013 22:08, Andrew Martin <amartin@xes-inc.com> wrote: > Hello, > > I am considering writing a tool to perform data integrity checking on filesystems > which do not support it internally (e.g. ext4). When storing long-term backups, > I would like to be able to detect bit rot or other corruption to ensure that I > have intact backups. The method I am considering is to recreate the directory > structure of the backup directory in a "shadow" directory tree, and then hash > each of the files in the backup directory and store the hash in the same filename > in the shadow directory. Then, months later, I can traverse the backup directory, > taking a hash of each file again and comparing it with the hash stored in the > shadow directory tree. If the hashes match, then the file's integrity has been > verified (or at least has not degraded since the shadow directory was created). > > Does this seem like a reasonable approach for checking data integrity, or is there > an existing tool or different method which would be better? > > Thanks, > > Andrew Martin > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data Integrity Check on EXT Family of Filesystems 2013-09-23 22:00 ` Mike Fleetwood @ 2013-09-24 14:57 ` Theodore Ts'o 2013-09-24 17:31 ` Zach Brown 2013-09-26 19:22 ` Andrew Martin 0 siblings, 2 replies; 5+ messages in thread From: Theodore Ts'o @ 2013-09-24 14:57 UTC (permalink / raw) To: Mike Fleetwood; +Cc: Andrew Martin, linux-fsdevel On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote: > > Here's a couple of integrity checking tools to consider: > tripwire - http://sourceforge.net/projects/tripwire/ > aide - http://aide.sourceforge.net/ I use cfv myself to create checksum files. Note that using CRC checksum is plenty if you are just worried about random corruption caused by hardware errors (i.e., memory bitflips, hard drive hiccups). Using a cryptographic checksum ala tripwire is useful if you are worried about malicious attackers trying to modify files without your noticing. Cryptographic checksums do take more CPU time, though. Cheers, - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data Integrity Check on EXT Family of Filesystems 2013-09-24 14:57 ` Theodore Ts'o @ 2013-09-24 17:31 ` Zach Brown 2013-09-26 19:22 ` Andrew Martin 1 sibling, 0 replies; 5+ messages in thread From: Zach Brown @ 2013-09-24 17:31 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Mike Fleetwood, Andrew Martin, linux-fsdevel On Tue, Sep 24, 2013 at 10:57:24AM -0400, Theodore Ts'o wrote: > On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote: > > > > Here's a couple of integrity checking tools to consider: > > tripwire - http://sourceforge.net/projects/tripwire/ > > aide - http://aide.sourceforge.net/ > > I use cfv myself to create checksum files. Note that using CRC > checksum is plenty if you are just worried about random corruption > caused by hardware errors (i.e., memory bitflips, hard drive hiccups). And if one truly wanted hashes instead of checksums, it might not take more than some light scripting around find and shasum -c. KISS, and all that. - z ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data Integrity Check on EXT Family of Filesystems 2013-09-24 14:57 ` Theodore Ts'o 2013-09-24 17:31 ` Zach Brown @ 2013-09-26 19:22 ` Andrew Martin 1 sibling, 0 replies; 5+ messages in thread From: Andrew Martin @ 2013-09-26 19:22 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-fsdevel, Mike Fleetwood ----- Original Message ----- > From: "Theodore Ts'o" <tytso@mit.edu> > To: "Mike Fleetwood" <mike.fleetwood@googlemail.com> > Cc: "Andrew Martin" <amartin@xes-inc.com>, linux-fsdevel@vger.kernel.org > Sent: Tuesday, September 24, 2013 9:57:24 AM > Subject: Re: Data Integrity Check on EXT Family of Filesystems > > On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote: > > > > Here's a couple of integrity checking tools to consider: > > tripwire - http://sourceforge.net/projects/tripwire/ > > aide - http://aide.sourceforge.net/ > > I use cfv myself to create checksum files. Note that using CRC > checksum is plenty if you are just worried about random corruption > caused by hardware errors (i.e., memory bitflips, hard drive > hiccups). > > Using a cryptographic checksum ala tripwire is useful if you are > worried about malicious attackers trying to modify files without your > noticing. Cryptographic checksums do take more CPU time, though. > > Cheers, > > - Ted > Ted, Thanks for the suggestion about cfv - it looks like it should work well for efficiently checking large trees and detecting random corruption. I like that the -r option stores the checksums recursively in each directory, that way you can also choose to only check a particular subtree. Mike, thanks for the other suggestions as well. I've used tripwire before but had not considered using it in this context due to its focus on security and intrusion detection. Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-09-26 19:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <5856b37a-3b66-4404-a6f7-3c120b14ae95@zimbra> 2013-09-23 21:08 ` Data Integrity Check on EXT Family of Filesystems Andrew Martin 2013-09-23 22:00 ` Mike Fleetwood 2013-09-24 14:57 ` Theodore Ts'o 2013-09-24 17:31 ` Zach Brown 2013-09-26 19:22 ` Andrew Martin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).