linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Data Integrity Check on EXT Family of Filesystems
       [not found] <5856b37a-3b66-4404-a6f7-3c120b14ae95@zimbra>
@ 2013-09-23 21:08 ` Andrew Martin
  2013-09-23 22:00   ` Mike Fleetwood
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Martin @ 2013-09-23 21:08 UTC (permalink / raw)
  To: linux-fsdevel

Hello,

I am considering writing a tool to perform data integrity checking on filesystems
which do not support it internally (e.g. ext4). When storing long-term backups, 
I would like to be able to detect bit rot or other corruption to ensure that I 
have intact backups. The method I am considering is to recreate the directory 
structure of the backup directory in a "shadow" directory tree, and then hash 
each of the files in the backup directory and store the hash in the same filename
in the shadow directory. Then, months later, I can traverse the backup directory,
taking a hash of each file again and comparing it with the hash stored in the 
shadow directory tree. If the hashes match, then the file's integrity has been
verified (or at least has not degraded since the shadow directory was created).

Does this seem like a reasonable approach for checking data integrity, or is there
an existing tool or different method which would be better?

Thanks,

Andrew Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Data Integrity Check on EXT Family of Filesystems
  2013-09-23 21:08 ` Data Integrity Check on EXT Family of Filesystems Andrew Martin
@ 2013-09-23 22:00   ` Mike Fleetwood
  2013-09-24 14:57     ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Fleetwood @ 2013-09-23 22:00 UTC (permalink / raw)
  To: Andrew Martin; +Cc: linux-fsdevel

On 23 September 2013 22:08, Andrew Martin <amartin@xes-inc.com> wrote:
>
> Hello,
>
> I am considering writing a tool to perform data integrity checking on filesystems
> which do not support it internally (e.g. ext4). When storing long-term backups,
> I would like to be able to detect bit rot or other corruption to ensure that I
> have intact backups. The method I am considering is to recreate the directory
> structure of the backup directory in a "shadow" directory tree, and then hash
> each of the files in the backup directory and store the hash in the same filename
> in the shadow directory. Then, months later, I can traverse the backup directory,
> taking a hash of each file again and comparing it with the hash stored in the
> shadow directory tree. If the hashes match, then the file's integrity has been
> verified (or at least has not degraded since the shadow directory was created).
>
> Does this seem like a reasonable approach for checking data integrity, or is there
> an existing tool or different method which would be better?
>
> Thanks,
>
> Andrew Martin

Here's a couple of integrity checking tools to consider:
tripwire - http://sourceforge.net/projects/tripwire/
aide - http://aide.sourceforge.net/

Don't use them, just providing options.

Thanks,
Mike



On 23 September 2013 22:08, Andrew Martin <amartin@xes-inc.com> wrote:
> Hello,
>
> I am considering writing a tool to perform data integrity checking on filesystems
> which do not support it internally (e.g. ext4). When storing long-term backups,
> I would like to be able to detect bit rot or other corruption to ensure that I
> have intact backups. The method I am considering is to recreate the directory
> structure of the backup directory in a "shadow" directory tree, and then hash
> each of the files in the backup directory and store the hash in the same filename
> in the shadow directory. Then, months later, I can traverse the backup directory,
> taking a hash of each file again and comparing it with the hash stored in the
> shadow directory tree. If the hashes match, then the file's integrity has been
> verified (or at least has not degraded since the shadow directory was created).
>
> Does this seem like a reasonable approach for checking data integrity, or is there
> an existing tool or different method which would be better?
>
> Thanks,
>
> Andrew Martin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Data Integrity Check on EXT Family of Filesystems
  2013-09-23 22:00   ` Mike Fleetwood
@ 2013-09-24 14:57     ` Theodore Ts'o
  2013-09-24 17:31       ` Zach Brown
  2013-09-26 19:22       ` Andrew Martin
  0 siblings, 2 replies; 5+ messages in thread
From: Theodore Ts'o @ 2013-09-24 14:57 UTC (permalink / raw)
  To: Mike Fleetwood; +Cc: Andrew Martin, linux-fsdevel

On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote:
> 
> Here's a couple of integrity checking tools to consider:
> tripwire - http://sourceforge.net/projects/tripwire/
> aide - http://aide.sourceforge.net/

I use cfv myself to create checksum files.  Note that using CRC
checksum is plenty if you are just worried about random corruption
caused by hardware errors (i.e., memory bitflips, hard drive hiccups).

Using a cryptographic checksum ala tripwire is useful if you are
worried about malicious attackers trying to modify files without your
noticing.  Cryptographic checksums do take more CPU time, though.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Data Integrity Check on EXT Family of Filesystems
  2013-09-24 14:57     ` Theodore Ts'o
@ 2013-09-24 17:31       ` Zach Brown
  2013-09-26 19:22       ` Andrew Martin
  1 sibling, 0 replies; 5+ messages in thread
From: Zach Brown @ 2013-09-24 17:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Mike Fleetwood, Andrew Martin, linux-fsdevel

On Tue, Sep 24, 2013 at 10:57:24AM -0400, Theodore Ts'o wrote:
> On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote:
> > 
> > Here's a couple of integrity checking tools to consider:
> > tripwire - http://sourceforge.net/projects/tripwire/
> > aide - http://aide.sourceforge.net/
> 
> I use cfv myself to create checksum files.  Note that using CRC
> checksum is plenty if you are just worried about random corruption
> caused by hardware errors (i.e., memory bitflips, hard drive hiccups).

And if one truly wanted hashes instead of checksums, it might not take
more than some light scripting around find and shasum -c.  KISS, and all
that.

- z

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Data Integrity Check on EXT Family of Filesystems
  2013-09-24 14:57     ` Theodore Ts'o
  2013-09-24 17:31       ` Zach Brown
@ 2013-09-26 19:22       ` Andrew Martin
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Martin @ 2013-09-26 19:22 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-fsdevel, Mike Fleetwood

----- Original Message -----
> From: "Theodore Ts'o" <tytso@mit.edu>
> To: "Mike Fleetwood" <mike.fleetwood@googlemail.com>
> Cc: "Andrew Martin" <amartin@xes-inc.com>, linux-fsdevel@vger.kernel.org
> Sent: Tuesday, September 24, 2013 9:57:24 AM
> Subject: Re: Data Integrity Check on EXT Family of Filesystems
> 
> On Mon, Sep 23, 2013 at 11:00:09PM +0100, Mike Fleetwood wrote:
> > 
> > Here's a couple of integrity checking tools to consider:
> > tripwire - http://sourceforge.net/projects/tripwire/
> > aide - http://aide.sourceforge.net/
> 
> I use cfv myself to create checksum files.  Note that using CRC
> checksum is plenty if you are just worried about random corruption
> caused by hardware errors (i.e., memory bitflips, hard drive
> hiccups).
> 
> Using a cryptographic checksum ala tripwire is useful if you are
> worried about malicious attackers trying to modify files without your
> noticing.  Cryptographic checksums do take more CPU time, though.
> 
> Cheers,
> 
> 					- Ted
> 
Ted,

Thanks for the suggestion about cfv - it looks like it should work well 
for efficiently checking large trees and detecting random corruption. 
I like that the -r option stores the checksums recursively in each 
directory, that way you can also choose to only check a particular subtree.

Mike, thanks for the other suggestions as well. I've used tripwire 
before but had not considered using it in this context due to its 
focus on security and intrusion detection. 

Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-26 19:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5856b37a-3b66-4404-a6f7-3c120b14ae95@zimbra>
2013-09-23 21:08 ` Data Integrity Check on EXT Family of Filesystems Andrew Martin
2013-09-23 22:00   ` Mike Fleetwood
2013-09-24 14:57     ` Theodore Ts'o
2013-09-24 17:31       ` Zach Brown
2013-09-26 19:22       ` Andrew Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).