From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] docs: record the metadump file format
Date: Wed, 26 Jul 2017 09:25:36 +1000 [thread overview]
Message-ID: <20170725232536.GJ17762@dastard> (raw)
In-Reply-To: <20170725163612.GD4369@magnolia>
On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote:
> [cc xfs]
>
> On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote:
> > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote:
> > > Document the metadump file format.
> >
> > Thanks for all this! I have started wondering all this and was
> > curious if there are perhaps more docs about the format or more
> > practical docs which can help one go read the dumps and help
> > analyze through examples.
> >
> > > --- /dev/null
> > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
> > > +== Dump Obfuscation
> > > +
> > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
> > > +space and naming information to avoid leaking sensitive information into
> > > +the metadump file. +xfs_metadump+ does not copy user data blocks.
> > > +
> > > +The obfuscation policy is as follows:
> > > +
> > > +* File and extended attribute names are both considered "names".
> > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
> > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
> >
> > Any reason for this?
>
> /me doesn't know. Maybe it's too hard to generate a new name with the
> same hash?
>
> > > +* Names shorter than 5 characters are not obscured at all.
> >
> > This does not seem like a good idea, do we have a record of why this was done
> > historically?
It was done because of mathematics. This is all "IIRC" off the top
of my head....
The hash we calculate is 4 bytes long, so we can't calculate a hash
collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte
to cause collision is the shortest we can obfuscate, but one byte of
"name overwrite" isn't enough to find collisions if all 4 of the
first 4 bytes are randomly chosen.
Hence for names 5-8 bytes in length we are limited to 1 byte of
correction for each of the first 4 bytes that is randomised. Hence
it's not until filenames are longer than 8 bytes that we can
generate a truly random filename that causes a hash collision.
> > > +* Names that cross a block boundary are not obscured at all.
> >
> > Likewise.
>
> iirc we basically copy things a block at a time, which makes it harder
> to deal with multi-fsblock dirblocks (???)
Discontiguous multi-block directories should be handled
transparently for xfs_db via libxfs buffers now. Maybe metadump
doesn't use these for dumping directory blocks? Also, We shouldn't
be splitting names and values across da block boundaries - we leave
free space in the block and allocate a new one if it doesn't fit....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2017-07-25 23:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-15 22:00 [PATCH] docs: record the metadump file format Darrick J. Wong
[not found] ` <20170725162054.GG18884@wotan.suse.de>
2017-07-25 16:36 ` Darrick J. Wong
2017-07-25 16:47 ` Luis R. Rodriguez
2017-07-25 23:25 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170725232536.GJ17762@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox