* [PATCH] docs: record the metadump file format
@ 2017-06-15 22:00 Darrick J. Wong
[not found] ` <20170725162054.GG18884@wotan.suse.de>
0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2017-06-15 22:00 UTC (permalink / raw)
To: xfs
Document the metadump file format.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
design/XFS_Filesystem_Structure/docinfo.xml | 14 +++++
design/XFS_Filesystem_Structure/magic.asciidoc | 1
design/XFS_Filesystem_Structure/metadump.asciidoc | 62 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 9 +++
4 files changed, 86 insertions(+)
create mode 100644 design/XFS_Filesystem_Structure/metadump.asciidoc
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 5cdcf6c..e13d705 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -155,4 +155,18 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.14159</revnumber>
+ <date>June 2017</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email>darrick.wong@oracle.com</email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>Add the metadump file format.</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 77bed6d..7e62783 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -47,6 +47,7 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
| +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only
| +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only
+| +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps]
|=====
The magic numbers for log items are at offset zero in each log item, but items
diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc
new file mode 100644
index 0000000..2bddb77
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
@@ -0,0 +1,62 @@
+[[Metadata_Dumps]]
+= Metadata Dumps
+
+The +xfs_metadump+ and +xfs_mdrestore+ tools are used to create a sparse
+snapshot of a live file system and to restore that snapshot onto a block
+device for debugging purposes. Only the metadata are captured in the
+snapshot, and the metadata blocks may be obscured for privacy reasons.
+
+A metadump file starts with a +xfs_metablock+ that records the addresses of
+the blocks that follow. Following that are the metadata blocks captured
+from the filesystem. The first block following the first superblock
+must be the superblock from AG 0. If the metadump has more blocks than
+can be pointed to by the +xfs_metablock.mb_daddr+ area, the sequence
+of +xfs_metablock+ followed by metadata blocks is repeated.
+
+.Metadata Dump Format
+
+[source, c]
+----
+struct xfs_metablock {
+ __be32 mb_magic;
+ __be16 mb_count;
+ uint8_t mb_blocklog;
+ uint8_t mb_reserved;
+ __be64 mb_daddr[];
+};
+----
+
+*mb_magic*::
+The magic number, ``XFSM'' (0x5846534d).
+
+*mb_count*::
+Number of blocks indexed by this record. This value must not exceed +(1
+<< mb_blocklog) - sizeof(struct xfs_metablock)+.
+
+*mb_blocklog*::
+The log size of a metadump block. This size of a metadump block 512
+bytes, so this value should be 9.
+
+*mb_reserved*::
+Reserved. Should be zero.
+
+*mb_daddr*::
+An array of disk addresses. Each of the +mb_count+ blocks (of size +(1
+<< mb_blocklog+) following the +xfs_metablock+ should be written back to
+the address pointed to by the corresponding +mb_daddr+ entry.
+
+== Dump Obfuscation
+
+Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
+space and naming information to avoid leaking sensitive information into
+the metadump file. +xfs_metadump+ does not copy user data blocks.
+
+The obfuscation policy is as follows:
+
+* File and extended attribute names are both considered "names".
+* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
+* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
+* Names shorter than 5 characters are not obscured at all.
+* Names that cross a block boundary are not obscured at all.
+* Extended attribute values are zeroed.
+* Empty parts of metadata blocks are zeroed.
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 7916fbe..5540fa0 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -96,3 +96,12 @@ include::extended_attributes.asciidoc[]
include::symbolic_links.asciidoc[]
:leveloffset: 0
+
+Auxiliary Data Structures
+=========================
+
+:leveloffset: 1
+
+include::metadump.asciidoc[]
+
+:leveloffset: 0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] docs: record the metadump file format
[not found] ` <20170725162054.GG18884@wotan.suse.de>
@ 2017-07-25 16:36 ` Darrick J. Wong
2017-07-25 16:47 ` Luis R. Rodriguez
2017-07-25 23:25 ` Dave Chinner
0 siblings, 2 replies; 4+ messages in thread
From: Darrick J. Wong @ 2017-07-25 16:36 UTC (permalink / raw)
To: Luis R. Rodriguez; +Cc: xfs
[cc xfs]
On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote:
> On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote:
> > Document the metadump file format.
>
> Thanks for all this! I have started wondering all this and was
> curious if there are perhaps more docs about the format or more
> practical docs which can help one go read the dumps and help
> analyze through examples.
>
> > --- /dev/null
> > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
> > +== Dump Obfuscation
> > +
> > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
> > +space and naming information to avoid leaking sensitive information into
> > +the metadump file. +xfs_metadump+ does not copy user data blocks.
> > +
> > +The obfuscation policy is as follows:
> > +
> > +* File and extended attribute names are both considered "names".
> > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
> > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
>
> Any reason for this?
/me doesn't know. Maybe it's too hard to generate a new name with the
same hash?
> > +* Names shorter than 5 characters are not obscured at all.
>
> This does not seem like a good idea, do we have a record of why this was done
> historically?
>
> > +* Names that cross a block boundary are not obscured at all.
>
> Likewise.
iirc we basically copy things a block at a time, which makes it harder
to deal with multi-fsblock dirblocks (???)
I don't really know, let's see if the list remembers. :)
--D
>
> Luis
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] docs: record the metadump file format
2017-07-25 16:36 ` Darrick J. Wong
@ 2017-07-25 16:47 ` Luis R. Rodriguez
2017-07-25 23:25 ` Dave Chinner
1 sibling, 0 replies; 4+ messages in thread
From: Luis R. Rodriguez @ 2017-07-25 16:47 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Jul 25, 2017 at 9:36 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>> > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
>> > +== Dump Obfuscation
>> > +
>> > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
>> > +space and naming information to avoid leaking sensitive information into
>> > +the metadump file. +xfs_metadump+ does not copy user data blocks.
>> > +
>> > +The obfuscation policy is as follows:
>> > +
>> > +* File and extended attribute names are both considered "names".
>> > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
>> > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
>>
>> Any reason for this?
>
> /me doesn't know. Maybe it's too hard to generate a new name with the
> same hash?
I just thought of a good reason: shorter names means less information
to be able to create a unique hash and avoid clashes.
>> > +* Names shorter than 5 characters are not obscured at all.
>>
>> This does not seem like a good idea, do we have a record of why this was done
>> historically?
And likewise for smaller number of characters there could be clashes.
In fact the latest md5 clash was for 64 bytes.
But since we *really* don't want to create a reverse map at all why
not just use random characters? Sure we'd eat up entropy fast on a
system but it would seem to be a fair requirement to ask for good
entropy for this.
>> > +* Names that cross a block boundary are not obscured at all.
>>
>> Likewise.
>
> iirc we basically copy things a block at a time, which makes it harder
> to deal with multi-fsblock dirblocks (???)
>
> I don't really know, let's see if the list remembers. :)
Luis
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] docs: record the metadump file format
2017-07-25 16:36 ` Darrick J. Wong
2017-07-25 16:47 ` Luis R. Rodriguez
@ 2017-07-25 23:25 ` Dave Chinner
1 sibling, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2017-07-25 23:25 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Luis R. Rodriguez, xfs
On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote:
> [cc xfs]
>
> On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote:
> > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote:
> > > Document the metadump file format.
> >
> > Thanks for all this! I have started wondering all this and was
> > curious if there are perhaps more docs about the format or more
> > practical docs which can help one go read the dumps and help
> > analyze through examples.
> >
> > > --- /dev/null
> > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
> > > +== Dump Obfuscation
> > > +
> > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
> > > +space and naming information to avoid leaking sensitive information into
> > > +the metadump file. +xfs_metadump+ does not copy user data blocks.
> > > +
> > > +The obfuscation policy is as follows:
> > > +
> > > +* File and extended attribute names are both considered "names".
> > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
> > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
> >
> > Any reason for this?
>
> /me doesn't know. Maybe it's too hard to generate a new name with the
> same hash?
>
> > > +* Names shorter than 5 characters are not obscured at all.
> >
> > This does not seem like a good idea, do we have a record of why this was done
> > historically?
It was done because of mathematics. This is all "IIRC" off the top
of my head....
The hash we calculate is 4 bytes long, so we can't calculate a hash
collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte
to cause collision is the shortest we can obfuscate, but one byte of
"name overwrite" isn't enough to find collisions if all 4 of the
first 4 bytes are randomly chosen.
Hence for names 5-8 bytes in length we are limited to 1 byte of
correction for each of the first 4 bytes that is randomised. Hence
it's not until filenames are longer than 8 bytes that we can
generate a truly random filename that causes a hash collision.
> > > +* Names that cross a block boundary are not obscured at all.
> >
> > Likewise.
>
> iirc we basically copy things a block at a time, which makes it harder
> to deal with multi-fsblock dirblocks (???)
Discontiguous multi-block directories should be handled
transparently for xfs_db via libxfs buffers now. Maybe metadump
doesn't use these for dumping directory blocks? Also, We shouldn't
be splitting names and values across da block boundaries - we leave
free space in the block and allocate a new one if it doesn't fit....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-07-25 23:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15 22:00 [PATCH] docs: record the metadump file format Darrick J. Wong
[not found] ` <20170725162054.GG18884@wotan.suse.de>
2017-07-25 16:36 ` Darrick J. Wong
2017-07-25 16:47 ` Luis R. Rodriguez
2017-07-25 23:25 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).