* [PATCH] docs: record the metadump file format
@ 2017-06-15 22:00 Darrick J. Wong
[not found] ` <20170725162054.GG18884@wotan.suse.de>
0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2017-06-15 22:00 UTC (permalink / raw)
To: xfs
Document the metadump file format.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
design/XFS_Filesystem_Structure/docinfo.xml | 14 +++++
design/XFS_Filesystem_Structure/magic.asciidoc | 1
design/XFS_Filesystem_Structure/metadump.asciidoc | 62 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 9 +++
4 files changed, 86 insertions(+)
create mode 100644 design/XFS_Filesystem_Structure/metadump.asciidoc
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 5cdcf6c..e13d705 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -155,4 +155,18 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.14159</revnumber>
+ <date>June 2017</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email>darrick.wong@oracle.com</email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>Add the metadump file format.</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 77bed6d..7e62783 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -47,6 +47,7 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
| +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only
| +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only
+| +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps]
|=====
The magic numbers for log items are at offset zero in each log item, but items
diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc
new file mode 100644
index 0000000..2bddb77
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/metadump.asciidoc
@@ -0,0 +1,62 @@
+[[Metadata_Dumps]]
+= Metadata Dumps
+
+The +xfs_metadump+ and +xfs_mdrestore+ tools are used to create a sparse
+snapshot of a live file system and to restore that snapshot onto a block
+device for debugging purposes. Only the metadata are captured in the
+snapshot, and the metadata blocks may be obscured for privacy reasons.
+
+A metadump file starts with a +xfs_metablock+ that records the addresses of
+the blocks that follow. Following that are the metadata blocks captured
+from the filesystem. The first block following the first superblock
+must be the superblock from AG 0. If the metadump has more blocks than
+can be pointed to by the +xfs_metablock.mb_daddr+ area, the sequence
+of +xfs_metablock+ followed by metadata blocks is repeated.
+
+.Metadata Dump Format
+
+[source, c]
+----
+struct xfs_metablock {
+ __be32 mb_magic;
+ __be16 mb_count;
+ uint8_t mb_blocklog;
+ uint8_t mb_reserved;
+ __be64 mb_daddr[];
+};
+----
+
+*mb_magic*::
+The magic number, ``XFSM'' (0x5846534d).
+
+*mb_count*::
+Number of blocks indexed by this record. This value must not exceed +(1
+<< mb_blocklog) - sizeof(struct xfs_metablock)+.
+
+*mb_blocklog*::
+The log size of a metadump block. This size of a metadump block 512
+bytes, so this value should be 9.
+
+*mb_reserved*::
+Reserved. Should be zero.
+
+*mb_daddr*::
+An array of disk addresses. Each of the +mb_count+ blocks (of size +(1
+<< mb_blocklog+) following the +xfs_metablock+ should be written back to
+the address pointed to by the corresponding +mb_daddr+ entry.
+
+== Dump Obfuscation
+
+Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block
+space and naming information to avoid leaking sensitive information into
+the metadump file. +xfs_metadump+ does not copy user data blocks.
+
+The obfuscation policy is as follows:
+
+* File and extended attribute names are both considered "names".
+* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name.
+* Names between 5 and 8 characters are partially rewritten to match the hash of the old name.
+* Names shorter than 5 characters are not obscured at all.
+* Names that cross a block boundary are not obscured at all.
+* Extended attribute values are zeroed.
+* Empty parts of metadata blocks are zeroed.
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 7916fbe..5540fa0 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -96,3 +96,12 @@ include::extended_attributes.asciidoc[]
include::symbolic_links.asciidoc[]
:leveloffset: 0
+
+Auxiliary Data Structures
+=========================
+
+:leveloffset: 1
+
+include::metadump.asciidoc[]
+
+:leveloffset: 0
^ permalink raw reply related [flat|nested] 4+ messages in thread[parent not found: <20170725162054.GG18884@wotan.suse.de>]
* Re: [PATCH] docs: record the metadump file format [not found] ` <20170725162054.GG18884@wotan.suse.de> @ 2017-07-25 16:36 ` Darrick J. Wong 2017-07-25 16:47 ` Luis R. Rodriguez 2017-07-25 23:25 ` Dave Chinner 0 siblings, 2 replies; 4+ messages in thread From: Darrick J. Wong @ 2017-07-25 16:36 UTC (permalink / raw) To: Luis R. Rodriguez; +Cc: xfs [cc xfs] On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote: > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote: > > Document the metadump file format. > > Thanks for all this! I have started wondering all this and was > curious if there are perhaps more docs about the format or more > practical docs which can help one go read the dumps and help > analyze through examples. > > > --- /dev/null > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc > > +== Dump Obfuscation > > + > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block > > +space and naming information to avoid leaking sensitive information into > > +the metadump file. +xfs_metadump+ does not copy user data blocks. > > + > > +The obfuscation policy is as follows: > > + > > +* File and extended attribute names are both considered "names". > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. > > Any reason for this? /me doesn't know. Maybe it's too hard to generate a new name with the same hash? > > +* Names shorter than 5 characters are not obscured at all. > > This does not seem like a good idea, do we have a record of why this was done > historically? > > > +* Names that cross a block boundary are not obscured at all. > > Likewise. iirc we basically copy things a block at a time, which makes it harder to deal with multi-fsblock dirblocks (???) I don't really know, let's see if the list remembers. :) --D > > Luis ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] docs: record the metadump file format 2017-07-25 16:36 ` Darrick J. Wong @ 2017-07-25 16:47 ` Luis R. Rodriguez 2017-07-25 23:25 ` Dave Chinner 1 sibling, 0 replies; 4+ messages in thread From: Luis R. Rodriguez @ 2017-07-25 16:47 UTC (permalink / raw) To: Darrick J. Wong; +Cc: xfs On Tue, Jul 25, 2017 at 9:36 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote: >> > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc >> > +== Dump Obfuscation >> > + >> > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block >> > +space and naming information to avoid leaking sensitive information into >> > +the metadump file. +xfs_metadump+ does not copy user data blocks. >> > + >> > +The obfuscation policy is as follows: >> > + >> > +* File and extended attribute names are both considered "names". >> > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. >> > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. >> >> Any reason for this? > > /me doesn't know. Maybe it's too hard to generate a new name with the > same hash? I just thought of a good reason: shorter names means less information to be able to create a unique hash and avoid clashes. >> > +* Names shorter than 5 characters are not obscured at all. >> >> This does not seem like a good idea, do we have a record of why this was done >> historically? And likewise for smaller number of characters there could be clashes. In fact the latest md5 clash was for 64 bytes. But since we *really* don't want to create a reverse map at all why not just use random characters? Sure we'd eat up entropy fast on a system but it would seem to be a fair requirement to ask for good entropy for this. >> > +* Names that cross a block boundary are not obscured at all. >> >> Likewise. > > iirc we basically copy things a block at a time, which makes it harder > to deal with multi-fsblock dirblocks (???) > > I don't really know, let's see if the list remembers. :) Luis ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] docs: record the metadump file format 2017-07-25 16:36 ` Darrick J. Wong 2017-07-25 16:47 ` Luis R. Rodriguez @ 2017-07-25 23:25 ` Dave Chinner 1 sibling, 0 replies; 4+ messages in thread From: Dave Chinner @ 2017-07-25 23:25 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Luis R. Rodriguez, xfs On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote: > [cc xfs] > > On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote: > > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote: > > > Document the metadump file format. > > > > Thanks for all this! I have started wondering all this and was > > curious if there are perhaps more docs about the format or more > > practical docs which can help one go read the dumps and help > > analyze through examples. > > > > > --- /dev/null > > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc > > > +== Dump Obfuscation > > > + > > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block > > > +space and naming information to avoid leaking sensitive information into > > > +the metadump file. +xfs_metadump+ does not copy user data blocks. > > > + > > > +The obfuscation policy is as follows: > > > + > > > +* File and extended attribute names are both considered "names". > > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. > > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. > > > > Any reason for this? > > /me doesn't know. Maybe it's too hard to generate a new name with the > same hash? > > > > +* Names shorter than 5 characters are not obscured at all. > > > > This does not seem like a good idea, do we have a record of why this was done > > historically? It was done because of mathematics. This is all "IIRC" off the top of my head.... The hash we calculate is 4 bytes long, so we can't calculate a hash collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte to cause collision is the shortest we can obfuscate, but one byte of "name overwrite" isn't enough to find collisions if all 4 of the first 4 bytes are randomly chosen. Hence for names 5-8 bytes in length we are limited to 1 byte of correction for each of the first 4 bytes that is randomised. Hence it's not until filenames are longer than 8 bytes that we can generate a truly random filename that causes a hash collision. > > > +* Names that cross a block boundary are not obscured at all. > > > > Likewise. > > iirc we basically copy things a block at a time, which makes it harder > to deal with multi-fsblock dirblocks (???) Discontiguous multi-block directories should be handled transparently for xfs_db via libxfs buffers now. Maybe metadump doesn't use these for dumping directory blocks? Also, We shouldn't be splitting names and values across da block boundaries - we leave free space in the block and allocate a new one if it doesn't fit.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-07-25 23:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15 22:00 [PATCH] docs: record the metadump file format Darrick J. Wong
[not found] ` <20170725162054.GG18884@wotan.suse.de>
2017-07-25 16:36 ` Darrick J. Wong
2017-07-25 16:47 ` Luis R. Rodriguez
2017-07-25 23:25 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox