[PATCH] (and bad attr2 bug) - pack xfs_sb

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
@ 2006-11-16 19:00 Eric Sandeen
  2006-11-16 22:10 ` Eric Sandeen
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Eric Sandeen @ 2006-11-16 19:00 UTC (permalink / raw)
  To: xfs

see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201

Bugzilla Bug 212201: Cannot build sysem with XFS file system.

I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness 
with more efficient xattr space usage.

But, many reports that this was totally broken in fc6, on x86_64.

Install went ok, but on reboot the filesystem was found to be corrupt.

The filesystem was also found to be marked w/ attr1, not attr2....

If you do a fresh mkfs.xfs on x86_64, with -i attr=2, and dump out the
superblock (or look at it with xfs_db) you will find that although the
versionnum says that there is a morebits bit, the features2 flag is 0.

if you dd/hexdump the superblock, you will find the attr2 flag, but at
the wrong offset.

This is because the xfs_sb_t struct is padded out to 64 bits on 64-bit
arches, and the xfs_xlatesb() routine and xfs_sb_info[] array take this
padding to mean that the last item is 4 bytes bigger than it is, and
treats sb_features2 as 8 bytes not four.  This then gets endian-flipped out...

I can't quite figure out how this winds up causing problems if you stay
on the x86_64 arch, as I'd expect that if the offset is wrong, it should
at least be consistently wrong.  And in fact if you do mkfs,mount,xfs_info,
it will tell you that you do have attr2. But somewhere along the line thing
go wrong, and post-install, post-reboot, the filesystem thinks it is attr1,
and is therefore corrupt.

I think that maybe some accesses are post-xfs_xlatesb, while others
may access the un-flipped sb directly?  Or maybe this is sb logging
code that has messed things up?  Not sure... needs more investigation.

In any case, dd does not lie, and this patch for the kernel, and a 
corresponding one for userspace, at least make "mkfs.xfs -i attr=2" 
puts the features2 flag in the right place, as shown by inspection via dd.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Index: linux-2.6.18/fs/xfs/xfs_sb.h
===================================================================
--- linux-2.6.18.orig/fs/xfs/xfs_sb.h
+++ linux-2.6.18/fs/xfs/xfs_sb.h
@@ -149,7 +149,7 @@ typedef struct xfs_sb
 	__uint16_t	sb_logsectsize;	/* sector size for the log, bytes */
 	__uint32_t	sb_logsunit;	/* stripe unit size for the log */
 	__uint32_t	sb_features2;	/* additional feature bits */
-} xfs_sb_t;
+} __attribute__ ((packed)) xfs_sb_t;

 /*
  * Sequence number values for the fields.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
@ 2006-11-16 22:10 ` Eric Sandeen
  2006-11-20  3:50   ` Eric Sandeen
  2006-11-16 22:45 ` David Chinner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-16 22:10 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:
> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
> 
> Bugzilla Bug 212201: Cannot build sysem with XFS file system.
> 
> I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness 
> with more efficient xattr space usage.
> 
> But, many reports that this was totally broken in fc6, on x86_64.

ugh.  it's broken on x86 too, so it's not just the alignment/padding,
although that should be fixed for cross-arch mounts.

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
  2006-11-16 22:10 ` Eric Sandeen
@ 2006-11-16 22:45 ` David Chinner
  2006-11-16 22:55   ` Eric Sandeen
  2006-11-17 15:53   ` Russell Cattelan
  2006-11-17  1:08 ` Timothy Shimmin
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: David Chinner @ 2006-11-16 22:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Thu, Nov 16, 2006 at 01:00:31PM -0600, Eric Sandeen wrote:
> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
> 
> Bugzilla Bug 212201: Cannot build sysem with XFS file system.

.....
> The filesystem was also found to be marked w/ attr1, not attr2....
.....
> if you dd/hexdump the superblock, you will find the attr2 flag, but at
> the wrong offset.
> 
> This is because the xfs_sb_t struct is padded out to 64 bits on 64-bit
> arches, and the xfs_xlatesb() routine and xfs_sb_info[] array take this
> padding to mean that the last item is 4 bytes bigger than it is, and
> treats sb_features2 as 8 bytes not four.  This then gets endian-flipped out...

Ok.

> I can't quite figure out how this winds up causing problems if you stay
> on the x86_64 arch, as I'd expect that if the offset is wrong, it should
> at least be consistently wrong.  And in fact if you do mkfs,mount,xfs_info,
> it will tell you that you do have attr2. But somewhere along the line thing
> go wrong, and post-install, post-reboot, the filesystem thinks it is attr1,
> and is therefore corrupt.

Nor would I expect an i386 to have a problem either.

> I think that maybe some accesses are post-xfs_xlatesb, while others
> may access the un-flipped sb directly?  Or maybe this is sb logging
> code that has messed things up?  Not sure... needs more investigation.

More investigation - we shouldn't be operating on an untranslated
superblock - the first thing we do is read and translate it....

> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> 
> Index: linux-2.6.18/fs/xfs/xfs_sb.h
> ===================================================================
> --- linux-2.6.18.orig/fs/xfs/xfs_sb.h
> +++ linux-2.6.18/fs/xfs/xfs_sb.h
> @@ -149,7 +149,7 @@ typedef struct xfs_sb
>  	__uint16_t	sb_logsectsize;	/* sector size for the log, bytes */
>  	__uint32_t	sb_logsunit;	/* stripe unit size for the log */
>  	__uint32_t	sb_features2;	/* additional feature bits */
> -} xfs_sb_t;
> +} __attribute__ ((packed)) xfs_sb_t;

I'd prefer not to pack the structure.

Over time, here's how this changed:

 typedef struct xfs_sb
 {
@@ -135,9 +136,12 @@
        __uint8_t       sb_shared_vn;   /* shared version number */
        xfs_extlen_t    sb_inoalignmt;  /* inode chunk alignment, fsblocks */
        __uint32_t      sb_unit;        /* stripe or raid unit */
-       __uint32_t      sb_width;       /* stripe or raid width */
+       __uint32_t      sb_width;       /* stripe or raid width */
        __uint8_t       sb_dirblklog;   /* log2 of dir block size (fsbs) */
-        __uint8_t       sb_dummy[7];    /* padding */
+       __uint8_t       sb_logsectlog;  /* log2 of the log sector size */
+       __uint16_t      sb_logsectsize; /* sector size for the log, bytes */
+       __uint32_t      sb_logsunit;    /* stripe unit size for the log */
+       __uint32_t      sb_features2;   /* additional feature bits */
 } xfs_sb_t;

So before the sector size > 512 bytes code, there was padding to push the
superblock out to 64bit alignement so that sb_dirblklog was correctly aligned.
The xfs_sb_info structure:

     { offsetof(xfs_sb_t, sb_unit),      0 },
     { offsetof(xfs_sb_t, sb_width),     0 },
     { offsetof(xfs_sb_t, sb_dirblklog),         0 },
-    { offsetof(xfs_sb_t, sb_dummy),     1 },
+    { offsetof(xfs_sb_t, sb_logsectlog), 0 },
+    { offsetof(xfs_sb_t, sb_logsectsize),0 },
     { offsetof(xfs_sb_t, sb_logsunit),  0 },
+    { offsetof(xfs_sb_t, sb_features2),         0 },
     { sizeof(xfs_sb_t),                 0 }
 };

had the sb_dummy field as "no translate" so it effectively ignored it
but it ensured that sb_dirblklog was sized correctly.

The real bug here was whoever removed the dummy field and did not
replace that with a comment ot say that the xfs_sb strucutre needs
to be padded to 64 bits to ensure translation worked properly
on 64 bit systems.

I'd prefer explicit padding (with warning comments) over packing
the structure. Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 22:45 ` David Chinner
@ 2006-11-16 22:55   ` Eric Sandeen
  2006-11-17 15:53   ` Russell Cattelan
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Sandeen @ 2006-11-16 22:55 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

David Chinner wrote:

> The real bug here was whoever removed the dummy field and did not
> replace that with a comment ot say that the xfs_sb strucutre needs
> to be padded to 64 bits to ensure translation worked properly
> on 64 bit systems.
> 
> I'd prefer explicit padding (with warning comments) over packing
> the structure. Thoughts?

yes, I agree that explicit padding, and a comment about why it's there,
would be better.

I was thinking about this over lunch and meant to follow up, but then
figured out that the actual bug isn't because of this (it's broken as
well on x86), and kept trying to chase that :)

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
  2006-11-16 22:10 ` Eric Sandeen
  2006-11-16 22:45 ` David Chinner
@ 2006-11-17  1:08 ` Timothy Shimmin
  2006-11-17  2:39   ` David Chinner
  2006-11-17 23:49 ` Eric Sandeen
  2007-05-17 14:41 ` Eric Sandeen
  4 siblings, 1 reply; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-17  1:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Hi Eric,

--On 16 November 2006 1:00:31 PM -0600 Eric Sandeen <sandeen@sandeen.net> wrote:

> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
>
> Bugzilla Bug 212201: Cannot build sysem with XFS file system.
>
> I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness
> with more efficient xattr space usage.
>
> But, many reports that this was totally broken in fc6, on x86_64.
>
> Install went ok, but on reboot the filesystem was found to be corrupt.
>
> The filesystem was also found to be marked w/ attr1, not attr2....
>
> If you do a fresh mkfs.xfs on x86_64, with -i attr=2, and dump out the
> superblock (or look at it with xfs_db) you will find that although the
> versionnum says that there is a morebits bit, the features2 flag is 0.
>
> if you dd/hexdump the superblock, you will find the attr2 flag, but at
> the wrong offset.
>
> This is because the xfs_sb_t struct is padded out to 64 bits on 64-bit
> arches,

This actually came up when I wrote xfstests/122.
It looks at sizes of various ondisk structures and for some of them it
print's out the offsets of the fields.
I noticed that the xfs_sb_t was a different size on 32bit and 64 bit
and so printed out all the field offsets.
They are all the same (on different word sizes)
and so the only difference is that the last field
will be padded out on a 64 bit platform as you noticed.
I couldn't really see a problem with that.
And discussed it with Nathan at the time.

> and the xfs_xlatesb() routine and xfs_sb_info[] array take this
> padding to mean that the last item is 4 bytes bigger than it is, and
> treats sb_features2 as 8 bytes not four.  This then gets endian-flipped out...
>

Well there is a bug in the sb endian translation code then or its setup.
All the field accesses should be correct, no?

I can't see why it needs to be packed or padded if it's just implicit extra padding
after the end of the last field.
Am I missing something?

Let me look into this a bit more :)

--Tim

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  1:08 ` Timothy Shimmin
@ 2006-11-17  2:39   ` David Chinner
  2006-11-17  4:11     ` Timothy Shimmin
  0 siblings, 1 reply; 34+ messages in thread
From: David Chinner @ 2006-11-17  2:39 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Eric Sandeen, xfs

On Fri, Nov 17, 2006 at 11:08:40AM +1000, Timothy Shimmin wrote:
> Hi Eric,
> 
> 
> Well there is a bug in the sb endian translation code then or its setup.
> All the field accesses should be correct, no?

The problem is the size of the variable translated in xfs_xlatesb()
is the offset of the next field minus the offset of the current
field.

With the last filed of the structure, it ends up being the
size of the structure minus the offset of the field. On a 32 bit
machine, the structure is 4 bytes larger that the offset of
the feature2 field. On 64 bit machine, the strucutre is
8 bytes larger than the offset of the features2 field,
so it translates is as though it was a 64 bit field,
not a 32 bit field.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  2:39   ` David Chinner
@ 2006-11-17  4:11     ` Timothy Shimmin
  2006-11-17  5:55       ` David Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-17  4:11 UTC (permalink / raw)
  To: David Chinner; +Cc: Eric Sandeen, xfs

Hi Dave,

--On 17 November 2006 1:39:46 PM +1100 David Chinner <dgc@sgi.com> wrote:

> On Fri, Nov 17, 2006 at 11:08:40AM +1000, Timothy Shimmin wrote:
>> Hi Eric,
>>
>>
>> Well there is a bug in the sb endian translation code then or its setup.
>> All the field accesses should be correct, no?
>
> The problem is the size of the variable translated in xfs_xlatesb()
> is the offset of the next field minus the offset of the current
> field.
>
Yep.

> With the last filed of the structure, it ends up being the
> size of the structure minus the offset of the field. On a 32 bit
> machine, the structure is 4 bytes larger that the offset of
> the feature2 field. On 64 bit machine, the strucutre is
> 8 bytes larger than the offset of the features2 field,
Yep.

> so it translates is as though it was a 64 bit field,
> not a 32 bit field.....
>

So why not change xfs_sb_info to give the real offset of where
the next field should go (if there was one), instead of giving the sizeof the
structure which is not where say a 32 bit field would go and
is wrong IMHO.

i.e.

===========================================================================
Index: fs/xfs/xfs_mount.c
===========================================================================

--- a/fs/xfs/xfs_mount.c	2006-11-17 15:02:21.000000000 +1100
+++ b/fs/xfs/xfs_mount.c	2006-11-17 14:48:43.261937705 +1100
@@ -121,7 +121,7 @@ static const struct {
     { offsetof(xfs_sb_t, sb_logsectsize),0 },
     { offsetof(xfs_sb_t, sb_logsunit),	 0 },
     { offsetof(xfs_sb_t, sb_features2),	 0 },
-    { sizeof(xfs_sb_t),			 0 }
+    { offsetof(xfs_sb_t, sb_features2) + sizeof(__uint32_t), 0 }
 };

 /*

--Tim

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  4:11     ` Timothy Shimmin
@ 2006-11-17  5:55       ` David Chinner
  2006-11-17  6:34         ` sandeen
  0 siblings, 1 reply; 34+ messages in thread
From: David Chinner @ 2006-11-17  5:55 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: David Chinner, Eric Sandeen, xfs

On Fri, Nov 17, 2006 at 02:11:12PM +1000, Timothy Shimmin wrote:
> >so it translates is as though it was a 64 bit field,
> >not a 32 bit field.....
> >
> 
> So why not change xfs_sb_info to give the real offset of where
> the next field should go (if there was one), instead of giving the sizeof 
> the
> structure which is not where say a 32 bit field would go and
> is wrong IMHO.
> 
> i.e.
> 
> ===========================================================================
> Index: fs/xfs/xfs_mount.c
> ===========================================================================
> 
> --- a/fs/xfs/xfs_mount.c	2006-11-17 15:02:21.000000000 +1100
> +++ b/fs/xfs/xfs_mount.c	2006-11-17 14:48:43.261937705 +1100
> @@ -121,7 +121,7 @@ static const struct {
>     { offsetof(xfs_sb_t, sb_logsectsize),0 },
>     { offsetof(xfs_sb_t, sb_logsunit),	 0 },
>     { offsetof(xfs_sb_t, sb_features2),	 0 },
> -    { sizeof(xfs_sb_t),			 0 }
> +    { offsetof(xfs_sb_t, sb_features2) + sizeof(__uint32_t), 0 }
> };

Whenever you add to the table, you now need to modify both the new
entry and the terminator to get it right.

Nor (IMO) is it obvious that it is a terminator or why it is
different to all the other entries in the structure. A field such as
sb_dummy or sb_pad before the terminator is fairly obvious, and it
means that you don't need to modify the table terminator every time
the superblock gets extended.

That way the code stays more consistent over time, diffs are smaller
and neater, and you can see at a simple diff just how the features
have been added over time (like I did this morning).....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  5:55       ` David Chinner
@ 2006-11-17  6:34         ` sandeen
  2006-11-17  6:52           ` Nathan Scott
  2006-11-17  6:58           ` David Chinner
  0 siblings, 2 replies; 34+ messages in thread
From: sandeen @ 2006-11-17  6:34 UTC (permalink / raw)
  To: David Chinner; +Cc: Timothy Shimmin, Eric Sandeen, xfs

> On Fri, Nov 17, 2006 at 02:11:12PM +1000, Timothy Shimmin wrote:

> Whenever you add to the table, you now need to modify both the new
> entry and the terminator to get it right.
>
> Nor (IMO) is it obvious that it is a terminator or why it is
> different to all the other entries in the structure. A field such as
> sb_dummy or sb_pad before the terminator is fairly obvious, and it
> means that you don't need to modify the table terminator every time
> the superblock gets extended.
>
> That way the code stays more consistent over time, diffs are smaller
> and neater, and you can see at a simple diff just how the features
> have been added over time (like I did this morning).....

nothing in the code is terribly obvious.. please add comments however you
decide to fix it :)

and really, now that this is out in the wild, maybe sb_features3 instead
of padding is appropriate, and check both for the attr2 bit...? :(

i'm trying to figure out what the kernel upgrade path is for fc6 users who
have an extra-padded-flipped features2/attr2 filesystem.  :(

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  6:34         ` sandeen
@ 2006-11-17  6:52           ` Nathan Scott
  2006-11-17 15:20             ` sandeen
  2006-11-17  6:58           ` David Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Nathan Scott @ 2006-11-17  6:52 UTC (permalink / raw)
  To: sandeen; +Cc: David Chinner, Timothy Shimmin, xfs

On Fri, 2006-11-17 at 00:34 -0600, sandeen@sandeen.net wrote:
> and really, now that this is out in the wild, maybe sb_features3
> instead of padding is appropriate, and check both for the attr2
> bit...? :(

Thats not going to work, theres three or four other feature2 bits
preceding attr2 as well.

The "take a 32 bit systems fs to a 64 bit system" is relatively
uncommon, so I suppose its just something we live with (as we did
with the log recovery issues in that situation for several years).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  6:34         ` sandeen
  2006-11-17  6:52           ` Nathan Scott
@ 2006-11-17  6:58           ` David Chinner
  1 sibling, 0 replies; 34+ messages in thread
From: David Chinner @ 2006-11-17  6:58 UTC (permalink / raw)
  To: sandeen; +Cc: David Chinner, Timothy Shimmin, xfs

On Fri, Nov 17, 2006 at 12:34:45AM -0600, sandeen@sandeen.net wrote:
> > On Fri, Nov 17, 2006 at 02:11:12PM +1000, Timothy Shimmin wrote:
> 
> > Whenever you add to the table, you now need to modify both the new
> > entry and the terminator to get it right.
> >
> > Nor (IMO) is it obvious that it is a terminator or why it is
> > different to all the other entries in the structure. A field such as
> > sb_dummy or sb_pad before the terminator is fairly obvious, and it
> > means that you don't need to modify the table terminator every time
> > the superblock gets extended.
> >
> > That way the code stays more consistent over time, diffs are smaller
> > and neater, and you can see at a simple diff just how the features
> > have been added over time (like I did this morning).....
> 
> nothing in the code is terribly obvious.. please add comments however you
> decide to fix it :)

*nod*

> and really, now that this is out in the wild, maybe sb_features3 instead
> of padding is appropriate, and check both for the attr2 bit...? :(

I'm not sure that this is a good idea, especially as past history of
introducing new feature bits is anything to go by (I think this makes bug #6
that the features2 field has been responsible for). I'd much prefer to
fix the bug, blacklist the bad 4 bytes in the superblock, and then either:

	- modify xfs_admin/repair to detect a busted superblock and have them
	  fix it; or
	- put code in the mount path that detects this and corrects it
	  automatically (which we do for some other superblock fields).

> i'm trying to figure out what the kernel upgrade path is for fc6 users who
> have an extra-padded-flipped features2/attr2 filesystem.  :(

Depends on what we do to fix it, right? Do you have any preferences?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17  6:52           ` Nathan Scott
@ 2006-11-17 15:20             ` sandeen
  2006-11-19 23:11               ` Nathan Scott
  0 siblings, 1 reply; 34+ messages in thread
From: sandeen @ 2006-11-17 15:20 UTC (permalink / raw)
  To: nscott; +Cc: sandeen, David Chinner, Timothy Shimmin, xfs

> On Fri, 2006-11-17 at 00:34 -0600, sandeen@sandeen.net wrote:
>> and really, now that this is out in the wild, maybe sb_features3
>> instead of padding is appropriate, and check both for the attr2
>> bit...? :(
>
> Thats not going to work, theres three or four other feature2 bits
> preceding attr2 as well.
>
> The "take a 32 bit systems fs to a 64 bit system" is relatively
> uncommon, so I suppose its just something we live with (as we did
> with the log recovery issues in that situation for several years).

So you think this should not be fixed, then?  Because if it -is- fixed
then it's not an fs transfer problem; suddenly 64-bit attr2 filesystems
will think they have attr1 if proper padding is added.

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 22:45 ` David Chinner
  2006-11-16 22:55   ` Eric Sandeen
@ 2006-11-17 15:53   ` Russell Cattelan
  1 sibling, 0 replies; 34+ messages in thread
From: Russell Cattelan @ 2006-11-17 15:53 UTC (permalink / raw)
  To: David Chinner; +Cc: Eric Sandeen, xfs

David Chinner wrote:

>
>@@ -135,9 +136,12 @@
>        __uint8_t       sb_shared_vn;   /* shared version number */
>        xfs_extlen_t    sb_inoalignmt;  /* inode chunk alignment, fsblocks */
>        __uint32_t      sb_unit;        /* stripe or raid unit */
>-       __uint32_t      sb_width;       /* stripe or raid width */
>+       __uint32_t      sb_width;       /* stripe or raid width */
>        __uint8_t       sb_dirblklog;   /* log2 of dir block size (fsbs) */
>-        __uint8_t       sb_dummy[7];    /* padding */
>+       __uint8_t       sb_logsectlog;  /* log2 of the log sector size */
>+       __uint16_t      sb_logsectsize; /* sector size for the log, bytes */
>+       __uint32_t      sb_logsunit;    /* stripe unit size for the log */
>+       __uint32_t      sb_features2;   /* additional feature bits */
> } xfs_sb_t;
>
>So before the sector size > 512 bytes code, there was padding to push the
>superblock out to 64bit alignement so that sb_dirblklog was correctly aligned.
>The xfs_sb_info structure:
>
>     { offsetof(xfs_sb_t, sb_unit),      0 },
>     { offsetof(xfs_sb_t, sb_width),     0 },
>     { offsetof(xfs_sb_t, sb_dirblklog),         0 },
>-    { offsetof(xfs_sb_t, sb_dummy),     1 },
>+    { offsetof(xfs_sb_t, sb_logsectlog), 0 },
>+    { offsetof(xfs_sb_t, sb_logsectsize),0 },
>     { offsetof(xfs_sb_t, sb_logsunit),  0 },
>+    { offsetof(xfs_sb_t, sb_features2),         0 },
>     { sizeof(xfs_sb_t),                 0 }
> };
>
>had the sb_dummy field as "no translate" so it effectively ignored it
>but it ensured that sb_dirblklog was sized correctly.
>
>The real bug here was whoever removed the dummy field and did not
>replace that with a comment ot say that the xfs_sb strucutre needs
>to be padded to 64 bits to ensure translation worked properly
>on 64 bit systems.
>
>I'd prefer explicit padding (with warning comments) over packing
>the structure. Thoughts?
>  
>
That seems safer to me and the comments will make the next  person to 
muck with
the structure think about padding.

>Cheers,
>
>Dave.
>  
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
                   ` (2 preceding siblings ...)
  2006-11-17  1:08 ` Timothy Shimmin
@ 2006-11-17 23:49 ` Eric Sandeen
  2007-05-17 14:41 ` Eric Sandeen
  4 siblings, 0 replies; 34+ messages in thread
From: Eric Sandeen @ 2006-11-17 23:49 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:
> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201

ok so the padding was a red herring w.r.t. the corruption.

What I have found is that the bmap btree block -is- in the inode, and
the attribute extents -are- in the inode, but they are not properly placed.

If you reset the forkoff to 13, you'll get proper answers for the btree
block.  If you set it to 15, you get proper answers for the attribute
extents.  But not both at the same time.  Something set the forkoff, or
the location of the btree block, or the location of the attr extents, at
the wrong place in the inode.

Getting closer...

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-17 15:20             ` sandeen
@ 2006-11-19 23:11               ` Nathan Scott
  2006-11-20  1:39                 ` Eric Sandeen
  0 siblings, 1 reply; 34+ messages in thread
From: Nathan Scott @ 2006-11-19 23:11 UTC (permalink / raw)
  To: sandeen; +Cc: David Chinner, Timothy Shimmin, xfs

On Fri, 2006-11-17 at 09:20 -0600, sandeen@sandeen.net wrote:
> > On Fri, 2006-11-17 at 00:34 -0600, sandeen@sandeen.net wrote:
> >> and really, now that this is out in the wild, maybe sb_features3
> >> instead of padding is appropriate, and check both for the attr2
> >> bit...? :(
> >
> > Thats not going to work, theres three or four other feature2 bits
> > preceding attr2 as well.
> >
> > The "take a 32 bit systems fs to a 64 bit system" is relatively
> > uncommon, so I suppose its just something we live with (as we did
> > with the log recovery issues in that situation for several years).
> 
> So you think this should not be fixed, then?  Because if it -is- fixed

I didn't say that.  It should be fixed.  Noone will notice though,
as its not actually biting anyone... (the attr2 problem will not
be related to this, its gonna be something else).

> then it's not an fs transfer problem; suddenly 64-bit attr2 filesystems
> will think they have attr1 if proper padding is added.

Now to really fry your noodle, attr2 is actually ondisk compatible
with attr1.  :)  (the SB bit was taken to prevent a repair buglet
from accidentally trashing all inodes using a non-fixed forkoff).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-19 23:11               ` Nathan Scott
@ 2006-11-20  1:39                 ` Eric Sandeen
  2006-11-20  3:00                   ` Nathan Scott
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-20  1:39 UTC (permalink / raw)
  To: nscott; +Cc: David Chinner, Timothy Shimmin, xfs

Nathan Scott wrote:
> On Fri, 2006-11-17 at 09:20 -0600, sandeen@sandeen.net wrote:
>>> On Fri, 2006-11-17 at 00:34 -0600, sandeen@sandeen.net wrote:
>>>> and really, now that this is out in the wild, maybe sb_features3
>>>> instead of padding is appropriate, and check both for the attr2
>>>> bit...? :(
>>> Thats not going to work, theres three or four other feature2 bits
>>> preceding attr2 as well.
>>>
>>> The "take a 32 bit systems fs to a 64 bit system" is relatively
>>> uncommon, so I suppose its just something we live with (as we did
>>> with the log recovery issues in that situation for several years).
>> So you think this should not be fixed, then?  Because if it -is- fixed
> 
> I didn't say that.  It should be fixed.  Noone will notice though,
> as its not actually biting anyone... (the attr2 problem will not
> be related to this, its gonna be something else).

but it can't just be properly padded in the kernel and leave it at that, 
can it?  If so won't attr2 filesystems on x86_64 suddenly start 
appearing to be attr2?

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-20  1:39                 ` Eric Sandeen
@ 2006-11-20  3:00                   ` Nathan Scott
  2006-11-20  3:32                     ` Eric Sandeen
  0 siblings, 1 reply; 34+ messages in thread
From: Nathan Scott @ 2006-11-20  3:00 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: David Chinner, Timothy Shimmin, xfs

On Sun, 2006-11-19 at 19:39 -0600, Eric Sandeen wrote:
> ...
> but it can't just be properly padded in the kernel and leave it at that, 
> can it?

I think it can.

>  If so won't attr2 filesystems on x86_64 suddenly start 
> appearing to be attr2?

What problem do you see resulting from that though?

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-20  3:00                   ` Nathan Scott
@ 2006-11-20  3:32                     ` Eric Sandeen
  2006-11-20  3:37                       ` Nathan Scott
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-20  3:32 UTC (permalink / raw)
  To: nscott; +Cc: David Chinner, Timothy Shimmin, xfs

Nathan Scott wrote:
> On Sun, 2006-11-19 at 19:39 -0600, Eric Sandeen wrote:
>> ...
>> but it can't just be properly padded in the kernel and leave it at that, 
>> can it?
> 
> I think it can.
> 
>>  If so won't attr2 filesystems on x86_64 suddenly start 
>> appearing to be attr2?

ugh typo... "as attr1" I meant...

> What problem do you see resulting from that though?

is an attr2 filesystem mounted as attr1 safe?

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-20  3:32                     ` Eric Sandeen
@ 2006-11-20  3:37                       ` Nathan Scott
  0 siblings, 0 replies; 34+ messages in thread
From: Nathan Scott @ 2006-11-20  3:37 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: David Chinner, Timothy Shimmin, xfs

On Sun, 2006-11-19 at 21:32 -0600, Eric Sandeen wrote:
> Nathan Scott wrote:
> > On Sun, 2006-11-19 at 19:39 -0600, Eric Sandeen wrote:
> >> ...
> >> but it can't just be properly padded in the kernel and leave it at that, 
> >> can it?
> > 
> > I think it can.
> > 
> >>  If so won't attr2 filesystems on x86_64 suddenly start 
> >> appearing to be attr2?
> 
> ugh typo... "as attr1" I meant...
> 
> > What problem do you see resulting from that though?
> 
> is an attr2 filesystem mounted as attr1 safe?
> 

yes.

-- 
Nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 22:10 ` Eric Sandeen
@ 2006-11-20  3:50   ` Eric Sandeen
  2006-11-21  4:02     ` Eric Sandeen
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-20  3:50 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:
> Eric Sandeen wrote:
>> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
>>
>> Bugzilla Bug 212201: Cannot build sysem with XFS file system.
>>
>> I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness 
>> with more efficient xattr space usage.
>>
>> But, many reports that this was totally broken in fc6, on x86_64.
> 
> ugh.  it's broken on x86 too, so it's not just the alignment/padding,
> although that should be fixed for cross-arch mounts.
> 
> -Eric
> 
> 
here's a testcase to corrupt it FWIW.

Russell has a slightly different one derived from this.


#!/bin/sh

remount() {
	umount mnt
	xfs_db -r fsfile2 -c "inode 131" -c "p core.forkoff" -c "p u" -c "p a"
	mount -o loop fsfile2 mnt/
}

umount mnt/
rm -f fsfile2
mkfs.xfs -dfile,name=fsfile2,size=100m -iattr=2
mount -o loop fsfile2 mnt/
mkdir mnt/dir
setfattr -n user.rity.selinux -v user_foo:blah_foo:mnt_what:0 mnt/dir/
for I in `seq 10 20`; do touch mnt/file$I; done
for I in `seq 100 700`; do touch mnt/dir/file$I; done
remount
for I in `seq 1000 1400`; do touch mnt/dir/file$I; done
#remount # works if we do remount here
setfattr -n user.rity.selinux -v user_foo:blah_foo:mnt_what:0 mnt/dir/
echo "unmounting"
umount mnt/
xfs_db -r fsfile2 -c "inode 131" -c "p core.forkoff" -c "p u" -c "p a" 
-c "type text"  -c "p"


Run it and you'll get the bmap root block from the wrong offset; it'll 
be "0"

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-20  3:50   ` Eric Sandeen
@ 2006-11-21  4:02     ` Eric Sandeen
  2006-11-22  1:02       ` Russell Cattelan
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-21  4:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:

> Eric Sandeen wrote:
>   
>> ugh.  it's broken on x86 too, so it's not just the alignment/padding,
>>     
>> although that should be fixed for cross-arch mounts.
>>     
>> -Eric
>>     
>>     
> here's a testcase to corrupt it FWIW.
>   
>   
Ok, with expert collaboration from Russell, Barry, Tim, 
Nathan, David, et al, how about this:

For btree dirs, we need a different calculation for the space
used in di_u, to set the minimum threshold for the fork offset...

This fixes my testcase, but as Tim points out -now- we need to compact
the btree ptrs, if we return (and use) an offset < current forkoff...

whee....

-Eric


Index: linux-2.6.18/fs/xfs.orig/xfs_attr_leaf.c
===================================================================
--- linux-2.6.18.orig/fs/xfs.orig/xfs_attr_leaf.c
+++ linux-2.6.18/fs/xfs.orig/xfs_attr_leaf.c
@@ -116,6 +116,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t 
 	int minforkoff;	/* lower limit on valid forkoff locations */
 	int maxforkoff;	/* upper limit on valid forkoff locations */
 	xfs_mount_t *mp = dp->i_mount;
+	int dsize = 0;
 
 	offset = (XFS_LITINO(mp) - bytes) >> 3; /* rounded down */
 
@@ -134,8 +135,21 @@ xfs_attr_shortform_bytesfit(xfs_inode_t 
 		return 0;
 	}
 
+	switch (dp->i_d.di_format) {
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_EXTENTS:
+		dsize = dp->i_df.if_bytes;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		dsize = XFS_BMDR_SPACE_CALC(
+			XFS_BMAP_BROOT_NUMRECS(dp->i_df.if_broot));
+		break;
+	default:
+		/* should bail, unknown format, .... */
+	}
+
 	/* data fork btree root can have at least this many key/ptr pairs */
-	minforkoff = MAX(dp->i_df.if_bytes, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+	minforkoff = MAX(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
 	minforkoff = roundup(minforkoff, 8) >> 3;
 
 	/* attr fork btree root can have at least this many key/ptr pairs */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-21  4:02     ` Eric Sandeen
@ 2006-11-22  1:02       ` Russell Cattelan
  2006-11-22  8:59         ` Timothy Shimmin
  0 siblings, 1 reply; 34+ messages in thread
From: Russell Cattelan @ 2006-11-22  1:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1769 bytes --]

On Mon, 2006-11-20 at 22:02 -0600, Eric Sandeen wrote:
> Eric Sandeen wrote:
> 
> > Eric Sandeen wrote:
> >   
> >> ugh.  it's broken on x86 too, so it's not just the alignment/padding,
> >>     
> >> although that should be fixed for cross-arch mounts.
> >>     
> >> -Eric
> >>     
> >>     
> > here's a testcase to corrupt it FWIW.
> >   
> >   
> Ok, with expert collaboration from Russell, Barry, Tim, 
> Nathan, David, et al, how about this:
> 
> For btree dirs, we need a different calculation for the space
> used in di_u, to set the minimum threshold for the fork offset...
> 
> This fixes my testcase, but as Tim points out -now- we need to compact
> the btree ptrs, if we return (and use) an offset < current forkoff...
> 
> whee....
> 
> -Eric
> 
It turns out this only fixes one of the problems it is still quite easy 
to corrupt indoes with attr2.

The following patch is a short term fix that address the problem of
forkoff
moving without re-factoring the root inode btree root block.

Once the inode has be flipped to BTREE for the data space the forkoff is
fixed
to the that size, currently due to the way attr1 worked (fixed size
forkoff) the code is not handling the size to the root btree node due to
size changes in the attr portion of the inode.

The optimal solution is to adjust the data portion of the inode root
btree block down if space exists.

One easy fix that was resulting all attr add being pushed out of line is
added
the header size to the initial split of the inode, at least the first
attr add
should go inline now. Which should be a win the big attr user right now 
SElinux.

Including the 2 test script that have been used.



-- 
Russell Cattelan <cattelan@thebarn.com>

[-- Attachment #1.2: attr2_patch --]
[-- Type: text/x-patch, Size: 5208 bytes --]

Index: work_gfs/fs/xfs/xfs_attr.c
===================================================================
--- work_gfs.orig/fs/xfs/xfs_attr.c	2006-11-21 18:38:27.572949303 -0600
+++ work_gfs/fs/xfs/xfs_attr.c	2006-11-21 18:44:51.666033422 -0600
@@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const 
 	 * (inode must not be locked when we call this routine)
 	 */
 	if (XFS_IFORK_Q(dp) == 0) {
-		if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
-			return(error);
+		if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
+		    ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
+		     (dp->i_d.di_anextents == 0))) {
+			/* xfs_bmap_add_attrfork will set the forkoffset based on
+			 * the size needed, the local attr case needs the size
+			 * attr plus the size of the hdr, if the size of
+			 * header is not accounted for initially the forkoffset
+			 * won't allow enough space, the actually attr add will
+			 * then be forced out out line to extents
+			 */
+			size += sizeof(xfs_attr_sf_hdr_t);
+			if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+				return(error);
+		}
 	}
 
 	/*
Index: work_gfs/fs/xfs/xfs_attr_leaf.c
===================================================================
--- work_gfs.orig/fs/xfs/xfs_attr_leaf.c	2006-11-21 18:38:27.572949303 -0600
+++ work_gfs/fs/xfs/xfs_attr_leaf.c	2006-11-21 18:45:16.094360919 -0600
@@ -148,40 +148,76 @@ int
 xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes)
 {
 	int offset;
-	int minforkoff;	/* lower limit on valid forkoff locations */
-	int maxforkoff;	/* upper limit on valid forkoff locations */
+	int minforkoff = 0;	/* lower limit on valid forkoff locations */
+	int maxforkoff = 0;	/* upper limit on valid forkoff locations */
 	xfs_mount_t *mp = dp->i_mount;
+	int result = 0;
+	int dsize = 0;
 
 	offset = (XFS_LITINO(mp) - bytes) >> 3; /* rounded down */
 
 	switch (dp->i_d.di_format) {
 	case XFS_DINODE_FMT_DEV:
 		minforkoff = roundup(sizeof(xfs_dev_t), 8) >> 3;
-		return (offset >= minforkoff) ? minforkoff : 0;
+		result = (offset >= minforkoff) ? minforkoff : 0;
+		goto result;
 	case XFS_DINODE_FMT_UUID:
 		minforkoff = roundup(sizeof(uuid_t), 8) >> 3;
-		return (offset >= minforkoff) ? minforkoff : 0;
+		result = (offset >= minforkoff) ? minforkoff : 0;
+		goto result;
 	}
 
 	if (!(mp->m_flags & XFS_MOUNT_ATTR2)) {
-		if (bytes <= XFS_IFORK_ASIZE(dp))
-			return mp->m_attroffset >> 3;
-		return 0;
+		if (bytes <= XFS_IFORK_ASIZE(dp)) {
+			result = mp->m_attroffset >> 3;
+			goto result;
+		}
+		result = 0;
+		goto result;
 	}
 
 	/* data fork btree root can have at least this many key/ptr pairs */
-	minforkoff = MAX(dp->i_df.if_bytes, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+	switch (dp->i_d.di_format) {
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_EXTENTS:
+		dsize = dp->i_df.if_bytes;
+		break;
+	case XFS_DINODE_FMT_BTREE:
+		if (dp->i_d.di_forkoff)
+			dsize = dp->i_d.di_forkoff << 3;
+		else
+			dsize = XFS_BMDR_SPACE_CALC(
+			  XFS_BMAP_BROOT_NUMRECS(dp->i_df.if_broot));
+		break;
+	default:
+		break;
+	}
+
+	minforkoff = MAX(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
 	minforkoff = roundup(minforkoff, 8) >> 3;
 
 	/* attr fork btree root can have at least this many key/ptr pairs */
 	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */
 
-	if (offset >= minforkoff && offset < maxforkoff)
-		return offset;
-	if (offset >= maxforkoff)
-		return maxforkoff;
-	return 0;
+	if (offset >= minforkoff && offset < maxforkoff) {
+		result = offset;
+	}
+
+	if (offset >= maxforkoff) {
+		result = maxforkoff;
+	}
+
+	/* the case of btree we don't want to move the forkoff
+	 * since that would require a rebalance of the btree
+	 * which is currently not implemented for attr2
+	 */
+
+	if (dp->i_d.di_format == XFS_DINODE_FMT_BTREE && result) {
+		result = dsize >> 3;
+	}
+result:
+	return result;
 }
 
 /*
Index: work_gfs/fs/xfs/xfs_bmap.c
===================================================================
--- work_gfs.orig/fs/xfs/xfs_bmap.c	2006-11-21 18:38:27.572949303 -0600
+++ work_gfs/fs/xfs/xfs_bmap.c	2006-11-21 18:45:38.446660445 -0600
@@ -3632,7 +3632,26 @@ xfs_bmap_local_to_extents(
 		flags |= XFS_ILOG_FEXT(whichfork);
 	} else {
 		ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
-		xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+
+		/* Changing the forkoff requires a rebalance
+		   of the data btree.
+		   Once forkoff is set leave it fixed while
+		   data format is stored in a btree.
+
+		   Granted the inode btree block in initially
+		   setup with the max avalible DSIZE space so
+		   might be a slight waste of space to not
+		   rebalance.
+		   But since the attr1 code did not have a
+		   variable forkoff as attr2 now has the
+		   detection points and the reblanace code
+		   it not in place.
+		   Take the easy way out for now
+		*/
+
+		if (ip->i_d.di_format != XFS_DINODE_FMT_BTREE)
+			xfs_bmap_forkoff_reset(ip->i_mount, ip, whichfork);
+
 	}
 	ifp->if_flags &= ~XFS_IFINLINE;
 	ifp->if_flags |= XFS_IFEXTENTS;

[-- Attachment #1.3: r2.sh --]
[-- Type: application/x-shellscript, Size: 569 bytes --]

[-- Attachment #1.4: e2 --]
[-- Type: application/x-shellscript, Size: 713 bytes --]

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22  1:02       ` Russell Cattelan
@ 2006-11-22  8:59         ` Timothy Shimmin
  2006-11-22 15:44           ` Eric Sandeen
  0 siblings, 1 reply; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-22  8:59 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Eric Sandeen, xfs

Thanks, Russell.

I've been going thru the irc and just started looking at the patch.
I'll get back to you about it tomorrow.

I agree it would be good to have the fixed forkoff for data btree roots
as the first fix. And look into redoing the btree root for a later change.

Cheers,
Tim.

--On 21 November 2006 7:02:16 PM -0600 Russell Cattelan <cattelan@thebarn.com> wrote:

> On Mon, 2006-11-20 at 22:02 -0600, Eric Sandeen wrote:
>> Eric Sandeen wrote:
>>
>> > Eric Sandeen wrote:
>> >
>> >> ugh.  it's broken on x86 too, so it's not just the alignment/padding,
>> >>
>> >> although that should be fixed for cross-arch mounts.
>> >>
>> >> -Eric
>> >>
>> >>
>> > here's a testcase to corrupt it FWIW.
>> >
>> >
>> Ok, with expert collaboration from Russell, Barry, Tim,
>> Nathan, David, et al, how about this:
>>
>> For btree dirs, we need a different calculation for the space
>> used in di_u, to set the minimum threshold for the fork offset...
>>
>> This fixes my testcase, but as Tim points out -now- we need to compact
>> the btree ptrs, if we return (and use) an offset < current forkoff...
>>
>> whee....
>>
>> -Eric
>>
> It turns out this only fixes one of the problems it is still quite easy
> to corrupt indoes with attr2.
>
> The following patch is a short term fix that address the problem of
> forkoff
> moving without re-factoring the root inode btree root block.
>
> Once the inode has be flipped to BTREE for the data space the forkoff is
> fixed
> to the that size, currently due to the way attr1 worked (fixed size
> forkoff) the code is not handling the size to the root btree node due to
> size changes in the attr portion of the inode.
>
> The optimal solution is to adjust the data portion of the inode root
> btree block down if space exists.
>
> One easy fix that was resulting all attr add being pushed out of line is
> added
> the header size to the initial split of the inode, at least the first
> attr add
> should go inline now. Which should be a win the big attr user right now
> SElinux.
>
> Including the 2 test script that have been used.
>
>
>
> --
> Russell Cattelan <cattelan@thebarn.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22  8:59         ` Timothy Shimmin
@ 2006-11-22 15:44           ` Eric Sandeen
  2006-11-22 16:24             ` Russell Cattelan
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-22 15:44 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Russell Cattelan, xfs

Timothy Shimmin wrote:
> Thanks, Russell.
> 
> I've been going thru the irc and just started looking at the patch.
> I'll get back to you about it tomorrow.
> 
> I agree it would be good to have the fixed forkoff for data btree roots
> as the first fix. And look into redoing the btree root for a later change.

My only question is, how much does this defeat the purpose of attr2?

-Eric

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22 15:44           ` Eric Sandeen
@ 2006-11-22 16:24             ` Russell Cattelan
  2006-11-22 16:38               ` Eric Sandeen
  2006-11-23 22:49               ` [PATCH] " David Chinner
  0 siblings, 2 replies; 34+ messages in thread
From: Russell Cattelan @ 2006-11-22 16:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Timothy Shimmin, xfs

[-- Attachment #1: Type: text/plain, Size: 1431 bytes --]

On Wed, 2006-11-22 at 09:44 -0600, Eric Sandeen wrote:
> Timothy Shimmin wrote:
> > Thanks, Russell.
> > 
> > I've been going thru the irc and just started looking at the patch.
> > I'll get back to you about it tomorrow.
> > 
> > I agree it would be good to have the fixed forkoff for data btree roots
> > as the first fix. And look into redoing the btree root for a later change.
> 
> My only question is, how much does this defeat the purpose of attr2?
Well from the standpoint that attr2 currently corrupts inodes anything
to prevent that is good, since  currently attr2 can't be used at all.
When the di_u is extent based the attr2 code works as expected, giving
space to which ever segment gets there first.The attr2 code should still
be a big win for most file/dir inodes since they are probably able to do
their block mapping with local or extent mode.

The number of inodes that get pushed to btree mode should be a small %
of the
total number of inodes, especially on a root file system. So while attr2
is
not as efficient as it could be for that segment of the inodes the rest
of inodes
do benefit from attr2

By fixing the initial size calculation at least things like SElinux
which is adding one attr won't cause the attr segment to flip to extents
immediately.
The second attr will cause the flip but not the first one.


> 
> -Eric
> 
-- 
Russell Cattelan <cattelan@thebarn.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22 16:24             ` Russell Cattelan
@ 2006-11-22 16:38               ` Eric Sandeen
  2006-11-23  7:09                 ` Timothy Shimmin
  2006-11-23 22:49               ` [PATCH] " David Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2006-11-22 16:38 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Timothy Shimmin, xfs

Russell Cattelan wrote:
> On Wed, 2006-11-22 at 09:44 -0600, Eric Sandeen wrote:
>> Timothy Shimmin wrote:
>>> Thanks, Russell.
>>>
>>> I've been going thru the irc and just started looking at the patch.
>>> I'll get back to you about it tomorrow.
>>>
>>> I agree it would be good to have the fixed forkoff for data btree roots
>>> as the first fix. And look into redoing the btree root for a later change.
>> My only question is, how much does this defeat the purpose of attr2?
> Well from the standpoint that attr2 currently corrupts inodes anything
> to prevent that is good, since  currently attr2 can't be used at all.
> When the di_u is extent based the attr2 code works as expected, giving
> space to which ever segment gets there first.The attr2 code should still
> be a big win for most file/dir inodes since they are probably able to do
> their block mapping with local or extent mode.

yeah, that;s rpobqably true.

> The number of inodes that get pushed to btree mode should be a small %
> of the
> total number of inodes, especially on a root file system. So while attr2
> is
> not as efficient as it could be for that segment of the inodes the rest
> of inodes
> do benefit from attr2
> 
> By fixing the initial size calculation at least things like SElinux
> which is adding one attr won't cause the attr segment to flip to extents
> immediately.
> The second attr will cause the flip but not the first one.

I'd say this part (fixing up proper space for the initial attr fork 
setup) should probably go in soon if it gets good reviews (with the 
removal of the extra tests, as we discussed on irc last night).  I think 
this proper change stands on its own just fine.

the rest of the patch... I'd rather not confuse the functional changes 
with your rearrangement of return locations (the new gotos etc) but 
that's just me.

I think the bytesfit() fixup is probably good too, with your short-term 
addition of "if forkoff exists with btree data then it cannot move"

-Eric

> 
>> -Eric
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22 16:38               ` Eric Sandeen
@ 2006-11-23  7:09                 ` Timothy Shimmin
  2006-11-23 17:37                   ` Russell Cattelan
  0 siblings, 1 reply; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-23  7:09 UTC (permalink / raw)
  To: Eric Sandeen, Russell Cattelan; +Cc: xfs

Hi Guys,

So just looking at the first part, which as Eric suggested can be considered
on its own.

Index: work_gfs/fs/xfs/xfs_attr.c
===================================================================
--- work_gfs.orig/fs/xfs/xfs_attr.c	2006-11-21 18:38:27.572949303 -0600
+++ work_gfs/fs/xfs/xfs_attr.c	2006-11-21 18:44:51.666033422 -0600
@@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
 	 * (inode must not be locked when we call this routine)
 	 */
 	if (XFS_IFORK_Q(dp) == 0) {
-		if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
-			return(error);
+		if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
+		    ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
+		     (dp->i_d.di_anextents == 0))) {
+			/* xfs_bmap_add_attrfork will set the forkoffset based on
+			 * the size needed, the local attr case needs the size
+			 * attr plus the size of the hdr, if the size of
+			 * header is not accounted for initially the forkoffset
+			 * won't allow enough space, the actually attr add will
+			 * then be forced out out line to extents
+			 */
+			size += sizeof(xfs_attr_sf_hdr_t);
+			if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+				return(error);
+		}
 	}

--On 22 November 2006 10:38:16 AM -0600 Eric Sandeen <sandeen@sandeen.net> wrote:
>> By fixing the initial size calculation at least things like SElinux
>> which is adding one attr won't cause the attr segment to flip to extents
>> immediately.
>> The second attr will cause the flip but not the first one.
>
> I'd say this part (fixing up proper space for the initial attr fork setup) should probably go in
> soon if it gets good reviews (with the removal of the extra tests, as we discussed on irc last
> night).  I think this proper change stands on its own just fine.
>

So yeah, as you said in IRC, the brace is in the wrong spot and
the di_aformat tests don't make any sense here.
Basically, we know that fork offset is zero and therefore that the di_aformat should be
set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
As this is the state before we add in an attribute fork.
Why we have this initial state as extents, I'm not too sure and wondered in the
past. Maybe because this state is one which doesn't occupy any space in the literal area.
A shortform EA has a header at least.

My next concern is that the size that is calculated is presumably trying to accomodate
the shortform EA. However, the calculation is for the sf header and the space for a
a xfs_attr_leaf_name_local with given namelen and valuelen.
It would be better to base it on an xfs_attr_sf_entry type.
So I think we need to rework this calculation.

Which leads me on to the next issue.
We don't know what EA form we are going to take,
so we can't really assume that it will be shortform.
If the EA name or value is big then the EA will go into extents and could occupy very
little room in the inode.
With the current & proposed test this could make the bytesfit function return 0
(the offset calculated in bytesfit could also go negative) and
then we would set the forkoff back at the old attr1 default.
So we might have 1 EA extent in the inode taking little space and yet setting the forkoff
in the middle.

Of course the setting of the forkoff is a bit of a guessing game since we can't
predict the future usage but I think the plan is to set it to the minimum to fit
on a first come first served basis.

So I'm thinking that we should set it based on the size of shortform if that
is how it will be stored or to the size taken up by the EA extents -
I was initially thinking that this would be 1 extent but with a remote value
block of up to 64K this could in theory be an extent for each fsb of the value
I guess.
Have to think about this some more.

--Tim

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-23  7:09                 ` Timothy Shimmin
@ 2006-11-23 17:37                   ` Russell Cattelan
  2006-11-24  4:47                     ` Timothy Shimmin
  2006-11-27 12:50                     ` Tim Shimmin
  0 siblings, 2 replies; 34+ messages in thread
From: Russell Cattelan @ 2006-11-23 17:37 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Eric Sandeen, xfs

Timothy Shimmin wrote:

> Hi Guys,
>
> So just looking at the first part, which as Eric suggested can be 
> considered
> on its own.
>
> Index: work_gfs/fs/xfs/xfs_attr.c
> ===================================================================
> --- work_gfs.orig/fs/xfs/xfs_attr.c    2006-11-21 18:38:27.572949303 
> -0600
> +++ work_gfs/fs/xfs/xfs_attr.c    2006-11-21 18:44:51.666033422 -0600
> @@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
>      * (inode must not be locked when we call this routine)
>      */
>     if (XFS_IFORK_Q(dp) == 0) {
> -        if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
> -            return(error);
> +        if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
> +            ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
> +             (dp->i_d.di_anextents == 0))) {
> +            /* xfs_bmap_add_attrfork will set the forkoffset based on
> +             * the size needed, the local attr case needs the size
> +             * attr plus the size of the hdr, if the size of
> +             * header is not accounted for initially the forkoffset
> +             * won't allow enough space, the actually attr add will
> +             * then be forced out out line to extents
> +             */
> +            size += sizeof(xfs_attr_sf_hdr_t);
> +            if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
> +                return(error);
> +        }
>     }
>
> --On 22 November 2006 10:38:16 AM -0600 Eric Sandeen 
> <sandeen@sandeen.net> wrote:
>
>>> By fixing the initial size calculation at least things like SElinux
>>> which is adding one attr won't cause the attr segment to flip to 
>>> extents
>>> immediately.
>>> The second attr will cause the flip but not the first one.
>>
>>
>> I'd say this part (fixing up proper space for the initial attr fork 
>> setup) should probably go in
>> soon if it gets good reviews (with the removal of the extra tests, as 
>> we discussed on irc last
>> night).  I think this proper change stands on its own just fine.
>>
>
> So yeah, as you said in IRC, the brace is in the wrong spot and
> the di_aformat tests don't make any sense here.
> Basically, we know that fork offset is zero and therefore that the 
> di_aformat should be
> set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
> As this is the state before we add in an attribute fork.
> Why we have this initial state as extents, I'm not too sure and 
> wondered in the
> past. Maybe because this state is one which doesn't occupy any space 
> in the literal area.
> A shortform EA has a header at least.
>
> My next concern is that the size that is calculated is presumably 
> trying to accomodate
> the shortform EA. However, the calculation is for the sf header and 
> the space for a
> a xfs_attr_leaf_name_local with given namelen and valuelen.
> It would be better to base it on an xfs_attr_sf_entry type.
> So I think we need to rework this calculation.
>
> Which leads me on to the next issue.
> We don't know what EA form we are going to take,
> so we can't really assume that it will be shortform.
> If the EA name or value is big then the EA will go into extents and 
> could occupy very
> little room in the inode.
> With the current & proposed test this could make the bytesfit function 
> return 0
> (the offset calculated in bytesfit could also go negative) and
> then we would set the forkoff back at the old attr1 default.
> So we might have 1 EA extent in the inode taking little space and yet 
> setting the forkoff
> in the middle.

Yes I agree worst cast scenario is that the inode has reverted to an 
attr1 split and
that space is being wasted in the attr portion. By the time an inode has 
flipped to
btree mode for di_u how much of a performance hit is really going to be 
noticed?
mapping the blocks for that inode is going to take multiple reads.
Attr2 seems most effective at space optimization for the local and extent
versions of di_u and probably not so much for btree.

At least by fixing the size calculation the shortforms that do fit  into 
di_a will now
be added inline.  What is happening now the btree is being re factored
which is probably expensive and the attr is being added as extents since the
original size used for the btree refactoring wasn't enough.

So the change to add in the header size will at least make single case
attrs more efficient since they will now be inline.
If the attr does not fit inline then worst case the forkoff flips to attr1
default  or a half and half split.

Given the cost of refactoring a btree it might be better to have attr1 
behavior?
Since di_a will have extra space additional attr adds won't  cause 
forkoff to
move and thus won't cause a rebalance of di_u.

So in thinking about this more does it make sense to actually not
try to optimize the space needed for di_u when it is a btree?
Maybe the first time an attr is added simply split the inode
space doing the rebalance once?
That would allow for more attrs to be added without rebalancing
the data btree.
The other scheme of space optimzation if the root btree node
is sparse would say sure give more space to di_a but at
the expense of a reblanace.

So ya it's a bit of a guessing game.

>
> Of course the setting of the forkoff is a bit of a guessing game since 
> we can't
> predict the future usage but I think the plan is to set it to the 
> minimum to fit
> on a first come first served basis.
>
> So I'm thinking that we should set it based on the size of shortform 
> if that
> is how it will be stored or to the size taken up by the EA extents -
> I was initially thinking that this would be 1 extent but with a remote 
> value
> block of up to 64K this could in theory be an extent for each fsb of 
> the value
> I guess.
> Have to think about this some more.
>
> --Tim
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-22 16:24             ` Russell Cattelan
  2006-11-22 16:38               ` Eric Sandeen
@ 2006-11-23 22:49               ` David Chinner
  1 sibling, 0 replies; 34+ messages in thread
From: David Chinner @ 2006-11-23 22:49 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Eric Sandeen, Timothy Shimmin, xfs

On Wed, Nov 22, 2006 at 10:24:55AM -0600, Russell Cattelan wrote:
> On Wed, 2006-11-22 at 09:44 -0600, Eric Sandeen wrote:
> > Timothy Shimmin wrote:
> > > Thanks, Russell.
> > > 
> > > I've been going thru the irc and just started looking at the patch.
> > > I'll get back to you about it tomorrow.
> > > 
> > > I agree it would be good to have the fixed forkoff for data btree roots
> > > as the first fix. And look into redoing the btree root for a later change.
> > 
> > My only question is, how much does this defeat the purpose of attr2?
> Well from the standpoint that attr2 currently corrupts inodes anything
> to prevent that is good, since  currently attr2 can't be used at all.
> When the di_u is extent based the attr2 code works as expected, giving
> space to which ever segment gets there first.The attr2 code should still
> be a big win for most file/dir inodes since they are probably able to do
> their block mapping with local or extent mode.

I suggest that dbench -x might be the best way to determine the
perfomrance impact - attr1 on a single disk would get ~5MB/s; attr2
on the same disk for the same test gave about 50MB/s (IIRC). I'd
hope that this fix retains that kind of advantage for attr2....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-23 17:37                   ` Russell Cattelan
@ 2006-11-24  4:47                     ` Timothy Shimmin
  2006-11-27 12:50                     ` Tim Shimmin
  1 sibling, 0 replies; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-24  4:47 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Eric Sandeen, xfs

--On 23 November 2006 11:37:46 AM -0600 Russell Cattelan <cattelan@thebarn.com> wrote:

> Timothy Shimmin wrote:
>
>> Hi Guys,
>>
>> So just looking at the first part, which as Eric suggested can be
>> considered
>> on its own.
>>
>> Index: work_gfs/fs/xfs/xfs_attr.c
>> ===================================================================
>> --- work_gfs.orig/fs/xfs/xfs_attr.c    2006-11-21 18:38:27.572949303
>> -0600
>> +++ work_gfs/fs/xfs/xfs_attr.c    2006-11-21 18:44:51.666033422 -0600
>> @@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
>>      * (inode must not be locked when we call this routine)
>>      */
>>     if (XFS_IFORK_Q(dp) == 0) {
>> -        if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> -            return(error);
>> +        if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
>> +            ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
>> +             (dp->i_d.di_anextents == 0))) {
>> +            /* xfs_bmap_add_attrfork will set the forkoffset based on
>> +             * the size needed, the local attr case needs the size
>> +             * attr plus the size of the hdr, if the size of
>> +             * header is not accounted for initially the forkoffset
>> +             * won't allow enough space, the actually attr add will
>> +             * then be forced out out line to extents
>> +             */
>> +            size += sizeof(xfs_attr_sf_hdr_t);
>> +            if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> +                return(error);
>> +        }
>>     }
>>
>> --On 22 November 2006 10:38:16 AM -0600 Eric Sandeen
>> <sandeen@sandeen.net> wrote:
>>
>>>> By fixing the initial size calculation at least things like SElinux
>>>> which is adding one attr won't cause the attr segment to flip to
>>>> extents
>>>> immediately.
>>>> The second attr will cause the flip but not the first one.
>>>
>>>
>>> I'd say this part (fixing up proper space for the initial attr fork
>>> setup) should probably go in
>>> soon if it gets good reviews (with the removal of the extra tests, as
>>> we discussed on irc last
>>> night).  I think this proper change stands on its own just fine.
>>>
>>
>> So yeah, as you said in IRC, the brace is in the wrong spot and
>> the di_aformat tests don't make any sense here.
>> Basically, we know that fork offset is zero and therefore that the
>> di_aformat should be
>> set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
>> As this is the state before we add in an attribute fork.
>> Why we have this initial state as extents, I'm not too sure and
>> wondered in the
>> past. Maybe because this state is one which doesn't occupy any space
>> in the literal area.
>> A shortform EA has a header at least.
>>
>> My next concern is that the size that is calculated is presumably
>> trying to accomodate
>> the shortform EA. However, the calculation is for the sf header and
>> the space for a
>> a xfs_attr_leaf_name_local with given namelen and valuelen.
>> It would be better to base it on an xfs_attr_sf_entry type.
>> So I think we need to rework this calculation.
>>
>> Which leads me on to the next issue.
>> We don't know what EA form we are going to take,
>> so we can't really assume that it will be shortform.
>> If the EA name or value is big then the EA will go into extents and
>> could occupy very
>> little room in the inode.
>> With the current & proposed test this could make the bytesfit function
>> return 0
>> (the offset calculated in bytesfit could also go negative) and
>> then we would set the forkoff back at the old attr1 default.
>> So we might have 1 EA extent in the inode taking little space and yet
>> setting the forkoff
>> in the middle.
>
> Yes I agree worst cast scenario is that the inode has reverted to an attr1 split and
> that space is being wasted in the attr portion. By the time an inode has flipped to
> btree mode for di_u how much of a performance hit is really going to be noticed?
> mapping the blocks for that inode is going to take multiple reads.
> Attr2 seems most effective at space optimization for the local and extent
> versions of di_u and probably not so much for btree.
>
> At least by fixing the size calculation the shortforms that do fit  into di_a will now
> be added inline.  What is happening now the btree is being re factored
> which is probably expensive and the attr is being added as extents since the
> original size used for the btree refactoring wasn't enough.
>
> So the change to add in the header size will at least make single case
> attrs more efficient since they will now be inline.

No argument there, I just want to make sure the SF size calculation is correct.
I think we need a new macro here.
Currently we just accumulate the size using the totsize field in the header
and so on an EA add, just add in the space for 1 entry: i.e.
        newsize = XFS_ATTR_SF_TOTSIZE(args->dp);
        newsize += XFS_ATTR_SF_ENTSIZE_BYNAME(args->namelen, args->valuelen);
We need a macro, at least, for the total size for 1 given EA.
xfs_attr_leaf_newentsize is not necessarily the space for a shortform (SF) EA -
the structs do have different field sizes.


> If the attr does not fit inline then worst case the forkoff flips to attr1
> default  or a half and half split.
>
> Given the cost of refactoring a btree it might be better to have attr1 behavior?
> Since di_a will have extra space additional attr adds won't  cause forkoff to
> move and thus won't cause a rebalance of di_u.
>
If the SF EA can't fit in the literal space, and goes to extents then adding
further EAs probably isn't going to move the forkoff at all either, since they'll
probably just go into the EA leaf block.

> So in thinking about this more does it make sense to actually not
> try to optimize the space needed for di_u when it is a btree?
> Maybe the first time an attr is added simply split the inode
> space doing the rebalance once?
> That would allow for more attrs to be added without rebalancing
> the data btree.
> The other scheme of space optimzation if the root btree node
> is sparse would say sure give more space to di_a but at
> the expense of a reblanace.
>
> So ya it's a bit of a guessing game.
>
Yeah I'm inclined not to worry about compressing the btree data root but
just working out where it really finishes in the literal area and not
pushing it.

(All this would be a lot easier to discuss in person with a whiteboard,
 as I think Eric was kind of suggesting in irc :)

Back to the 1st part.
I'm thinking that if I can't fit the SF EA in the available space,
then we might just assume that we should leave EA space for the
MAX(2 extents, min-root-size) i.e. try for 2 extents (bytesfit will make
sure we atleast have the min-root-size). The forkoff is just a good
guess after all - it's a tradeoff.

Okay, need to do some coding...

--Tim

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-23 17:37                   ` Russell Cattelan
  2006-11-24  4:47                     ` Timothy Shimmin
@ 2006-11-27 12:50                     ` Tim Shimmin
  2006-11-29  9:56                       ` [PATCH] attr2 patch for data btrees & attr 2 was: " Timothy Shimmin
  1 sibling, 1 reply; 34+ messages in thread
From: Tim Shimmin @ 2006-11-27 12:50 UTC (permalink / raw)
  To: Russell Cattelan, Eric Sandeen; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 6547 bytes --]

Hi Russell & Eric,

Basing on Russell's patch, I was thinking of something like the
attached patch.
However, I'm wondering if the xfs_attr_set_int() change to use
a req_size for non-fitting shortform EA's is worth it - as it is a bit
of a prediction (trying to codify what I was thinking).
Russell, perhaps I should just send in sf_size like you initially intended.
In fact, the more I think about it, I'm more inclined to just pass
in sf_size.

Haven't tested the patch out yet. Just wanted to discuss it a bit.

Cheers,
Tim.

Russell Cattelan wrote:
> Timothy Shimmin wrote:
> 
>> Hi Guys,
>>
>> So just looking at the first part, which as Eric suggested can be 
>> considered
>> on its own.
>>
>> Index: work_gfs/fs/xfs/xfs_attr.c
>> ===================================================================
>> --- work_gfs.orig/fs/xfs/xfs_attr.c    2006-11-21 18:38:27.572949303 
>> -0600
>> +++ work_gfs/fs/xfs/xfs_attr.c    2006-11-21 18:44:51.666033422 -0600
>> @@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
>>      * (inode must not be locked when we call this routine)
>>      */
>>     if (XFS_IFORK_Q(dp) == 0) {
>> -        if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> -            return(error);
>> +        if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
>> +            ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
>> +             (dp->i_d.di_anextents == 0))) {
>> +            /* xfs_bmap_add_attrfork will set the forkoffset based on
>> +             * the size needed, the local attr case needs the size
>> +             * attr plus the size of the hdr, if the size of
>> +             * header is not accounted for initially the forkoffset
>> +             * won't allow enough space, the actually attr add will
>> +             * then be forced out out line to extents
>> +             */
>> +            size += sizeof(xfs_attr_sf_hdr_t);
>> +            if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
>> +                return(error);
>> +        }
>>     }
>>
>> --On 22 November 2006 10:38:16 AM -0600 Eric Sandeen 
>> <sandeen@sandeen.net> wrote:
>>
>>>> By fixing the initial size calculation at least things like SElinux
>>>> which is adding one attr won't cause the attr segment to flip to 
>>>> extents
>>>> immediately.
>>>> The second attr will cause the flip but not the first one.
>>>
>>>
>>> I'd say this part (fixing up proper space for the initial attr fork 
>>> setup) should probably go in
>>> soon if it gets good reviews (with the removal of the extra tests, as 
>>> we discussed on irc last
>>> night).  I think this proper change stands on its own just fine.
>>>
>>
>> So yeah, as you said in IRC, the brace is in the wrong spot and
>> the di_aformat tests don't make any sense here.
>> Basically, we know that fork offset is zero and therefore that the 
>> di_aformat should be
>> set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
>> As this is the state before we add in an attribute fork.
>> Why we have this initial state as extents, I'm not too sure and 
>> wondered in the
>> past. Maybe because this state is one which doesn't occupy any space 
>> in the literal area.
>> A shortform EA has a header at least.
>>
>> My next concern is that the size that is calculated is presumably 
>> trying to accomodate
>> the shortform EA. However, the calculation is for the sf header and 
>> the space for a
>> a xfs_attr_leaf_name_local with given namelen and valuelen.
>> It would be better to base it on an xfs_attr_sf_entry type.
>> So I think we need to rework this calculation.
>>
>> Which leads me on to the next issue.
>> We don't know what EA form we are going to take,
>> so we can't really assume that it will be shortform.
>> If the EA name or value is big then the EA will go into extents and 
>> could occupy very
>> little room in the inode.
>> With the current & proposed test this could make the bytesfit function 
>> return 0
>> (the offset calculated in bytesfit could also go negative) and
>> then we would set the forkoff back at the old attr1 default.
>> So we might have 1 EA extent in the inode taking little space and yet 
>> setting the forkoff
>> in the middle.
> 
> Yes I agree worst cast scenario is that the inode has reverted to an 
> attr1 split and
> that space is being wasted in the attr portion. By the time an inode has 
> flipped to
> btree mode for di_u how much of a performance hit is really going to be 
> noticed?
> mapping the blocks for that inode is going to take multiple reads.
> Attr2 seems most effective at space optimization for the local and extent
> versions of di_u and probably not so much for btree.
> 
> At least by fixing the size calculation the shortforms that do fit  into 
> di_a will now
> be added inline.  What is happening now the btree is being re factored
> which is probably expensive and the attr is being added as extents since 
> the
> original size used for the btree refactoring wasn't enough.
> 
> So the change to add in the header size will at least make single case
> attrs more efficient since they will now be inline.
> If the attr does not fit inline then worst case the forkoff flips to attr1
> default  or a half and half split.
> 
> Given the cost of refactoring a btree it might be better to have attr1 
> behavior?
> Since di_a will have extra space additional attr adds won't  cause 
> forkoff to
> move and thus won't cause a rebalance of di_u.
> 
> So in thinking about this more does it make sense to actually not
> try to optimize the space needed for di_u when it is a btree?
> Maybe the first time an attr is added simply split the inode
> space doing the rebalance once?
> That would allow for more attrs to be added without rebalancing
> the data btree.
> The other scheme of space optimzation if the root btree node
> is sparse would say sure give more space to di_a but at
> the expense of a reblanace.
> 
> So ya it's a bit of a guessing game.
> 
>>
>> Of course the setting of the forkoff is a bit of a guessing game since 
>> we can't
>> predict the future usage but I think the plan is to set it to the 
>> minimum to fit
>> on a first come first served basis.
>>
>> So I'm thinking that we should set it based on the size of shortform 
>> if that
>> is how it will be stored or to the size taken up by the EA extents -
>> I was initially thinking that this would be 1 extent but with a remote 
>> value
>> block of up to 64K this could in theory be an extent for each fsb of 
>> the value
>> I guess.
>> Have to think about this some more.
>>
>> --Tim
>>


[-- Attachment #2: attr2.patch --]
[-- Type: text/plain, Size: 3971 bytes --]

--- .pc/attr2_tes/fs/xfs/xfs_attr.c	2006-10-26 17:45:01.000000000 +1000
+++ fs/xfs/xfs_attr.c	2006-11-27 22:58:49.753629073 +1100
@@ -210,7 +210,19 @@ xfs_attr_set_int(xfs_inode_t *dp, const 
 	 * (inode must not be locked when we call this routine)
 	 */
 	if (XFS_IFORK_Q(dp) == 0) {
-		if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+		int req_size;
+		int sf_size = sizeof(xfs_attr_sf_hdr_t) + XFS_ATTR_SF_ENTSIZE_BYNAME(namelen, valuelen);
+
+		if (local && (sf_size <= (XFS_LITINO(mp) - xfs_ifork_dsize_used(dp))))
+			req_size = sf_size;
+		else
+			/*
+			 * We can't fit our SF EA inline, so leave space for 2 EA extents
+			 * which should cover most initial EAs and most EAs in general
+			 */
+			req_size = 2 * sizeof(xfs_bmbt_rec_t);
+
+		if ((error = xfs_bmap_add_attrfork(dp, req_size, rsvd)))
 			return(error);
 	}
 
--- .pc/attr2_tes/fs/xfs/xfs_attr_leaf.c	2006-10-26 17:45:01.000000000 +1000
+++ fs/xfs/xfs_attr_leaf.c	2006-11-27 22:59:07.295306537 +1100
@@ -170,18 +170,25 @@ xfs_attr_shortform_bytesfit(xfs_inode_t 
 	}
 
 	/* data fork btree root can have at least this many key/ptr pairs */
-	minforkoff = MAX(dp->i_df.if_bytes, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+	minforkoff = MAX(xfs_ifork_dsize_used(dp), XFS_BMDR_SPACE_CALC(MINDBTPTRS));
 	minforkoff = roundup(minforkoff, 8) >> 3;
 
 	/* attr fork btree root can have at least this many key/ptr pairs */
 	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */
 
-	if (offset >= minforkoff && offset < maxforkoff)
-		return offset;
+	/* we can't fit inline */
+	if (offset < minforkoff)
+		return 0;
+
+	/* don't move the forkoff for data btree */
+	if (dp->i_d.di_format == XFS_DINODE_FMT_BTREE && dp->i_d.di_forkoff)
+		return dp->i_d.di_forkoff << 3;
+
 	if (offset >= maxforkoff)
 		return maxforkoff;
-	return 0;
+	else
+		return offset;
 }
 
 /*
--- .pc/attr2_tes/fs/xfs/xfs_bmap.c	2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_bmap.c	2006-11-27 15:54:33.166590715 +1100
@@ -3543,6 +3543,7 @@ xfs_bmap_forkoff_reset(
 	if (whichfork == XFS_ATTR_FORK &&
 	    (ip->i_d.di_format != XFS_DINODE_FMT_DEV) &&
 	    (ip->i_d.di_format != XFS_DINODE_FMT_UUID) &&
+	    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
 	    ((mp->m_attroffset >> 3) > ip->i_d.di_forkoff)) {
 		ip->i_d.di_forkoff = mp->m_attroffset >> 3;
 		ip->i_df.if_ext_max = XFS_IFORK_DSIZE(ip) /
--- .pc/attr2_tes/fs/xfs/xfs_inode.c	2006-11-27 23:20:19.000000000 +1100
+++ fs/xfs/xfs_inode.c	2006-11-27 23:21:29.604958540 +1100
@@ -4747,3 +4747,34 @@ xfs_iext_irec_update_extoffs(
 		ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
 	}
 }
+
+/*
+ * return how much space is used by the inode's data fork
+ */
+int
+xfs_ifork_dsize_used(xfs_inode_t *ip)
+{
+	switch (ip->i_d.di_format) {
+	case XFS_DINODE_FMT_DEV:
+		return sizeof(xfs_dev_t);
+	case XFS_DINODE_FMT_UUID:
+		return sizeof(uuid_t);
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_EXTENTS:
+		return ip->i_df.if_bytes;
+	case XFS_DINODE_FMT_BTREE:
+		if (ip->i_d.di_forkoff)
+			return ip->i_d.di_forkoff << 3;
+		else
+			/*
+			 * For new attr fork, data btree takes all the space,
+			 * so no room for any attrs with the current layout
+			 * but we can know how much space it really needs
+			 * i.e. the ptrs are half way along but we could compress to
+			 * preserve the num of records.
+			 */
+			return XFS_BMDR_SPACE_CALC(XFS_BMAP_BROOT_NUMRECS(ip->i_df.if_broot));
+	default:
+		return 0;
+	}
+}
--- .pc/attr2_tes/fs/xfs/xfs_inode.h	2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_inode.h	2006-11-27 23:24:09.376554289 +1100
@@ -535,6 +535,7 @@ void		xfs_iext_irec_compact(xfs_ifork_t 
 void		xfs_iext_irec_compact_pages(xfs_ifork_t *);
 void		xfs_iext_irec_compact_full(xfs_ifork_t *);
 void		xfs_iext_irec_update_extoffs(xfs_ifork_t *, int, int);
+int		xfs_ifork_dsize_used(xfs_inode_t *);
 
 #define xfs_ipincount(ip)	((unsigned int) atomic_read(&ip->i_pincount))
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] attr2 patch for data btrees & attr 2  was: (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-27 12:50                     ` Tim Shimmin
@ 2006-11-29  9:56                       ` Timothy Shimmin
  0 siblings, 0 replies; 34+ messages in thread
From: Timothy Shimmin @ 2006-11-29  9:56 UTC (permalink / raw)
  To: Russell Cattelan, Eric Sandeen; +Cc: xfs

Hi,

FYI
I've done more testing and still have more testing to do.
Found a bug in my previous patch.
Below is my latest one.

--Tim

--- .pc/attr2_tes/fs/xfs/xfs_attr.c	2006-10-26 17:45:01.000000000 +1000
+++ fs/xfs/xfs_attr.c	2006-11-28 14:45:09.191482676 +1100
@@ -199,18 +199,14 @@ xfs_attr_set_int(xfs_inode_t *dp, const
 		return (error);

 	/*
-	 * Determine space new attribute will use, and if it would be
-	 * "local" or "remote" (note: local != inline).
-	 */
-	size = xfs_attr_leaf_newentsize(namelen, valuelen,
-					mp->m_sb.sb_blocksize, &local);
-
-	/*
 	 * If the inode doesn't have an attribute fork, add one.
 	 * (inode must not be locked when we call this routine)
 	 */
 	if (XFS_IFORK_Q(dp) == 0) {
-		if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+		int sf_size = sizeof(xfs_attr_sf_hdr_t) +
+			      XFS_ATTR_SF_ENTSIZE_BYNAME(namelen, valuelen);
+
+		if ((error = xfs_bmap_add_attrfork(dp, sf_size, rsvd)))
 			return(error);
 	}

@@ -231,6 +227,13 @@ xfs_attr_set_int(xfs_inode_t *dp, const
 	args.addname = 1;
 	args.oknoent = 1;

+	/*
+	 * Determine space new attribute will use, and if it would be
+	 * "local" or "remote" (note: local != inline).
+	 */
+	size = xfs_attr_leaf_newentsize(namelen, valuelen,
+					mp->m_sb.sb_blocksize, &local);
+
 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
 	if (local) {
 		if (size > (mp->m_sb.sb_blocksize >> 1)) {
--- .pc/attr2_tes/fs/xfs/xfs_attr_leaf.c	2006-10-26 17:45:01.000000000 +1000
+++ fs/xfs/xfs_attr_leaf.c	2006-11-29 20:25:17.273599367 +1100
@@ -170,18 +170,33 @@ xfs_attr_shortform_bytesfit(xfs_inode_t
 	}

 	/* data fork btree root can have at least this many key/ptr pairs */
-	minforkoff = MAX(dp->i_df.if_bytes, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+	minforkoff = MAX(xfs_ifork_dsize_used(dp), XFS_BMDR_SPACE_CALC(MINDBTPTRS));
 	minforkoff = roundup(minforkoff, 8) >> 3;

 	/* attr fork btree root can have at least this many key/ptr pairs */
 	maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
 	maxforkoff = maxforkoff >> 3;	/* rounded down */

-	if (offset >= minforkoff && offset < maxforkoff)
-		return offset;
+	/* we can't fit inline */
+	if (offset < minforkoff)
+		return 0;
+
+	/*
+	 * If have data btree then keep forkoff if we have one,
+	 * otherwise we are adding a new attr, so then we set forkoff to where
+	 * the btree root can finish so we have plenty of room for attrs
+	 */
+	if (dp->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+		if (dp->i_d.di_forkoff)
+			return dp->i_d.di_forkoff;
+		else
+			return minforkoff;
+	}
+
 	if (offset >= maxforkoff)
 		return maxforkoff;
-	return 0;
+	else
+		return offset;
 }

 /*
--- .pc/attr2_tes/fs/xfs/xfs_bmap.c	2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_bmap.c	2006-11-27 15:54:33.166590715 +1100
@@ -3543,6 +3543,7 @@ xfs_bmap_forkoff_reset(
 	if (whichfork == XFS_ATTR_FORK &&
 	    (ip->i_d.di_format != XFS_DINODE_FMT_DEV) &&
 	    (ip->i_d.di_format != XFS_DINODE_FMT_UUID) &&
+	    (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
 	    ((mp->m_attroffset >> 3) > ip->i_d.di_forkoff)) {
 		ip->i_d.di_forkoff = mp->m_attroffset >> 3;
 		ip->i_df.if_ext_max = XFS_IFORK_DSIZE(ip) /
--- .pc/attr2_tes/fs/xfs/xfs_inode.c	2006-11-27 23:20:19.000000000 +1100
+++ fs/xfs/xfs_inode.c	2006-11-29 20:29:16.994035217 +1100
@@ -4747,3 +4747,34 @@ xfs_iext_irec_update_extoffs(
 		ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
 	}
 }
+
+/*
+ * return how much space is used by the inode's data fork
+ */
+int
+xfs_ifork_dsize_used(xfs_inode_t *ip)
+{
+	switch (ip->i_d.di_format) {
+	case XFS_DINODE_FMT_DEV:
+		return sizeof(xfs_dev_t);
+	case XFS_DINODE_FMT_UUID:
+		return sizeof(uuid_t);
+	case XFS_DINODE_FMT_LOCAL:
+	case XFS_DINODE_FMT_EXTENTS:
+		return ip->i_df.if_bytes;
+	case XFS_DINODE_FMT_BTREE:
+		if (ip->i_d.di_forkoff)
+			return ip->i_d.di_forkoff << 3;
+		else
+			/*
+			 * For new attr fork, data btree takes all the space,
+			 * so no room for any attrs with the current layout
+			 * but we can know how much space it really needs
+			 * i.e. the ptrs are half way along but we could compress to
+			 * preserve the num of records.
+			 */
+			return XFS_BMDR_SPACE_CALC(XFS_BMAP_BROOT_NUMRECS(ip->i_df.if_broot));
+	default:
+		return 0;
+	}
+}
--- .pc/attr2_tes/fs/xfs/xfs_inode.h	2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_inode.h	2006-11-27 23:24:09.376554289 +1100
@@ -535,6 +535,7 @@ void		xfs_iext_irec_compact(xfs_ifork_t
 void		xfs_iext_irec_compact_pages(xfs_ifork_t *);
 void		xfs_iext_irec_compact_full(xfs_ifork_t *);
 void		xfs_iext_irec_update_extoffs(xfs_ifork_t *, int, int);
+int		xfs_ifork_dsize_used(xfs_inode_t *);

 #define xfs_ipincount(ip)	((unsigned int) atomic_read(&ip->i_pincount))

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
                   ` (3 preceding siblings ...)
  2006-11-17 23:49 ` Eric Sandeen
@ 2007-05-17 14:41 ` Eric Sandeen
  2007-05-21  7:42   ` David Chinner
  4 siblings, 1 reply; 34+ messages in thread
From: Eric Sandeen @ 2007-05-17 14:41 UTC (permalink / raw)
  To: xfs

Eric Sandeen wrote:
> see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
> 
> Bugzilla Bug 212201: Cannot build sysem with XFS file system.
> 
> I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness 
> with more efficient xattr space usage.
> 
> But, many reports that this was totally broken in fc6, on x86_64.

Although it turned out to be a different issue, not the packing issue, 
is the packing/alignment (below) still something that needs to be fixed...?

-Eric

> Install went ok, but on reboot the filesystem was found to be corrupt.
> 
> The filesystem was also found to be marked w/ attr1, not attr2....
> 
> If you do a fresh mkfs.xfs on x86_64, with -i attr=2, and dump out the
> superblock (or look at it with xfs_db) you will find that although the
> versionnum says that there is a morebits bit, the features2 flag is 0.
> 
> if you dd/hexdump the superblock, you will find the attr2 flag, but at
> the wrong offset.
> 
> This is because the xfs_sb_t struct is padded out to 64 bits on 64-bit
> arches, and the xfs_xlatesb() routine and xfs_sb_info[] array take this
> padding to mean that the last item is 4 bytes bigger than it is, and
> treats sb_features2 as 8 bytes not four.  This then gets endian-flipped out...
> 
> I can't quite figure out how this winds up causing problems if you stay
> on the x86_64 arch, as I'd expect that if the offset is wrong, it should
> at least be consistently wrong.  And in fact if you do mkfs,mount,xfs_info,
> it will tell you that you do have attr2. But somewhere along the line thing
> go wrong, and post-install, post-reboot, the filesystem thinks it is attr1,
> and is therefore corrupt.
> 
> I think that maybe some accesses are post-xfs_xlatesb, while others
> may access the un-flipped sb directly?  Or maybe this is sb logging
> code that has messed things up?  Not sure... needs more investigation.
> 
> In any case, dd does not lie, and this patch for the kernel, and a 
> corresponding one for userspace, at least make "mkfs.xfs -i attr=2" 
> puts the features2 flag in the right place, as shown by inspection via dd.
> 
> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
> 
> Index: linux-2.6.18/fs/xfs/xfs_sb.h
> ===================================================================
> --- linux-2.6.18.orig/fs/xfs/xfs_sb.h
> +++ linux-2.6.18/fs/xfs/xfs_sb.h
> @@ -149,7 +149,7 @@ typedef struct xfs_sb
>  	__uint16_t	sb_logsectsize;	/* sector size for the log, bytes */
>  	__uint32_t	sb_logsunit;	/* stripe unit size for the log */
>  	__uint32_t	sb_features2;	/* additional feature bits */
> -} xfs_sb_t;
> +} __attribute__ ((packed)) xfs_sb_t;
>  
>  /*
>   * Sequence number values for the fields.
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
  2007-05-17 14:41 ` Eric Sandeen
@ 2007-05-21  7:42   ` David Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: David Chinner @ 2007-05-21  7:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Thu, May 17, 2007 at 09:41:40AM -0500, Eric Sandeen wrote:
> Eric Sandeen wrote:
> >see also https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212201
> >
> >Bugzilla Bug 212201: Cannot build sysem with XFS file system.
> >
> >I turned on attr2 in FC6 at nathan's suggestion, for selinux goodness 
> >with more efficient xattr space usage.
> >
> >But, many reports that this was totally broken in fc6, on x86_64.
> 
> Although it turned out to be a different issue, not the packing issue, 
> is the packing/alignment (below) still something that needs to be fixed...?

The problem of the screwed up structure should be fixed. however,
packing the structure is not the correct solution IMO. I think we discussed
the correct fx back when this was first brought up - blacklist the bad
bits in the superblock, do on the fly detection and correction of the problem
and make sure in future that we always leave padding in the structure
so that it is always correctly translated....

Patches are welcome ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-05-21  7:43 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-16 19:00 [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches Eric Sandeen
2006-11-16 22:10 ` Eric Sandeen
2006-11-20  3:50   ` Eric Sandeen
2006-11-21  4:02     ` Eric Sandeen
2006-11-22  1:02       ` Russell Cattelan
2006-11-22  8:59         ` Timothy Shimmin
2006-11-22 15:44           ` Eric Sandeen
2006-11-22 16:24             ` Russell Cattelan
2006-11-22 16:38               ` Eric Sandeen
2006-11-23  7:09                 ` Timothy Shimmin
2006-11-23 17:37                   ` Russell Cattelan
2006-11-24  4:47                     ` Timothy Shimmin
2006-11-27 12:50                     ` Tim Shimmin
2006-11-29  9:56                       ` [PATCH] attr2 patch for data btrees & attr 2 was: " Timothy Shimmin
2006-11-23 22:49               ` [PATCH] " David Chinner
2006-11-16 22:45 ` David Chinner
2006-11-16 22:55   ` Eric Sandeen
2006-11-17 15:53   ` Russell Cattelan
2006-11-17  1:08 ` Timothy Shimmin
2006-11-17  2:39   ` David Chinner
2006-11-17  4:11     ` Timothy Shimmin
2006-11-17  5:55       ` David Chinner
2006-11-17  6:34         ` sandeen
2006-11-17  6:52           ` Nathan Scott
2006-11-17 15:20             ` sandeen
2006-11-19 23:11               ` Nathan Scott
2006-11-20  1:39                 ` Eric Sandeen
2006-11-20  3:00                   ` Nathan Scott
2006-11-20  3:32                     ` Eric Sandeen
2006-11-20  3:37                       ` Nathan Scott
2006-11-17  6:58           ` David Chinner
2006-11-17 23:49 ` Eric Sandeen
2007-05-17 14:41 ` Eric Sandeen
2007-05-21  7:42   ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox