public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* ext2/4 large inode xattr mismash?
@ 2017-02-27 18:10 Darrick J. Wong
  2017-02-27 22:42 ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2017-02-27 18:10 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, ngkaho1234

Hi Ted,

Today ngkaho1234 pointed out on IRC that if you do the following:

$ mkfs.ext2 /dev/sda -F
$ mount /dev/sda /mnt -t ext4
$ touch /mnt/x
$ setfattr -n user.name1 -v moo /mnt/x
$ umount /mnt
$ mount /dev/sda /mnt -t ext2
$ setfattr -n user.name1 -v urk /mnt/x
$ umount /mnt

Then you get this broken looking result:
$ mount /dev/sda /mnt -e ext4
$ getfattr /mnt/x 
getfattr: Removing leading '/' from absolute path names
# file: mnt/x
user.name1
user.name1

Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
wrote, and neither can see the attr the other wrote.

$ debugfs -R 'features' /dev/sda
Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file

$ debugfs -R 'stat /x' /dev/sda
Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
Generation: 2341792653    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 0
File ACL: 617    Directory ACL: 0
Links: 1   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
 atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
 mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
Size of extra inode fields: 32
Extended attributes:
  user.name1 (3) = "moo"
  user.name1 (3) = "urk"
BLOCKS:

This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
/not/ be setting i_extra_isize=32.  Unfortunately, it does set
i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
the inode body.  That's a problem, because ext2 doesn't know about
inline attrs or i_extra_isize.

It occurred to me to check the s_inode_size, which is 256 on this
supposedly ext2 filesystem.  I'd have thought _fill_super would check
this value, but apparently its only criteria are that s_inode_size is at
least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
no ability to use any inode space beyond the first 128 bytes, so what
good are large inodes?

Oh, and e2fsck apparently doesn't notice if there are inodes with
i_extra_isize set even if ro_compat_extra_isize is not set.

I could see a bunch of fixes to resolve this problem:

1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
   have big inodes and i_extra_isize set, to encourage people to run
   e2fsck.
3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
   on a filesystem that doesn't support it.
4) mke2fs probably should stop allocating 256 byte inodes with the ext2
   and ext3 profiles, though it's not clear to me why the ext2 driver
   even allows this -- maybe there's some context here I don't know?

So ... /me isn't sure how to deal with this.  List? :)

--D

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext2/4 large inode xattr mismash?
  2017-02-27 18:10 ext2/4 large inode xattr mismash? Darrick J. Wong
@ 2017-02-27 22:42 ` Andreas Dilger
  2017-02-28  3:15   ` Kaho Ng
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andreas Dilger @ 2017-02-27 22:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Theodore Ts'o, linux-ext4, ngkaho1234

[-- Attachment #1: Type: text/plain, Size: 5324 bytes --]

On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> Hi Ted,
> 
> Today ngkaho1234 pointed out on IRC that if you do the following:
> 
> $ mkfs.ext2 /dev/sda -F
> $ mount /dev/sda /mnt -t ext4
> $ touch /mnt/x
> $ setfattr -n user.name1 -v moo /mnt/x
> $ umount /mnt
> $ mount /dev/sda /mnt -t ext2
> $ setfattr -n user.name1 -v urk /mnt/x
> $ umount /mnt
> 
> Then you get this broken looking result:
> $ mount /dev/sda /mnt -e ext4
> $ getfattr /mnt/x
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/x
> user.name1
> user.name1
> 
> Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
> ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
> and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
> wrote, and neither can see the attr the other wrote.
> 
> $ debugfs -R 'features' /dev/sda
> Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
> 
> $ debugfs -R 'stat /x' /dev/sda
> Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
> Generation: 2341792653    Version: 0x00000000:00000001
> User:     0   Group:     0   Project:     0   Size: 0
> File ACL: 617    Directory ACL: 0
> Links: 1   Blockcount: 8
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
> atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> Size of extra inode fields: 32
> Extended attributes:
>  user.name1 (3) = "moo"
>  user.name1 (3) = "urk"
> BLOCKS:
> 
> This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
> /not/ be setting i_extra_isize=32.  Unfortunately, it does set
> i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
> the inode body.  That's a problem, because ext2 doesn't know about
> inline attrs or i_extra_isize.

I suspect that this isn't a big problem in real life, since most systems
these days are using ext4 to mount ext2 filesystems, instead of using the
separate ext2 module, and it could understand the extra data anyway.

The thing to check/fix in the ext4 code is to not set i_extra_isize if the
EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
if it finds valid extra inode data beyond 128 bytes, since this is already
true out in the wild, so we don't want to clobber existing (meta)data in
large inodes.

Unfortunately, our policy is to not enable features in-kernel automatically,
to avoid the problem of potentially making the filesystem unmountable,
otherwise we could do the same in ext4.  That said, e2fsck will prompt the
user to fix this if needed.

> It occurred to me to check the s_inode_size, which is 256 on this
> supposedly ext2 filesystem.  I'd have thought _fill_super would check
> this value, but apparently its only criteria are that s_inode_size is at
> least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
> no ability to use any inode space beyond the first 128 bytes, so what
> good are large inodes?

I don't think the 256-byte inode size should be considered a problem on
ext2 by itself.  The old code understands the larger inode size, it just
didn't do anything with that larger size until ext3 had fast xattrs and
the added timestamp fields.  The other thought is to just get rid of the
ext2 code completely, and the problem would be gone...  I don't think
there are (m)any cases where ext2 is faster than ext4, if nojournal
is used to level the playing field.

> Oh, and e2fsck apparently doesn't notice if there are inodes with
> i_extra_isize set even if ro_compat_extra_isize is not set.
> 
> I could see a bunch of fixes to resolve this problem:
> 
> 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
> 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
>   have big inodes and i_extra_isize set, to encourage people to run
>   e2fsck.

No, it shouldn't be accessing that space if the feature flag isn't set,
and should return an error at mount if it is (if not read-only).  That
is the reason the EXTRA_ISIZE was created as a read-only feature, so
that ext2 can't add xattrs or change the extended timestamps that conflict
with values in the large inode.

> 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
>   on a filesystem that doesn't support it.

Wouldn't it be better to set the feature flag rather than deleting the
xattrs (which may not fit into the xattr block and/or may consume the
free space of the filesystem, and at a minimum will hurt performance)?

> 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
>   and ext3 profiles, though it's not clear to me why the ext2 driver
>   even allows this -- maybe there's some context here I don't know?

Well, this in itself isn't harmful, and allows those filesystems to be
updated to ext4 easily in the future.  I think the root of the problem
is that ext4 is enabling a feature without checking the feature flag.

> So ... /me isn't sure how to deal with this.  List? :)
> 
> --D


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext2/4 large inode xattr mismash?
  2017-02-27 22:42 ` Andreas Dilger
@ 2017-02-28  3:15   ` Kaho Ng
  2017-02-28  4:05   ` Darrick J. Wong
  2017-02-28 17:49   ` Kaho Ng
  2 siblings, 0 replies; 6+ messages in thread
From: Kaho Ng @ 2017-02-28  3:15 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4, Theodore Ts'o, Darrick J. Wong

On 02/28/2017 06:42 AM, Andreas Dilger wrote:
> On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>
>> Hi Ted,
>>
>> Today ngkaho1234 pointed out on IRC that if you do the following:
>>
>> $ mkfs.ext2 /dev/sda -F
>> $ mount /dev/sda /mnt -t ext4
>> $ touch /mnt/x
>> $ setfattr -n user.name1 -v moo /mnt/x
>> $ umount /mnt
>> $ mount /dev/sda /mnt -t ext2
>> $ setfattr -n user.name1 -v urk /mnt/x
>> $ umount /mnt
>>
>> Then you get this broken looking result:
>> $ mount /dev/sda /mnt -e ext4
>> $ getfattr /mnt/x
>> getfattr: Removing leading '/' from absolute path names
>> # file: mnt/x
>> user.name1
>> user.name1
>>
>> Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
>> ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
>> and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
>> wrote, and neither can see the attr the other wrote.
>>
>> $ debugfs -R 'features' /dev/sda
>> Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
>>
>> $ debugfs -R 'stat /x' /dev/sda
>> Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
>> Generation: 2341792653    Version: 0x00000000:00000001
>> User:     0   Group:     0   Project:     0   Size: 0
>> File ACL: 617    Directory ACL: 0
>> Links: 1   Blockcount: 8
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
>> atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> Size of extra inode fields: 32
>> Extended attributes:
>>  user.name1 (3) = "moo"
>>  user.name1 (3) = "urk"
>> BLOCKS:
>>
>> This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
>> /not/ be setting i_extra_isize=32.  Unfortunately, it does set
>> i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
>> the inode body.  That's a problem, because ext2 doesn't know about
>> inline attrs or i_extra_isize.
>
> I suspect that this isn't a big problem in real life, since most systems
> these days are using ext4 to mount ext2 filesystems, instead of using the
> separate ext2 module, and it could understand the extra data anyway.
>
> The thing to check/fix in the ext4 code is to not set i_extra_isize if the
> EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
> if it finds valid extra inode data beyond 128 bytes, since this is already
> true out in the wild, so we don't want to clobber existing (meta)data in
> large inodes.
>
> Unfortunately, our policy is to not enable features in-kernel automatically,
> to avoid the problem of potentially making the filesystem unmountable,
> otherwise we could do the same in ext4.  That said, e2fsck will prompt the
> user to fix this if needed.
>
>> It occurred to me to check the s_inode_size, which is 256 on this
>> supposedly ext2 filesystem.  I'd have thought _fill_super would check
>> this value, but apparently its only criteria are that s_inode_size is at
>> least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
>> no ability to use any inode space beyond the first 128 bytes, so what
>> good are large inodes?
>
> I don't think the 256-byte inode size should be considered a problem on
> ext2 by itself.  The old code understands the larger inode size, it just
> didn't do anything with that larger size until ext3 had fast xattrs and
> the added timestamp fields.  The other thought is to just get rid of the
> ext2 code completely, and the problem would be gone...  I don't think
> there are (m)any cases where ext2 is faster than ext4, if nojournal
> is used to level the playing field.
>
>> Oh, and e2fsck apparently doesn't notice if there are inodes with
>> i_extra_isize set even if ro_compat_extra_isize is not set.
>>
>> I could see a bunch of fixes to resolve this problem:
>>
>> 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
>> 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
>>   have big inodes and i_extra_isize set, to encourage people to run
>>   e2fsck.
>
> No, it shouldn't be accessing that space if the feature flag isn't set,
> and should return an error at mount if it is (if not read-only).  That
> is the reason the EXTRA_ISIZE was created as a read-only feature, so
> that ext2 can't add xattrs or change the extended timestamps that conflict
> with values in the large inode.
>
>> 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
>>   on a filesystem that doesn't support it.
>
> Wouldn't it be better to set the feature flag rather than deleting the
> xattrs (which may not fit into the xattr block and/or may consume the
> free space of the filesystem, and at a minimum will hurt performance)?
>
>> 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
>>   and ext3 profiles, though it's not clear to me why the ext2 driver
>>   even allows this -- maybe there's some context here I don't know?
>
> Well, this in itself isn't harmful, and allows those filesystems to be
> updated to ext4 easily in the future.  I think the root of the problem
> is that ext4 is enabling a feature without checking the feature flag.
>
>> So ... /me isn't sure how to deal with this.  List? :)
>>
>> --D
>
>
> Cheers, Andreas
>
>
>
>
>

My thought on the issue is that ext4 fs should be taught to deal with 
the case that RO_COMPAT_EXTRA_ISIZE is not set. If that flag is not set 
in superblock, ext4 fs should set EXT4_I(inode)->i_extra_isize to zero, 
thus not allowing the module itself from putting things starting at 
offset 128 in an inode. As such ext4 fs will not make incompatible 
changes on a filesystem created by mkfs.ext2.

Cheers,
Kaho Ng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext2/4 large inode xattr mismash?
  2017-02-27 22:42 ` Andreas Dilger
  2017-02-28  3:15   ` Kaho Ng
@ 2017-02-28  4:05   ` Darrick J. Wong
  2017-02-28 17:49   ` Kaho Ng
  2 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2017-02-28  4:05 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Theodore Ts'o, linux-ext4, ngkaho1234

On Mon, Feb 27, 2017 at 03:42:11PM -0700, Andreas Dilger wrote:
> On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > Hi Ted,
> > 
> > Today ngkaho1234 pointed out on IRC that if you do the following:
> > 
> > $ mkfs.ext2 /dev/sda -F
> > $ mount /dev/sda /mnt -t ext4
> > $ touch /mnt/x
> > $ setfattr -n user.name1 -v moo /mnt/x
> > $ umount /mnt
> > $ mount /dev/sda /mnt -t ext2
> > $ setfattr -n user.name1 -v urk /mnt/x
> > $ umount /mnt
> > 
> > Then you get this broken looking result:
> > $ mount /dev/sda /mnt -e ext4
> > $ getfattr /mnt/x
> > getfattr: Removing leading '/' from absolute path names
> > # file: mnt/x
> > user.name1
> > user.name1
> > 
> > Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
> > ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
> > and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
> > wrote, and neither can see the attr the other wrote.
> > 
> > $ debugfs -R 'features' /dev/sda
> > Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
> > 
> > $ debugfs -R 'stat /x' /dev/sda
> > Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
> > Generation: 2341792653    Version: 0x00000000:00000001
> > User:     0   Group:     0   Project:     0   Size: 0
> > File ACL: 617    Directory ACL: 0
> > Links: 1   Blockcount: 8
> > Fragment:  Address: 0    Number: 0    Size: 0
> > ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
> > atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> > mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> > crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
> > Size of extra inode fields: 32
> > Extended attributes:
> >  user.name1 (3) = "moo"
> >  user.name1 (3) = "urk"
> > BLOCKS:
> > 
> > This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
> > /not/ be setting i_extra_isize=32.  Unfortunately, it does set
> > i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
> > the inode body.  That's a problem, because ext2 doesn't know about
> > inline attrs or i_extra_isize.
> 
> I suspect that this isn't a big problem in real life, since most systems
> these days are using ext4 to mount ext2 filesystems, instead of using the
> separate ext2 module, and it could understand the extra data anyway.
> 
> The thing to check/fix in the ext4 code is to not set i_extra_isize if the
> EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
> if it finds valid extra inode data beyond 128 bytes, since this is already
> true out in the wild, so we don't want to clobber existing (meta)data in
> large inodes.
> 
> Unfortunately, our policy is to not enable features in-kernel automatically,
> to avoid the problem of potentially making the filesystem unmountable,
> otherwise we could do the same in ext4.  That said, e2fsck will prompt the
> user to fix this if needed.
> 
> > It occurred to me to check the s_inode_size, which is 256 on this
> > supposedly ext2 filesystem.  I'd have thought _fill_super would check
> > this value, but apparently its only criteria are that s_inode_size is at
> > least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
> > no ability to use any inode space beyond the first 128 bytes, so what
> > good are large inodes?
> 
> I don't think the 256-byte inode size should be considered a problem on
> ext2 by itself.  The old code understands the larger inode size, it just
> didn't do anything with that larger size until ext3 had fast xattrs and
> the added timestamp fields.  The other thought is to just get rid of the
> ext2 code completely, and the problem would be gone...  I don't think
> there are (m)any cases where ext2 is faster than ext4, if nojournal
> is used to level the playing field.
> 
> > Oh, and e2fsck apparently doesn't notice if there are inodes with
> > i_extra_isize set even if ro_compat_extra_isize is not set.
> > 
> > I could see a bunch of fixes to resolve this problem:
> > 
> > 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
> > 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
> >   have big inodes and i_extra_isize set, to encourage people to run
> >   e2fsck.
> 
> No, it shouldn't be accessing that space if the feature flag isn't set,
> and should return an error at mount if it is (if not read-only).  That
> is the reason the EXTRA_ISIZE was created as a read-only feature, so
> that ext2 can't add xattrs or change the extended timestamps that conflict
> with values in the large inode.

Yes, in a strict sense ext2 shouldn't be touching /anything/ after the
first 128 bytes.  However, there's no other way for ext2 to detect this
situation that ext4 has created for it (where i_extra_isize is set but
RO_COMPAT_EXTRA_ISIZE is /not/ set), which means that ext2 will exhibit
inconsistent behavior.  I'm arguing that it's better to throw errors
back to userspace than it is to allow this weird condition to continue
(non-unique xattr names).

Complicating this is the fact that ext4 has been setting i_extra_isize
without checking the feature bit since the beginning of ext4 in 2006.
Granted we've only had ext2-on-ext4 support for a few years now, but
this means that there could be "ext2" filesystems out there with the
inode field set and the feature bit cleared.

> > 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
> >   on a filesystem that doesn't support it.
> 
> Wouldn't it be better to set the feature flag rather than deleting the
> xattrs (which may not fit into the xattr block and/or may consume the
> free space of the filesystem, and at a minimum will hurt performance)?

That depends -- some people aren't going to appreciate ext2 refusing to
mount their ext2 fs because e2fsck flipped on a feature bit that ext2
doesn't know about, after the supposedly-compatible ext4 stomped on it.

It'd certainly be easier to implement... though we do at least have
libext2fs apis to read and write xattrs now.

Oh, I see the libext2fs xattr code also sets i_extra_isize without
checking the feature bit. <groan>

> > 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
> >   and ext3 profiles, though it's not clear to me why the ext2 driver
> >   even allows this -- maybe there's some context here I don't know?
> 
> Well, this in itself isn't harmful, and allows those filesystems to be
> updated to ext4 easily in the future.  I think the root of the problem

Ok, I'm convinced that > 128b inodes on ext2 is a real thing. :)

> is that ext4 is enabling a feature without checking the feature flag.

Yes, indeed.  For certain, we must prohibit ext4 from setting
i_extra_isize if the feature flag isn't set.

--D

> > So ... /me isn't sure how to deal with this.  List? :)
> > 
> > --D
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext2/4 large inode xattr mismash?
  2017-02-27 22:42 ` Andreas Dilger
  2017-02-28  3:15   ` Kaho Ng
  2017-02-28  4:05   ` Darrick J. Wong
@ 2017-02-28 17:49   ` Kaho Ng
  2017-03-01  0:53     ` Andreas Dilger
  2 siblings, 1 reply; 6+ messages in thread
From: Kaho Ng @ 2017-02-28 17:49 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4, Darrick J. Wong, Theodore Ts'o

On 02/28/2017 06:42 AM, Andreas Dilger wrote:
> On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>
>> Hi Ted,
>>
>> Today ngkaho1234 pointed out on IRC that if you do the following:
>>
>> $ mkfs.ext2 /dev/sda -F
>> $ mount /dev/sda /mnt -t ext4
>> $ touch /mnt/x
>> $ setfattr -n user.name1 -v moo /mnt/x
>> $ umount /mnt
>> $ mount /dev/sda /mnt -t ext2
>> $ setfattr -n user.name1 -v urk /mnt/x
>> $ umount /mnt
>>
>> Then you get this broken looking result:
>> $ mount /dev/sda /mnt -e ext4
>> $ getfattr /mnt/x
>> getfattr: Removing leading '/' from absolute path names
>> # file: mnt/x
>> user.name1
>> user.name1
>>
>> Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
>> ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
>> and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
>> wrote, and neither can see the attr the other wrote.
>>
>> $ debugfs -R 'features' /dev/sda
>> Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
>>
>> $ debugfs -R 'stat /x' /dev/sda
>> Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
>> Generation: 2341792653    Version: 0x00000000:00000001
>> User:     0   Group:     0   Project:     0   Size: 0
>> File ACL: 617    Directory ACL: 0
>> Links: 1   Blockcount: 8
>> Fragment:  Address: 0    Number: 0    Size: 0
>> ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
>> atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>> Size of extra inode fields: 32
>> Extended attributes:
>>  user.name1 (3) = "moo"
>>  user.name1 (3) = "urk"
>> BLOCKS:
>>
>> This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
>> /not/ be setting i_extra_isize=32.  Unfortunately, it does set
>> i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
>> the inode body.  That's a problem, because ext2 doesn't know about
>> inline attrs or i_extra_isize.
>
> I suspect that this isn't a big problem in real life, since most systems
> these days are using ext4 to mount ext2 filesystems, instead of using the
> separate ext2 module, and it could understand the extra data anyway.
>
> The thing to check/fix in the ext4 code is to not set i_extra_isize if the
> EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
> if it finds valid extra inode data beyond 128 bytes, since this is already
> true out in the wild, so we don't want to clobber existing (meta)data in
> large inodes.
>
> Unfortunately, our policy is to not enable features in-kernel automatically,
> to avoid the problem of potentially making the filesystem unmountable,
> otherwise we could do the same in ext4.  That said, e2fsck will prompt the
> user to fix this if needed.
>
>> It occurred to me to check the s_inode_size, which is 256 on this
>> supposedly ext2 filesystem.  I'd have thought _fill_super would check
>> this value, but apparently its only criteria are that s_inode_size is at
>> least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
>> no ability to use any inode space beyond the first 128 bytes, so what
>> good are large inodes?
>
> I don't think the 256-byte inode size should be considered a problem on
> ext2 by itself.  The old code understands the larger inode size, it just
> didn't do anything with that larger size until ext3 had fast xattrs and
> the added timestamp fields.  The other thought is to just get rid of the
> ext2 code completely, and the problem would be gone...  I don't think
> there are (m)any cases where ext2 is faster than ext4, if nojournal
> is used to level the playing field.
>
>> Oh, and e2fsck apparently doesn't notice if there are inodes with
>> i_extra_isize set even if ro_compat_extra_isize is not set.
>>
>> I could see a bunch of fixes to resolve this problem:
>>
>> 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
>> 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
>>   have big inodes and i_extra_isize set, to encourage people to run
>>   e2fsck.
>
> No, it shouldn't be accessing that space if the feature flag isn't set,
> and should return an error at mount if it is (if not read-only).  That
> is the reason the EXTRA_ISIZE was created as a read-only feature, so
> that ext2 can't add xattrs or change the extended timestamps that conflict
> with values in the large inode.
>
>> 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
>>   on a filesystem that doesn't support it.
>
> Wouldn't it be better to set the feature flag rather than deleting the
> xattrs (which may not fit into the xattr block and/or may consume the
> free space of the filesystem, and at a minimum will hurt performance)?
>
>> 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
>>   and ext3 profiles, though it's not clear to me why the ext2 driver
>>   even allows this -- maybe there's some context here I don't know?
>
> Well, this in itself isn't harmful, and allows those filesystems to be
> updated to ext4 easily in the future.  I think the root of the problem
> is that ext4 is enabling a feature without checking the feature flag.
>
>> So ... /me isn't sure how to deal with this.  List? :)
>>
>> --D
>
>
> Cheers, Andreas
>
>
>
>
>

I think making use of ext2_feature_set_ok() may be another solution. 
Since ext3.ko doesn't make use of RO_COMPAT_EXTRA_ISIZE either, but 
still will try to access beyond the first 128 bytes of inode to support 
fast EA if possible, ext2_feature_set_ok() may sounds like a good 
candidate to help differentiate ext2 fs from ext3/ext4 fs.

Cheers,
Kaho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ext2/4 large inode xattr mismash?
  2017-02-28 17:49   ` Kaho Ng
@ 2017-03-01  0:53     ` Andreas Dilger
  0 siblings, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2017-03-01  0:53 UTC (permalink / raw)
  To: Kaho Ng; +Cc: linux-ext4, Darrick J. Wong, Theodore Ts'o

[-- Attachment #1: Type: text/plain, Size: 7254 bytes --]

On Feb 28, 2017, at 10:49 AM, Kaho Ng <ngkaho1234@gmail.com> wrote:
> 
> On 02/28/2017 06:42 AM, Andreas Dilger wrote:
>> On Feb 27, 2017, at 11:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>>> 
>>> Hi Ted,
>>> 
>>> Today ngkaho1234 pointed out on IRC that if you do the following:
>>> 
>>> $ mkfs.ext2 /dev/sda -F
>>> $ mount /dev/sda /mnt -t ext4
>>> $ touch /mnt/x
>>> $ setfattr -n user.name1 -v moo /mnt/x
>>> $ umount /mnt
>>> $ mount /dev/sda /mnt -t ext2
>>> $ setfattr -n user.name1 -v urk /mnt/x
>>> $ umount /mnt
>>> 
>>> Then you get this broken looking result:
>>> $ mount /dev/sda /mnt -e ext4
>>> $ getfattr /mnt/x
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: mnt/x
>>> user.name1
>>> user.name1
>>> 
>>> Looking through debugfs, it seems that ext4 wrote user.name1=moo as an
>>> ibody xattr, then ext2 wrote user.name1=urk into an external xattr block
>>> and set i_file_acl.  ext4 sees the attr it wrote, ext2 sees the attr it
>>> wrote, and neither can see the attr the other wrote.
>>> 
>>> $ debugfs -R 'features' /dev/sda
>>> Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
>>> 
>>> $ debugfs -R 'stat /x' /dev/sda
>>> Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
>>> Generation: 2341792653    Version: 0x00000000:00000001
>>> User:     0   Group:     0   Project:     0   Size: 0
>>> File ACL: 617    Directory ACL: 0
>>> Links: 1   Blockcount: 8
>>> Fragment:  Address: 0    Number: 0    Size: 0
>>> ctime: 0x58b45e2b:926e39a8 -- Mon Feb 27 09:13:15 2017
>>> atime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>>> mtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>>> crtime: 0x58b45e21:c4f666c0 -- Mon Feb 27 09:13:05 2017
>>> Size of extra inode fields: 32
>>> Extended attributes:
>>> user.name1 (3) = "moo"
>>> user.name1 (3) = "urk"
>>> BLOCKS:
>>> 
>>> This is wrong -- we don't have RO_COMPAT_EXTRA_ISIZE set, so ext4 should
>>> /not/ be setting i_extra_isize=32.  Unfortunately, it does set
>>> i_extra_isize, which enables ext4_xattr_ibody_set to write xattrs into
>>> the inode body.  That's a problem, because ext2 doesn't know about
>>> inline attrs or i_extra_isize.
>> 
>> I suspect that this isn't a big problem in real life, since most systems
>> these days are using ext4 to mount ext2 filesystems, instead of using the
>> separate ext2 module, and it could understand the extra data anyway.
>> 
>> The thing to check/fix in the ext4 code is to not set i_extra_isize if the
>> EXTRA_ISIZE feature isn't set.  Also, e2fsck should set the feature flag
>> if it finds valid extra inode data beyond 128 bytes, since this is already
>> true out in the wild, so we don't want to clobber existing (meta)data in
>> large inodes.
>> 
>> Unfortunately, our policy is to not enable features in-kernel automatically,
>> to avoid the problem of potentially making the filesystem unmountable,
>> otherwise we could do the same in ext4.  That said, e2fsck will prompt the
>> user to fix this if needed.
>> 
>>> It occurred to me to check the s_inode_size, which is 256 on this
>>> supposedly ext2 filesystem.  I'd have thought _fill_super would check
>>> this value, but apparently its only criteria are that s_inode_size is at
>>> least 128, a power of 2, and no larger than a block.  But AFAIK ext2 has
>>> no ability to use any inode space beyond the first 128 bytes, so what
>>> good are large inodes?
>> 
>> I don't think the 256-byte inode size should be considered a problem on
>> ext2 by itself.  The old code understands the larger inode size, it just
>> didn't do anything with that larger size until ext3 had fast xattrs and
>> the added timestamp fields.  The other thought is to just get rid of the
>> ext2 code completely, and the problem would be gone...  I don't think
>> there are (m)any cases where ext2 is faster than ext4, if nojournal
>> is used to level the playing field.
>> 
>>> Oh, and e2fsck apparently doesn't notice if there are inodes with
>>> i_extra_isize set even if ro_compat_extra_isize is not set.
>>> 
>>> I could see a bunch of fixes to resolve this problem:
>>> 
>>> 1) Teach ext4 not to set i_extra_isize unless the feature bit is set.
>>> 2) ext2_iget grows the ability to return -EFSCORRUPTED for inodes that
>>>  have big inodes and i_extra_isize set, to encourage people to run
>>>  e2fsck.
>> 
>> No, it shouldn't be accessing that space if the feature flag isn't set,
>> and should return an error at mount if it is (if not read-only).  That
>> is the reason the EXTRA_ISIZE was created as a read-only feature, so
>> that ext2 can't add xattrs or change the extended timestamps that conflict
>> with values in the large inode.
>> 
>>> 3) e2fsck will move attrs and zap i_extra_isize if i_extra_isize is set
>>>  on a filesystem that doesn't support it.
>> 
>> Wouldn't it be better to set the feature flag rather than deleting the
>> xattrs (which may not fit into the xattr block and/or may consume the
>> free space of the filesystem, and at a minimum will hurt performance)?
>> 
>>> 4) mke2fs probably should stop allocating 256 byte inodes with the ext2
>>>  and ext3 profiles, though it's not clear to me why the ext2 driver
>>>  even allows this -- maybe there's some context here I don't know?
>> 
>> Well, this in itself isn't harmful, and allows those filesystems to be
>> updated to ext4 easily in the future.  I think the root of the problem
>> is that ext4 is enabling a feature without checking the feature flag.
>> 
>>> So ... /me isn't sure how to deal with this.  List? :)
>>> 
>>> --D
>> 
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> 
> 
> I think making use of ext2_feature_set_ok() may be another solution. Since ext3.ko doesn't make use of RO_COMPAT_EXTRA_ISIZE either, but still will try to access beyond the first 128 bytes of inode to support fast EA if possible, ext2_feature_set_ok() may sounds like a good candidate to help differentiate ext2 fs from ext3/ext4 fs.

I guess the first question is how this problem became visible in the first
place?  Was the filesystem formatted with mkfs.ext2 and then mounted with
ext4, and then mounted on an embedded system with the ext2 driver (presumably
to reduce kernel memory usage, at the expense of performance from using ext4)?

Alternately, it could have been an upgrade/downgrade problem?  What kernels
and distros are involved?  All recent RHEL/SLES kernels are using the ext4
driver to load ext2 filesystems, so in that case the fix would be relatively
straight forward.  If the "real" ext2 driver is in use this becomes tricky.

If the ext4 driver is in use for ext2, all that is needed is to set the
EXTRA_ISIZE flag (a fixed e2fsck or manually) and the next mount will just use
ext4 since this is not in the EXT[23]_FEATURE_RO_COMPAT_SUPP mask.  I agree
if the inode is large and has fast xattrs (i_extra_isize=4) then it is an ext3
filesystem instead of ext2, regardless of whether the EXTRA_ISIZE flag is set.

Separately, e2fsck needs to be fixed to check for duplicate xattr names in
the in-inode xattr space and the external block.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-01  0:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-27 18:10 ext2/4 large inode xattr mismash? Darrick J. Wong
2017-02-27 22:42 ` Andreas Dilger
2017-02-28  3:15   ` Kaho Ng
2017-02-28  4:05   ` Darrick J. Wong
2017-02-28 17:49   ` Kaho Ng
2017-03-01  0:53     ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox