Re: [PATCH v1] ext4: Remove deprecated noacl/nouser

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
       [not found] <166431556706.3511882.843791619431401636.b4-ty@mit.edu>
@ 2022-12-16  3:41 ` Jun Nie
  2022-12-16  5:47   ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Jun Nie @ 2022-12-16  3:41 UTC (permalink / raw)
  To: tytso, stable
  Cc: djwong, jack, jlayton, lczerner, linux-ext4, xuyang2018.jy,
	Jun Nie

This patch[1] is needed on linux-5.15.y because the panic[2] is also found on
linux-5.15.y when debugging bug[3]. Back ported patch[4] is confirmed to fix
the bug on linux-5.15.y in the latest test of page[3]. Maybe back port on more
branches is needed per patch comments.

[1]
2d544ec923dbe5 ("ext4: remove deprecated noacl/nouser_xattr options")

[2]
https://syzkaller.appspot.com/text?tag=CrashLog&x=171b1077880000

[3]
https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08

[4]
https://syzkaller.appspot.com/text?tag=Patch&x=1766eb13880000

Regards,
Jun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-16  3:41 ` [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options Jun Nie
@ 2022-12-16  5:47   ` Theodore Ts'o
  2022-12-16 13:10     ` Theodore Ts'o
  2022-12-19  9:23     ` Jun Nie
  0 siblings, 2 replies; 7+ messages in thread
From: Theodore Ts'o @ 2022-12-16  5:47 UTC (permalink / raw)
  To: Jun Nie; +Cc: stable, djwong, jack, jlayton, lczerner, linux-ext4,
	xuyang2018.jy

On Fri, Dec 16, 2022 at 11:41:16AM +0800, Jun Nie wrote:
> This patch[1] is needed on linux-5.15.y because the panic[2] is also found on
> linux-5.15.y when debugging bug[3]. Back ported patch[4] is confirmed to fix
> the bug on linux-5.15.y in the latest test of page[3]. Maybe back port on more
> branches is needed per patch comments.

This is not a proper fix for the syzkaller report being reported here:

https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08

It's true that the reproducer will no longer trigger, but that's just
because the reproducer is just exiting early because it is passing in
a mount option which is no longer being accepted.  In fact, that mount
option is completely unneeded and it's a failing of syzkaller that it
doesn't adequately minimize the reproducer by trying to remove various
random mount options that are not actually needed.  For example,
running the reproducer will trigger warnings like this:

	EXT4-fs: Ignoring removed nobh option

If we modify the kernel to simply ignore nouser_xattr, then the
reproducer will still trigger.  So this is not the right patch to
backport.

It's important that people who are trying to fix syzkaller bugs
understand what is fundamentally going on, instead of using blunt
force patches that simple paper over the issue.  Please remember that
syzkaller is supposed to help us improve the kernel, and it's not just
about trying to reduce the count of open syzkaller reports for its own
sake.  (This is really much more of a quality of implementation issue,
since this is not something that would really ever trigger in real
life, nor is it really a security issue --- despite some people
thinking that all syzkaller reports are actually security issues, and
we must run around like chickens with their heads cut off and until
they are all fixed.)

The real root cause of the problem is that the file system is getting
mounted with these mount options:

nouser_xattr,acl,debug_want_extra_isize=0x0000000000000080,lazytime,nobh,quota

Of which nouser_attr, acl, nobh, and quota are completely pointless.
It's also **super** unfortunate that the reproducer isn't written in
C, but this horrible psuedo-assimply language:

  memcpy(
      (void*)0x20000000,
      "\x6e\x6f\x75\x73\x65\x72\x5f\x78\x61\x74\x74\x72\x2c\x61\x63\x6c\x2c\x64"
      "\x65\x62\x75\x67\x5f\x77\x61\x6e\x74\x5f\x65\x78\x74\x72\x61\x5f\x69\x73"
      "\x69\x7a\x65\x3d\x30\x78\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30"
      "\x30\x30\x38\x30\x2c\x6c\x61\x7a\x79\x74\x69\x6d\x65\x2c\x6e\x6f\x62\x68"
      "\x2c\x71\x75\x6f\x74\x61\x2c\x00\x3d\x93\x09\x61\x36\x5d\x73\x58\x9c",
      89);

      ...
  syz_mount_image(0x20000440, 0x20000480, 0x1e, 0x20000000, 2, 0x427,
  			      		  	^^^^^^^^^^
                  0x200004c0);

(And again, this is stuff that I've complained to the syzkaller team
for years and years and years as being fundamentally developer hostile
and disrespects the time of upstream maintainers.  ARGH!!!!)

Anyway.....  So now let's look at the stack trace:

 ext4_xattr_block_set+0x8f8/0x3820 fs/ext4/xattr.c:1971
 ext4_xattr_move_to_block fs/ext4/xattr.c:2603 [inline]
 ext4_xattr_make_inode_space fs/ext4/xattr.c:2672 [inline]
 ext4_expand_extra_isize_ea+0x1591/0x1f30 fs/ext4/xattr.c:2764
 __ext4_expand_extra_isize+0x29e/0x3d0 fs/ext4/inode.c:5826
 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5869 [inline]
 __ext4_mark_inode_dirty+0x4bf/0x7a0 fs/ext4/inode.c:5947
 ext4_dirty_inode+0xbc/0x100 fs/ext4/inode.c:5979
 __mark_inode_dirty+0x1f9/0x9d0 fs/fs-writeback.c:2431
 mark_inode_dirty_sync include/linux/fs.h:2429 [inline]
 iput+0x155/0x7d0 fs/inode.c:1686
 dentry_unlink_inode+0x349/0x430 fs/dcache.c:376
 __dentry_kill+0x3e2/0x5d0 fs/dcache.c:582
 shrink_dentry_list+0x379/0x4d0 fs/dcache.c:1176
 shrink_dcache_parent+0xcd/0x350
 do_one_tree fs/dcache.c:1657 [inline]
 shrink_dcache_for_umount+0x7c/0x1a0 fs/dcache.c:1674
 generic_shutdown_super+0x69/0x2d0 fs/super.c:447
 kill_block_super+0x80/0xe0 fs/super.c:1395

Because lazytime is enabled, after running the reproducer under
strace, what happens is that inode #12 gets touched so its access time
is modified, but because lazytime is enabled, we don't actually update
the on-disk until we actually unmount the superblock.  That's why
generic_shutdown_super() is in the stack trace.

At that point, when we shrink the dentry cache, when we eject the
inode from memory, iput() needs to update the on-disk inode with the
updated atime.  So far, so good.  But then we call ext4_dirty_inode(),
and then that interacts with the "debug_want_extra_isize-=128" mount
option.  So at this point, we try to expand inode's extra isize space,
and in order to do that we have to move some extended attributes.

Unfortunately, how ext4 currently does this is a bit stupid, and it
reads the contents of the ea_inode into memory, deletes the ea_inode
and then creates a new ea_inode.  That works, but it's horribly
inefficient, and **that's*** what we should actually fix.

Unfortunately, because we try to create a new ea_inode, when
ext4_xattr_block_set() calls the static function (which gets inlined)
ext4_xattr_inode_create(), and at that point, the call to
ext4_new_inode trips over the fact that the file system is being
unmoutned, and sb->s_root has already been set to NULL.

So this is what actually goes *boom*:

	ea_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode,
					  ^^^^^^^^^^^^^^^^^^^ NULL ptr, oops!
				  S_IFREG | 0600, NULL, inode->i_ino + 1, owner,
				  EXT4_EA_INODE_FL);

We can prove this is the issue by using the following debugging patch,
which prevents the reproducer from triggering after prining the "fs
being unmouinted" message:

--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2697,6 +2697,13 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
 	int s_min_extra_isize = le16_to_cpu(sbi->s_es->s_min_extra_isize);
 	int isize_diff;	/* How much do we need to grow i_extra_isize */

+	pr_err("ext4_expand_extra_isize_ea ino %lu new_extra_isize %d curr %d\n",
+	       inode->i_ino, new_extra_isize, EXT4_I(inode)->i_extra_isize);
+	if (inode->i_sb->s_root == NULL) {
+		pr_err("ext4_expand_extra_isize_ea: fs being unmounted\n");
+		return -EINVAL;
+	}
+
 retry:
 	isize_diff = new_extra_isize - EXT4_I(inode)->i_extra_isize;
 	if (EXT4_I(inode)->i_extra_isize >= new_extra_isize)

Fixing this the clean and proper way, which is by making
ext4_xattr_move_to_block() more intelligent/efficient, is left as an
exercise to the reader.

Cheers,

						- Ted

P.S.  Note that this fix is actually needed for the current upstream
kernel; the reproducer will trigger in 6.1, although we need to either
modify the reproducer to drop the completely pointless nouser_xattr
mount option (which is a bit painful since the !@?! mount options is
obfuscated by virtue of being in hex for no particular good reason) or
by hacking the kernel to ignore that mount options, via a patch like
this:

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1658,6 +1658,7 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("oldalloc",		Opt_removed),
 	fsparam_flag	("orlov",		Opt_removed),
 	fsparam_flag	("user_xattr",		Opt_user_xattr),
+	fsparam_flag	("nouser_xattr",	Opt_removed),
 	fsparam_flag	("acl",			Opt_acl),
 	fsparam_flag	("norecovery",		Opt_noload),
 	fsparam_flag	("noload",		Opt_noload),

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-16  5:47   ` Theodore Ts'o
@ 2022-12-16 13:10     ` Theodore Ts'o
  2022-12-19  9:23     ` Jun Nie
  1 sibling, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2022-12-16 13:10 UTC (permalink / raw)
  To: Jun Nie; +Cc: stable, djwong, jack, jlayton, lczerner, linux-ext4,
	xuyang2018.jy

Here is a proper, minmized reproducer which reproduces on upstream, for someone
who wants to try to work this bug.

On Fri, Dec 16, 2022 at 12:47:16AM -0500, Theodore Ts'o wrote:
> Fixing this the clean and proper way, which is by making
> ext4_xattr_move_to_block() more intelligent/efficient, is left as an
> exercise to the reader.

For someone who wants to work the bug, here is a cleaner, properly
minimzed, easier-for-humans-to-understand reproducer:

#!/bin/bash -vx
#
# This reproduces an ext4 bug caused by an unfortunate interaction
# between lazytime updates happening when a file system is being
# unmounted and expand_extra_isize
#
# Initially discovered via syzkaller:
# https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08
#

img=/tmp/foo.img
dir=/mnt
file=$dir/file0

rm -f $img
mke2fs -Fq -t ext4 -I 256 -O ea_inode -b 1024 $img 200k
mount $img $dir
v=$(dd if=/dev/zero bs=2000 count=1 2>/dev/null | tr '\0' =)
touch $file
attr -q -s test -V $v $file
umount $dir
mount -o debug_want_extra_isize=128,lazytime /tmp/foo.img $dir
cat $file
umount $dir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-16  5:47   ` Theodore Ts'o
  2022-12-16 13:10     ` Theodore Ts'o
@ 2022-12-19  9:23     ` Jun Nie
  2022-12-19 11:46       ` Jan Kara
  2022-12-19 16:41       ` Theodore Ts'o
  1 sibling, 2 replies; 7+ messages in thread
From: Jun Nie @ 2022-12-19  9:23 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: stable, djwong, jack, jlayton, lczerner, linux-ext4,
	xuyang2018.jy

Theodore Ts'o <tytso@mit.edu> 于2022年12月16日周五 13:47写道：
>
> On Fri, Dec 16, 2022 at 11:41:16AM +0800, Jun Nie wrote:
> > This patch[1] is needed on linux-5.15.y because the panic[2] is also found on
> > linux-5.15.y when debugging bug[3]. Back ported patch[4] is confirmed to fix
> > the bug on linux-5.15.y in the latest test of page[3]. Maybe back port on more
> > branches is needed per patch comments.
>
> This is not a proper fix for the syzkaller report being reported here:
>
> https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08
>
> It's true that the reproducer will no longer trigger, but that's just
> because the reproducer is just exiting early because it is passing in
> a mount option which is no longer being accepted.  In fact, that mount
> option is completely unneeded and it's a failing of syzkaller that it
> doesn't adequately minimize the reproducer by trying to remove various
> random mount options that are not actually needed.  For example,
> running the reproducer will trigger warnings like this:
>
>         EXT4-fs: Ignoring removed nobh option
>
> If we modify the kernel to simply ignore nouser_xattr, then the
> reproducer will still trigger.  So this is not the right patch to
> backport.
>
> It's important that people who are trying to fix syzkaller bugs
> understand what is fundamentally going on, instead of using blunt
> force patches that simple paper over the issue.  Please remember that
> syzkaller is supposed to help us improve the kernel, and it's not just
> about trying to reduce the count of open syzkaller reports for its own
> sake.  (This is really much more of a quality of implementation issue,
> since this is not something that would really ever trigger in real
> life, nor is it really a security issue --- despite some people
> thinking that all syzkaller reports are actually security issues, and
> we must run around like chickens with their heads cut off and until
> they are all fixed.)
>
>
> The real root cause of the problem is that the file system is getting
> mounted with these mount options:
>
> nouser_xattr,acl,debug_want_extra_isize=0x0000000000000080,lazytime,nobh,quota
>
> Of which nouser_attr, acl, nobh, and quota are completely pointless.
> It's also **super** unfortunate that the reproducer isn't written in
> C, but this horrible psuedo-assimply language:
>
>   memcpy(
>       (void*)0x20000000,
>       "\x6e\x6f\x75\x73\x65\x72\x5f\x78\x61\x74\x74\x72\x2c\x61\x63\x6c\x2c\x64"
>       "\x65\x62\x75\x67\x5f\x77\x61\x6e\x74\x5f\x65\x78\x74\x72\x61\x5f\x69\x73"
>       "\x69\x7a\x65\x3d\x30\x78\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30"
>       "\x30\x30\x38\x30\x2c\x6c\x61\x7a\x79\x74\x69\x6d\x65\x2c\x6e\x6f\x62\x68"
>       "\x2c\x71\x75\x6f\x74\x61\x2c\x00\x3d\x93\x09\x61\x36\x5d\x73\x58\x9c",
>       89);
>
>       ...
>   syz_mount_image(0x20000440, 0x20000480, 0x1e, 0x20000000, 2, 0x427,
>                                                 ^^^^^^^^^^
>                   0x200004c0);
>
> (And again, this is stuff that I've complained to the syzkaller team
> for years and years and years as being fundamentally developer hostile
> and disrespects the time of upstream maintainers.  ARGH!!!!)
>
> Anyway.....  So now let's look at the stack trace:
>
>  ext4_xattr_block_set+0x8f8/0x3820 fs/ext4/xattr.c:1971
>  ext4_xattr_move_to_block fs/ext4/xattr.c:2603 [inline]
>  ext4_xattr_make_inode_space fs/ext4/xattr.c:2672 [inline]
>  ext4_expand_extra_isize_ea+0x1591/0x1f30 fs/ext4/xattr.c:2764
>  __ext4_expand_extra_isize+0x29e/0x3d0 fs/ext4/inode.c:5826
>  ext4_try_to_expand_extra_isize fs/ext4/inode.c:5869 [inline]
>  __ext4_mark_inode_dirty+0x4bf/0x7a0 fs/ext4/inode.c:5947
>  ext4_dirty_inode+0xbc/0x100 fs/ext4/inode.c:5979
>  __mark_inode_dirty+0x1f9/0x9d0 fs/fs-writeback.c:2431
>  mark_inode_dirty_sync include/linux/fs.h:2429 [inline]
>  iput+0x155/0x7d0 fs/inode.c:1686
>  dentry_unlink_inode+0x349/0x430 fs/dcache.c:376
>  __dentry_kill+0x3e2/0x5d0 fs/dcache.c:582
>  shrink_dentry_list+0x379/0x4d0 fs/dcache.c:1176
>  shrink_dcache_parent+0xcd/0x350
>  do_one_tree fs/dcache.c:1657 [inline]
>  shrink_dcache_for_umount+0x7c/0x1a0 fs/dcache.c:1674
>  generic_shutdown_super+0x69/0x2d0 fs/super.c:447
>  kill_block_super+0x80/0xe0 fs/super.c:1395
>
> Because lazytime is enabled, after running the reproducer under
> strace, what happens is that inode #12 gets touched so its access time
> is modified, but because lazytime is enabled, we don't actually update
> the on-disk until we actually unmount the superblock.  That's why
> generic_shutdown_super() is in the stack trace.
>
> At that point, when we shrink the dentry cache, when we eject the
> inode from memory, iput() needs to update the on-disk inode with the
> updated atime.  So far, so good.  But then we call ext4_dirty_inode(),
> and then that interacts with the "debug_want_extra_isize-=128" mount
> option.  So at this point, we try to expand inode's extra isize space,
> and in order to do that we have to move some extended attributes.
>
> Unfortunately, how ext4 currently does this is a bit stupid, and it
> reads the contents of the ea_inode into memory, deletes the ea_inode
> and then creates a new ea_inode.  That works, but it's horribly
> inefficient, and **that's*** what we should actually fix.

Hi Theodore,

Thanks for the detailed explanation here! I am new in syzkaller task force
 and not familiar with ext4 subsystem. So my reply here may be stupid
as a new comer in ext4 side.

Do you mean we have a chance to expand ea_inode in place for some
cases? If so, a new ea_inode with larger space should be created
to hold expanded ea_inode data, thus data have to be copied and written
out through memory in my mind. Or anything other than CPU/memory can
utilized for this to avoid memory usage, such as DMA?
>
> Unfortunately, because we try to create a new ea_inode, when
> ext4_xattr_block_set() calls the static function (which gets inlined)
> ext4_xattr_inode_create(), and at that point, the call to
> ext4_new_inode trips over the fact that the file system is being
> unmoutned, and sb->s_root has already been set to NULL.
>
per general understanding of a subsystem uninitialization, a flag shall be
marked to reject further operation on the sub-system and flush the pending
operation, then free the resource. In such a general method, current
handling to create a new ea_inode should not crash even it is stupid.
sb->s_root seems to be a key global resource in ext4 subsystem per my
understanding, and should not be set as NULL until the last step of unmount
 operation.

Please help teach me what is wrong in my understanding.


> So this is what actually goes *boom*:
>
>         ea_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode,
>                                           ^^^^^^^^^^^^^^^^^^^ NULL ptr, oops!
>                                   S_IFREG | 0600, NULL, inode->i_ino + 1, owner,
>                                   EXT4_EA_INODE_FL);
>
>
> We can prove this is the issue by using the following debugging patch,
> which prevents the reproducer from triggering after prining the "fs
> being unmouinted" message:
>
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -2697,6 +2697,13 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize,
>         int s_min_extra_isize = le16_to_cpu(sbi->s_es->s_min_extra_isize);
>         int isize_diff; /* How much do we need to grow i_extra_isize */
>
> +       pr_err("ext4_expand_extra_isize_ea ino %lu new_extra_isize %d curr %d\n",
> +              inode->i_ino, new_extra_isize, EXT4_I(inode)->i_extra_isize);
> +       if (inode->i_sb->s_root == NULL) {
> +               pr_err("ext4_expand_extra_isize_ea: fs being unmounted\n");
> +               return -EINVAL;
> +       }
> +
>  retry:
>         isize_diff = new_extra_isize - EXT4_I(inode)->i_extra_isize;
>         if (EXT4_I(inode)->i_extra_isize >= new_extra_isize)
>
> Fixing this the clean and proper way, which is by making
> ext4_xattr_move_to_block() more intelligent/efficient, is left as an
> exercise to the reader.
>
> Cheers,
>
>                                                 - Ted
>
> P.S.  Note that this fix is actually needed for the current upstream
> kernel; the reproducer will trigger in 6.1, although we need to either
> modify the reproducer to drop the completely pointless nouser_xattr
> mount option (which is a bit painful since the !@?! mount options is
> obfuscated by virtue of being in hex for no particular good reason) or
> by hacking the kernel to ignore that mount options, via a patch like
> this:
>
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1658,6 +1658,7 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
>         fsparam_flag    ("oldalloc",            Opt_removed),
>         fsparam_flag    ("orlov",               Opt_removed),
>         fsparam_flag    ("user_xattr",          Opt_user_xattr),
> +       fsparam_flag    ("nouser_xattr",        Opt_removed),
>         fsparam_flag    ("acl",                 Opt_acl),
>         fsparam_flag    ("norecovery",          Opt_noload),
>         fsparam_flag    ("noload",              Opt_noload),
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-19  9:23     ` Jun Nie
@ 2022-12-19 11:46       ` Jan Kara
  2022-12-19 16:41       ` Theodore Ts'o
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2022-12-19 11:46 UTC (permalink / raw)
  To: Jun Nie
  Cc: Theodore Ts'o, stable, djwong, jack, jlayton, lczerner,
	linux-ext4, xuyang2018.jy

On Mon 19-12-22 17:23:18, Jun Nie wrote:
> Theodore Ts'o <tytso@mit.edu> 于2022年12月16日周五 13:47写道：
> >
> > On Fri, Dec 16, 2022 at 11:41:16AM +0800, Jun Nie wrote:
> > > This patch[1] is needed on linux-5.15.y because the panic[2] is also found on
> > > linux-5.15.y when debugging bug[3]. Back ported patch[4] is confirmed to fix
> > > the bug on linux-5.15.y in the latest test of page[3]. Maybe back port on more
> > > branches is needed per patch comments.
> >
> > This is not a proper fix for the syzkaller report being reported here:
> >
> > https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08
> >
> > It's true that the reproducer will no longer trigger, but that's just
> > because the reproducer is just exiting early because it is passing in
> > a mount option which is no longer being accepted.  In fact, that mount
> > option is completely unneeded and it's a failing of syzkaller that it
> > doesn't adequately minimize the reproducer by trying to remove various
> > random mount options that are not actually needed.  For example,
> > running the reproducer will trigger warnings like this:
> >
> >         EXT4-fs: Ignoring removed nobh option
> >
> > If we modify the kernel to simply ignore nouser_xattr, then the
> > reproducer will still trigger.  So this is not the right patch to
> > backport.
> >
> > It's important that people who are trying to fix syzkaller bugs
> > understand what is fundamentally going on, instead of using blunt
> > force patches that simple paper over the issue.  Please remember that
> > syzkaller is supposed to help us improve the kernel, and it's not just
> > about trying to reduce the count of open syzkaller reports for its own
> > sake.  (This is really much more of a quality of implementation issue,
> > since this is not something that would really ever trigger in real
> > life, nor is it really a security issue --- despite some people
> > thinking that all syzkaller reports are actually security issues, and
> > we must run around like chickens with their heads cut off and until
> > they are all fixed.)
> >
> >
> > The real root cause of the problem is that the file system is getting
> > mounted with these mount options:
> >
> > nouser_xattr,acl,debug_want_extra_isize=0x0000000000000080,lazytime,nobh,quota
> >
> > Of which nouser_attr, acl, nobh, and quota are completely pointless.
> > It's also **super** unfortunate that the reproducer isn't written in
> > C, but this horrible psuedo-assimply language:
> >
> >   memcpy(
> >       (void*)0x20000000,
> >       "\x6e\x6f\x75\x73\x65\x72\x5f\x78\x61\x74\x74\x72\x2c\x61\x63\x6c\x2c\x64"
> >       "\x65\x62\x75\x67\x5f\x77\x61\x6e\x74\x5f\x65\x78\x74\x72\x61\x5f\x69\x73"
> >       "\x69\x7a\x65\x3d\x30\x78\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30\x30"
> >       "\x30\x30\x38\x30\x2c\x6c\x61\x7a\x79\x74\x69\x6d\x65\x2c\x6e\x6f\x62\x68"
> >       "\x2c\x71\x75\x6f\x74\x61\x2c\x00\x3d\x93\x09\x61\x36\x5d\x73\x58\x9c",
> >       89);
> >
> >       ...
> >   syz_mount_image(0x20000440, 0x20000480, 0x1e, 0x20000000, 2, 0x427,
> >                                                 ^^^^^^^^^^
> >                   0x200004c0);
> >
> > (And again, this is stuff that I've complained to the syzkaller team
> > for years and years and years as being fundamentally developer hostile
> > and disrespects the time of upstream maintainers.  ARGH!!!!)
> >
> > Anyway.....  So now let's look at the stack trace:
> >
> >  ext4_xattr_block_set+0x8f8/0x3820 fs/ext4/xattr.c:1971
> >  ext4_xattr_move_to_block fs/ext4/xattr.c:2603 [inline]
> >  ext4_xattr_make_inode_space fs/ext4/xattr.c:2672 [inline]
> >  ext4_expand_extra_isize_ea+0x1591/0x1f30 fs/ext4/xattr.c:2764
> >  __ext4_expand_extra_isize+0x29e/0x3d0 fs/ext4/inode.c:5826
> >  ext4_try_to_expand_extra_isize fs/ext4/inode.c:5869 [inline]
> >  __ext4_mark_inode_dirty+0x4bf/0x7a0 fs/ext4/inode.c:5947
> >  ext4_dirty_inode+0xbc/0x100 fs/ext4/inode.c:5979
> >  __mark_inode_dirty+0x1f9/0x9d0 fs/fs-writeback.c:2431
> >  mark_inode_dirty_sync include/linux/fs.h:2429 [inline]
> >  iput+0x155/0x7d0 fs/inode.c:1686
> >  dentry_unlink_inode+0x349/0x430 fs/dcache.c:376
> >  __dentry_kill+0x3e2/0x5d0 fs/dcache.c:582
> >  shrink_dentry_list+0x379/0x4d0 fs/dcache.c:1176
> >  shrink_dcache_parent+0xcd/0x350
> >  do_one_tree fs/dcache.c:1657 [inline]
> >  shrink_dcache_for_umount+0x7c/0x1a0 fs/dcache.c:1674
> >  generic_shutdown_super+0x69/0x2d0 fs/super.c:447
> >  kill_block_super+0x80/0xe0 fs/super.c:1395
> >
> > Because lazytime is enabled, after running the reproducer under
> > strace, what happens is that inode #12 gets touched so its access time
> > is modified, but because lazytime is enabled, we don't actually update
> > the on-disk until we actually unmount the superblock.  That's why
> > generic_shutdown_super() is in the stack trace.
> >
> > At that point, when we shrink the dentry cache, when we eject the
> > inode from memory, iput() needs to update the on-disk inode with the
> > updated atime.  So far, so good.  But then we call ext4_dirty_inode(),
> > and then that interacts with the "debug_want_extra_isize-=128" mount
> > option.  So at this point, we try to expand inode's extra isize space,
> > and in order to do that we have to move some extended attributes.
> >
> > Unfortunately, how ext4 currently does this is a bit stupid, and it
> > reads the contents of the ea_inode into memory, deletes the ea_inode
> > and then creates a new ea_inode.  That works, but it's horribly
> > inefficient, and **that's*** what we should actually fix.
> 
> Hi Theodore,
> 
> Thanks for the detailed explanation here! I am new in syzkaller task force
>  and not familiar with ext4 subsystem. So my reply here may be stupid
> as a new comer in ext4 side.
> 
> Do you mean we have a chance to expand ea_inode in place for some
> cases? If so, a new ea_inode with larger space should be created
> to hold expanded ea_inode data, thus data have to be copied and written
> out through memory in my mind. Or anything other than CPU/memory can
> utilized for this to avoid memory usage, such as DMA?

There are two things you seem to be mixing together so let me explain in a
bit more detail:

There is normal inode which has some extented attributes. In the filesystem
image created by syzbot, *headers* of these extended attributes are stored
in the inode. The actual extended attribute content is stored in other
inodes - so called ea_inodes. The mount option "debug_want_extra_isize" asks
the kernel to make more space in the inode itself (say for higher precision
time stamps) so we have now less space for extended attribute headers and
we need to move them to an external block. How we currently do it is that
we read the whole extended attribute in memory, delete the extended
attribute, and then pretend the user called setxattr(2) to store the
extended atribute header in the freshly allocated block and the attribute
data itself in the newly allocated ea_inode. This works but it is
inefficient as we could have just moved the extended attribute header from
the inode into the newly allocated block, keep the ea_inode and be done
with it...

> > Unfortunately, because we try to create a new ea_inode, when
> > ext4_xattr_block_set() calls the static function (which gets inlined)
> > ext4_xattr_inode_create(), and at that point, the call to
> > ext4_new_inode trips over the fact that the file system is being
> > unmoutned, and sb->s_root has already been set to NULL.
> >
> per general understanding of a subsystem uninitialization, a flag shall be
> marked to reject further operation on the sub-system and flush the pending
> operation, then free the resource. In such a general method, current
> handling to create a new ea_inode should not crash even it is stupid.
> sb->s_root seems to be a key global resource in ext4 subsystem per my
> understanding, and should not be set as NULL until the last step of unmount
>  operation.
> 
> Please help teach me what is wrong in my understanding.

Well, sb->s_root is important but actually not that much because it just
speaks about how the filesystem is attached to the overall directory
hierarchy. In particular while sb->s_root is set, it pins corresponding
inode in memory so we cannot fully cleanup the filesystem. So I think
clearing sb->s_root during unmount process is fine where it is. It is the
ea_inode handling that should be tweaked to avoid issues.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-19  9:23     ` Jun Nie
  2022-12-19 11:46       ` Jan Kara
@ 2022-12-19 16:41       ` Theodore Ts'o
  2022-12-20  8:39         ` Jun Nie
  1 sibling, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2022-12-19 16:41 UTC (permalink / raw)
  To: Jun Nie; +Cc: stable, djwong, jack, jlayton, lczerner, linux-ext4,
	xuyang2018.jy

On Mon, Dec 19, 2022 at 05:23:18PM +0800, Jun Nie wrote:
> 
> Do you mean we have a chance to expand ea_inode in place for some
> cases? If so, a new ea_inode with larger space should be created
> to hold expanded ea_inode data, thus data have to be copied and written
> out through memory in my mind. Or anything other than CPU/memory can
> utilized for this to avoid memory usage, such as DMA?

There are two inodes in question here.  The first is the base inode,
which in this case is /file0.  The second is the ea_inode which stores
the value of one of the extended attributes.  In the syzkaller fuzzed
file system, there is an ea_inode field which is already created; it
contains a value which is too large to fit in the inode or the
extended attribute block; but that's OK, because we can put it in a
ea_inode.  Unfortunately, we are unnecessarily created and deleting
the ea_inode (which contains the xattr *value*) when we move the xattr
from in-inode storage to the external xattr block.

Extended attributes can be stored either in the on-disk inode, or in
an extended attribute block.  The storage in the on-disk inode is
limited, but extended attributes stored don't require a random access
4k read as in the case of the extended attribute block.  So we try to
store extended attributes in the inode if possible --- especially the
ones which might be accessed frequently, such as a POSIX ACL or a
SELinux security id.

The ext4 inode is composed of two portions.  The base 128 byte inode,
which is always present, and which is what was originally used in
ext2.  And the "extra inode fields", which are these fields as
currently defined at the end of struct ext4_inode:

	__le16	i_extra_isize;
	__le16	i_checksum_hi;	/* crc32c(uuid+inum+inode) BE */
	__le32  i_ctime_extra;  /* extra Change time      (nsec << 2 | epoch) */
	__le32  i_mtime_extra;  /* extra Modification time(nsec << 2 | epoch) */
	__le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
	__le32  i_crtime;       /* File Creation time */
	__le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
	__le32  i_version_hi;	/* high 32 bits for 64-bit version */
	__le32	i_projid;	/* Project ID */

The i_extra_isize is the field that is always present for inodes
larger than 128 bytes and for which the ext4 file system feature
EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE is enabled.  The i_extra_isize
field tells us how many of the fields beyond the first 128 are
present.  These fields are necessary for various "advanced" (newer
than ext2) ext4 features, including metadata checksums, support for
dates beyond 2038 and sub-second timestamp granularity, file creation
time, 64-bit i_version, and the project id for project quotas.
Everything beyond i_extra_isize is used for in-inode extended
attributes.

Now, what if we need to add extra space for new ext4 features?  Well,
ext4 has a way of expanding the extra inode fields, and one of the
ways to trigger this is via the debugging mount option,
debug_want_extra_isize.  In this particular syzbot reproducer, the
mount option, "debug_want_extra_isize=128" sets the i_extra_isize
field to maximum allowable size for a 256 byte inode size, and this
means that all extended attributes should be ejected out from in-inode
storage to the external extended attribute block.  We do this on a
best efforts bases, when a modified inode is written back to the disk.

The lazytime mount option delays inode updates until the very last
minute.  The reason for this is to avoid multiple writes to the inode
table blocks.  This improves performance by reducing random 4k writes,
and for flash based storage, reducing flash wearout for flash-based
storage.  For hard drives (HDD's), it reducing random 4k writes
reduces the need to perform Adjacent Track Interference (ATI)
mitigations.  ATI mitigations can significantly increase the 99.9
percentile tail latency on file system operations, and decreasing tail
latency can be worth $$$ for some use cases[1].

[1] https://research.google/pubs/pub44830

The downside of using lazytime updates is that on a crash, the inode
timestamps might not get updated --- but very often, this is not a big
deal.  And normally, when some other inode in the same inode table
block is updated, we take that opportunity to update all of the
timestamps that were deferred.  Or, in the worst case, this will get
delayed until the file system is unounted.

Now, back to the extra space expansion.  Eexpanding to allow extra
inode fields to be used in the future is a "nice to have" sort of
thing.  It can fail for a number of reasons, including there not being
enough space in the extended attribute block to evict the extended
attributes in the inode; or if the file system is full, we might not
be able to allocate an external block for the extended attribute block
in the first place.

So it's OK for us to simply pass on making space for the extra inode
fields if it turns out we happen to be in the process of unmounting
file system.  However, that doesn't fix the performance problem of
unnecessarily deleting and creating the ea_inode when moving the xattr
from the inode to the exernal xattr block.  So fixing that performance
issue is the ideal solution.  Simply passing on the extra_isize
expansion is the second best issue.  Backporting an unrelated fix[2]
which papers over the problem by disallowing the mount option
nouser_xattr is the worst option, since it doesn't actually fix the
underlying file system bug.

[2] commit 2d544ec923dbe5 ("ext4: remove deprecated noacl/nouser_xattr
options")

Backporting [2] will shut up the syzbot reproducer, yes.  But that's
because the syzbot reproducer was inadequately minimized.  *This*
reproducer, which is a easier for a human to understand and which is
appropriately minimized will trigger exact same issue, with or without

#!/bin/bash -vx
#
# This reproduces an ext4 bug caused by an unfortunate interaction
# between lazytime updates happening when a file system is being
# unmounted and expand_extra_isize
#
# Initially discovered via syzkaller:
# https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08
#
img=/tmp/foo.img
dir=/mnt 
file=$dir/file0

rm -f $img
mke2fs -Fq -t ext4 -I 256 -O ea_inode -b 1024 $img 200k
mount $img $dir
v=$(dd if=/dev/zero bs=2000 count=1 2>/dev/null | tr '\0' =)
touch $file
attr -q -s test -V $v $file
umount $dir
mount -o debug_want_extra_isize=128,lazytime /tmp/foo.img $dir
cat $file
umount $dir

This is why your proposal to backport commit 2d544ec923dbe5 is not the
right answer.

> per general understanding of a subsystem uninitialization, a flag
> shall be marked to reject further operation on the sub-system and
> flush the pending operation, then free the resource. In such a
> general method, current handling to create a new ea_inode should not
> crash even it is stupid.  sb->s_root seems to be a key global
> resource in ext4 subsystem per my understanding, and should not be
> set as NULL until the last step of unmount operation.

That's true in general.  And yes, simply bypassing the extra_isize
expansion when the file system is being unmounted is certainly better
that backporting the unrelated commit[2].  But the true correct fix is
to optimize how we migrate the xattr from the in-inode storage to the
external xattr block.

This is also at *best* P2 bug, since (a) it's not real security issue;
just a null pointer derference, and there is no way this could be
leveraged into any kind of denial of service or privilege escalation
attack, and (b) it requires root access, and use of a debugging option
to enable a code path which is in practice never used in production.
It is a syzkaller report, and unfortunately, there seems to be this
assumption that all syzkaller issues are P0 or P1 issues that must be
remediated right away.  Which is not the case in this instance.  It's
a real bug, and so it should be fixed; but it's not a high priority
bug.

That being said, if you'd like ot become more experienced in a portion
of ext4 internals, I'd certainly invite you to try to understand how
ext4 extended attributes are managed, and try your hand at fixing this
bug.

Best regards,

						- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options
  2022-12-19 16:41       ` Theodore Ts'o
@ 2022-12-20  8:39         ` Jun Nie
  0 siblings, 0 replies; 7+ messages in thread
From: Jun Nie @ 2022-12-20  8:39 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: stable, djwong, jack, jlayton, lczerner, linux-ext4,
	xuyang2018.jy

On Mon, Dec 19, 2022 at 11:41:11AM -0500, Theodore Ts'o wrote:
> On Mon, Dec 19, 2022 at 05:23:18PM +0800, Jun Nie wrote:
> > 
> > Do you mean we have a chance to expand ea_inode in place for some
> > cases? If so, a new ea_inode with larger space should be created
> > to hold expanded ea_inode data, thus data have to be copied and written
> > out through memory in my mind. Or anything other than CPU/memory can
> > utilized for this to avoid memory usage, such as DMA?
> 
> There are two inodes in question here.  The first is the base inode,
> which in this case is /file0.  The second is the ea_inode which stores
> the value of one of the extended attributes.  In the syzkaller fuzzed
> file system, there is an ea_inode field which is already created; it
> contains a value which is too large to fit in the inode or the
> extended attribute block; but that's OK, because we can put it in a
> ea_inode.  Unfortunately, we are unnecessarily created and deleting
> the ea_inode (which contains the xattr *value*) when we move the xattr
> from in-inode storage to the external xattr block.
> 
> Extended attributes can be stored either in the on-disk inode, or in
> an extended attribute block.  The storage in the on-disk inode is
> limited, but extended attributes stored don't require a random access
> 4k read as in the case of the extended attribute block.  So we try to
> store extended attributes in the inode if possible --- especially the
> ones which might be accessed frequently, such as a POSIX ACL or a
> SELinux security id.
> 
> The ext4 inode is composed of two portions.  The base 128 byte inode,
> which is always present, and which is what was originally used in
> ext2.  And the "extra inode fields", which are these fields as
> currently defined at the end of struct ext4_inode:
> 
> 	__le16	i_extra_isize;
> 	__le16	i_checksum_hi;	/* crc32c(uuid+inum+inode) BE */
> 	__le32  i_ctime_extra;  /* extra Change time      (nsec << 2 | epoch) */
> 	__le32  i_mtime_extra;  /* extra Modification time(nsec << 2 | epoch) */
> 	__le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
> 	__le32  i_crtime;       /* File Creation time */
> 	__le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
> 	__le32  i_version_hi;	/* high 32 bits for 64-bit version */
> 	__le32	i_projid;	/* Project ID */
> 
> The i_extra_isize is the field that is always present for inodes
> larger than 128 bytes and for which the ext4 file system feature
> EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE is enabled.  The i_extra_isize
> field tells us how many of the fields beyond the first 128 are
> present.  These fields are necessary for various "advanced" (newer
> than ext2) ext4 features, including metadata checksums, support for
> dates beyond 2038 and sub-second timestamp granularity, file creation
> time, 64-bit i_version, and the project id for project quotas.
> Everything beyond i_extra_isize is used for in-inode extended
> attributes.
> 
> Now, what if we need to add extra space for new ext4 features?  Well,
> ext4 has a way of expanding the extra inode fields, and one of the
> ways to trigger this is via the debugging mount option,
> debug_want_extra_isize.  In this particular syzbot reproducer, the
> mount option, "debug_want_extra_isize=128" sets the i_extra_isize
> field to maximum allowable size for a 256 byte inode size, and this
> means that all extended attributes should be ejected out from in-inode
> storage to the external extended attribute block.  We do this on a
> best efforts bases, when a modified inode is written back to the disk.
> 
> The lazytime mount option delays inode updates until the very last
> minute.  The reason for this is to avoid multiple writes to the inode
> table blocks.  This improves performance by reducing random 4k writes,
> and for flash based storage, reducing flash wearout for flash-based
> storage.  For hard drives (HDD's), it reducing random 4k writes
> reduces the need to perform Adjacent Track Interference (ATI)
> mitigations.  ATI mitigations can significantly increase the 99.9
> percentile tail latency on file system operations, and decreasing tail
> latency can be worth $$$ for some use cases[1].
> 
> [1] https://research.google/pubs/pub44830
> 
> The downside of using lazytime updates is that on a crash, the inode
> timestamps might not get updated --- but very often, this is not a big
> deal.  And normally, when some other inode in the same inode table
> block is updated, we take that opportunity to update all of the
> timestamps that were deferred.  Or, in the worst case, this will get
> delayed until the file system is unounted.
> 
> Now, back to the extra space expansion.  Eexpanding to allow extra
> inode fields to be used in the future is a "nice to have" sort of
> thing.  It can fail for a number of reasons, including there not being
> enough space in the extended attribute block to evict the extended
> attributes in the inode; or if the file system is full, we might not
> be able to allocate an external block for the extended attribute block
> in the first place.
> 
> So it's OK for us to simply pass on making space for the extra inode
> fields if it turns out we happen to be in the process of unmounting
> file system.  However, that doesn't fix the performance problem of
> unnecessarily deleting and creating the ea_inode when moving the xattr
> from the inode to the exernal xattr block.  So fixing that performance
> issue is the ideal solution.  Simply passing on the extra_isize
> expansion is the second best issue.  Backporting an unrelated fix[2]
> which papers over the problem by disallowing the mount option
> nouser_xattr is the worst option, since it doesn't actually fix the
> underlying file system bug.
> 
> [2] commit 2d544ec923dbe5 ("ext4: remove deprecated noacl/nouser_xattr
> options")
> 
> Backporting [2] will shut up the syzbot reproducer, yes.  But that's
> because the syzbot reproducer was inadequately minimized.  *This*
> reproducer, which is a easier for a human to understand and which is
> appropriately minimized will trigger exact same issue, with or without
> 
> #!/bin/bash -vx
> #
> # This reproduces an ext4 bug caused by an unfortunate interaction
> # between lazytime updates happening when a file system is being
> # unmounted and expand_extra_isize
> #
> # Initially discovered via syzkaller:
> # https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08
> #
> img=/tmp/foo.img
> dir=/mnt 
> file=$dir/file0
> 
> rm -f $img
> mke2fs -Fq -t ext4 -I 256 -O ea_inode -b 1024 $img 200k
> mount $img $dir
> v=$(dd if=/dev/zero bs=2000 count=1 2>/dev/null | tr '\0' =)
> touch $file
> attr -q -s test -V $v $file
> umount $dir
> mount -o debug_want_extra_isize=128,lazytime /tmp/foo.img $dir
> cat $file
> umount $dir
> 
> This is why your proposal to backport commit 2d544ec923dbe5 is not the
> right answer.
> 
> > per general understanding of a subsystem uninitialization, a flag
> > shall be marked to reject further operation on the sub-system and
> > flush the pending operation, then free the resource. In such a
> > general method, current handling to create a new ea_inode should not
> > crash even it is stupid.  sb->s_root seems to be a key global
> > resource in ext4 subsystem per my understanding, and should not be
> > set as NULL until the last step of unmount operation.
> 
> That's true in general.  And yes, simply bypassing the extra_isize
> expansion when the file system is being unmounted is certainly better
> that backporting the unrelated commit[2].  But the true correct fix is
> to optimize how we migrate the xattr from the in-inode storage to the
> external xattr block.
> 
> This is also at *best* P2 bug, since (a) it's not real security issue;
> just a null pointer derference, and there is no way this could be
> leveraged into any kind of denial of service or privilege escalation
> attack, and (b) it requires root access, and use of a debugging option
> to enable a code path which is in practice never used in production.
> It is a syzkaller report, and unfortunately, there seems to be this
> assumption that all syzkaller issues are P0 or P1 issues that must be
> remediated right away.  Which is not the case in this instance.  It's
> a real bug, and so it should be fixed; but it's not a high priority
> bug.
> 
> That being said, if you'd like ot become more experienced in a portion
> of ext4 internals, I'd certainly invite you to try to understand how
> ext4 extended attributes are managed, and try your hand at fixing this
> bug.
> 
> Best regards,
> 
> 						- Ted

Thanks for the elabration of the logic here. I guess below change is similiar
with what we are expecting. There are 2 question in the change I do not have
anwser yet.

But for the crash with NULL sb->s_root, below change does not impact anything
because all functions are still called. So I guess the protection with
rejecting further request during umount is still needed. Or I still missed
something?


 fs/ext4/xattr.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7decaaf27e82..546808dbbdd6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2551,9 +2551,8 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
 
 	is = kzalloc(sizeof(struct ext4_xattr_ibody_find), GFP_NOFS);
 	bs = kzalloc(sizeof(struct ext4_xattr_block_find), GFP_NOFS);
-	buffer = kvmalloc(value_size, GFP_NOFS);
 	b_entry_name = kmalloc(entry->e_name_len + 1, GFP_NOFS);
-	if (!is || !bs || !buffer || !b_entry_name) {
+	if (!is || !bs || !b_entry_name) {
 		error = -ENOMEM;
 		goto out;
 	}
@@ -2565,14 +2564,21 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
 
 	/* Save the entry name and the entry value */
 	if (entry->e_value_inum) {
+		buffer = kvmalloc(value_size, GFP_NOFS);
+		if (!buffer) {
+			error = -ENOMEM;
+			goto out;
+		}
+
 		error = ext4_xattr_inode_get(inode, entry, buffer, value_size);
 		if (error)
 			goto out;
 	} else {
 		size_t value_offs = le16_to_cpu(entry->e_value_offs);
-		memcpy(buffer, (void *)IFIRST(header) + value_offs, value_size);
+		buffer = (void *)IFIRST(header) + value_offs;
 	}
 
+	/* Can we reuse entry->e_name with assumption of \0 for all e_name? */
 	memcpy(b_entry_name, entry->e_name, entry->e_name_len);
 	b_entry_name[entry->e_name_len] = '\0';
 	i.name = b_entry_name;
@@ -2585,11 +2591,6 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
 	if (error)
 		goto out;
 
-	/* Remove the chosen entry from the inode */
-	error = ext4_xattr_ibody_set(handle, inode, &i, is);
-	if (error)
-		goto out;
-
 	i.value = buffer;
 	i.value_len = value_size;
 	error = ext4_xattr_block_find(inode, &i, bs);
@@ -2597,13 +2598,18 @@ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode,
 		goto out;
 
 	/* Add entry which was removed from the inode into the block */
+	/* Can this function remove in inode xattr automatically? */
 	error = ext4_xattr_block_set(handle, inode, &i, bs);
 	if (error)
 		goto out;
-	error = 0;
+
+	/* Remove the chosen entry from the inode */
+	error = ext4_xattr_ibody_set(handle, inode, &i, is);
+
 out:
 	kfree(b_entry_name);
-	kvfree(buffer);
+	if (entry->e_value_inum && buffer)
+		kvfree(buffer);
 	if (is)
 		brelse(is->iloc.bh);
 	if (bs)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-20  8:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <166431556706.3511882.843791619431401636.b4-ty@mit.edu>
2022-12-16  3:41 ` [PATCH v1] ext4: Remove deprecated noacl/nouser_xattr options Jun Nie
2022-12-16  5:47   ` Theodore Ts'o
2022-12-16 13:10     ` Theodore Ts'o
2022-12-19  9:23     ` Jun Nie
2022-12-19 11:46       ` Jan Kara
2022-12-19 16:41       ` Theodore Ts'o
2022-12-20  8:39         ` Jun Nie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).