All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: "Luís Henriques" <lhenriques@suse.de>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap
Date: Sat, 5 Nov 2022 20:32:08 -0400	[thread overview]
Message-ID: <Y2cAiLNIIJhm4goP@mit.edu> (raw)
In-Reply-To: <20221011155623.14840-1-lhenriques@suse.de>

First of all, you replied to this patch a completely different patch,
"ext4: fix BUG_ON() when directory entry has invalid rec_len".  This
very much confuses b4, so please don't do that.  If you send a patch
series, where the message-id are related, e.g.:

    20221011155623.14840-1-lhenriques@suse.de
    20221011155623.14840-2-lhenriques@suse.de

etc., b4 will figure out what is going on.  But when the message id's
are unrelated, e.g:

    20221011155623.14840-1-lhenriques@suse.de
vs    
    20221012131330.32456-1-lhenriques@suse.de

... b4 will assume that 20221012131330.32456-1-lhenriques@suse.de is a
newer version of 20221011155623.14840-1-lhenriques@suse.de and there
is apparently no way to tell it to not try to use the "newer" version
of the patch.

On Tue, Oct 11, 2022 at 04:56:24PM +0100, Luís Henriques wrote:
> It's possible to hit a NULL pointer exception while accessing the
> sb->s_group_info in ext4_validate_inode_bitmap(), when calling
> ext4_get_group_info().

  ...

> This issue can be fixed by returning NULL in ext4_get_group_info() when
> ->s_group_info is NULL.  This also requires checking the return code from
> ext4_get_group_info() when appropriate.

I don't believe this is a correct diagnosis of what is going on.  Did
you actually confirm the line numbers associated with the call stack?
What makes you believe that?  Look at how s_group_info is initialized
in ext4_mb_alloc_groupinfo() in fs/ext4/mballoc.c.  It's pretty
careful to make sure this is not the case.

>  EXT4-fs (loop0): warning: mounting unchecked fs, running e2fsck is recommended
>  EXT4-fs error (device loop0): ext4_clear_blocks:866: inode #32: comm mount: attempt to clear invalid blocks 16777450 len 1
>  EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 1258291200 (level 1)
>  EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 7379847 (level 2)
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  ...
>  RIP: 0010:ext4_read_inode_bitmap+0x21b/0x5a0
>  ...
>  Call Trace:
>   <TASK>
>   ext4_free_inode+0x172/0x5c0
>   ext4_evict_inode+0x4a5/0x730
>   evict+0xc1/0x1c0
>   ext4_setup_system_zone+0x2ea/0x380
>   ext4_fill_super+0x249f/0x3910
>   ? ext4_reconfigure+0x880/0x880
>   ? snprintf+0x49/0x60
>   ? ext4_reconfigure+0x880/0x880
>   get_tree_bdev+0x169/0x260
>   vfs_get_tree+0x16/0x70
>   path_mount+0x77d/0xa30
>   __x64_sys_mount+0x101/0x140
>   do_syscall_64+0x3c/0x80
>   entry_SYSCALL_64_after_hwframe+0x46/0xb0

So we're evicting an inode while in the middle of calling
ext4_setup_system_zone() in fs/ext4/block_validity.c.  That can only
happen if we are calling iput() on an an inode, and the only place
that we do that in block_validity.c is in the function
ext4_protect_reserved_inode() --- which we call on the journal inode.

Given the error messages, I suspect this was a fuzzed file system
where the journal inode was not in the standard reserved ino, but
rather in a the normal inode number, in s_journal_inum (which is a
leftover relic from the very early ext3 days), and that inode number
was then explicitly/maliciously placed on the orphan list, and then
hilarity ensued from there.

We need to add some better error checking to protect against this case
in ext4_orphan_get().

Do you have the file system image which triggered this failure?  Was
it the same syzkaller report, or perhaps was it some other syzkaller
report?


> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 860fc5119009..e5ac5c2363df 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -148,6 +148,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
>  	struct super_block *sb = inode->i_sb;
>  	Indirect *p = chain;
>  	struct buffer_head *bh;
> +	unsigned int key;
>  	int ret = -EIO;
>  
>  	*err = 0;
> @@ -156,9 +157,18 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
>  	if (!p->key)
>  		goto no_block;
>  	while (--depth) {
> -		bh = sb_getblk(sb, le32_to_cpu(p->key));
> +		key = le32_to_cpu(p->key);
> +		bh = sb_getblk(sb, key);
>  		if (unlikely(!bh)) {
> -			ret = -ENOMEM;
> +			/*
> +			 * sb_getblk() masks different errors by always
> +			 * returning NULL.  Let's distinguish at least the case
> +			 * where the block is out of range.
> +			 */
> +			if (key > ext4_blocks_count(EXT4_SB(sb)->s_es))
> +				ret = -EFSCORRUPTED;
> +			else
> +				ret = -ENOMEM;
>  			goto failure;
>  		}
>

And this is fixing a completely different problem and should go in a
different patch.  It's also not the best way of fixing it.  What we
should do is check whether key is out of bounds *before* calling
sb_getblkf(), and then call ext4_error() to mark the file system is
corrupted, and then return -EFSCORRUPTED.

Cheers,

						- Ted

  reply	other threads:[~2022-11-06  0:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10 14:20 [PATCH] ext4: fix a NULL pointer when validating an inode bitmap Luís Henriques
2022-10-11 15:56 ` [PATCH v2] " Luís Henriques
2022-11-06  0:32   ` Theodore Ts'o [this message]
2022-11-08 14:06     ` Luís Henriques
2022-11-28 22:28       ` Theodore Ts'o
2022-11-29  3:18         ` Baokun Li
2022-11-29 21:00           ` Theodore Ts'o
2022-11-30  3:20             ` Baokun Li
2022-12-01  4:32               ` Theodore Ts'o
2022-12-01  6:20                 ` Baokun Li
2022-10-12 13:13 ` [PATCH v2] ext4: fix BUG_ON() when directory entry has invalid rec_len Luís Henriques
2022-10-12 13:16   ` Luís Henriques
2022-10-12 14:21     ` Theodore Ts'o
2022-10-12 15:18       ` Luís Henriques
2022-11-06  6:16   ` Theodore Ts'o
  -- strict thread matches above, loose matches on Subject: below --
2022-10-12  7:28 [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2cAiLNIIJhm4goP@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=lhenriques@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.