From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73FD11D131B; Wed, 16 Oct 2024 08:02:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729065768; cv=none; b=OkYeJTg1d7mRexyc1AB6rxE4/688t0hJaCgQ2vOX+OIFzTO72/KqdmIBKRTO0blQC7pajX9+aZBD6mZIRHjV5wKrfSEoxDYYotXiR4sBaTV2QyUJ9BysSrCZt5iARu/yOl5M0Zrf94MylmgqDn98p18DCvEpGDtl3t6JnEksIZc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729065768; c=relaxed/simple; bh=lYVoOhBmBv2Oe1O7WtpFdmC6UcsaAGmc/QEnV7hVXz0=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=fabU0RhQfyQNF7w8BkEdxQuqOI0LQfjTrlysiW51VpU0z2HdHEJq7kq1AeAx8uQXZ5tYR8hfPvUBkACV5g9RJEb86vJp25ofErtyACAWFfJdkn99FpX+dsWOcenmw108dJT7C5uaOe8bc7QF+F0E6JZovsx4t2E9tfPUVEkC6Z8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4XT3MC2Y9dz2DdVd; Wed, 16 Oct 2024 16:01:27 +0800 (CST) Received: from dggpeml100021.china.huawei.com (unknown [7.185.36.148]) by mail.maildlp.com (Postfix) with ESMTPS id 629D6140391; Wed, 16 Oct 2024 16:02:41 +0800 (CST) Received: from [127.0.0.1] (10.174.177.71) by dggpeml100021.china.huawei.com (7.185.36.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 16 Oct 2024 16:02:40 +0800 Message-ID: <3930aad6-174d-4422-944e-6c90a3ea065a@huawei.com> Date: Wed, 16 Oct 2024 16:02:40 +0800 Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry To: Jan Kara CC: Qianqiang Liu , , , syzbot , , , , Yang Erkun , Baokun Li References: <66efba95.050a0220.3195df.008c.GAE@google.com> <20241009155028.u7jpzrw6txldt43j@quack3> <05f9c7c2-655a-4f5b-be8e-93f511a954bd@huawei.com> <20241014163120.hinbd5jc6mp4vev7@quack3> Content-Language: en-US From: Baokun Li In-Reply-To: <20241014163120.hinbd5jc6mp4vev7@quack3> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml100021.china.huawei.com (7.185.36.148) On 2024/10/15 0:31, Jan Kara wrote: > On Fri 11-10-24 10:18:04, Baokun Li wrote: >> On 2024/10/9 23:50, Jan Kara wrote: >>>> Or go one step further and add a mechanism like xfs Reverse-Mapping, which >>>> makes sure that allocated blocks do point to the target inode, which could >>>> replace the current block_validity, and could also be used in future online >>>> fscks. >>> Well, that is a rather big change. It requires significant on-disk format >>> change and also performance cost when to maintain. Furthermore for xattr >>> blocks which can be shared by many inodes it is not even clear how to >>> implement this... So I'm not sure we really want to do this either. >> Yes, there can be a lot of work involved. >> >>  * Perhaps we could create an rmap file to store the rmap tree to avoid >>    on-disk format changes. >>  * The performance impact of maintaining rmap really needs to be evaluated, >>    perhaps by writing a simple DEMO to test it. >>  * XFS supports shared blocks(A.K.A. reflink.), so even if the physical >>    blocks are the same, but the inodes are different or the logical blocks >>    are different, they will be recorded multiple times in the tree. So the >>    shared xattr block can be handled similarly. >> >> We have plans to support online fsck in the future, and implementing rmap >> is one of the steps. Perhaps one can wait until rmap is implemented to >> assess whether it is worth a strict check here. > Yes, we could implement something like this be as you wrote, it's going to > be a lot of work. We've briefly discussed this with Ted on ext4 call and we > came to a conclusion that this is a type of corruption ext4 may never > protect agaist. You simply should not mount arbitrarily corrupted > filesystems... Thank you for discussing this with Ted. This kind of problem is really tricky. > But if you want to try, sure go ahead :) As server clusters get larger and larger, server maintenance becomes very difficult. Therefore, timely detection of problems (i.e. online scanning, similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e. online fsck, similar to e2fsck -a) have always been the requirements of our customers. Thus online fsck has been on our TODO list, and it's really time to start doing it. 😀 > One relatively easy solution to similar class of problems would be to store > the type of metadata buffer inside the buffer_head when we are verifying > checksum, clear the info when freeing the block in __ext4_forget(), and > fail with EFSCORRUPTED error when one type -> another type transition would > happen. This solution looks great. If we save checksum further, we can sense not only the change of block type, but also the change of block data. But now journal_head takes up bh->b_private, so to record information, we need to add new fields to buffer_head (e.g. b_info), or modify the logic of journal_head (e.g. make bh->b_private hold ext4_bufdata, and include the journal_head in ext4_bufdata. ocfs2 needs a similar adaptation). >> Implementing rmap may take some time, until then we can avoid the problem >> as much as possible by checking the magic and xattr block csum. >> Maybe something like this? >> >> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c >> index 7647e9f6e190..cd3ae1e3371c 100644 >> --- a/fs/ext4/xattr.c >> +++ b/fs/ext4/xattr.c >> @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct >> ext4_xattr_info *i, >>                 } >>         } >> >> +       if (WARN_ON_ONCE(last < here)) { >> +               EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s", >> +                                       is_block ? "block" : "ibody"); >> +               ret = -EFSCORRUPTED; >> +               goto out; >> +       } >> + >>         /* Check whether we have enough space. */ >>         if (i->value) { >>                 size_t free; >> @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode >> *inode, >>         } >> >>         if (s->base) { >> +               struct ext4_xattr_header *hdr; >>                 int offset = (char *)s->here - bs->bh->b_data; >> >>                 BUFFER_TRACE(bs->bh, "get_write_access"); >> @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode >> *inode, >>                         goto cleanup; >> >>                 lock_buffer(bs->bh); >> +               hdr = header(s->base); >> + >> +               if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || >> +                   (ext4_has_metadata_csum(inode->i_sb) && >> +                    (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr) >> != >> +                     hdr->h_checksum))) { >> +                       unlock_buffer(bs->bh); >> +                       error = -EFSCORRUPTED; >> +                       goto bad_block; >> +               } >> >>                 if (header(s->base)->h_refcount == cpu_to_le32(1)) { >>                         __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); > Hum, there are more places in xattr code that access a buffer that could > have been modified. So why do you add check into this place? Is it somehow > special? > > Honza The out-of-bounds access occurs here because the last dentry obtained in ext4_xattr_set_entry() is not the same as the last dentry obtained in ext4_xattr_block_find(). When we modify an xattr, we always hold the inode lock, so the ibody case is not considered. Check_xattrs() is called to do a full check when looking up an xattr, so it's not necessary to consider that either. When inserting an xattr into an xattr block, there are three cases:  * xattr block is newly allocated, and the block is allocated only    after the entry is set in memory.  * xattr block is unshared, insert the new xattr directly.  * xattr block is shared, copy the xattr block and insert the new xattr. Only in the last two cases can a multiply claimed xattr block result in out-of-bounds access, so the buffer lock is held to verify that the data is correct. (It looks like kmemdup should be moved to the buffer lock here as well.🤔) Thanks again! Regards, Baokun