public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Arseniy Krasnov <avkrasnov@salutedevices.com>
Cc: oxffffaa@gmail.com, linux-erofs@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, kernel@salutedevices.com,
	Gao Xiang <xiang@kernel.org>
Subject: Re: erofs pointer corruption and kernel crash
Date: Sat, 25 Apr 2026 23:29:32 +0800	[thread overview]
Message-ID: <f5789b4d-512b-4596-af79-cd2b80172b88@linux.alibaba.com> (raw)
In-Reply-To: <15702a84-ea4f-4d12-b9e5-a37a4c3bb014@salutedevices.com>

Hi Arseniy,

On 2026/4/13 15:20, Arseniy Krasnov wrote:
> 
> 
> 13.04.2026 10:08, Gao Xiang пишет:
>>
>>
>> On 2026/4/11 23:10, Arseniy Krasnov wrote:
>>>
>>>
>>> 10.04.2026 18:41, Gao Xiang пишет:
>>>> Hi Arseniy,
>>>>
>>>> On 2026/4/10 21:27, Arseniy Krasnov wrote:
>>>>>
>>>>>
>>>>> 10.04.2026 15:20, Gao Xiang пишет:
>>>>>>
>>>>>>
>>>>>> On 2026/4/10 19:37, Arseniy Krasnov wrote:
>>>>>>
>>>>>> (drop unrelated folks since they all subscribed erofs mailing list)
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 10.04.2026 11:31, Gao Xiang wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>
>> ...
>>
>>>>>>>>
>>>>>>>> I need more informations to find some clues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>>>>>>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
>>>>>> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
>>>>>> only erofs-utils 1.9+ ship it as an experimental
>>>>>> feature, see Changelog; so I think you're using
>>>>>> modified erofs-utils 1.8.10:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
>>>>>>
>>>>>> ```
>>>>>> erofs-utils 1.9
>>>>>>
>>>>>>     * This release includes the following updates:
>>>>>>       - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
>>>>>> ```
>>>>>>
>>>>>> Second, I'm pretty sure this issue is related to
>>>>>> experimenal `-E48bit`, and those information is
>>>>>> not enough for me to find the root cause, so I
>>>>>> need to find a way to reproduce myself: It may
>>>>>> take time; you could debug yourself but I don't
>>>>>> think it's an easy task if you don't quite familiar
>>>>>> with the EROFS codebase.
>>>>>>
>>>>>> Anyway I really suggest if you need a rush solution
>>>>>> for production, don't use `-E48bit + zstd` like
>>>>>> this for now: try to use other options like
>>>>>> `-zzstd -C65536 -Efragments` instead since those
>>>>>> are common production choices.
>>>>>
>>>>> Ok thanks for this advice! One more question: currently we use this options:
>>>>> "zstd,22 --max-extent-bytes 65536 -E48bit". Ok we remove "zstd,22" and "E48bit",
>>>>> but what about "--max-extent-bytes 65536" - is it considered stable option?
>>>>> Or it is better to use your version: "-zzstd -C65536 -Efragments" ?
>>>>
>>>> I'm not sure how you find this
>>>> "zstd,22 --max-extent-bytes 65536 -E48bit" combination.
>>>>
>>>> My suggestion based on production is that as long as
>>>> you don't use `-zzstd` ++ `-E48bit`, it should be fine.
>>>>
>>>> If you need smaller images, I suggest: `-zlzma,9 -C65536 -Efragments`
>>>> Or like Android, they all use `-zlz4hc`,
>>>> Or zstd, but don't add `-E48bit`.
>>>>
>>>> As for "--max-extent-bytes 65536", it can be dropped
>>>> since if `-E48bit` is not used, it only has negative
>>>> impacts.
>>>>
>>>> In short, `-E48bit` + `-zzstd` + `--max-extent-bytes`
>>>> enables new unaligned compression for zstd, but it's
>>>> a relatively new feature, I still still some time to
>>>> stablize it but my own time is limited and all things
>>>> are always prioritized.
>>>
>>> Ok, thanks for this advice!
>>
>> FYI, I can reproduce this issue locally with `-E48bit`
>> on in 600s.
>>
>> I do think it's a `-E48bit` + zstd issue so
>> non-`-E48bit` won't be impacted and I will find time
>> to troubleshoot it this week.
> 
> Yes, without '-E48bit' we also can't reproduce it for entire weekend on several boards. No such panics.

Can you check if the following informal patch resolves
this issue?  I've checked it locally:

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 8a0b15511931..824ffe4b871c 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1509,12 +1509,6 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
  	DBG_BUGON(z_erofs_is_shortlived_page(bvec->bv_page));

  	folio = page_folio(zbv.page);
-	/* For preallocated managed folios, add them to page cache here */
-	if (folio->private == Z_EROFS_PREALLOCATED_FOLIO) {
-		tocache = true;
-		goto out_tocache;
-	}
-
  	mapping = READ_ONCE(folio->mapping);
  	/*
  	 * File-backed folios for inplace I/Os are all locked steady,
@@ -1527,6 +1521,12 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
  		return;
  	}

+	if (cmpxchg(&folio->private, Z_EROFS_PREALLOCATED_FOLIO, NULL) ==
+	    Z_EROFS_PREALLOCATED_FOLIO) {
+		tocache = true;
+		goto out_tocache;
+	}
+
  	folio_lock(folio);
  	if (likely(folio->mapping == mc)) {
  		/*
@@ -1546,14 +1546,8 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
  			}
  			return;
  		}
-		/*
-		 * Already linked with another pcluster, which only appears in
-		 * crafted images by fuzzers for now.  But handle this anyway.
-		 */
-		tocache = false;	/* use temporary short-lived pages */
  	} else {
-		DBG_BUGON(1); /* referenced managed folios can't be truncated */
-		tocache = true;
+		DBG_BUGON(1);		/* referenced managed folios can't be truncated */
  	}
  	folio_unlock(folio);
  	folio_put(folio);


I will form a formal patch later with comments and commit
message later.

Thanks,
Gao Xiang

  reply	other threads:[~2026-04-25 15:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10  8:13 erofs pointer corruption and kernel crash Arseniy Krasnov
2026-04-10  8:31 ` Gao Xiang
2026-04-10  8:42   ` Gao Xiang
2026-04-10  8:51     ` Arseniy Krasnov
2026-04-10  8:59       ` Gao Xiang
2026-04-10  8:55   ` Arseniy Krasnov
2026-04-10  9:20     ` Gao Xiang
2026-04-10  9:59       ` Arseniy Krasnov
2026-04-10 10:01         ` Gao Xiang
2026-04-10 10:03           ` Arseniy Krasnov
2026-04-10 10:06             ` Gao Xiang
2026-04-10 10:10               ` Arseniy Krasnov
2026-04-10 10:22                 ` Gao Xiang
2026-04-10 10:31                   ` Arseniy Krasnov
2026-04-10 11:37   ` Arseniy Krasnov
2026-04-10 12:20     ` Gao Xiang
2026-04-10 13:27       ` Arseniy Krasnov
2026-04-10 15:41         ` Gao Xiang
2026-04-11 15:10           ` Arseniy Krasnov
2026-04-13  7:08             ` Gao Xiang
2026-04-13  7:20               ` Arseniy Krasnov
2026-04-25 15:29                 ` Gao Xiang [this message]
2026-04-26 11:42                   ` Arseniy Krasnov
2026-04-27 14:45                     ` Arseniy Krasnov
2026-04-10 13:35       ` Arseniy Krasnov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5789b4d-512b-4596-af79-cd2b80172b88@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=avkrasnov@salutedevices.com \
    --cc=kernel@salutedevices.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox