From: Arseniy Krasnov <oxffffaa@gmail.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>,
Arseniy Krasnov <avkrasnov@salutedevices.com>
Cc: linux-erofs@lists.ozlabs.org, linux-kernel@vger.kernel.org,
kernel@salutedevices.com, Gao Xiang <xiang@kernel.org>
Subject: Re: erofs pointer corruption and kernel crash
Date: Sun, 26 Apr 2026 14:42:47 +0300 [thread overview]
Message-ID: <c09372f2-8387-4c5a-a0a5-218c4e846c89@gmail.com> (raw)
In-Reply-To: <f5789b4d-512b-4596-af79-cd2b80172b88@linux.alibaba.com>
25.04.2026 18:29, Gao Xiang пишет:
> Hi Arseniy,
>
> On 2026/4/13 15:20, Arseniy Krasnov wrote:
>>
>>
>> 13.04.2026 10:08, Gao Xiang пишет:
>>>
>>>
>>> On 2026/4/11 23:10, Arseniy Krasnov wrote:
>>>>
>>>>
>>>> 10.04.2026 18:41, Gao Xiang пишет:
>>>>> Hi Arseniy,
>>>>>
>>>>> On 2026/4/10 21:27, Arseniy Krasnov wrote:
>>>>>>
>>>>>>
>>>>>> 10.04.2026 15:20, Gao Xiang пишет:
>>>>>>>
>>>>>>>
>>>>>>> On 2026/4/10 19:37, Arseniy Krasnov wrote:
>>>>>>>
>>>>>>> (drop unrelated folks since they all subscribed erofs mailing list)
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 10.04.2026 11:31, Gao Xiang wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>
>>> ...
>>>
>>>>>>>>>
>>>>>>>>> I need more informations to find some clues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>>>>>>>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
>>>>>>> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
>>>>>>> only erofs-utils 1.9+ ship it as an experimental
>>>>>>> feature, see Changelog; so I think you're using
>>>>>>> modified erofs-utils 1.8.10:
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
>>>>>>>
>>>>>>> ```
>>>>>>> erofs-utils 1.9
>>>>>>>
>>>>>>> * This release includes the following updates:
>>>>>>> - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
>>>>>>> ```
>>>>>>>
>>>>>>> Second, I'm pretty sure this issue is related to
>>>>>>> experimenal `-E48bit`, and those information is
>>>>>>> not enough for me to find the root cause, so I
>>>>>>> need to find a way to reproduce myself: It may
>>>>>>> take time; you could debug yourself but I don't
>>>>>>> think it's an easy task if you don't quite familiar
>>>>>>> with the EROFS codebase.
>>>>>>>
>>>>>>> Anyway I really suggest if you need a rush solution
>>>>>>> for production, don't use `-E48bit + zstd` like
>>>>>>> this for now: try to use other options like
>>>>>>> `-zzstd -C65536 -Efragments` instead since those
>>>>>>> are common production choices.
>>>>>>
>>>>>> Ok thanks for this advice! One more question: currently we use this options:
>>>>>> "zstd,22 --max-extent-bytes 65536 -E48bit". Ok we remove "zstd,22" and "E48bit",
>>>>>> but what about "--max-extent-bytes 65536" - is it considered stable option?
>>>>>> Or it is better to use your version: "-zzstd -C65536 -Efragments" ?
>>>>>
>>>>> I'm not sure how you find this
>>>>> "zstd,22 --max-extent-bytes 65536 -E48bit" combination.
>>>>>
>>>>> My suggestion based on production is that as long as
>>>>> you don't use `-zzstd` ++ `-E48bit`, it should be fine.
>>>>>
>>>>> If you need smaller images, I suggest: `-zlzma,9 -C65536 -Efragments`
>>>>> Or like Android, they all use `-zlz4hc`,
>>>>> Or zstd, but don't add `-E48bit`.
>>>>>
>>>>> As for "--max-extent-bytes 65536", it can be dropped
>>>>> since if `-E48bit` is not used, it only has negative
>>>>> impacts.
>>>>>
>>>>> In short, `-E48bit` + `-zzstd` + `--max-extent-bytes`
>>>>> enables new unaligned compression for zstd, but it's
>>>>> a relatively new feature, I still still some time to
>>>>> stablize it but my own time is limited and all things
>>>>> are always prioritized.
>>>>
>>>> Ok, thanks for this advice!
>>>
>>> FYI, I can reproduce this issue locally with `-E48bit`
>>> on in 600s.
>>>
>>> I do think it's a `-E48bit` + zstd issue so
>>> non-`-E48bit` won't be impacted and I will find time
>>> to troubleshoot it this week.
>>
>> Yes, without '-E48bit' we also can't reproduce it for entire weekend on several boards. No such panics.
>
> Can you check if the following informal patch resolves
> this issue? I've checked it locally:
>
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 8a0b15511931..824ffe4b871c 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1509,12 +1509,6 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
> DBG_BUGON(z_erofs_is_shortlived_page(bvec->bv_page));
>
> folio = page_folio(zbv.page);
> - /* For preallocated managed folios, add them to page cache here */
> - if (folio->private == Z_EROFS_PREALLOCATED_FOLIO) {
> - tocache = true;
> - goto out_tocache;
> - }
> -
> mapping = READ_ONCE(folio->mapping);
> /*
> * File-backed folios for inplace I/Os are all locked steady,
> @@ -1527,6 +1521,12 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
> return;
> }
>
> + if (cmpxchg(&folio->private, Z_EROFS_PREALLOCATED_FOLIO, NULL) ==
> + Z_EROFS_PREALLOCATED_FOLIO) {
> + tocache = true;
> + goto out_tocache;
> + }
> +
> folio_lock(folio);
> if (likely(folio->mapping == mc)) {
> /*
> @@ -1546,14 +1546,8 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
> }
> return;
> }
> - /*
> - * Already linked with another pcluster, which only appears in
> - * crafted images by fuzzers for now. But handle this anyway.
> - */
> - tocache = false; /* use temporary short-lived pages */
> } else {
> - DBG_BUGON(1); /* referenced managed folios can't be truncated */
> - tocache = true;
> + DBG_BUGON(1); /* referenced managed folios can't be truncated */
> }
> folio_unlock(folio);
> folio_put(folio);
>
>
> I will form a formal patch later with comments and commit
> message later.
Hi, thanks! I'll test it!
>
> Thanks,
> Gao Xiang
next prev parent reply other threads:[~2026-04-26 11:42 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 8:13 erofs pointer corruption and kernel crash Arseniy Krasnov
2026-04-10 8:31 ` Gao Xiang
2026-04-10 8:42 ` Gao Xiang
2026-04-10 8:51 ` Arseniy Krasnov
2026-04-10 8:59 ` Gao Xiang
2026-04-10 8:55 ` Arseniy Krasnov
2026-04-10 9:20 ` Gao Xiang
2026-04-10 9:59 ` Arseniy Krasnov
2026-04-10 10:01 ` Gao Xiang
2026-04-10 10:03 ` Arseniy Krasnov
2026-04-10 10:06 ` Gao Xiang
2026-04-10 10:10 ` Arseniy Krasnov
2026-04-10 10:22 ` Gao Xiang
2026-04-10 10:31 ` Arseniy Krasnov
2026-04-10 11:37 ` Arseniy Krasnov
2026-04-10 12:20 ` Gao Xiang
2026-04-10 13:27 ` Arseniy Krasnov
2026-04-10 15:41 ` Gao Xiang
2026-04-11 15:10 ` Arseniy Krasnov
2026-04-13 7:08 ` Gao Xiang
2026-04-13 7:20 ` Arseniy Krasnov
2026-04-25 15:29 ` Gao Xiang
2026-04-26 11:42 ` Arseniy Krasnov [this message]
2026-04-27 14:45 ` Arseniy Krasnov
2026-04-10 13:35 ` Arseniy Krasnov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c09372f2-8387-4c5a-a0a5-218c4e846c89@gmail.com \
--to=oxffffaa@gmail.com \
--cc=avkrasnov@salutedevices.com \
--cc=hsiangkao@linux.alibaba.com \
--cc=kernel@salutedevices.com \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox