From: Uladzislau Rezki <urezki@gmail.com>
To: David Wang <00107082@163.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
oliver.sang@intel.com, urezki@gmail.com, ahuang12@lenovo.com,
akpm@linux-foundation.org, bhe@redhat.com, hch@infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, lkp@intel.com,
mjguzik@gmail.com, oe-lkp@lists.linux.dev, harry.yoo@oracle.com,
kent.overstreet@linux.dev
Subject: Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
Date: Mon, 23 Jun 2025 13:36:03 +0200 [thread overview]
Message-ID: <aFk8I4qNG9ntonTa@pc636> (raw)
In-Reply-To: <375419f4.2ba1.1979aad313a.Coremail.00107082@163.com>
On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote:
>
> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
> >>
> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >> >
> >> > Hello,
> >> >
> >> > for this change, we reported
> >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> >> > in
> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >> >
> >> > at that time, we made some tests with x86_64 config which runs well.
> >> >
> >> > now we noticed the commit is in mainline now.
> >>
> >> > the config still has expected diff with parent:
> >> >
> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> >> > CONFIG_TEST_MISC_MINOR=m
> >> > # CONFIG_TEST_LKM is not set
> >> > CONFIG_TEST_BITOPS=m
> >> > -CONFIG_TEST_VMALLOC=m
> >> > +CONFIG_TEST_VMALLOC=y
> >> > # CONFIG_TEST_BPF is not set
> >> > CONFIG_FIND_BIT_BENCHMARK=m
> >> > # CONFIG_TEST_FIRMWARE is not set
> >> >
> >> >
> >> > then we noticed similar random issue with x86_64 randconfig this time.
> >> >
> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> >> > ---------------- ---------------------------
> >> > fail:runs %reproduction fail:runs
> >> > | | |
> >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> >> > :199 34% 67:200 dmesg.Mem-Info
> >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> >> > :199 34% 67:200 dmesg.RIP:down_read_trylock
> >> >
> >> > we don't have enough knowledge to understand the relationship between code
> >> > change and the random issues. just report what we obsverved in our tests FYI.
> >> >
> >>
> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
> >>
> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> >> memory allocation fails show_mem() would invoke alloc_tag_top_users.
> >>
> >> With following configuration:
> >>
> >> CONFIG_TEST_VMALLOC=y
> >> CONFIG_MEM_ALLOC_PROFILING=y
> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >>
> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> >> a NULL deference because alloc_tag_cttype was not init yet.
> >>
> >> I add some debug to confirm this theory
> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> index d48b80f3f007..9b8e7501010f 100644
> >> --- a/lib/alloc_tag.c
> >> +++ b/lib/alloc_tag.c
> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> struct codetag *ct;
> >> struct codetag_bytes n;
> >> unsigned int i, nr = 0;
> >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >> + return 0;
> >>
> >> if (can_sleep)
> >> codetag_lock_module_list(alloc_tag_cttype, true);
> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> >> shutdown_mem_profiling(true);
> >> return PTR_ERR(alloc_tag_cttype);
> >> }
> >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >>
> >> return 0;
> >> }
> >>
> >> When bootup the kernel, the log shows:
> >>
> >> $ sudo dmesg -T | grep profiling
> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
> >>
> >>
> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> >> or mem_show() should check whether alloc_tag is done initialized when calling
> >> alloc_tag_top_users
> >
> >Thanks for reporting!
> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> >will address this issue as well. Is that correct?
>
> Yes, the panic can be fix by that patch.
>
> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>
We can, but then we would not notice the bag that is in question :)
At least we should, i think, to exclude the tests which trigger warnings
when the test-suite is run with default configurations, i.e. run the tests
which are not supposed to fail.
--
Uladzislau Rezki
next prev parent reply other threads:[~2025-06-23 11:36 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
2025-06-19 15:04 ` Harry Yoo
2025-06-20 8:47 ` Uladzislau Rezki
2025-06-22 22:54 ` Suren Baghdasaryan
2025-06-23 11:29 ` Uladzislau Rezki
2025-06-19 15:08 ` David Wang
2025-06-20 1:14 ` Harry Yoo
2025-06-20 0:40 ` [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled Harry Yoo
2025-06-20 3:09 ` David Wang
2025-06-20 10:40 ` [PATCH] " Harry Yoo
2025-06-20 11:33 ` Harry Yoo
2025-06-20 13:59 ` David Wang
2025-06-20 12:47 ` Harry Yoo
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
2025-06-22 22:50 ` Suren Baghdasaryan
2025-06-23 2:04 ` Harry Yoo
2025-06-23 2:45 ` David Wang
2025-06-23 3:16 ` David Wang
2025-06-23 4:39 ` David Wang
2025-06-23 11:36 ` Uladzislau Rezki [this message]
2025-06-23 13:20 ` David Wang
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
2025-06-20 19:59 ` Harry Yoo
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-21 3:43 ` David Wang
2025-06-22 22:24 ` [PATCH " Suren Baghdasaryan
2025-06-23 2:01 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFk8I4qNG9ntonTa@pc636 \
--to=urezki@gmail.com \
--cc=00107082@163.com \
--cc=ahuang12@lenovo.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=harry.yoo@oracle.com \
--cc=hch@infradead.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=mjguzik@gmail.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.