All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chengming Zhou <chengming.zhou@linux.dev>
To: Nhat Pham <nphamcs@gmail.com>, Christian Heusel <christian@heusel.eu>
Cc: Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Runge <dave@sleepmap.de>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	Mark W <instruform@gmail.com>,
	regressions@lists.linux.dev, Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosryahmed@google.com>
Subject: Re: [REGRESSION] Null pointer dereference while shrinking zswap
Date: Wed, 17 Apr 2024 11:44:45 +0800	[thread overview]
Message-ID: <246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev> (raw)
In-Reply-To: <CAKEwX=MWPUf1NMGdn+1AkRdOUf25ifAbPyoP9zppPTx3U3Tv2Q@mail.gmail.com>

On 2024/4/17 08:22, Nhat Pham wrote:
> On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>>
>>> On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote:
>>>>
>>>> Hello everyone,
>>>
>>> Thanks for the report, Christian! Looking at it now.
>>>
>>>>
>>>> while rebuilding a few packages in Arch Linux we have recently come
>>>> across a regression in the linux kernel which was made visible by a test
>>>> failure in libguestfs[0], where the booted kernel showed a Call Trace
>>>> like the following one:
>>>>
>>>> [  218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M
>>>
>>> Is this one of the kernel versions that was broken? That looks a bit
>>> odd, as zswap shrinker landed on 6.8...
>>
>> Ah ignore this - I understand the versioning now...
>>
>>>
>>>> [  218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M
>>>> [  218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M
>>>> [  218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M
>>>> [  218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M
>>>> [  218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M
>>>> [  218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M
>>>> [  218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M
>>>> [  218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M
>>>> [  218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M
>>>> [  218.742376] FS:  00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M
>>>> [  218.742569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
>>>> [  218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M
>>>> [  218.743494] PKRU: 55555554^M
>>>> [  218.743593] Call Trace:^M
>>>> [  218.743733]  <TASK>^M
>>>> [  218.743847]  ? __die+0x23/0x70^M
>>>> [  218.743957]  ? page_fault_oops+0x171/0x4e0^M
>>>> [  218.744056]  ? free_unref_page+0xf6/0x180^M
>>>> [  218.744458]  ? exc_page_fault+0x7f/0x180^M
>>>> [  218.744551]  ? asm_exc_page_fault+0x26/0x30^M
>>>> [  218.744684]  ? memcg_page_state+0x9/0x30^M
>>>> [  218.744779]  zswap_shrinker_count+0x9d/0x110^M
>>>> [  218.744896]  do_shrink_slab+0x3a/0x360^M
>>>> [  218.744990]  shrink_slab+0xc7/0x3c0^M
>>>> [  218.745609]  drop_slab+0x85/0x140^M
>>>> [  218.745691]  drop_caches_sysctl_handler+0x7e/0xd0^M
>>>> [  218.745799]  proc_sys_call_handler+0x1c0/0x2e0^M
>>>> [  218.745912]  vfs_write+0x23d/0x400^M
>>>> [  218.745998]  ksys_write+0x6f/0xf0^M
>>>> [  218.746080]  do_syscall_64+0x64/0xe0^M
>>>> [  218.746169]  ? exit_to_user_mode_prepare+0x132/0x1f0^M
>>>> [  218.746873]  entry_SYSCALL_64_after_hwframe+0x6e/0x76^M
>>>>
> 
> Actually, inspecting the code a bit more - can memcg be null here?
> 
> Specifically, if mem_cgroup_disabled() is true, can we see null memcg
> here? Looks like in this case, mem_cgroup_iter() can return null, and
> the first iteration of drop_slab_node() can have null memcg if it's
> returned by mem_cgroup_iter().
> 
> shrink_slab() will still proceed and call do_shrink_slab() if the
> memcg is null - provided that mem_cgroup_disabled() holds:
> 
> if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
>       return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> 

Ah, I think your analysis is very right, here memcg is NULL but memcg_page_state
won't check.

> Inside zswap_shrink_count(), all the memcg accessors in this area seem
> to always check for null memcg (mem_cgroup_lruvec,
> mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which
> is the one line that fail.
> 
> If this is all to it, we can simply add a null check or
> mem_cgroup_disabled() check here, and use pool stats instead?

Both look ok to me. The VM could only set sc->memcg to NULL when memcg
disabled, right?

Thanks.

  reply	other threads:[~2024-04-17  3:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel
2024-04-16 19:18 ` Andrew Morton
2024-04-16 19:57   ` Christian Heusel
2024-04-16 22:14 ` Nhat Pham
2024-04-16 22:29   ` Christian Heusel
2024-04-16 23:29   ` Nhat Pham
2024-04-17  0:22     ` Nhat Pham
2024-04-17  3:44       ` Chengming Zhou [this message]
2024-04-17 14:33         ` Johannes Weiner
2024-04-17 15:08           ` Richard W.M. Jones
2024-04-17 17:18           ` Christian Heusel
2024-04-18 12:40             ` Johannes Weiner
2024-04-18 14:25               ` Linux regression tracking (Thorsten Leemhuis)
2024-04-18 20:09               ` Yosry Ahmed
2024-04-19 14:22                 ` Johannes Weiner
2024-04-19 19:10                   ` Yosry Ahmed
2024-04-17  0:33 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=246c1f4d-af13-40fa-b968-fbaf36b8f91f@linux.dev \
    --to=chengming.zhou@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=christian@heusel.eu \
    --cc=dave@sleepmap.de \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=instruform@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=regressions@lists.linux.dev \
    --cc=rjones@redhat.com \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.