From: Matthew Wilcox <willy@infradead.org>
To: Aaron Tomlin <atomlin@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, mhocko@suse.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
Date: Thu, 20 May 2021 12:56:28 +0100 [thread overview]
Message-ID: <YKZObDpduqwWi/Zm@casper.infradead.org> (raw)
In-Reply-To: <20210520114257.huqhkqsdrhohn3u5@ava.usersys.com>
On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote:
> On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> > On 5/20/21 6:34 AM, Andrew Morton wrote:
> > >
> > > What observed problems motivated this change?
> > >
> > > What were the observed runtime effects of this change?
> >
> > Yep those details from the previous thread should be included here.
>
> Fair enough.
>
> During kernel crash dump/or vmcore analysis: I discovered in the context of
> __alloc_pages_slowpath() the value stored in the no_progress_loops variable
> was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
> signal was pending against current.
While this is true, it's not really answering Andrew's question.
What we want as part of the commit message is something like:
"A customer experienced a low memory situation and sent their task a
fatal signal. Instead of dying promptly, it looped in the page
allocator failing to make progress because ..."
>
> #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
> #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
> #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
>
> // w28 = *(sp + 148) /* no_progress_loops */
> 0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
> // w0 = w0 + 0x1
> 0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
> // *(sp + 148) = w0
> 0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
> // if (w0 >= 0x10)
> // goto __alloc_pages_nodemask+0x904
> 0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
> 0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534
>
> - The stack pointer was 0xffff00002e78f900
>
> crash> p *(int *)(0xffff00002e78f900+148)
> $1 = 31611688
>
> crash> ps 521171
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special
>
> crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
> $2 = (struct sigpending *) 0xffff80809a416e40
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
> $3 = 0x804100
>
> crash> sig -s 0x804100
> SIGKILL SIGTERM SIGXCPU
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
> $4 = 0x100
>
>
> Unfortunately, this incident was not reproduced, to date.
>
>
>
>
>
> Kind regards,
>
> --
> Aaron Tomlin
next prev parent reply other threads:[~2021-05-20 11:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-19 19:23 [PATCH v2] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt Aaron Tomlin
2021-05-19 19:32 ` Matthew Wilcox
2021-05-19 19:48 ` Aaron Tomlin
2021-05-19 20:17 ` [PATCH v3] " Aaron Tomlin
2021-05-20 4:34 ` Andrew Morton
2021-05-20 10:20 ` Vlastimil Babka
2021-05-20 11:42 ` Aaron Tomlin
2021-05-20 11:56 ` Matthew Wilcox [this message]
2021-05-20 13:30 ` Aaron Tomlin
2021-05-20 14:29 ` [PATCH v4] " Aaron Tomlin
2021-05-28 12:53 ` Vlastimil Babka
2021-05-31 11:33 ` Michal Hocko
2021-05-31 11:35 ` Vlastimil Babka
2021-05-31 13:21 ` Michal Hocko
2021-05-20 11:09 ` [PATCH v3] " Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YKZObDpduqwWi/Zm@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).