From: Benjamin Segall <bsegall@google.com>
To: Rishabh Bhatnagar <risbhat@amazon.com>
Cc: <gregkh@linuxfoundation.org>, <shakeelb@google.com>,
<viro@zeniv.linux.org.uk>, <mdecandia@gmail.com>,
<linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>
Subject: Re: [PATCH 5.4 0/2] Fix epoll issue in 5.4 kernels
Date: Mon, 28 Nov 2022 13:05:17 -0800 [thread overview]
Message-ID: <xm26wn7en62a.fsf@google.com> (raw)
In-Reply-To: <20221124001123.3248571-1-risbhat@amazon.com> (Rishabh Bhatnagar's message of "Thu, 24 Nov 2022 00:11:21 +0000")
Rishabh Bhatnagar <risbhat@amazon.com> writes:
> Hi Greg
> After upgrading to 5.4.211 we were started seeing some nodes getting
> stuck in our Kubernetes cluster. All nodes are running this kernel
> version. After taking a closer look it seems that runc was command getting
> stuck. Looking at the stack it appears the thread is stuck in epoll wait for
> sometime.
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> [<0>] ep_poll+0x48d/0x4e0
> [<0>] do_epoll_wait+0xab/0xc0
> [<0>] __x64_sys_epoll_pwait+0x4d/0xa0
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> [<0>] futex_wait_queue_me+0xb6/0x110
> [<0>] futex_wait+0xe2/0x260
> [<0>] do_futex+0x372/0x4f0
> [<0>] __x64_sys_futex+0x134/0x180
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
>
> I noticed there are other discussions going on as well
> regarding this.
> https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
> Reverting the below patch does fix the issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=cf2db24ec4b8e9d399005ececd6f6336916ab6fc
> We don't see this issue in latest upstream kernel or even latest 5.10
> stable tree. Looking at the patches that went in for 5.10 stable there's
> one that stands out that seems to be missing in 5.4.
> 289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
> a timed out thread from the wait queue)
>
> Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
> this patch fixes time out scenarios which might cause missed wake ups.
> The other patch in the patch series also fixes a race and is needed for
> the second patch to apply.
Yes, this definitely makes sense to me; the aggressive removal was only
valid because the rest of the epoll machinery did plenty of extra
checking. And I didn't as carefully check the backports when I saw the
-stable emails.
prev parent reply other threads:[~2022-11-28 21:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-24 0:11 [PATCH 5.4 0/2] Fix epoll issue in 5.4 kernels Rishabh Bhatnagar
2022-11-24 0:11 ` [PATCH 5.4 1/2] epoll: call final ep_events_available() check under the lock Rishabh Bhatnagar
2022-11-24 7:48 ` Thadeu Lima de Souza Cascardo
2022-12-01 4:07 ` Samuel Mendoza-Jonas
2022-11-24 0:11 ` [PATCH 2/2] epoll: check for events when removing a timed out thread from the wait queue Rishabh Bhatnagar
2022-11-24 7:49 ` Thadeu Lima de Souza Cascardo
2022-11-28 21:05 ` Benjamin Segall [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26wn7en62a.fsf@google.com \
--to=bsegall@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mdecandia@gmail.com \
--cc=risbhat@amazon.com \
--cc=shakeelb@google.com \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.