stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 5.4 0/2] Fix epoll issue in 5.4 kernels
@ 2022-11-24  0:11 Rishabh Bhatnagar
  2022-11-24  0:11 ` [PATCH 5.4 1/2] epoll: call final ep_events_available() check under the lock Rishabh Bhatnagar
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rishabh Bhatnagar @ 2022-11-24  0:11 UTC (permalink / raw)
  To: gregkh, shakeelb, viro, bsegall
  Cc: mdecandia, linux-kernel, stable, Rishabh Bhatnagar

Hi Greg
After upgrading to 5.4.211 we were started seeing some nodes getting
stuck in our Kubernetes cluster. All nodes are running this kernel
version. After taking a closer look it seems that runc was command getting
stuck. Looking at the stack it appears the thread is stuck in epoll wait for
sometime. 
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] ep_poll+0x48d/0x4e0
[<0>] do_epoll_wait+0xab/0xc0
[<0>] __x64_sys_epoll_pwait+0x4d/0xa0
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] futex_wait_queue_me+0xb6/0x110
[<0>] futex_wait+0xe2/0x260
[<0>] do_futex+0x372/0x4f0
[<0>] __x64_sys_futex+0x134/0x180
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1

I noticed there are other discussions going on as well
regarding this.
https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
Reverting the below patch does fix the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=cf2db24ec4b8e9d399005ececd6f6336916ab6fc
We don't see this issue in latest upstream kernel or even latest 5.10
stable tree. Looking at the patches that went in for 5.10 stable there's
one that stands out that seems to be missing in 5.4.
289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
a timed out thread from the wait queue)

Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
this patch fixes time out scenarios which might cause missed wake ups.
The other patch in the patch series also fixes a race and is needed for
the second patch to apply.

Roman Penyaev (1):
  epoll: call final ep_events_available() check under the lock

Soheil Hassas Yeganeh (1):
  epoll: check for events when removing a timed out thread from the wait
    queue

 fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-01  4:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-24  0:11 [PATCH 5.4 0/2] Fix epoll issue in 5.4 kernels Rishabh Bhatnagar
2022-11-24  0:11 ` [PATCH 5.4 1/2] epoll: call final ep_events_available() check under the lock Rishabh Bhatnagar
2022-11-24  7:48   ` Thadeu Lima de Souza Cascardo
2022-12-01  4:07   ` Samuel Mendoza-Jonas
2022-11-24  0:11 ` [PATCH 2/2] epoll: check for events when removing a timed out thread from the wait queue Rishabh Bhatnagar
2022-11-24  7:49   ` Thadeu Lima de Souza Cascardo
2022-11-28 21:05 ` [PATCH 5.4 0/2] Fix epoll issue in 5.4 kernels Benjamin Segall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).