* Re: Several tst-robust* tests time out with recent Linux kernel [not found] <4bda9f2e06512e375e045f9e72edb205104af19c.camel@xry111.site> @ 2023-11-14 9:46 ` Xi Ruoyao 2023-11-14 15:31 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Xi Ruoyao @ 2023-11-14 9:46 UTC (permalink / raw) To: Peter Zijlstra (Intel), libc-alpha Cc: linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > Hi, > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > FAIL: nptl/tst-robust1 > FAIL: nptl/tst-robust3 > FAIL: nptl/tst-robust4 > FAIL: nptl/tst-robust6 > FAIL: nptl/tst-robust7 > FAIL: nptl/tst-robust9 > > This does not happen with Linux 6.6.0. Do you have some clue about > it? Bisected to the kernel commit: commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) Author: peterz@infradead.org <peterz@infradead.org> Date: Thu Sep 21 12:45:08 2023 +0200 futex: Flag conversion Futex has 3 sets of flags: - legacy futex op bits - futex2 flags - internal flags Add a few helpers to convert from the API flags into the internal flags. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 9:46 ` Several tst-robust* tests time out with recent Linux kernel Xi Ruoyao @ 2023-11-14 15:31 ` Peter Zijlstra 2023-11-14 15:40 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 15:31 UTC (permalink / raw) To: Xi Ruoyao Cc: libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 05:46:43PM +0800, Xi Ruoyao wrote: > On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > > Hi, > > > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > > > FAIL: nptl/tst-robust1 > > FAIL: nptl/tst-robust3 > > FAIL: nptl/tst-robust4 > > FAIL: nptl/tst-robust6 > > FAIL: nptl/tst-robust7 > > FAIL: nptl/tst-robust9 > > > > This does not happen with Linux 6.6.0. Do you have some clue about > > it? > > Bisected to the kernel commit: > > commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) > Author: peterz@infradead.org <peterz@infradead.org> > Date: Thu Sep 21 12:45:08 2023 +0200 > > futex: Flag conversion > > Futex has 3 sets of flags: > > - legacy futex op bits > - futex2 flags > - internal flags > > Add a few helpers to convert from the API flags into the internal > flags. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> > Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net I can confirm. I'm also going crazy trying to figure out how this happens. The below is sufficient to make it unhappy... /me most puzzled --- diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b5379c0e6d6d..1a1f9301251f 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -17,7 +17,7 @@ * restarts. */ #ifdef CONFIG_MMU -# define FLAGS_SHARED 0x01 +# define FLAGS_SHARED 0x10 #else /* * NOMMU does not have per process address space. Let the compiler optimize @@ -25,8 +25,8 @@ */ # define FLAGS_SHARED 0x00 #endif -#define FLAGS_CLOCKRT 0x02 -#define FLAGS_HAS_TIMEOUT 0x04 +#define FLAGS_CLOCKRT 0x20 +#define FLAGS_HAS_TIMEOUT 0x40 #ifdef CONFIG_FAIL_FUTEX extern bool should_fail_futex(bool fshared); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 15:31 ` Peter Zijlstra @ 2023-11-14 15:40 ` Peter Zijlstra 2023-11-14 16:43 ` Florian Weimer 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 15:40 UTC (permalink / raw) To: Xi Ruoyao Cc: libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 04:31:00PM +0100, Peter Zijlstra wrote: > On Tue, Nov 14, 2023 at 05:46:43PM +0800, Xi Ruoyao wrote: > > On Tue, 2023-11-14 at 02:33 +0800, Xi Ruoyao wrote: > > > Hi, > > > > > > With Linux 6.7.0-rc1, several tst-robust* tests time out on x86_64: > > > > > > FAIL: nptl/tst-robust1 > > > FAIL: nptl/tst-robust3 > > > FAIL: nptl/tst-robust4 > > > FAIL: nptl/tst-robust6 > > > FAIL: nptl/tst-robust7 > > > FAIL: nptl/tst-robust9 > > > > > > This does not happen with Linux 6.6.0. Do you have some clue about > > > it? > > > > Bisected to the kernel commit: > > > > commit 5694289ce183bc3336407a78c8c722a0b9208f9b (HEAD) > > Author: peterz@infradead.org <peterz@infradead.org> > > Date: Thu Sep 21 12:45:08 2023 +0200 > > > > futex: Flag conversion > > > > Futex has 3 sets of flags: > > > > - legacy futex op bits > > - futex2 flags > > - internal flags > > > > Add a few helpers to convert from the API flags into the internal > > flags. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > Reviewed-by: Andr<C3><A9> Almeida <andrealmeid@igalia.com> > > Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net > > I can confirm. I'm also going crazy trying to figure out how this > happens. > > The below is sufficient to make it unhappy... > > /me most puzzled > > --- > diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h > index b5379c0e6d6d..1a1f9301251f 100644 > --- a/kernel/futex/futex.h > +++ b/kernel/futex/futex.h > @@ -17,7 +17,7 @@ > * restarts. > */ > #ifdef CONFIG_MMU > -# define FLAGS_SHARED 0x01 > +# define FLAGS_SHARED 0x10 > #else > /* > * NOMMU does not have per process address space. Let the compiler optimize Just the above seems sufficient. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 15:40 ` Peter Zijlstra @ 2023-11-14 16:43 ` Florian Weimer 2023-11-14 20:14 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Florian Weimer @ 2023-11-14 16:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Xi Ruoyao, libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida * Peter Zijlstra: >> diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h >> index b5379c0e6d6d..1a1f9301251f 100644 >> --- a/kernel/futex/futex.h >> +++ b/kernel/futex/futex.h >> @@ -17,7 +17,7 @@ >> * restarts. >> */ >> #ifdef CONFIG_MMU >> -# define FLAGS_SHARED 0x01 >> +# define FLAGS_SHARED 0x10 >> #else >> /* >> * NOMMU does not have per process address space. Let the compiler optimize > > Just the above seems sufficient. There are a few futex_wake calls which hard-code the flags argument as 1: kernel/futex/core.c=637=static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, -- kernel/futex/core.c-686- * this. kernel/futex/core.c-687- */ kernel/futex/core.c-688- owner = uval & FUTEX_TID_MASK; kernel/futex/core.c-689- kernel/futex/core.c-690- if (pending_op && !pi && !owner) { kernel/futex/core.c:691: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); kernel/futex/core.c-692- return 0; kernel/futex/core.c-693- } kernel/futex/core.c-694- kernel/futex/core.c-695- if (owner != task_pid_vnr(curr)) kernel/futex/core.c-696- return 0; -- kernel/futex/core.c-739- /* kernel/futex/core.c-740- * Wake robust non-PI futexes here. The wakeup of kernel/futex/core.c-741- * PI futexes happens in exit_pi_state(): kernel/futex/core.c-742- */ kernel/futex/core.c-743- if (!pi && (uval & FUTEX_WAITERS)) kernel/futex/core.c:744: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); kernel/futex/core.c-745- kernel/futex/core.c-746- return 0; kernel/futex/core.c-747-} kernel/futex/core.c-748- kernel/futex/core.c-749-/* Thanks, Florian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 16:43 ` Florian Weimer @ 2023-11-14 20:14 ` Peter Zijlstra 2023-11-14 21:56 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Peter Zijlstra @ 2023-11-14 20:14 UTC (permalink / raw) To: Florian Weimer Cc: Xi Ruoyao, libc-alpha, linux-kernel, linux-api, linux-mm, linux-arch, Thomas Gleixner, André Almeida On Tue, Nov 14, 2023 at 05:43:20PM +0100, Florian Weimer wrote: > * Peter Zijlstra: > > >> diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h > >> index b5379c0e6d6d..1a1f9301251f 100644 > >> --- a/kernel/futex/futex.h > >> +++ b/kernel/futex/futex.h > >> @@ -17,7 +17,7 @@ > >> * restarts. > >> */ > >> #ifdef CONFIG_MMU > >> -# define FLAGS_SHARED 0x01 > >> +# define FLAGS_SHARED 0x10 > >> #else > >> /* > >> * NOMMU does not have per process address space. Let the compiler optimize > > > > Just the above seems sufficient. > > There are a few futex_wake calls which hard-code the flags argument as > 1: > > kernel/futex/core.c=637=static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, > -- > kernel/futex/core.c-686- * this. > kernel/futex/core.c-687- */ > kernel/futex/core.c-688- owner = uval & FUTEX_TID_MASK; > kernel/futex/core.c-689- > kernel/futex/core.c-690- if (pending_op && !pi && !owner) { > kernel/futex/core.c:691: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); > kernel/futex/core.c-692- return 0; > kernel/futex/core.c-693- } > kernel/futex/core.c-694- > kernel/futex/core.c-695- if (owner != task_pid_vnr(curr)) > kernel/futex/core.c-696- return 0; > -- > kernel/futex/core.c-739- /* > kernel/futex/core.c-740- * Wake robust non-PI futexes here. The wakeup of > kernel/futex/core.c-741- * PI futexes happens in exit_pi_state(): > kernel/futex/core.c-742- */ > kernel/futex/core.c-743- if (!pi && (uval & FUTEX_WAITERS)) > kernel/futex/core.c:744: futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); > kernel/futex/core.c-745- > kernel/futex/core.c-746- return 0; > kernel/futex/core.c-747-} > kernel/futex/core.c-748- > kernel/futex/core.c-749-/* Urgh, thanks! Confirmed, the below cures things. Although I should probably make that FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. Let me go do a proper patch. --- kernel/futex/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index d1d7b3c175a4..e7793f0d5757 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -687,7 +687,7 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, owner = uval & FUTEX_TID_MASK; if (pending_op && !pi && !owner) { - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY); return 0; } @@ -740,7 +740,7 @@ static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, * PI futexes happens in exit_pi_state(): */ if (!pi && (uval & FUTEX_WAITERS)) - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SHARED, 1, FUTEX_BITSET_MATCH_ANY); return 0; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip: locking/urgent] futex: Fix hardcoded flags 2023-11-14 20:14 ` Peter Zijlstra @ 2023-11-14 21:56 ` tip-bot2 for Peter Zijlstra 2023-11-15 1:11 ` Several tst-robust* tests time out with recent Linux kernel Edgecombe, Rick P 2023-11-15 3:07 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra 2 siblings, 0 replies; 14+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2023-11-14 21:56 UTC (permalink / raw) To: linux-tip-commits Cc: Xi Ruoyao, Florian Weimer, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/urgent branch of tip: Commit-ID: 59f3daffefbf48206dbf310af5472d6d7d7161df Gitweb: https://git.kernel.org/tip/59f3daffefbf48206dbf310af5472d6d7d7161df Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Tue, 14 Nov 2023 21:36:13 +01:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 14 Nov 2023 22:26:41 +01:00 futex: Fix hardcoded flags Xi reported that commit 5694289ce183 ("futex: Flag conversion") broke glibc's robust futex tests. This was narrowed down to the change of FLAGS_SHARED from 0x01 to 0x10, at which point Florian noted that handle_futex_death() has a hardcoded flags argument of 1. Change this to: FLAGS_SIZE_32 | FLAGS_SHARED, matching how futex_to_flags() unconditionally sets FLAGS_SIZE_32 for all legacy futex ops. Fixes: 5694289ce183 ("futex: Flag conversion") Reported-by: Xi Ruoyao <xry111@xry111.site> Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231114201402.GA25315@noisy.programming.kicks-ass.net --- kernel/futex/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 52695c5..dad981a 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -700,7 +700,8 @@ retry: owner = uval & FUTEX_TID_MASK; if (pending_op && !pi && !owner) { - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, + FUTEX_BITSET_MATCH_ANY); return 0; } @@ -752,8 +753,10 @@ retry: * Wake robust non-PI futexes here. The wakeup of * PI futexes happens in exit_pi_state(): */ - if (!pi && (uval & FUTEX_WAITERS)) - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + if (!pi && (uval & FUTEX_WAITERS)) { + futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, + FUTEX_BITSET_MATCH_ANY); + } return 0; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-14 20:14 ` Peter Zijlstra 2023-11-14 21:56 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra @ 2023-11-15 1:11 ` Edgecombe, Rick P 2023-11-15 8:51 ` Peter Zijlstra 2023-11-15 3:07 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra 2 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-15 1:11 UTC (permalink / raw) To: peterz@infradead.org, fweimer@redhat.com Cc: xry111@xry111.site, andrealmeid@igalia.com, linux-api@vger.kernel.org, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, linux-arch@vger.kernel.org On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > Urgh, thanks! > > Confirmed, the below cures things. Although I should probably make > that > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > Let me go do a proper patch. I saw these fail on the glibc shadow stack branch today, and I also saw this one failing: FAIL: nptl/tst-robustpi8 It spit out: mutex_timedlock of 41 in thread 1 failed with 22 child did not die of a signal in round 1 After the fix here I saw the others pass, but still not tst-robustpi8. Not sure if it is some shadow stack complication. I can try to dig in tomorrow if the problem doesn't jump out. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 1:11 ` Several tst-robust* tests time out with recent Linux kernel Edgecombe, Rick P @ 2023-11-15 8:51 ` Peter Zijlstra 2023-11-15 23:28 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2023-11-15 8:51 UTC (permalink / raw) To: Edgecombe, Rick P Cc: fweimer@redhat.com, xry111@xry111.site, andrealmeid@igalia.com, linux-api@vger.kernel.org, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, linux-arch@vger.kernel.org On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote: > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > > Urgh, thanks! > > > > Confirmed, the below cures things. Although I should probably make > > that > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > > > Let me go do a proper patch. > > I saw these fail on the glibc shadow stack branch today, and I also saw > this one failing: > FAIL: nptl/tst-robustpi8 tip/locking/urgent (branch with the fix on) gets me: root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8 running child verifying locks running child verifying locks running child verifying locks running child verifying locks running child verifying locks root@ivb-ep:/usr/local/src/glibc# Which, to my untrained eye, looks like a pass to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 8:51 ` Peter Zijlstra @ 2023-11-15 23:28 ` Edgecombe, Rick P 2023-11-17 1:22 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-15 23:28 UTC (permalink / raw) To: peterz@infradead.org Cc: xry111@xry111.site, andrealmeid@igalia.com, fweimer@redhat.com, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, linux-api@vger.kernel.org, linux-arch@vger.kernel.org On Wed, 2023-11-15 at 09:51 +0100, Peter Zijlstra wrote: > On Wed, Nov 15, 2023 at 01:11:20AM +0000, Edgecombe, Rick P wrote: > > On Tue, 2023-11-14 at 21:14 +0100, Peter Zijlstra wrote: > > > Urgh, thanks! > > > > > > Confirmed, the below cures things. Although I should probably > > > make > > > that > > > FLAGS_SIZE_32 | FLAGS_SHARED against Linus' tree. > > > > > > Let me go do a proper patch. > > > > I saw these fail on the glibc shadow stack branch today, and I also > > saw > > this one failing: > > FAIL: nptl/tst-robustpi8 > > tip/locking/urgent (branch with the fix on) gets me: > > root@ivb-ep:/usr/local/src/glibc# ./build/nptl/tst-robustpi8 > running child > verifying locks > running child > verifying locks > running child > verifying locks > running child > verifying locks > running child > verifying locks > root@ivb-ep:/usr/local/src/glibc# > > Which, to my untrained eye, looks like a pass to me. It bisects to this for me: fbeb558b0dd0 ("futex/pi: Fix recursive rt_mutex waiter state") Reading the patch, I'm not immediately clear what is going on but a few comments stood out: "There be dragons here" "What could possibly go wrong..." "This is a somewhat dangerous proposition". Seems a likelihood of some race, but it reproduces reliably on my machine. Haven't dug into debugging it yet. Any pointers? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-15 23:28 ` Edgecombe, Rick P @ 2023-11-17 1:22 ` Edgecombe, Rick P 2024-01-19 13:56 ` Stefan Liebler 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2023-11-17 1:22 UTC (permalink / raw) To: peterz@infradead.org Cc: xry111@xry111.site, andrealmeid@igalia.com, fweimer@redhat.com, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, linux-api@vger.kernel.org, linux-arch@vger.kernel.org A bit more info... The error returned to userspace is originating from: https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295 'uval' is often zero in that error case, but sometimes just a mismatching value like: uval=0x567, task_pid_vnr()=0x564 Depending on the number of CPUs the VM is running on it reproduces or not. When it does reproduce, the newly added path here is taken: https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185 The path is taken a lot during the test, sometimes >400 times before the above linked error is generated during the syscall. When it doesn't reproduce, I never saw that new path taken. More print statements make the reproduction less reliable, so it does seem to have a race in the mix at least somewhat. Otherwise, I haven't tried to understand what is going on here with all this highwire locking. Hope it helps. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2023-11-17 1:22 ` Edgecombe, Rick P @ 2024-01-19 13:56 ` Stefan Liebler 2024-01-29 22:23 ` Edgecombe, Rick P 0 siblings, 1 reply; 14+ messages in thread From: Stefan Liebler @ 2024-01-19 13:56 UTC (permalink / raw) To: Edgecombe, Rick P, peterz@infradead.org Cc: xry111@xry111.site, andrealmeid@igalia.com, fweimer@redhat.com, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, Heiko Carstens, Sven Schnelle [-- Attachment #1: Type: text/plain, Size: 2794 bytes --] On 17.11.23 02:22, Edgecombe, Rick P wrote: > A bit more info... > > The error returned to userspace is originating from: > https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L295 > > 'uval' is often zero in that error case, but sometimes just a > mismatching value like: uval=0x567, task_pid_vnr()=0x564 > > > Depending on the number of CPUs the VM is running on it reproduces or > not. When it does reproduce, the newly added path here is taken: > https://github.com/torvalds/linux/blob/master/kernel/futex/pi.c#L1185 > The path is taken a lot during the test, sometimes >400 times before > the above linked error is generated during the syscall. When it doesn't > reproduce, I never saw that new path taken. > > More print statements make the reproduction less reliable, so it does > seem to have a race in the mix at least somewhat. Otherwise, I haven't > tried to understand what is going on here with all this highwire > locking. > > Hope it helps. Hi, I've also observed fails in glibc testcase nptl/tst-robust8pi with: mutex_timedlock of 66 in thread 7 failed with 22 => pthread_mutex_timedlock returns 22=EINVAL I've saw it on s390x. There I've used kernel with commit 120d99901eb288f1d21db3976df4ba347b28f9c7 s390/vfio-ap: do not reset queue removed from host config But I also saw it on a x86_64 kvm-guest with Fedora 39 and copr-repository with vanilla kernel: Linux fedora 6.7.0-0.rc8.20240107gt52b1853b.366.vanilla.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Jan 7 06:17:30 UTC 2024 x86_64 GNU/Linux And reported it to libc-alpha ("FAILING nptl/tst-robust8pi" https://sourceware.org/pipermail/libc-alpha/2024-January/154150.html) where Florian Weimer pointed me to this thread. I've reduced the test (see attachement) and now have only one process with three threads. I only use one mutex with attributes like the original testcase: PTHREAD_MUTEX_ROBUST_NP, PTHREAD_PROCESS_SHARED, PTHREAD_PRIO_INHERIT. Every thread is doing a loop with pthread_mutex_timedlock(abstime={0,0}) and if locked, pthread_mutex_unlock. I've added some uprobes before and after the futex-syscall in __futex_lock_pi64(in pthread_mutex_timedlock) and futex_unlock_pi(in pthread_mutex_unlock). For me __ASSUME_FUTEX_LOCK_PI2 is not available, but __ASSUME_TIME64_SYSCALLS is defined. For me it looks like this (simplified ubprobes-trace): <thread> <timestamp>: <probe> t1 4309589.419744: before syscall in __futex_lock_pi64 t3 4309589.419745: before syscall in futex_unlock_pi t2 4309589.419745: before syscall in __futex_lock_pi64 t3 4309589.419747: after syscall in futex_unlock_pi t2 4309589.419747: after syscall in __futex_lock_pi64 ret=-22=EINVAL t1 4309589.419748: after syscall in __futex_lock_pi64 ret=-110=ETIMEDOUT Can you please have a look again? Bye, Stefan Liebler [-- Attachment #2: tst-robust8pi-20240118.c --] [-- Type: text/x-csrc, Size: 4161 bytes --] //CFLAGS=-pthread //LDFLAGS=-lpthread #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <assert.h> #include <errno.h> #include <unistd.h> #define NUM_THREADS 3 #define THREAD_FUNC thr_func #define USE_BARRIER 1 #ifndef ROUNDS # define ROUNDS 100000000 #endif typedef struct thr_info { int nr; pthread_t thread; } __attribute__ ((aligned (256))) thr_info_t; #define THR_INIT() \ thr_info_t *thr = (thr_info_t *) arg; #define THR_PRINTF(fmt, ...) \ printf ("#%d: " fmt, thr->nr, __VA_ARGS__) #define THR_PUTS(msg) \ printf ("#%d: " msg "\n", thr->nr) #if USE_BARRIER != 0 static pthread_barrier_t thrs_barrier; #endif static pthread_mutex_t mtx; static const struct timespec before = { 0, 0 }; /* ################################################################### thread func ############################################################### */ static void * thr_func (void *arg) { THR_INIT (); int state = 0; int fct; #if 0 /* 3 threads, 1xfct=0=pthread_mutex_lock, 2xfct=1=pthread_mutex_timedlock: EINVAL. */ fct = (thr->nr + 1) % 2; #elif 0 /* 3 threads, 2xfct=0=pthread_mutex_lock, 1xfct=1=pthread_mutex_timedlock: no fails. */ fct = (thr->nr) % 2; #elif 1 /* >3 threads, fct=1=only pthread_mutex_timedlock: EINVAL. */ fct = 1; #endif int round = 0; THR_PRINTF ("started: fct=%d\n", fct); #if USE_BARRIER != 0 pthread_barrier_wait (&thrs_barrier); #endif while (1) { if (state == 0) { round ++; int e; switch (fct) { case 0: e = pthread_mutex_lock (&mtx); if (e != 0) { THR_PRINTF ("mutex_lock failed with %d (round=%d)\n", e, round); exit (1); } state = 1; break; case 1: e = pthread_mutex_timedlock (&mtx, &before); if (e != 0 && e != ETIMEDOUT) { THR_PRINTF ("mutex_timedlock failed with %d (round=%d)\n", e, round); exit (1); } break; default: e = pthread_mutex_trylock (&mtx); if (e != 0 && e != EBUSY) { THR_PRINTF ("mutex_trylock failed with %d (round=%d)\n", e, round); exit (1); } break; } if (e == EOWNERDEAD) pthread_mutex_consistent (&mtx); if (e == 0 || e == EOWNERDEAD) state = 1; } else { int e = pthread_mutex_unlock (&mtx); if (e != 0) { THR_PRINTF ("mutex_unlock of failed with %d (round=%d)\n", e, round); exit (1); } state = 0; } if (round >= ROUNDS) { THR_PRINTF ("REACHED round %d. => exit\n", ROUNDS); if (state != 0) { int e = pthread_mutex_unlock (&mtx); if (e != 0) { THR_PRINTF ("mutex_unlock@exit of failed with %d (round=%d)\n", e, round); exit (1); } } break; } } return NULL; } int main (void) { int i; printf ("main: start %d threads.\n", NUM_THREADS); #if USE_BARRIER != 0 pthread_barrier_init (&thrs_barrier, NULL, NUM_THREADS + 1); #endif pthread_mutexattr_t ma; if (pthread_mutexattr_init (&ma) != 0) { puts ("mutexattr_init failed"); return 0; } if (pthread_mutexattr_setrobust (&ma, PTHREAD_MUTEX_ROBUST_NP) != 0) { puts ("mutexattr_setrobust failed"); return 1; } if (pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED) != 0) { puts ("mutexattr_setpshared failed"); return 1; } if (pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT) != 0) { puts ("pthread_mutexattr_setprotocol failed"); return 1; } if (pthread_mutex_init (&mtx, &ma) != 0) { puts ("pthread_mutex_init failed"); return 1; } thr_info_t thrs[NUM_THREADS]; for (i = 0; i < NUM_THREADS; i++) { thrs[i].nr = i; assert (pthread_create (&(thrs[i].thread), NULL, THREAD_FUNC, &(thrs[i])) == 0);; } #if USE_BARRIER != 0 /* All threads start work after this barrier. */ pthread_barrier_wait (&thrs_barrier); #endif for (i = 0; i < NUM_THREADS; i++) { pthread_join (thrs[i].thread, NULL); } #if USE_BARRIER != 0 pthread_barrier_destroy (&thrs_barrier); #endif printf ("main: end.\n"); return EXIT_SUCCESS; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2024-01-19 13:56 ` Stefan Liebler @ 2024-01-29 22:23 ` Edgecombe, Rick P 2024-01-30 10:12 ` Stefan Liebler 0 siblings, 1 reply; 14+ messages in thread From: Edgecombe, Rick P @ 2024-01-29 22:23 UTC (permalink / raw) To: peterz@infradead.org, stli@linux.ibm.com Cc: xry111@xry111.site, hca@linux.ibm.com, andrealmeid@igalia.com, fweimer@redhat.com, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, svens@linux.ibm.com, linux-api@vger.kernel.org, linux-arch@vger.kernel.org On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote: > I've reduced the test (see attachement) and now have only one process > with three threads. This tests fails on my setup as well: main: start 3 threads. #0: started: fct=1 #1: started: fct=1 #2: started: fct=1 #2: mutex_timedlock failed with 22 (round=28772) But, after this patch: https://lore.kernel.org/all/20240116130810.ji1YCxpg@linutronix.de/ ...the attached test hangs. However, the glibc test that was failing for me "nptl/tst-robustpi8" passes with the linked patch applied. So I think that patch fixes the issue I hit. What is passing supposed to look like on the attached test? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Several tst-robust* tests time out with recent Linux kernel 2024-01-29 22:23 ` Edgecombe, Rick P @ 2024-01-30 10:12 ` Stefan Liebler 0 siblings, 0 replies; 14+ messages in thread From: Stefan Liebler @ 2024-01-30 10:12 UTC (permalink / raw) To: Edgecombe, Rick P, peterz@infradead.org Cc: xry111@xry111.site, hca@linux.ibm.com, andrealmeid@igalia.com, fweimer@redhat.com, linux-mm@kvack.org, libc-alpha@sourceware.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, svens@linux.ibm.com, linux-api@vger.kernel.org, linux-arch@vger.kernel.org On 29.01.24 23:23, Edgecombe, Rick P wrote: > On Fri, 2024-01-19 at 14:56 +0100, Stefan Liebler wrote: >> I've reduced the test (see attachement) and now have only one process >> with three threads. > > This tests fails on my setup as well: > main: start 3 threads. > #0: started: fct=1 > #1: started: fct=1 > #2: started: fct=1 > #2: mutex_timedlock failed with 22 (round=28772) > > But, after this patch: > https://lore.kernel.org/all/20240116130810.ji1YCxpg@linutronix.de/ > > ...the attached test hangs. > > However, the glibc test that was failing for me "nptl/tst-robustpi8" > passes with the linked patch applied. So I think that patch fixes the > issue I hit. > > What is passing supposed to look like on the attached test? kernel commit "futex: Prevent the reuse of stale pi_state" https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=e626cb02ee8399fd42c415e542d031d185783903 fixes the issue on s390x. With this commit, the test runs to the end: main: start 3 threads. #0: started: fct=1 #1: started: fct=1 #2: started: fct=1 #2: REACHED round 100000000. => exit #0: REACHED round 100000000. => exit #1: REACHED round 100000000. => exit main: end. If you want you can reduce the number of rounds by compiling with -DROUNDS=XYZ or manually adjusting the ROUNDS macro define. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: locking/urgent] futex: Fix hardcoded flags 2023-11-14 20:14 ` Peter Zijlstra 2023-11-14 21:56 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra 2023-11-15 1:11 ` Several tst-robust* tests time out with recent Linux kernel Edgecombe, Rick P @ 2023-11-15 3:07 ` tip-bot2 for Peter Zijlstra 2 siblings, 0 replies; 14+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2023-11-15 3:07 UTC (permalink / raw) To: linux-tip-commits Cc: Xi Ruoyao, Florian Weimer, Peter Zijlstra (Intel), Ingo Molnar, stable, x86, linux-kernel The following commit has been merged into the locking/urgent branch of tip: Commit-ID: c9bd1568d5462f4108417518ce1af7b924acfb6f Gitweb: https://git.kernel.org/tip/c9bd1568d5462f4108417518ce1af7b924acfb6f Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Tue, 14 Nov 2023 21:36:13 +01:00 Committer: Ingo Molnar <mingo@kernel.org> CommitterDate: Wed, 15 Nov 2023 04:02:25 +01:00 futex: Fix hardcoded flags Xi reported that commit 5694289ce183 ("futex: Flag conversion") broke glibc's robust futex tests. This was narrowed down to the change of FLAGS_SHARED from 0x01 to 0x10, at which point Florian noted that handle_futex_death() has a hardcoded flags argument of 1. Change this to: FLAGS_SIZE_32 | FLAGS_SHARED, matching how futex_to_flags() unconditionally sets FLAGS_SIZE_32 for all legacy futex ops. Reported-by: Xi Ruoyao <xry111@xry111.site> Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20231114201402.GA25315@noisy.programming.kicks-ass.net Fixes: 5694289ce183 ("futex: Flag conversion") Cc: <stable@vger.kernel.org> --- kernel/futex/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 52695c5..dad981a 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -700,7 +700,8 @@ retry: owner = uval & FUTEX_TID_MASK; if (pending_op && !pi && !owner) { - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, + FUTEX_BITSET_MATCH_ANY); return 0; } @@ -752,8 +753,10 @@ retry: * Wake robust non-PI futexes here. The wakeup of * PI futexes happens in exit_pi_state(): */ - if (!pi && (uval & FUTEX_WAITERS)) - futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); + if (!pi && (uval & FUTEX_WAITERS)) { + futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1, + FUTEX_BITSET_MATCH_ANY); + } return 0; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-01-30 10:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4bda9f2e06512e375e045f9e72edb205104af19c.camel@xry111.site>
2023-11-14 9:46 ` Several tst-robust* tests time out with recent Linux kernel Xi Ruoyao
2023-11-14 15:31 ` Peter Zijlstra
2023-11-14 15:40 ` Peter Zijlstra
2023-11-14 16:43 ` Florian Weimer
2023-11-14 20:14 ` Peter Zijlstra
2023-11-14 21:56 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra
2023-11-15 1:11 ` Several tst-robust* tests time out with recent Linux kernel Edgecombe, Rick P
2023-11-15 8:51 ` Peter Zijlstra
2023-11-15 23:28 ` Edgecombe, Rick P
2023-11-17 1:22 ` Edgecombe, Rick P
2024-01-19 13:56 ` Stefan Liebler
2024-01-29 22:23 ` Edgecombe, Rick P
2024-01-30 10:12 ` Stefan Liebler
2023-11-15 3:07 ` [tip: locking/urgent] futex: Fix hardcoded flags tip-bot2 for Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.