* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" [not found] ` <tencent_D03A5C20BC0603E8D2F936D37C97FAE62607@qq.com> @ 2025-01-15 15:57 ` Petr Mladek 2025-01-16 5:03 ` laokz 0 siblings, 1 reply; 6+ messages in thread From: Petr Mladek @ 2025-01-15 15:57 UTC (permalink / raw) To: laokz@foxmail.com Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence, live-patching@vger.kernel.org On Wed 2025-01-15 08:32:12, laokz@foxmail.com wrote: > When do livepatch transition, kernel call klp_try_complete_transition() which in-turn might call klp_send_signals(). klp_send_signal() has the code: > > if (klp_signals_cnt == SIGNALS_TIMEOUT) > pr_notice("signaling remaining tasks\n"); > > Do we need to match or filter out this message when check_result? And here klp_signals_cnt MUST EQUAL to SIGNALS_TIMEOUT, right? Good question. Have you seen this message when running the selftests, please? I wonder which test could trigger it. I do not recall any test livepatch where the transition might get blocked for too long. There is the self test with a blocked transition ("busy target module") but the waiting is stopped much earlier there. The message probably might get printed when the selftests are called on a huge and very busy system. But then we might get into troubles also with other timeouts. So it would be nice to know more details about when this happens. Best Regards, Petr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" 2025-01-15 15:57 ` selftests/livepatch: question about dmesg "signaling remaining tasks" Petr Mladek @ 2025-01-16 5:03 ` laokz 2025-01-16 8:48 ` Petr Mladek 2025-01-17 13:10 ` Miroslav Benes 0 siblings, 2 replies; 6+ messages in thread From: laokz @ 2025-01-16 5:03 UTC (permalink / raw) To: Petr Mladek Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence, live-patching@vger.kernel.org Hi Petr, Thanks for the quick reply. On 1/15/2025 11:57 PM, Petr Mladek wrote: > On Wed 2025-01-15 08:32:12, laokz@foxmail.com wrote: >> When do livepatch transition, kernel call klp_try_complete_transition() which in-turn might call klp_send_signals(). klp_send_signal() has the code: >> >> if (klp_signals_cnt == SIGNALS_TIMEOUT) >> pr_notice("signaling remaining tasks\n"); >> >> Do we need to match or filter out this message when check_result? And here klp_signals_cnt MUST EQUAL to SIGNALS_TIMEOUT, right? Oops, I misunderstood the 2nd question: (klp_signals_cnt % SIGNALS_TIMEOUT == 0) not always mean equal. > Good question. Have you seen this message when running the selftests, please? > > I wonder which test could trigger it. I do not recall any test > livepatch where the transition might get blocked for too long. > > There is the self test with a blocked transition ("busy target > module") but the waiting is stopped much earlier there. > > The message probably might get printed when the selftests are > called on a huge and very busy system. But then we might get > into troubles also with other timeouts. So it would be nice > to know more details about when this happens. We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud environment, all tests passed except test-syscall.sh. Mostly it complained the missed dmesg "signaling remaining tasks". I want to confirm from your experts that in theory the failure is expected, or if we could filter out this potential dmesg completely. Thanks, laokz ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" 2025-01-16 5:03 ` laokz @ 2025-01-16 8:48 ` Petr Mladek 2025-01-16 14:17 ` laokz 2025-01-17 13:10 ` Miroslav Benes 1 sibling, 1 reply; 6+ messages in thread From: Petr Mladek @ 2025-01-16 8:48 UTC (permalink / raw) To: laokz Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence, live-patching@vger.kernel.org On Thu 2025-01-16 13:03:16, laokz wrote: > Hi Petr, > > Thanks for the quick reply. > > On 1/15/2025 11:57 PM, Petr Mladek wrote: > > On Wed 2025-01-15 08:32:12, laokz@foxmail.com wrote: > > > When do livepatch transition, kernel call klp_try_complete_transition() which in-turn might call klp_send_signals(). klp_send_signal() has the code: > > > > > > if (klp_signals_cnt == SIGNALS_TIMEOUT) > > > pr_notice("signaling remaining tasks\n"); > > > > > > Do we need to match or filter out this message when check_result? > > > And here klp_signals_cnt MUST EQUAL to SIGNALS_TIMEOUT, right? > > Oops, I misunderstood the 2nd question: (klp_signals_cnt % SIGNALS_TIMEOUT > == 0) not always mean equal. > > > Good question. Have you seen this message when running the selftests, please? > > > > I wonder which test could trigger it. I do not recall any test > > livepatch where the transition might get blocked for too long. > > > > There is the self test with a blocked transition ("busy target > > module") but the waiting is stopped much earlier there. > > > > The message probably might get printed when the selftests are > > called on a huge and very busy system. But then we might get > > into troubles also with other timeouts. So it would be nice > > to know more details about when this happens. > > We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud > environment, all tests passed except test-syscall.sh. Mostly it complained > the missed dmesg "signaling remaining tasks". I want to confirm from your > experts that in theory the failure is expected, or if we could filter out > this potential dmesg completely. The test-syscall.sh test spawns many processes which are calling the SYS_getpid syscall in a busy loop. I could imagine that it might cause problems when the virt VM emulates much more virtual CPUs than the assigned real CPUs. It might be even worse when the RISC-V processor is just emulated on another architecture. Anyway, we have already limited the max number of processes because they overflow the default log buffer size, see the commit 46edf5d7aed54380 ("selftests/livepatch: define max test-syscall processes"). Does it help to reduce the MAXPROC limit from 128 to 64, 32, or 16? IMHO, even 16 processes are good enough. We do not need to waste that many resources by QA. You might also review the setup of your VM and reduce the number of emulated CPUs. If the VM is not able to reasonably handle high load than it might show false positives in many tests. If nothing helps, fell free to send a patch for filtering the "signaling remaining tasks" message. IMHO, it is perfectly fine to hide this message. Just extend the already existing filter in the "check_result" function. Best Regards, Petr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" 2025-01-16 8:48 ` Petr Mladek @ 2025-01-16 14:17 ` laokz 0 siblings, 0 replies; 6+ messages in thread From: laokz @ 2025-01-16 14:17 UTC (permalink / raw) To: Petr Mladek Cc: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Joe Lawrence, live-patching@vger.kernel.org Hi Petr, On 1/16/2025 4:48 PM, Petr Mladek wrote: > On Thu 2025-01-16 13:03:16, laokz wrote: >> Hi Petr, >> >> Thanks for the quick reply. >> >> On 1/15/2025 11:57 PM, Petr Mladek wrote: >>> On Wed 2025-01-15 08:32:12, laokz@foxmail.com wrote: >>>> When do livepatch transition, kernel call klp_try_complete_transition() which in-turn might call klp_send_signals(). klp_send_signal() has the code: >>>> >>>> if (klp_signals_cnt == SIGNALS_TIMEOUT) >>>> pr_notice("signaling remaining tasks\n"); >>>> >>>> Do we need to match or filter out this message when check_result? >>>> And here klp_signals_cnt MUST EQUAL to SIGNALS_TIMEOUT, right? >> >> Oops, I misunderstood the 2nd question: (klp_signals_cnt % SIGNALS_TIMEOUT >> == 0) not always mean equal. >> >>> Good question. Have you seen this message when running the selftests, please? >>> >>> I wonder which test could trigger it. I do not recall any test >>> livepatch where the transition might get blocked for too long. >>> >>> There is the self test with a blocked transition ("busy target >>> module") but the waiting is stopped much earlier there. >>> >>> The message probably might get printed when the selftests are >>> called on a huge and very busy system. But then we might get >>> into troubles also with other timeouts. So it would be nice >>> to know more details about when this happens. >> >> We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud >> environment, all tests passed except test-syscall.sh. Mostly it complained >> the missed dmesg "signaling remaining tasks". I want to confirm from your >> experts that in theory the failure is expected, or if we could filter out >> this potential dmesg completely. > > The test-syscall.sh test spawns many processes which are calling the > SYS_getpid syscall in a busy loop. I could imagine that it might > cause problems when the virt VM emulates much more virtual CPUs than > the assigned real CPUs. It might be even worse when the RISC-V > processor is just emulated on another architecture. > > Anyway, we have already limited the max number of processes because > they overflow the default log buffer size, see the commit > 46edf5d7aed54380 ("selftests/livepatch: define max test-syscall > processes"). > > Does it help to reduce the MAXPROC limit from 128 to 64, 32, or 16? > IMHO, even 16 processes are good enough. We do not need to waste > that many resources by QA. > > You might also review the setup of your VM and reduce the number > of emulated CPUs. If the VM is not able to reasonably handle > high load than it might show false positives in many tests. > > If nothing helps, fell free to send a patch for filtering the > "signaling remaining tasks" message. IMHO, it is perfectly fine > to hide this message. Just extend the already existing filter in > the "check_result" function. With your help, I tried decrease MAXPROC, not ok; decrease VM '-smp 8' to 4, ok, all tests passed all 5 times(MAXPROC not modified). Yes it is my emulation environment triggered the false positive. If later we faced the same problem in real machine, we'd try patching the filter. Thanks a lot laokz ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" 2025-01-16 5:03 ` laokz 2025-01-16 8:48 ` Petr Mladek @ 2025-01-17 13:10 ` Miroslav Benes 2025-01-20 6:38 ` laokz 1 sibling, 1 reply; 6+ messages in thread From: Miroslav Benes @ 2025-01-17 13:10 UTC (permalink / raw) To: laokz Cc: Petr Mladek, Josh Poimboeuf, Jiri Kosina, Joe Lawrence, live-patching@vger.kernel.org Hi, > > Good question. Have you seen this message when running the selftests, > > please? > > > > I wonder which test could trigger it. I do not recall any test > > livepatch where the transition might get blocked for too long. > > > > There is the self test with a blocked transition ("busy target > > module") but the waiting is stopped much earlier there. > > > > The message probably might get printed when the selftests are > > called on a huge and very busy system. But then we might get > > into troubles also with other timeouts. So it would be nice > > to know more details about when this happens. > > We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud > environment, all tests passed except test-syscall.sh. Mostly it complained the > missed dmesg "signaling remaining tasks". I want to confirm from your experts > that in theory the failure is expected, or if we could filter out this > potential dmesg completely. it might also mean that the implementation on risc-v is not complete yet. If there are many unreliable stacktraces, for example, the live patching infrastructure would retry many times which causes delays and you might run into the message eventually. It pays off to enable dynamic_debug for kernel/livepatch/ and see if there is anything suspicious in the output. Regards, Miroslav ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: selftests/livepatch: question about dmesg "signaling remaining tasks" 2025-01-17 13:10 ` Miroslav Benes @ 2025-01-20 6:38 ` laokz 0 siblings, 0 replies; 6+ messages in thread From: laokz @ 2025-01-20 6:38 UTC (permalink / raw) To: Miroslav Benes Cc: Petr Mladek, Josh Poimboeuf, Jiri Kosina, Joe Lawrence, live-patching@vger.kernel.org Hi Miroslav, On 1/17/2025 9:10 PM, Miroslav Benes wrote: > Hi, > >>> Good question. Have you seen this message when running the selftests, >>> please? >>> >>> I wonder which test could trigger it. I do not recall any test >>> livepatch where the transition might get blocked for too long. >>> >>> There is the self test with a blocked transition ("busy target >>> module") but the waiting is stopped much earlier there. >>> >>> The message probably might get printed when the selftests are >>> called on a huge and very busy system. But then we might get >>> into troubles also with other timeouts. So it would be nice >>> to know more details about when this happens. >> >> We're trying to port livepatch to RISC-V. In my qemu virt VM in a cloud >> environment, all tests passed except test-syscall.sh. Mostly it complained the >> missed dmesg "signaling remaining tasks". I want to confirm from your experts >> that in theory the failure is expected, or if we could filter out this >> potential dmesg completely. > > it might also mean that the implementation on risc-v is not complete yet. > If there are many unreliable stacktraces, for example, the live patching > infrastructure would retry many times which causes delays and you might > run into the message eventually. It pays off to enable dynamic_debug for > kernel/livepatch/ and see if there is anything suspicious in the output. Yes, this is just an in-progress work and we'll try your suggestion to help developing. Thanks a lot, laokz ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-20 6:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <TYZPR01MB6878934C04B458FA6FEE011CA6192@TYZPR01MB6878.apcprd01.prod.exchangelabs.com>
[not found] ` <tencent_D03A5C20BC0603E8D2F936D37C97FAE62607@qq.com>
2025-01-15 15:57 ` selftests/livepatch: question about dmesg "signaling remaining tasks" Petr Mladek
2025-01-16 5:03 ` laokz
2025-01-16 8:48 ` Petr Mladek
2025-01-16 14:17 ` laokz
2025-01-17 13:10 ` Miroslav Benes
2025-01-20 6:38 ` laokz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).