* Splat in kernel RT while processing incoming network packets @ 2023-07-03 12:47 Wander Lairson Costa 2023-07-03 13:20 ` Wander Lairson Costa 2023-07-03 14:29 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 9+ messages in thread From: Wander Lairson Costa @ 2023-07-03 12:47 UTC (permalink / raw) To: linux-kernel, linux-rt-users, bigeasy, juri.lelli Dear all, I am writing to report a splat issue we encountered while running the Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet Steering). During some testing of the RT kernel version 6.4.0 with Network RPS enabled, we observed a splat occurring in the SoftIRQ subsystem. The splat message is as follows: [ 37.168920] ------------[ cut here ]------------ [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60 [ 37.168935] Modules linked in: xt_conntrack(E) ... [ 37.168976] Unloaded tainted modules: intel_cstate(E):4 intel_uncore(E):3 [ 37.168994] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E ------- --- 6.4.0-0.rc2.23.test.eln127.x86_64+rt #1 [ 37.168996] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014 [ 37.168998] RIP: 0010:do_softirq_post_smp_call_flush+0x2d/0x60 [ 37.169001] Code: 00 0f 1f 44 00 00 53 89 fb 48 c7 c7 f7 98 be 96 e8 d8 97 d2 00 65 66 8b 05 f8 36 ... [ 37.169002] RSP: 0018:ffffffff97403eb0 EFLAGS: 00010002 [ 37.169004] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003 [ 37.169005] RDX: ffff992db7a34840 RSI: ffffffff96be98f7 RDI: ffffffff96bc23d8 [ 37.169006] RBP: ffffffff97410000 R08: ffff992db7a34840 R09: ffff992c87f8dbc0 [ 37.169007] R10: 00000000fffbfc67 R11: 0000000000000018 R12: 0000000000000000 [ 37.169008] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 37.169011] FS: 0000000000000000(0000) GS:ffff992db7a00000(0000) knlGS:0000000000000000 [ 37.169013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.169014] CR2: 00007f028b8da3f8 CR3: 0000000118f44001 CR4: 0000000000370eb0 [ 37.169015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 37.169015] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 37.169016] Call Trace: [ 37.169018] <TASK> [ 37.169020] flush_smp_call_function_queue+0x78/0x80 [ 37.169026] do_idle+0xb2/0xd0 [ 37.169030] cpu_startup_entry+0x1d/0x20 [ 37.169032] rest_init+0xd1/0xe0 [ 37.169037] arch_call_rest_init+0xe/0x30 [ 37.169044] start_kernel+0x342/0x420 [ 37.169046] x86_64_start_reservations+0x18/0x30 [ 37.169051] x86_64_start_kernel+0x96/0xa0 [ 37.169054] secondary_startup_64_no_verify+0x10b/0x10b [ 37.169059] </TASK> [ 37.169060] ---[ end trace 0000000000000000 ]--- It comes from [1]. The issue lies in the mechanism of RPS to defer network packets processing to other CPUs. It sends an IPI to the to the target CPU. The registered callback is rps_trigger_softirq, which will raise a softirq, leading to the following scenario: CPU0 CPU1 | netif_rx() | | | enqueue_to_backlog(cpu=1) | | | | net_rps_send_ipi() | | | flush_smp_call_function_queue() | | | was_pending = local_softirq_pending() | | | __flush_smp_call_function_queue() | | | rps_trigger_softirq() | | | | __raise_softirq_irqoff() | | | do_softirq_post_smp_call_flush() That has the undesired side effect of raising a softirq in a function call, leading to the aforementioned splat. The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can provide the .config. The only solution I imagined so far was to modify RPS to process packtes in a kernel thread in RT. But I wonder how would be that be different than processing them in ksoftirqd. Any inputs on the issue? [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 Cheers, Wander ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-03 12:47 Splat in kernel RT while processing incoming network packets Wander Lairson Costa @ 2023-07-03 13:20 ` Wander Lairson Costa 2023-07-03 14:29 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 9+ messages in thread From: Wander Lairson Costa @ 2023-07-03 13:20 UTC (permalink / raw) To: linux-kernel, linux-rt-users, bigeasy, juri.lelli On Mon, Jul 03, 2023 at 09:47:26AM -0300, Wander Lairson Costa wrote: > Dear all, > > I am writing to report a splat issue we encountered while running the > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet > Steering). > > During some testing of the RT kernel version 6.4.0 with Network RPS enabled, > we observed a splat occurring in the SoftIRQ subsystem. The splat message is as > follows: > > [ 37.168920] ------------[ cut here ]------------ > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60 > [ 37.168935] Modules linked in: xt_conntrack(E) ... > [ 37.168976] Unloaded tainted modules: intel_cstate(E):4 intel_uncore(E):3 > [ 37.168994] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E ------- --- 6.4.0-0.rc2.23.test.eln127.x86_64+rt #1 > [ 37.168996] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014 > [ 37.168998] RIP: 0010:do_softirq_post_smp_call_flush+0x2d/0x60 > [ 37.169001] Code: 00 0f 1f 44 00 00 53 89 fb 48 c7 c7 f7 98 be 96 e8 d8 97 d2 00 65 66 8b 05 f8 36 ... > [ 37.169002] RSP: 0018:ffffffff97403eb0 EFLAGS: 00010002 > [ 37.169004] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003 > [ 37.169005] RDX: ffff992db7a34840 RSI: ffffffff96be98f7 RDI: ffffffff96bc23d8 > [ 37.169006] RBP: ffffffff97410000 R08: ffff992db7a34840 R09: ffff992c87f8dbc0 > [ 37.169007] R10: 00000000fffbfc67 R11: 0000000000000018 R12: 0000000000000000 > [ 37.169008] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 37.169011] FS: 0000000000000000(0000) GS:ffff992db7a00000(0000) knlGS:0000000000000000 > [ 37.169013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 37.169014] CR2: 00007f028b8da3f8 CR3: 0000000118f44001 CR4: 0000000000370eb0 > [ 37.169015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 37.169015] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 37.169016] Call Trace: > [ 37.169018] <TASK> > [ 37.169020] flush_smp_call_function_queue+0x78/0x80 > [ 37.169026] do_idle+0xb2/0xd0 > [ 37.169030] cpu_startup_entry+0x1d/0x20 > [ 37.169032] rest_init+0xd1/0xe0 > [ 37.169037] arch_call_rest_init+0xe/0x30 > [ 37.169044] start_kernel+0x342/0x420 > [ 37.169046] x86_64_start_reservations+0x18/0x30 > [ 37.169051] x86_64_start_kernel+0x96/0xa0 > [ 37.169054] secondary_startup_64_no_verify+0x10b/0x10b > [ 37.169059] </TASK> > [ 37.169060] ---[ end trace 0000000000000000 ]--- > > It comes from [1]. > > The issue lies in the mechanism of RPS to defer network packets processing to > other CPUs. It sends an IPI to the to the target CPU. The registered callback > is rps_trigger_softirq, which will raise a softirq, leading to the following > scenario: > > CPU0 CPU1 > | netif_rx() | > | | enqueue_to_backlog(cpu=1) | > | | | net_rps_send_ipi() | > | | flush_smp_call_function_queue() > | | | was_pending = local_softirq_pending() > | | | __flush_smp_call_function_queue() > | | | rps_trigger_softirq() > | | | | __raise_softirq_irqoff() > | | | do_softirq_post_smp_call_flush() > > That has the undesired side effect of raising a softirq in a function call, > leading to the aforementioned splat. > > The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the Correction: kernel-ark [2] > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can > provide the .config. > > The only solution I imagined so far was to modify RPS to process packtes in a > kernel thread in RT. But I wonder how would be that be different than processing > them in ksoftirqd. > > Any inputs on the issue? > > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 > [2] https://gitlab.com/cki-project/kernel-ark > Cheers, > Wander > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-03 12:47 Splat in kernel RT while processing incoming network packets Wander Lairson Costa 2023-07-03 13:20 ` Wander Lairson Costa @ 2023-07-03 14:29 ` Sebastian Andrzej Siewior 2023-07-03 21:15 ` Wander Lairson Costa 1 sibling, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2023-07-03 14:29 UTC (permalink / raw) To: Wander Lairson Costa; +Cc: linux-kernel, linux-rt-users, juri.lelli On 2023-07-03 09:47:26 [-0300], Wander Lairson Costa wrote: > Dear all, Hi, > I am writing to report a splat issue we encountered while running the > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet > Steering). > > During some testing of the RT kernel version 6.4.0 with Network RPS enabled, > we observed a splat occurring in the SoftIRQ subsystem. The splat message is as > follows: > > [ 37.168920] ------------[ cut here ]------------ > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60 … > [ 37.169060] ---[ end trace 0000000000000000 ]--- > > It comes from [1]. > > The issue lies in the mechanism of RPS to defer network packets processing to > other CPUs. It sends an IPI to the to the target CPU. The registered callback > is rps_trigger_softirq, which will raise a softirq, leading to the following > scenario: > > CPU0 CPU1 > | netif_rx() | > | | enqueue_to_backlog(cpu=1) | > | | | net_rps_send_ipi() | > | | flush_smp_call_function_queue() > | | | was_pending = local_softirq_pending() > | | | __flush_smp_call_function_queue() > | | | rps_trigger_softirq() > | | | | __raise_softirq_irqoff() > | | | do_softirq_post_smp_call_flush() > > That has the undesired side effect of raising a softirq in a function call, > leading to the aforementioned splat. correct. > The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can > provide the .config. It is fine, I see it. > The only solution I imagined so far was to modify RPS to process packtes in a > kernel thread in RT. But I wonder how would be that be different than processing > them in ksoftirqd. > > Any inputs on the issue? Not sure how to proceed. One thing you could do is a hack similar like net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and remove the warning because we have now commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") Prior that, raising softirq from hardirq would wake ksoftirqd which in turn would collect all pending softirqs. As a consequence all following softirqs (networking, …) would run as SCHED_OTHER and compete with SCHED_OTHER tasks for resources. Not good because the networking work is no longer processed within the networking interrupt thread. Also not a DDoS kind of situation where one could want to delay processing. With that change, this isn't the case anymore. Only an "unrelated" IRQ thread could pick up the networking work which is less then ideal. That is because the global softirq set is added, ksoftirq is marked for a wakeup and could be delayed because other tasks are busy. Then the disk interrupt (for instance) could pick it up as part of its threaded interrupt. Now that I think about, we could make the backlog pseudo device a thread. NAPI threading enables one thread but here we would need one thread per-CPU. So it would remain kind of special. But we would avoid clobbering the global state and delay everything to ksoftird. Processing it in ksoftirqd might not be ideal from performance point of view. > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 > > Cheers, > Wander Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-03 14:29 ` Sebastian Andrzej Siewior @ 2023-07-03 21:15 ` Wander Lairson Costa 2023-07-04 10:05 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Wander Lairson Costa @ 2023-07-03 21:15 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Paolo Abeni Cc: linux-kernel, linux-rt-users, juri.lelli On Mon, Jul 03, 2023 at 04:29:08PM +0200, Sebastian Andrzej Siewior wrote: > On 2023-07-03 09:47:26 [-0300], Wander Lairson Costa wrote: > > Dear all, > Hi, > > > I am writing to report a splat issue we encountered while running the > > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet > > Steering). > > > > During some testing of the RT kernel version 6.4.0 with Network RPS enabled, > > we observed a splat occurring in the SoftIRQ subsystem. The splat message is as > > follows: > > > > [ 37.168920] ------------[ cut here ]------------ > > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60 > … > > [ 37.169060] ---[ end trace 0000000000000000 ]--- > > > > It comes from [1]. > > > > The issue lies in the mechanism of RPS to defer network packets processing to > > other CPUs. It sends an IPI to the to the target CPU. The registered callback > > is rps_trigger_softirq, which will raise a softirq, leading to the following > > scenario: > > > > CPU0 CPU1 > > | netif_rx() | > > | | enqueue_to_backlog(cpu=1) | > > | | | net_rps_send_ipi() | > > | | flush_smp_call_function_queue() > > | | | was_pending = local_softirq_pending() > > | | | __flush_smp_call_function_queue() > > | | | rps_trigger_softirq() > > | | | | __raise_softirq_irqoff() > > | | | do_softirq_post_smp_call_flush() > > > > That has the undesired side effect of raising a softirq in a function call, > > leading to the aforementioned splat. > > correct. > > > The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the > > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can > > provide the .config. > > It is fine, I see it. > > > The only solution I imagined so far was to modify RPS to process packtes in a > > kernel thread in RT. But I wonder how would be that be different than processing > > them in ksoftirqd. > > > > Any inputs on the issue? > > Not sure how to proceed. One thing you could do is a hack similar like > net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. At first sight it seems straightforward to implement. > On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and > remove the warning because we have now commit > d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") But I am more in favor of a solution that removes code than one that adds more :) > > Prior that, raising softirq from hardirq would wake ksoftirqd which in > turn would collect all pending softirqs. As a consequence all following > softirqs (networking, …) would run as SCHED_OTHER and compete with > SCHED_OTHER tasks for resources. Not good because the networking work is > no longer processed within the networking interrupt thread. Also not a > DDoS kind of situation where one could want to delay processing. > > With that change, this isn't the case anymore. Only an "unrelated" IRQ > thread could pick up the networking work which is less then ideal. That > is because the global softirq set is added, ksoftirq is marked for a > wakeup and could be delayed because other tasks are busy. Then the disk > interrupt (for instance) could pick it up as part of its threaded > interrupt. > > Now that I think about, we could make the backlog pseudo device a > thread. NAPI threading enables one thread but here we would need one > thread per-CPU. So it would remain kind of special. But we would avoid > clobbering the global state and delay everything to ksoftird. Processing > it in ksoftirqd might not be ideal from performance point of view. Before sending this to the ML, I talked to Paolo about using NAPI thread. He explained that it is implemented per interface. For example, for this specific case, it happened on the loopback interface, which doesn't implement NAPI. I am cc'ing him, so the can correct me if I am saying something wrong. > > > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 > > > > Cheers, > > Wander > > Sebastian > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-03 21:15 ` Wander Lairson Costa @ 2023-07-04 10:05 ` Sebastian Andrzej Siewior 2023-07-04 10:29 ` Paolo Abeni 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2023-07-04 10:05 UTC (permalink / raw) To: Wander Lairson Costa Cc: Paolo Abeni, linux-kernel, linux-rt-users, juri.lelli On 2023-07-03 18:15:58 [-0300], Wander Lairson Costa wrote: > > Not sure how to proceed. One thing you could do is a hack similar like > > net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. > > At first sight it seems straightforward to implement. > > > On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and > > remove the warning because we have now commit > > d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") > > But I am more in favor of a solution that removes code than one that > adds more :) Raising the softirq from anonymous (hardirq context) is not ideal for the reasons I stated below. > > Prior that, raising softirq from hardirq would wake ksoftirqd which in > > turn would collect all pending softirqs. As a consequence all following > > softirqs (networking, …) would run as SCHED_OTHER and compete with > > SCHED_OTHER tasks for resources. Not good because the networking work is > > no longer processed within the networking interrupt thread. Also not a > > DDoS kind of situation where one could want to delay processing. > > > > With that change, this isn't the case anymore. Only an "unrelated" IRQ > > thread could pick up the networking work which is less then ideal. That > > is because the global softirq set is added, ksoftirq is marked for a > > wakeup and could be delayed because other tasks are busy. Then the disk > > interrupt (for instance) could pick it up as part of its threaded > > interrupt. > > > > Now that I think about, we could make the backlog pseudo device a > > thread. NAPI threading enables one thread but here we would need one > > thread per-CPU. So it would remain kind of special. But we would avoid > > clobbering the global state and delay everything to ksoftird. Processing > > it in ksoftirqd might not be ideal from performance point of view. > > Before sending this to the ML, I talked to Paolo about using NAPI > thread. He explained that it is implemented per interface. For example, > for this specific case, it happened on the loopback interface, which > doesn't implement NAPI. I am cc'ing him, so the can correct me if I am > saying something wrong. It is per NAPI-queue/instance and you could have multiple instances per interface. However loopback has one and you need per-CPU threads if you want to RPS your skbs to any CPU. We could just remove the warning but then your RPS processes the skbs in SCHED_OTHER. This might not be what you want. Maybe Paolo has a better idea. > > > Cheers, > > > Wander Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-04 10:05 ` Sebastian Andrzej Siewior @ 2023-07-04 10:29 ` Paolo Abeni 2023-07-04 14:47 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Paolo Abeni @ 2023-07-04 10:29 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Wander Lairson Costa Cc: linux-kernel, linux-rt-users, juri.lelli On Tue, 2023-07-04 at 12:05 +0200, Sebastian Andrzej Siewior wrote: > On 2023-07-03 18:15:58 [-0300], Wander Lairson Costa wrote: > > > Not sure how to proceed. One thing you could do is a hack similar like > > > net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. > > > > At first sight it seems straightforward to implement. > > > > > On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and > > > remove the warning because we have now commit > > > d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") > > > > But I am more in favor of a solution that removes code than one that > > adds more :) > > Raising the softirq from anonymous (hardirq context) is not ideal for > the reasons I stated below. > > > > Prior that, raising softirq from hardirq would wake ksoftirqd which in > > > turn would collect all pending softirqs. As a consequence all following > > > softirqs (networking, …) would run as SCHED_OTHER and compete with > > > SCHED_OTHER tasks for resources. Not good because the networking work is > > > no longer processed within the networking interrupt thread. Also not a > > > DDoS kind of situation where one could want to delay processing. > > > > > > With that change, this isn't the case anymore. Only an "unrelated" IRQ > > > thread could pick up the networking work which is less then ideal. That > > > is because the global softirq set is added, ksoftirq is marked for a > > > wakeup and could be delayed because other tasks are busy. Then the disk > > > interrupt (for instance) could pick it up as part of its threaded > > > interrupt. > > > > > > Now that I think about, we could make the backlog pseudo device a > > > thread. NAPI threading enables one thread but here we would need one > > > thread per-CPU. So it would remain kind of special. But we would avoid > > > clobbering the global state and delay everything to ksoftird. Processing > > > it in ksoftirqd might not be ideal from performance point of view. > > > > Before sending this to the ML, I talked to Paolo about using NAPI > > thread. He explained that it is implemented per interface. For example, > > for this specific case, it happened on the loopback interface, which > > doesn't implement NAPI. I am cc'ing him, so the can correct me if I am > > saying something wrong. > > It is per NAPI-queue/instance and you could have multiple instances per > interface. However loopback has one and you need per-CPU threads if you > want to RPS your skbs to any CPU. Just to hopefully clarify the networking side of it, napi instances != network backlog (used by RPS). The network backlog (RPS) is available for all the network devices, including the loopback and all the virtual ones. The napi instances (and the threaded mode) are available only on network device drivers implementing the napi model. The loopback driver does not implement the napi model, as most virtual devices and even some H/W NICs (mostily low end ones). The network backlog can't run in threaded mode: there is no API/sysctl nor infrastructure for that. The backlog processing threaded mode could be implemented, even if should not be completely trivial and it sounds a bit weird to me. Just for the records, I mentioned the following in the bz: It looks like flush_smp_call_function_queue() has 2 only callers, migration, and do_idle(). What about moving softirq processing from flush_smp_call_function_queue() into cpu_stopper_thread(), outside the unpreemptable critical section? I *think*/wild guess the call from do_idle() could be just removed (at least for RT build), as according to: commit b2a02fc43a1f40ef4eb2fb2b06357382608d4d84 Author: Peter Zijlstra <peterz@infradead.org> Date: Tue May 26 18:11:01 2020 +0200 smp: Optimize send_call_function_single_ipi() is just an optimization. Cheers, Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-04 10:29 ` Paolo Abeni @ 2023-07-04 14:47 ` Sebastian Andrzej Siewior 2023-07-05 15:59 ` Wander Lairson Costa 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2023-07-04 14:47 UTC (permalink / raw) To: Paolo Abeni Cc: Wander Lairson Costa, linux-kernel, linux-rt-users, juri.lelli On 2023-07-04 12:29:33 [+0200], Paolo Abeni wrote: > Just to hopefully clarify the networking side of it, napi instances != > network backlog (used by RPS). The network backlog (RPS) is available > for all the network devices, including the loopback and all the virtual > ones. Yes. > The napi instances (and the threaded mode) are available only on > network device drivers implementing the napi model. The loopback driver > does not implement the napi model, as most virtual devices and even > some H/W NICs (mostily low end ones). Yes. > The network backlog can't run in threaded mode: there is no API/sysctl > nor infrastructure for that. The backlog processing threaded mode could > be implemented, even if should not be completely trivial and it sounds > a bit weird to me. Yes, I mean that this needs to be done. > > Just for the records, I mentioned the following in the bz: > > It looks like flush_smp_call_function_queue() has 2 only callers, > migration, and do_idle(). > > What about moving softirq processing from > flush_smp_call_function_queue() into cpu_stopper_thread(), outside the > unpreemptable critical section? This doesn't solve anything. You schedule softirq from hardirq and from this moment on you are in "anonymous context" and we solve this by processing it in ksoftirqd. For !RT you process it while leaving the hardirq. For RT, we can't. Processing it in the context of the currently running process (say idle as in the reported backtrace or an another running user task) would lead to processing network related that originated somewhere at someone else's expense. Assume you have a high prio RT task running, not related to networking at all, and suddenly you throw a bunch of skbs on it. Therefore it is preferred to process them within the interrupt thread in which the softirq was raised/ within its origin. The other problem with ksoftirqd processing is that everything is added to a global state and then left for ksoftirqd to process. The global state is considered by every local_bh_enable() instance so random interrupt thread could process it or even a random task doing a syscall involving spin_lock_bh(). The NAPI-threads are nice in a way that they don't clobber the global state. For RPS we would need either per-CPU threads or serve this in ksoftirqd/X. The additional thread per-CPU makes only sense if it runs at higher priority. However without the priority it would be no different to ksoftirqd unless it does only the backlog's work. puh. I'm undecided here. We might want to throw it into ksoftirqd, remove the warning. But then this will be processed with other softirqs (like USB due to tasklet) and at some point and might be picked up by another interrupt thread. > Cheers, > > Paolo Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-04 14:47 ` Sebastian Andrzej Siewior @ 2023-07-05 15:59 ` Wander Lairson Costa 2023-08-09 10:56 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Wander Lairson Costa @ 2023-07-05 15:59 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Paolo Abeni, linux-kernel, linux-rt-users, juri.lelli On Tue, Jul 04, 2023 at 04:47:49PM +0200, Sebastian Andrzej Siewior wrote: > On 2023-07-04 12:29:33 [+0200], Paolo Abeni wrote: > > Just to hopefully clarify the networking side of it, napi instances != > > network backlog (used by RPS). The network backlog (RPS) is available > > for all the network devices, including the loopback and all the virtual > > ones. > > Yes. > > > The napi instances (and the threaded mode) are available only on > > network device drivers implementing the napi model. The loopback driver > > does not implement the napi model, as most virtual devices and even > > some H/W NICs (mostily low end ones). > > Yes. > > > The network backlog can't run in threaded mode: there is no API/sysctl > > nor infrastructure for that. The backlog processing threaded mode could > > be implemented, even if should not be completely trivial and it sounds > > a bit weird to me. > > Yes, I mean that this needs to be done. > > > > > Just for the records, I mentioned the following in the bz: > > > > It looks like flush_smp_call_function_queue() has 2 only callers, > > migration, and do_idle(). > > > > What about moving softirq processing from > > flush_smp_call_function_queue() into cpu_stopper_thread(), outside the > > unpreemptable critical section? > > This doesn't solve anything. You schedule softirq from hardirq and from > this moment on you are in "anonymous context" and we solve this by > processing it in ksoftirqd. > For !RT you process it while leaving the hardirq. For RT, we can't. > Processing it in the context of the currently running process (say idle > as in the reported backtrace or an another running user task) would lead > to processing network related that originated somewhere at someone > else's expense. Assume you have a high prio RT task running, not related > to networking at all, and suddenly you throw a bunch of skbs on it. > > Therefore it is preferred to process them within the interrupt thread in > which the softirq was raised/ within its origin. > > The other problem with ksoftirqd processing is that everything is added > to a global state and then left for ksoftirqd to process. The global > state is considered by every local_bh_enable() instance so random > interrupt thread could process it or even a random task doing a syscall > involving spin_lock_bh(). > > The NAPI-threads are nice in a way that they don't clobber the global > state. > For RPS we would need either per-CPU threads or serve this in > ksoftirqd/X. The additional thread per-CPU makes only sense if it runs > at higher priority. However without the priority it would be no > different to ksoftirqd unless it does only the backlog's work. > > puh. I'm undecided here. We might want to throw it into ksoftirqd, > remove the warning. But then this will be processed with other softirqs > (like USB due to tasklet) and at some point and might be picked up by > another interrupt thread. > Maybe, under RT, some softirq should run in the context of the "target" process. For NET_RX, for example, the softirq's would run in the context of the packet recipient process. Each task_struct would have a list of pending softirq, which would be checked in a few points, like on scheduling, when the process enters in the kernel, softirq raise, etc. The default target process would be ksoftirqd. Does this idea make sense? > > Cheers, > > > > Paolo > > Sebastian > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Splat in kernel RT while processing incoming network packets 2023-07-05 15:59 ` Wander Lairson Costa @ 2023-08-09 10:56 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2023-08-09 10:56 UTC (permalink / raw) To: Wander Lairson Costa Cc: Paolo Abeni, linux-kernel, linux-rt-users, juri.lelli On 2023-07-05 12:59:28 [-0300], Wander Lairson Costa wrote: > Maybe, under RT, some softirq should run in the context of the "target" > process. For NET_RX, for example, the softirq's would run in the context > of the packet recipient process. Each task_struct would have a list of > pending softirq, which would be checked in a few points, like on scheduling, > when the process enters in the kernel, softirq raise, etc. The default > target process would be ksoftirqd. Does this idea make sense? We had something similar. The softirq runs in the context of the task that raised it. So the networking driver raised NET_RX and it was processed in its context (and still is). The only difference now is that we no longer have a task based "raised bit" but a per-CPU. For RPS you already pulled the skb from the NIC, you need to process it and this isn't handled in the task's context but on a specific CPU. Let me look at per-CPU backlog thread or ripping the warning out… Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-09 10:56 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-03 12:47 Splat in kernel RT while processing incoming network packets Wander Lairson Costa 2023-07-03 13:20 ` Wander Lairson Costa 2023-07-03 14:29 ` Sebastian Andrzej Siewior 2023-07-03 21:15 ` Wander Lairson Costa 2023-07-04 10:05 ` Sebastian Andrzej Siewior 2023-07-04 10:29 ` Paolo Abeni 2023-07-04 14:47 ` Sebastian Andrzej Siewior 2023-07-05 15:59 ` Wander Lairson Costa 2023-08-09 10:56 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).