* skb allocation from interrupt handler? @ 2017-08-08 22:17 Murali Karicheri 2017-08-08 22:29 ` Matteo Croce 2017-08-08 23:00 ` David Miller 0 siblings, 2 replies; 7+ messages in thread From: Murali Karicheri @ 2017-08-08 22:17 UTC (permalink / raw) To: open list:TI NETCP ETHERNET DRIVER Is there an skb_alloc function that can be used from interrupt handler? Looks like netdev_alloc_skb() can't be used since I see following trace with kernel hack debug options enabled. [ 652.481713] [<c021007c>] (unwind_backtrace) from [<c020bdcc>] (show_stack+0x10/0x14) [ 652.481725] [<c020bdcc>] (show_stack) from [<c0517780>] (dump_stack+0x98/0xc4) [ 652.481736] [<c0517780>] (dump_stack) from [<c0256a70>] (___might_sleep+0x1b8/0x2a4) [ 652.481746] [<c0256a70>] (___might_sleep) from [<c0939e80>] (rt_spin_lock+0x24/0x5c) [ 652.481755] [<c0939e80>] (rt_spin_lock) from [<c07d827c>] (__netdev_alloc_skb+0xd0/0x254) [ 652.481774] [<c07d827c>] (__netdev_alloc_skb) from [<bf23a544>] (emac_rx_hardirq+0x374/0x554 [prueth]) [ 652.481793] [<bf23a544>] (emac_rx_hardirq [prueth]) from [<c02925dc>] (__handle_irq_event_percpu+0x9c/0x128) This is running under RT kernel off 4.9.y -- Murali Karicheri Linux Kernel, Keystone ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-08 22:17 skb allocation from interrupt handler? Murali Karicheri @ 2017-08-08 22:29 ` Matteo Croce 2017-08-09 16:04 ` Murali Karicheri 2017-08-08 23:00 ` David Miller 1 sibling, 1 reply; 7+ messages in thread From: Matteo Croce @ 2017-08-08 22:29 UTC (permalink / raw) To: Murali Karicheri, open list:TI NETCP ETHERNET DRIVER Il giorno mar, 08/08/2017 alle 18.17 -0400, Murali Karicheri ha scritto: > Is there an skb_alloc function that can be used from interrupt > handler? Looks like netdev_alloc_skb() > can't be used since I see following trace with kernel hack debug > options enabled. > > [ 652.481713] [<c021007c>] (unwind_backtrace) from [<c020bdcc>] > (show_stack+0x10/0x14) > [ 652.481725] [<c020bdcc>] (show_stack) from [<c0517780>] > (dump_stack+0x98/0xc4) > [ 652.481736] [<c0517780>] (dump_stack) from [<c0256a70>] > (___might_sleep+0x1b8/0x2a4) > [ 652.481746] [<c0256a70>] (___might_sleep) from [<c0939e80>] > (rt_spin_lock+0x24/0x5c) > [ 652.481755] [<c0939e80>] (rt_spin_lock) from [<c07d827c>] > (__netdev_alloc_skb+0xd0/0x254) > [ 652.481774] [<c07d827c>] (__netdev_alloc_skb) from [<bf23a544>] > (emac_rx_hardirq+0x374/0x554 [prueth]) > [ 652.481793] [<bf23a544>] (emac_rx_hardirq [prueth]) from > [<c02925dc>] (__handle_irq_event_percpu+0x9c/0x128) > > This is running under RT kernel off 4.9.y > netdev_alloc_skb() passes GFP_ATOMIC to alloc_skb() so it should work in an interrupt handler too. -- Matteo Croce per aspera ad upstream ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-08 22:29 ` Matteo Croce @ 2017-08-09 16:04 ` Murali Karicheri 0 siblings, 0 replies; 7+ messages in thread From: Murali Karicheri @ 2017-08-09 16:04 UTC (permalink / raw) To: Matteo Croce, open list:TI NETCP ETHERNET DRIVER On 08/08/2017 06:29 PM, Matteo Croce wrote: > netdev_alloc_skb() passes GFP_ATOMIC to alloc_skb() so it should work > in an interrupt handler too. I will provide more background on my work as response to your next response. This is running RT Linux kernel. I have CONFIG_IRQ_FORCED_THREADING enabled. So my understanding is that the irq handler will be executed as part of a kernel thread. So it is not actually from a hard irq context. Correct? So what why does the following trace complains? root@am57xx-evm:~# [ 108.745031] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987 [ 108.745035] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 [ 108.745038] no locks held by swapper/0/0. [ 108.745040] irq event stamp: 292222 [ 108.745054] hardirqs last enabled at (292221): [<c0208eb0>] arch_cpu_idle+0x20/0x3c [ 108.745060] hardirqs last disabled at (292222): [<c020c8ac>] __irq_svc+0x4c/0xa8 [ 108.745063] softirqs last enabled at (0): [< (null)>] (null) [ 108.745066] softirqs last disabled at (0): [< (null)>] (null) [ 108.745076] Preemption disabled at: [ 108.745077] [<c0936f24>] schedule_preempt_disabled+0x1c/0x20 [ 108.745084] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.38-rt23-02686-gc7fcc4e7-dirty #4 [ 108.745087] Hardware name: Generic DRA74X (Flattened Device Tree) [ 108.745100] [<c021007c>] (unwind_backtrace) from [<c020bdcc>] (show_stack+0x10/0x14) [ 108.745111] [<c020bdcc>] (show_stack) from [<c0517780>] (dump_stack+0x98/0xc4) [ 108.745122] [<c0517780>] (dump_stack) from [<c0256a70>] (___might_sleep+0x1b8/0x2a4) [ 108.745133] [<c0256a70>] (___might_sleep) from [<c0939e80>] (rt_spin_lock+0x24/0x5c) [ 108.745143] [<c0939e80>] (rt_spin_lock) from [<c07d827c>] (__netdev_alloc_skb+0xd0/0x254) [ 108.745166] [<c07d827c>] (__netdev_alloc_skb) from [<bf23a544>] (emac_rx_hardirq+0x374/0x554 [prueth]) [ 108.745212] [<bf23a544>] (emac_rx_hardirq [prueth]) from [<c02925dc>] (__handle_irq_event_percpu+0x9c/0x128) [ 108.745221] [<c02925dc>] (__handle_irq_event_percpu) from [<c02926b0>] (handle_irq_event_percpu+0x48/0x84) [ 108.745229] [<c02926b0>] (handle_irq_event_percpu) from [<c0292724>] (handle_irq_event+0x38/0x5c) [ 108.745238] [<c0292724>] (handle_irq_event) from [<c0295b1c>] (handle_level_irq+0xc4/0x16c) [ 108.745246] [<c0295b1c>] (handle_level_irq) from [<c0291880>] (generic_handle_irq+0x24/0x34) [ 108.745257] [<c0291880>] (generic_handle_irq) from [<bf214244>] (pruss_intc_irq_handler+0xdc/0x130 [pruss_intc]) [ 108.745270] [<bf214244>] (pruss_intc_irq_handler [pruss_intc]) from [<c0291880>] (generic_handle_irq+0x24/0x34) [ 108.745277] [<c0291880>] (generic_handle_irq) from [<c0291de0>] (__handle_domain_irq+0x7c/0xec) [ 108.745284] [<c0291de0>] (__handle_domain_irq) from [<c020154c>] (gic_handle_irq+0x48/0x8c) [ 108.745290] [<c020154c>] (gic_handle_irq) from [<c020c8bc>] (__irq_svc+0x5c/0xa8) Here is the code snippet. This is part of an experiment, that I will explain in the next message where you have talked about NAPI. skb = netdev_alloc_skb(ndev, pkt_info.length + 2); if (!skb) { if (netif_msg_rx_err(emac) && net_ratelimit()) netdev_err(ndev, "failed rx buffer alloc\n"); return -ENOMEM; } <====== Code to get the packet from the firmware FIFO =====> /* send packet up the stack */ skb_put(skb, pkt_info.length); skb->protocol = eth_type_trans(skb, ndev); netif_rx(skb); /* update stats */ ndev->stats.rx_bytes += pkt_info.length; ndev->stats.rx_packets++; Also want to know if there is a real SKB alloc function that can be used from hard irq context. -- Murali Karicheri Linux Kernel, Keystone ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-08 22:17 skb allocation from interrupt handler? Murali Karicheri 2017-08-08 22:29 ` Matteo Croce @ 2017-08-08 23:00 ` David Miller 2017-08-09 16:36 ` Murali Karicheri 1 sibling, 1 reply; 7+ messages in thread From: David Miller @ 2017-08-08 23:00 UTC (permalink / raw) To: m-karicheri2; +Cc: netdev From: Murali Karicheri <m-karicheri2@ti.com> Date: Tue, 8 Aug 2017 18:17:52 -0400 > Is there an skb_alloc function that can be used from interrupt handler? Looks like netdev_alloc_skb() > can't be used since I see following trace with kernel hack debug options enabled. > > [ 652.481713] [<c021007c>] (unwind_backtrace) from [<c020bdcc>] (show_stack+0x10/0x14) > [ 652.481725] [<c020bdcc>] (show_stack) from [<c0517780>] (dump_stack+0x98/0xc4) > [ 652.481736] [<c0517780>] (dump_stack) from [<c0256a70>] (___might_sleep+0x1b8/0x2a4) > [ 652.481746] [<c0256a70>] (___might_sleep) from [<c0939e80>] (rt_spin_lock+0x24/0x5c) > [ 652.481755] [<c0939e80>] (rt_spin_lock) from [<c07d827c>] (__netdev_alloc_skb+0xd0/0x254) > [ 652.481774] [<c07d827c>] (__netdev_alloc_skb) from [<bf23a544>] (emac_rx_hardirq+0x374/0x554 [prueth]) > [ 652.481793] [<bf23a544>] (emac_rx_hardirq [prueth]) from [<c02925dc>] (__handle_irq_event_percpu+0x9c/0x128) > > This is running under RT kernel off 4.9.y Your receive handler should be running from a NAPI poll, which is in software interrupt. You should not be doing packet processing in hardware interrupt context as hardware interrupts should be as short as possible, and with NAPI polling packet input processing can be properly distributed amongst several devices, and if the system is overloaded such processing can be deferred to a kernel thread. NAPI polling has a large number of other advantages as well, more streamlined GRO support, automatic support for busypolling... the list goes on and on and on. I could show you how to do an SKB allocation in a hardware interrupt, but instead I'd rather teach you how to fish properly, and encourage you to convert your driver to NAPI polling instead. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-08 23:00 ` David Miller @ 2017-08-09 16:36 ` Murali Karicheri 2017-08-09 22:29 ` Francois Romieu 0 siblings, 1 reply; 7+ messages in thread From: Murali Karicheri @ 2017-08-09 16:36 UTC (permalink / raw) To: David Miller; +Cc: netdev Hi David, On 08/08/2017 07:00 PM, David Miller wrote: > From: Murali Karicheri <m-karicheri2@ti.com> > Date: Tue, 8 Aug 2017 18:17:52 -0400 > >> Is there an skb_alloc function that can be used from interrupt handler? Looks like netdev_alloc_skb() >> can't be used since I see following trace with kernel hack debug options enabled. >> >> [ 652.481713] [<c021007c>] (unwind_backtrace) from [<c020bdcc>] (show_stack+0x10/0x14) >> [ 652.481725] [<c020bdcc>] (show_stack) from [<c0517780>] (dump_stack+0x98/0xc4) >> [ 652.481736] [<c0517780>] (dump_stack) from [<c0256a70>] (___might_sleep+0x1b8/0x2a4) >> [ 652.481746] [<c0256a70>] (___might_sleep) from [<c0939e80>] (rt_spin_lock+0x24/0x5c) >> [ 652.481755] [<c0939e80>] (rt_spin_lock) from [<c07d827c>] (__netdev_alloc_skb+0xd0/0x254) >> [ 652.481774] [<c07d827c>] (__netdev_alloc_skb) from [<bf23a544>] (emac_rx_hardirq+0x374/0x554 [prueth]) >> [ 652.481793] [<bf23a544>] (emac_rx_hardirq [prueth]) from [<c02925dc>] (__handle_irq_event_percpu+0x9c/0x128) >> >> This is running under RT kernel off 4.9.y > > Your receive handler should be running from a NAPI poll, which is in > software interrupt. You should not be doing packet processing in > hardware interrupt context as hardware interrupts should be as short > as possible, and with NAPI polling packet input processing can be > properly distributed amongst several devices, and if the system is > overloaded such processing can be deferred to a kernel thread. > Thanks for responding! I appreciate your feedback. Our NetCP and CPSW device drivers do use NAPI poll to process receive packets. However these hardwares have capability to use ring buffers or descriptors setup in DDR to enqueue the received packets to the CPU. However the specific hardware (in fact a firmware running in the ICSS PRU that is available on our industrial IDK SoCs) have limited internal memory that is shared between the ARM and PRU to enqueue the received packets to the CPU for processing. This is using a 100Mbps Ethernet link. As per NAPI documentation, at https://wiki.linuxfoundation.org/networking/napi two of the conditions mentioned there for using NAPI are ====== Quote from the above link ================================================ DMA ring or enough RAM to store packets in software devices. Ability to turn off interrupts or maybe events that send packets up the stack. ================================================================================== The internal memory or FIFO can store only up to 3 MTU sized packets. So that has to be processed before PRU gets another packets to send to CPU. So per above, it is not ideal to run NAPI for this scenario, right? Also for NetCP we use about 128 descriptors with MTU size buffers to handle 1Gbps Ethernet link. Based on that roughly we would need at least 10-12 buffers in the FIFO. Currently we have a NAPI implementation in use that gives throughput of 95Mbps for MTU sized packets, but our UDP iperf tests shows less than 1% packet loss for an offered traffic of 95Mbps with MTU sized packets. This is not good for industrial network using HSR/PRP protocol for network redundancy. We need to have zero packet loss for MTU sized packets at 95Mbps throughput. That is the problem description. As an experiment, I have moved the packet processing to irq handler to see if we can take advantage of CPU cycle to processing the packet instead of NAPI and to check if the firmware encounters buffer overflow. The result is positive with no buffer overflow seen at the firmware and no packet loss in the iperf test. But we want to do more testing as an experiment and ran into a uart console locks up after running traffic for about 2 minutes. So I tried enabling the DEBUG HACK options to get some clue on what is happening and ran into the trace I shared earlier. So what function can I use to allocate SKB from interrupt handler? Also wondering what is the best way to implement the packet processing in this case to avoid the packet loss. > NAPI polling has a large number of other advantages as well, more > streamlined GRO support, automatic support for busypolling... the > list goes on and on and on. > > I could show you how to do an SKB allocation in a hardware interrupt, > but instead I'd rather teach you how to fish properly, and encourage > you to convert your driver to NAPI polling instead. > Would love to use NAPI if we can overcome the packet loss in some way. Thanks and regards, Murali > Thanks. > -- Murali Karicheri Linux Kernel, Keystone ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-09 16:36 ` Murali Karicheri @ 2017-08-09 22:29 ` Francois Romieu 2017-08-09 23:31 ` Stephen Hemminger 0 siblings, 1 reply; 7+ messages in thread From: Francois Romieu @ 2017-08-09 22:29 UTC (permalink / raw) To: Murali Karicheri; +Cc: David Miller, netdev Murali Karicheri <m-karicheri2@ti.com> : [...] > The internal memory or FIFO can store only up to 3 MTU sized packets. So that has to > be processed before PRU gets another packets to send to CPU. So per above, > it is not ideal to run NAPI for this scenario, right? Also for NetCP we use > about 128 descriptors with MTU size buffers to handle 1Gbps Ethernet link. > Based on that roughly we would need at least 10-12 buffers in the FIFO. > > Currently we have a NAPI implementation in use that gives throughput of 95Mbps for > MTU sized packets, but our UDP iperf tests shows less than 1% packet loss for an > offered traffic of 95Mbps with MTU sized packets. This is not good for industrial > network using HSR/PRP protocol for network redundancy. We need to have zero packet > loss for MTU sized packets at 95Mbps throughput. That is the problem description. Imvho you should instrument the kernel to figure where the excess latency that prevents NAPI processing to take place within 125 us of physical packet reception comes from. > As an experiment, I have moved the packet processing to irq handler to see if we > can take advantage of CPU cycle to processing the packet instead of NAPI > and to check if the firmware encounters buffer overflow. The result is positive > with no buffer overflow seen at the firmware and no packet loss in the iperf test. > But we want to do more testing as an experiment and ran into a uart console locks > up after running traffic for about 2 minutes. So I tried enabling the DEBUG HACK > options to get some clue on what is happening and ran into the trace I shared > earlier. So what function can I use to allocate SKB from interrupt handler ? Is your design also so tight on memory that you can't even refill your own software skb pool from some non-irq context then only swap buffers in the irq handler ? -- Ueimor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: skb allocation from interrupt handler? 2017-08-09 22:29 ` Francois Romieu @ 2017-08-09 23:31 ` Stephen Hemminger 0 siblings, 0 replies; 7+ messages in thread From: Stephen Hemminger @ 2017-08-09 23:31 UTC (permalink / raw) To: Francois Romieu; +Cc: Murali Karicheri, David Miller, netdev On Thu, 10 Aug 2017 00:29:19 +0200 Francois Romieu <romieu@fr.zoreil.com> wrote: > Murali Karicheri <m-karicheri2@ti.com> : > [...] > > The internal memory or FIFO can store only up to 3 MTU sized packets. So that has to > > be processed before PRU gets another packets to send to CPU. So per above, > > it is not ideal to run NAPI for this scenario, right? Also for NetCP we use > > about 128 descriptors with MTU size buffers to handle 1Gbps Ethernet link. > > Based on that roughly we would need at least 10-12 buffers in the FIFO. > > > > Currently we have a NAPI implementation in use that gives throughput of 95Mbps for > > MTU sized packets, but our UDP iperf tests shows less than 1% packet loss for an > > offered traffic of 95Mbps with MTU sized packets. This is not good for industrial > > network using HSR/PRP protocol for network redundancy. We need to have zero packet > > loss for MTU sized packets at 95Mbps throughput. That is the problem description. > > Imvho you should instrument the kernel to figure where the excess latency that > prevents NAPI processing to take place within 125 us of physical packet reception > comes from. > > > As an experiment, I have moved the packet processing to irq handler to see if we > > can take advantage of CPU cycle to processing the packet instead of NAPI > > and to check if the firmware encounters buffer overflow. The result is positive > > with no buffer overflow seen at the firmware and no packet loss in the iperf test. > > But we want to do more testing as an experiment and ran into a uart console locks > > up after running traffic for about 2 minutes. So I tried enabling the DEBUG HACK > > options to get some clue on what is happening and ran into the trace I shared > > earlier. So what function can I use to allocate SKB from interrupt handler ? > > Is your design also so tight on memory that you can't even refill your own > software skb pool from some non-irq context then only swap buffers in the > irq handler ? > The current best practice in network drivers is to receive into an allocated page, then create skb meta data with build_skb() in the NAPI poll routine. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-09 23:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-08 22:17 skb allocation from interrupt handler? Murali Karicheri 2017-08-08 22:29 ` Matteo Croce 2017-08-09 16:04 ` Murali Karicheri 2017-08-08 23:00 ` David Miller 2017-08-09 16:36 ` Murali Karicheri 2017-08-09 22:29 ` Francois Romieu 2017-08-09 23:31 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).