* new NAPI interface broken @ 2007-09-07 9:37 Jan-Bernd Themann 2007-09-12 12:50 ` David Miller 2007-09-14 22:12 ` David Miller 0 siblings, 2 replies; 19+ messages in thread From: Jan-Bernd Themann @ 2007-09-07 9:37 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, themann, Christoph Raisch Hi Stephen, I saw that you developed most of the new NAPI interface. I already addressed this issue a while ago. Please correct me if I got it wrong. I think there is still a serious problem with the NAPI changes to make NAPI polling independent of struct net_device objects. Its about the question who inserts and removes devices from the poll list. netif_rx_schedule: sets NAPI_STATE_SCHED flag, insert device in poll list. netif_rx_complete: clears NAPI_STATE_SCHED netif_rx_reschedule: sets NAPI_STATE_SCHED, insert device in poll list. net_rx_action: -removes dev from poll list -calls poll function -adds dev to poll list if NAPI_STATE_SCHED still set 1) netif_rx_complete and netif_rx_reschedule don't work together 2) On SMP systems: after netif_rx_complete has been called on CPU1 (+interruts enabled), netif_rx_schedule could be called on CPU2 (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. In that case the device would be added to poll lists of CPU1 and CPU2 as net_rx_action would see NAPI_STATE_SCHED set. This must not happen. It will be caught when netif_rx_complete is called the second time (BUG() called) This would mean we have a problem on all SMP machines right now. If I got all this right then we probably need a further flag to tell net_rx_action whether to poll again or to stop (with the possibility that the device has been scheduled on a different CPU in between). The "old" NAPI interface uses the return value of poll to determine if the device has to be polled again or not. We can either switch back or in case we want to stick to the new return value, we might have to add something similar to the NAPI_STATE_SCHED flag or a new parameter... Regards, Jan-Bernd ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-07 9:37 new NAPI interface broken Jan-Bernd Themann @ 2007-09-12 12:50 ` David Miller 2007-09-12 13:10 ` new NAPI interface broken for POWER architecture? Christoph Raisch 2007-10-16 7:29 ` new NAPI interface broken Benjamin Herrenschmidt 2007-09-14 22:12 ` David Miller 1 sibling, 2 replies; 19+ messages in thread From: David Miller @ 2007-09-12 12:50 UTC (permalink / raw) To: ossthema; +Cc: shemminger, netdev, themann, raisch From: Jan-Bernd Themann <ossthema@de.ibm.com> Date: Fri, 7 Sep 2007 11:37:02 +0200 > 2) On SMP systems: after netif_rx_complete has been called on CPU1 > (+interruts enabled), netif_rx_schedule could be called on CPU2 > (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. > In that case the device would be added to poll lists of CPU1 and CPU2 > as net_rx_action would see NAPI_STATE_SCHED set. > This must not happen. It will be caught when netif_rx_complete is > called the second time (BUG() called) > > This would mean we have a problem on all SMP machines right now. This is not a correct statement. Only on your platform do network device interrupts get moved around, no other platform does this. Sparc64 doesn't, all interrupts stay in one location after the cpu is initially choosen. x86 and x86_64 specifically do not move around network device interrupts, even though other device types do get dynamic IRQ cpu distribution. That's why you are the only person seeing this problem. I agree that it should be fixed, but we should also fix the IRQ distribution scheme used on powerpc platforms which is totally broken in these cases. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken for POWER architecture? 2007-09-12 12:50 ` David Miller @ 2007-09-12 13:10 ` Christoph Raisch 2007-09-12 13:27 ` David Miller 2007-09-12 15:18 ` Arnd Bergmann 2007-10-16 7:29 ` new NAPI interface broken Benjamin Herrenschmidt 1 sibling, 2 replies; 19+ messages in thread From: Christoph Raisch @ 2007-09-12 13:10 UTC (permalink / raw) To: David Miller Cc: Jan-Bernd Themann, netdev, ossthema, shemminger, Arnd Bergmann, Paul Mackerras, Michael Ellerman, linuxppc-dev David Miller <davem@davemloft.net> wrote on 12.09.2007 14:50:04: > From: Jan-Bernd Themann <ossthema@de.ibm.com> > Date: Fri, 7 Sep 2007 11:37:02 +0200 > > > 2) On SMP systems: after netif_rx_complete has been called on CPU1 > > (+interruts enabled), netif_rx_schedule could be called on CPU2 > > (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. > > In that case the device would be added to poll lists of CPU1 and CPU2 > > as net_rx_action would see NAPI_STATE_SCHED set. > > This must not happen. It will be caught when netif_rx_complete is > > called the second time (BUG() called) > > > > This would mean we have a problem on all SMP machines right now. > > This is not a correct statement. > > Only on your platform do network device interrupts get moved > around, no other platform does this. > > Sparc64 doesn't, all interrupts stay in one location after > the cpu is initially choosen. > > x86 and x86_64 specifically do not move around network > device interrupts, even though other device types do > get dynamic IRQ cpu distribution. > > That's why you are the only person seeing this problem. > > I agree that it should be fixed, but we should also fix the IRQ > distribution scheme used on powerpc platforms which is totally > broken in these cases. This is definitely not something we can change in the HEA device driver alone. It could also affect any other networking cards on POWER (e1000,s2io...). Paul, Michael, Arndt, what is your opinion here? Gruss / Regards Christoph Raisch ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken for POWER architecture? 2007-09-12 13:10 ` new NAPI interface broken for POWER architecture? Christoph Raisch @ 2007-09-12 13:27 ` David Miller 2007-09-12 15:18 ` Arnd Bergmann 1 sibling, 0 replies; 19+ messages in thread From: David Miller @ 2007-09-12 13:27 UTC (permalink / raw) To: RAISCH Cc: THEMANN, netdev, ossthema, shemminger, ARNDB, pmac, ellerman, linuxppc-dev From: Christoph Raisch <RAISCH@de.ibm.com> Date: Wed, 12 Sep 2007 15:10:08 +0200 > This is definitely not something we can change in the HEA device driver > alone. And it shouldn't be, x86 implements the policy in irq balance daemon, powerpc should do it wherever it would be appropriate there. > Paul, Michael, Arndt, what is your opinion here? I'm all ears too :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken for POWER architecture? 2007-09-12 13:10 ` new NAPI interface broken for POWER architecture? Christoph Raisch 2007-09-12 13:27 ` David Miller @ 2007-09-12 15:18 ` Arnd Bergmann 1 sibling, 0 replies; 19+ messages in thread From: Arnd Bergmann @ 2007-09-12 15:18 UTC (permalink / raw) To: Christoph Raisch Cc: David Miller, Jan-Bernd Themann, netdev, ossthema, shemminger, Paul Mackerras, Michael Ellerman, linuxppc-dev On Wednesday 12 September 2007, Christoph Raisch wrote: > David Miller <davem@davemloft.net> wrote on 12.09.2007 14:50:04: > > > I agree that it should be fixed, but we should also fix the IRQ > > distribution scheme used on powerpc platforms which is totally > > broken in these cases. > > This is definitely not something we can change in the HEA device driver > alone. > It could also affect any other networking cards on POWER (e1000,s2io...). > > Paul, Michael, Arndt, what is your opinion here? The situation on Cell with the existing south bridge chips is that interrupts _never_ get moved around, but are routed to specific SMT threads by the firmware, while Linux does not interfere with this. We have been thinking about changing this so we can distribute interrupts over all SMT threads in a given NUMA node, or even over all logical CPUs in the system by reprogramming the interrupt controller after each interrupt, but the current Axon bridge chip will always have all devices routed to the same target CPU, so it's unclear whether that is even an advantage. Arnd <>< ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-12 12:50 ` David Miller 2007-09-12 13:10 ` new NAPI interface broken for POWER architecture? Christoph Raisch @ 2007-10-16 7:29 ` Benjamin Herrenschmidt 2007-10-16 7:42 ` Benjamin Herrenschmidt 2007-10-16 7:44 ` David Miller 1 sibling, 2 replies; 19+ messages in thread From: Benjamin Herrenschmidt @ 2007-10-16 7:29 UTC (permalink / raw) To: David Miller; +Cc: ossthema, shemminger, netdev, themann, raisch Jumping on an old train ... On Wed, 2007-09-12 at 05:50 -0700, David Miller wrote: > > > This would mean we have a problem on all SMP machines right now. > > This is not a correct statement. > > Only on your platform do network device interrupts get moved > around, no other platform does this. > > Sparc64 doesn't, all interrupts stay in one location after > the cpu is initially choosen. > > x86 and x86_64 specifically do not move around network > device interrupts, even though other device types do > get dynamic IRQ cpu distribution. > > That's why you are the only person seeing this problem. > > I agree that it should be fixed, but we should also fix the IRQ > distribution scheme used on powerpc platforms which is totally > broken in these cases. So the powerpc platform just honors the affinity mask, and depending on the PIC does things that range from nothing to spreading interrupts to CPUs in the affinity mask. All interrupts by defaults are spread to all CPUs (full balancing). At this stage, it's afaik userland business to enforce different policies by changing the affinities via /proc/irq/*. Do you have any pointer to how that is done on x86 or sparc64 ? On my x86 laptop using ubuntu gutsy, I definitely see the IRQ on which the network card is connected (e1000) happily spread between the 2 cores just like powerpc would do. Cheers, Ben. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 7:29 ` new NAPI interface broken Benjamin Herrenschmidt @ 2007-10-16 7:42 ` Benjamin Herrenschmidt 2007-10-16 7:44 ` David Miller 1 sibling, 0 replies; 19+ messages in thread From: Benjamin Herrenschmidt @ 2007-10-16 7:42 UTC (permalink / raw) To: David Miller; +Cc: ossthema, shemminger, netdev, themann, raisch > So the powerpc platform just honors the affinity mask, and depending on > the PIC does things that range from nothing to spreading interrupts to > CPUs in the affinity mask. > > All interrupts by defaults are spread to all CPUs (full balancing). > > At this stage, it's afaik userland business to enforce different > policies by changing the affinities via /proc/irq/*. > > Do you have any pointer to how that is done on x86 or sparc64 ? On my > x86 laptop using ubuntu gutsy, I definitely see the IRQ on which the > network card is connected (e1000) happily spread between the 2 cores > just like powerpc would do. More specifically, IRQF_NOBALANCING doesn't seem to be set anywhere except a few arch specific timer interrupts etc... nowhere I can see in network drivers or the network stack (the stack wouldn't know what IRQ anyway since not all drivers set netdev->irq). We currently don't have a balance kthread like x86 has, though I wonder if we should move this one out of x86 and make it generic (hell, it's even hidden in the IO_APIC code :-) But at this stage, HW balancing by the PIC is the norm and seems to be happening. Ben. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 7:29 ` new NAPI interface broken Benjamin Herrenschmidt 2007-10-16 7:42 ` Benjamin Herrenschmidt @ 2007-10-16 7:44 ` David Miller 2007-10-16 8:28 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 19+ messages in thread From: David Miller @ 2007-10-16 7:44 UTC (permalink / raw) To: benh; +Cc: ossthema, shemminger, netdev, themann, raisch From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Tue, 16 Oct 2007 17:29:47 +1000 > Do you have any pointer to how that is done on x86 or sparc64 ? Sparc64 does it statically in the kernel. For x86, see http://irqbalance.org/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 7:44 ` David Miller @ 2007-10-16 8:28 ` Benjamin Herrenschmidt 2007-10-16 8:31 ` David Miller 0 siblings, 1 reply; 19+ messages in thread From: Benjamin Herrenschmidt @ 2007-10-16 8:28 UTC (permalink / raw) To: David Miller; +Cc: ossthema, shemminger, netdev, themann, raisch On Tue, 2007-10-16 at 00:44 -0700, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Tue, 16 Oct 2007 17:29:47 +1000 > > > Do you have any pointer to how that is done on x86 or sparc64 ? > > Sparc64 does it statically in the kernel. > > For x86, see http://irqbalance.org/ Allright, so that's an out of tree userland thingy... (which may well work on ppc too I suppose). Definitely not installed by default by my distro so IRQs from the network cards on all x86's using ubuntu gutsy at least are spread to all CPUs :-) There's also a balance kernel thread in x86 with has a notion of irq that isn't moveable but that flag isn't set by any driver. Cheers, Ben. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 8:28 ` Benjamin Herrenschmidt @ 2007-10-16 8:31 ` David Miller 2007-10-16 9:01 ` Benjamin Herrenschmidt 2007-10-16 15:56 ` Arjan van de Ven 0 siblings, 2 replies; 19+ messages in thread From: David Miller @ 2007-10-16 8:31 UTC (permalink / raw) To: benh; +Cc: ossthema, shemminger, netdev, themann, raisch, arjan From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Tue, 16 Oct 2007 18:28:56 +1000 > Allright, so that's an out of tree userland thingy... (which may well > work on ppc too I suppose). Definitely not installed by default by my > distro so IRQs from the network cards on all x86's using ubuntu gutsy at > least are spread to all CPUs :-) But the thing does treat network interfaces differently from other devices. Arjan ran over to my table at kernel summit when I presented this topic to the audience, to remind me of all of this and he should be included in on any discussions about this topic. :-) I've CC:'d him. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 8:31 ` David Miller @ 2007-10-16 9:01 ` Benjamin Herrenschmidt 2007-10-16 21:01 ` Anton Blanchard 2007-10-17 15:26 ` Christoph Raisch 2007-10-16 15:56 ` Arjan van de Ven 1 sibling, 2 replies; 19+ messages in thread From: Benjamin Herrenschmidt @ 2007-10-16 9:01 UTC (permalink / raw) To: David Miller; +Cc: ossthema, shemminger, netdev, themann, raisch, arjan On Tue, 2007-10-16 at 01:31 -0700, David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Tue, 16 Oct 2007 18:28:56 +1000 > > > Allright, so that's an out of tree userland thingy... (which may well > > work on ppc too I suppose). Definitely not installed by default by my > > distro so IRQs from the network cards on all x86's using ubuntu gutsy at > > least are spread to all CPUs :-) > > But the thing does treat network interfaces differently from > other devices. > > Arjan ran over to my table at kernel summit when I presented this > topic to the audience, to remind me of all of this and he should be > included in on any discussions about this topic. :-) > > I've CC:'d him. As far as I know, the x86 in-kernel thingy doesn't but yeah, the userland one seems much more evolved and does things based on the "class" of the device. Christoph, have any of you tried it on powerpc ? Cheers, Ben. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 9:01 ` Benjamin Herrenschmidt @ 2007-10-16 21:01 ` Anton Blanchard 2007-10-17 15:26 ` Christoph Raisch 1 sibling, 0 replies; 19+ messages in thread From: Anton Blanchard @ 2007-10-16 21:01 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, ossthema, shemminger, netdev, themann, raisch, arjan Hi, > As far as I know, the x86 in-kernel thingy doesn't but yeah, the > userland one seems much more evolved and does things based on the > "class" of the device. > > Christoph, have any of you tried it on powerpc ? FYI It works fine on PowerPC and its installed by default on some distros (eg RHEL5). Anton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 9:01 ` Benjamin Herrenschmidt 2007-10-16 21:01 ` Anton Blanchard @ 2007-10-17 15:26 ` Christoph Raisch 1 sibling, 0 replies; 19+ messages in thread From: Christoph Raisch @ 2007-10-17 15:26 UTC (permalink / raw) To: benh; +Cc: arjan, David Miller, Jan-Bernd Themann, netdev, ossthema, shemminger Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 16.10.2007 11:01:49: > > > Christoph, have any of you tried it on powerpc ? No we didn't try this (yet). This approach makes a lot of sense. Why is this not installed by both large distros on PPC by default? how mature is this for larger SMPs on PPC? > > Cheers, > Ben. > > Gruss / Regards Christoph R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-10-16 8:31 ` David Miller 2007-10-16 9:01 ` Benjamin Herrenschmidt @ 2007-10-16 15:56 ` Arjan van de Ven 1 sibling, 0 replies; 19+ messages in thread From: Arjan van de Ven @ 2007-10-16 15:56 UTC (permalink / raw) To: David Miller; +Cc: benh, ossthema, shemminger, netdev, themann, raisch David Miller wrote: > From: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Date: Tue, 16 Oct 2007 18:28:56 +1000 > >> Allright, so that's an out of tree userland thingy... (which may well >> work on ppc too I suppose). Definitely not installed by default by my >> distro so IRQs from the network cards on all x86's using ubuntu gutsy at >> least are spread to all CPUs :-) > > But the thing does treat network interfaces differently from > other devices. > and it works on various architectures.... The in-kernel x86 thing is going away (it's actually highly suboptimal).... Yes it's done in userland, this is the right place so far.... out of tree... well... there's no good place for such userspace tools in the kernel tree currently otherwise I'd love to have it there. If your distro doesn't install this by default, please file a bug against the distro; I know we (Intel) worked with Fedora, RHEL, SuSE and Ubuntu to get it included.... maybe others don't? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-07 9:37 new NAPI interface broken Jan-Bernd Themann 2007-09-12 12:50 ` David Miller @ 2007-09-14 22:12 ` David Miller 2007-09-18 16:15 ` Jan-Bernd Themann 1 sibling, 1 reply; 19+ messages in thread From: David Miller @ 2007-09-14 22:12 UTC (permalink / raw) To: ossthema; +Cc: shemminger, netdev, themann, raisch From: Jan-Bernd Themann <ossthema@de.ibm.com> Date: Fri, 7 Sep 2007 11:37:02 +0200 > Its about the question who inserts and removes devices from the poll list. > > netif_rx_schedule: sets NAPI_STATE_SCHED flag, insert device in poll list. > netif_rx_complete: clears NAPI_STATE_SCHED > netif_rx_reschedule: sets NAPI_STATE_SCHED, insert device in poll list. > net_rx_action: > -removes dev from poll list > -calls poll function > -adds dev to poll list if NAPI_STATE_SCHED still set Ok, for now I'm going to try and deal with this by reverting the list handling to something approximating the old NAPI code, as per the patch below. I've only quickly test booted into this kernel on my workstation. So take care when trying it out. I took the liberty to add some assertions and comments all over wrt. list and quota handling. One thing to note that (and this was true previously too) that if ->poll() uses the whole quota it _MUST_ _NOT_ modify the NAPI state by doing a complete or reschedule. The net_rx_action code still owns the NAPI state in that case, and therefore is allowed to make modifications to the ->poll_list. I realize this adds a lot of IRQ enable/disable overhead back into the code, but we can't get rid of that cleanly without solving this list ownership and modification issue properly. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0106fa6..9837ed3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -284,7 +284,14 @@ extern int __init netdev_boot_setup(char *str); * Structure for NAPI scheduling similar to tasklet but with weighting */ struct napi_struct { + /* The poll_list must only be managed by the entity which + * changes the state of the NAPI_STATE_SCHED bit. This means + * whoever atomically sets that bit can add this napi_struct + * to the per-cpu poll_list, and whoever clears that bit + * can remove from the list right before clearing the bit. + */ struct list_head poll_list; + unsigned long state; int weight; int quota; @@ -336,13 +343,21 @@ static inline void napi_schedule(struct napi_struct *n) * * Mark NAPI processing as complete. */ -static inline void napi_complete(struct napi_struct *n) +static inline void __napi_complete(struct napi_struct *n) { BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); + list_del(&n->poll_list); smp_mb__before_clear_bit(); clear_bit(NAPI_STATE_SCHED, &n->state); } +static inline void napi_complete(struct napi_struct *n) +{ + local_irq_disable(); + __napi_complete(n); + local_irq_enable(); +} + /** * napi_disable - prevent NAPI from scheduling * @n: napi context @@ -1184,19 +1199,11 @@ static inline u32 netif_msg_init(int debug_value, int default_msg_enable_bits) return (1 << debug_value) - 1; } -/* Try to reschedule poll. Called by dev->poll() after netif_rx_complete(). - * Do not inline this? - */ +/* Try to reschedule poll. Called by dev->poll() after netif_rx_complete(). */ static inline int netif_rx_reschedule(struct napi_struct *n) { if (napi_schedule_prep(n)) { - unsigned long flags; - - local_irq_save(flags); - list_add_tail(&n->poll_list, - &__get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); - local_irq_restore(flags); + __napi_schedule(n); return 1; } return 0; @@ -1234,7 +1241,7 @@ static inline void netif_rx_schedule(struct net_device *dev, static inline void __netif_rx_complete(struct net_device *dev, struct napi_struct *napi) { - napi_complete(napi); + __napi_complete(napi); dev_put(dev); } diff --git a/net/core/dev.c b/net/core/dev.c index f119dc0..2897352 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2120,6 +2120,13 @@ static int process_backlog(struct napi_struct *napi, int quota) return work; } +static void napi_check_quota_bug(struct napi_struct *n) +{ + /* It is illegal to consume more than the given quota. */ + if (WARN_ON_ONCE(n->quota < 0)) + n->quota = 0; +} + /** * __napi_schedule - schedule for receive * @napi: entry to schedule @@ -2130,10 +2137,8 @@ void fastcall __napi_schedule(struct napi_struct *n) { unsigned long flags; - if (n->quota < 0) - n->quota += n->weight; - else - n->quota = n->weight; + napi_check_quota_bug(n); + n->quota = n->weight; local_irq_save(flags); list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list); @@ -2145,32 +2150,36 @@ EXPORT_SYMBOL(__napi_schedule); static void net_rx_action(struct softirq_action *h) { - struct list_head list; + struct list_head *list = &__get_cpu_var(softnet_data).poll_list; unsigned long start_time = jiffies; int budget = netdev_budget; void *have; local_irq_disable(); - list_replace_init(&__get_cpu_var(softnet_data).poll_list, &list); - local_irq_enable(); - while (!list_empty(&list)) { + while (!list_empty(list)) { struct napi_struct *n; - /* if softirq window is exhuasted then punt */ - if (unlikely(budget <= 0 || jiffies != start_time)) { - local_irq_disable(); - list_splice(&list, &__get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); - local_irq_enable(); - break; - } + /* If softirq window is exhuasted then punt. + * + * Note that this is a slight policy change from the + * previous NAPI code, which would allow up to 2 + * jiffies to pass before breaking out. The test + * used to be "jiffies - start_time > 1". + */ + if (unlikely(budget <= 0 || jiffies != start_time)) + goto softnet_break; - n = list_entry(list.next, struct napi_struct, poll_list); + local_irq_enable(); - have = netpoll_poll_lock(n); + /* Even though interrupts have been re-enabled, this + * access is safe because interrupts can only add new + * entries to the tail of this list, and only ->poll() + * calls can remove this head entry from the list. + */ + n = list_entry(list->next, struct napi_struct, poll_list); - list_del(&n->poll_list); + have = netpoll_poll_lock(n); /* if quota not exhausted process work */ if (likely(n->quota > 0)) { @@ -2180,12 +2189,25 @@ static void net_rx_action(struct softirq_action *h) n->quota -= work; } - /* if napi_complete not called, reschedule */ - if (test_bit(NAPI_STATE_SCHED, &n->state)) - __napi_schedule(n); + local_irq_disable(); + + napi_check_quota_bug(n); + + /* Drivers must not modify the NAPI state if they + * consume the entire quota. In such cases this code + * still "owns" the NAPI instance and therefore can + * move the instance around on the list at-will. + */ + if (unlikely(!n->quota)) { + n->quota = n->weight; + list_move_tail(&n->poll_list, list); + } netpoll_poll_unlock(have); } +out: + local_irq_enable(); + #ifdef CONFIG_NET_DMA /* * There may not be any more sk_buffs coming right now, so push @@ -2200,6 +2222,13 @@ static void net_rx_action(struct softirq_action *h) } } #endif + + return; + +softnet_break: + __get_cpu_var(netdev_rx_stat).time_squeeze++; + __raise_softirq_irqoff(NET_RX_SOFTIRQ); + goto out; } static gifconf_func_t * gifconf_list [NPROTO]; ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-14 22:12 ` David Miller @ 2007-09-18 16:15 ` Jan-Bernd Themann 2007-09-18 19:08 ` David Miller 2007-09-19 15:33 ` Roland Dreier 0 siblings, 2 replies; 19+ messages in thread From: Jan-Bernd Themann @ 2007-09-18 16:15 UTC (permalink / raw) To: David Miller; +Cc: shemminger, netdev, themann, raisch Hi, On Saturday 15 September 2007 00:12, David Miller wrote: > Ok, for now I'm going to try and deal with this by reverting > the list handling to something approximating the old NAPI > code, as per the patch below. > > I've only quickly test booted into this kernel on my workstation. > So take care when trying it out. > > I took the liberty to add some assertions and comments all over > wrt. list and quota handling. One thing to note that (and this > was true previously too) that if ->poll() uses the whole quota > it _MUST_ _NOT_ modify the NAPI state by doing a complete or > reschedule. The net_rx_action code still owns the NAPI state in > that case, and therefore is allowed to make modifications to the > ->poll_list. I'm currently updating the eHEA polling function to work with this scheme. Up to now we had sort of a quota for or TX refill part as well, which was also done in the poll function. This was possible as the device could be scheduled again even if the quota has not been used completely. If I got it right this is not longer possible. The idea was to benefit from the same "fairness" scheme of NAPI as it is done for the RX side. One other thing I observed is that I can not unload the module as the ref counter of the eth device is too low. I haven't tracked down the source of this problem yet. Regards, Jan-Bernd ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-18 16:15 ` Jan-Bernd Themann @ 2007-09-18 19:08 ` David Miller 2007-09-19 15:33 ` Roland Dreier 1 sibling, 0 replies; 19+ messages in thread From: David Miller @ 2007-09-18 19:08 UTC (permalink / raw) To: ossthema; +Cc: shemminger, netdev, themann, raisch From: Jan-Bernd Themann <ossthema@de.ibm.com> Date: Tue, 18 Sep 2007 18:15:45 +0200 > One other thing I observed is that I can not unload the module as the > ref counter of the eth device is too low. I haven't tracked down the > source of this problem yet. This is probably because of the resched device refcounting bug that Roland Dreier just posted a fix for. Look for subject "Fix refcounting problem with netif_rx_reschedule()" I merge that in shortly to net-2.6.24 as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-18 16:15 ` Jan-Bernd Themann 2007-09-18 19:08 ` David Miller @ 2007-09-19 15:33 ` Roland Dreier 2007-09-19 15:43 ` Jan-Bernd Themann 1 sibling, 1 reply; 19+ messages in thread From: Roland Dreier @ 2007-09-19 15:33 UTC (permalink / raw) To: Jan-Bernd Themann; +Cc: David Miller, shemminger, netdev, themann, raisch > One other thing I observed is that I can not unload the module as the > ref counter of the eth device is too low. I haven't tracked down the > source of this problem yet. I suspect that this is because netif_rx_reschedule() was missing a dev_hold() to match the dev_put() in netif_rx_complete(). Dave merged a fix for that problem yesterday, so current net-2.6.24 should be OK. Let us know if it's not OK, because then there's another bug... - R. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: new NAPI interface broken 2007-09-19 15:33 ` Roland Dreier @ 2007-09-19 15:43 ` Jan-Bernd Themann 0 siblings, 0 replies; 19+ messages in thread From: Jan-Bernd Themann @ 2007-09-19 15:43 UTC (permalink / raw) To: Roland Dreier; +Cc: David Miller, shemminger, netdev, themann, raisch On Wednesday 19 September 2007 17:33, Roland Dreier wrote: > > One other thing I observed is that I can not unload the module as the > > ref counter of the eth device is too low. I haven't tracked down the > > source of this problem yet. > > I suspect that this is because netif_rx_reschedule() was missing a > dev_hold() to match the dev_put() in netif_rx_complete(). Dave merged > a fix for that problem yesterday, so current net-2.6.24 should be OK. > Let us know if it's not OK, because then there's another bug... > > - R. > When I applied this netif_rx_reschedule fix this problem did not occur anymore (module could be unloaded). So I guess I hit the problem you described. Thanks, Jan-Bernd ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-10-17 15:26 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-07 9:37 new NAPI interface broken Jan-Bernd Themann 2007-09-12 12:50 ` David Miller 2007-09-12 13:10 ` new NAPI interface broken for POWER architecture? Christoph Raisch 2007-09-12 13:27 ` David Miller 2007-09-12 15:18 ` Arnd Bergmann 2007-10-16 7:29 ` new NAPI interface broken Benjamin Herrenschmidt 2007-10-16 7:42 ` Benjamin Herrenschmidt 2007-10-16 7:44 ` David Miller 2007-10-16 8:28 ` Benjamin Herrenschmidt 2007-10-16 8:31 ` David Miller 2007-10-16 9:01 ` Benjamin Herrenschmidt 2007-10-16 21:01 ` Anton Blanchard 2007-10-17 15:26 ` Christoph Raisch 2007-10-16 15:56 ` Arjan van de Ven 2007-09-14 22:12 ` David Miller 2007-09-18 16:15 ` Jan-Bernd Themann 2007-09-18 19:08 ` David Miller 2007-09-19 15:33 ` Roland Dreier 2007-09-19 15:43 ` Jan-Bernd Themann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).