* [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
@ 2007-12-12 4:01 Joonwoo Park
2007-12-12 5:39 ` Stephen Hemminger
2007-12-12 15:18 ` [PATCH 6/7] : " David Miller
0 siblings, 2 replies; 41+ messages in thread
From: Joonwoo Park @ 2007-12-12 4:01 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: jgarzik, baum, andy
[NETDEV]: tehuti Fix possible causing oops of net_rx_action
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
drivers/net/tehuti.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index 21230c9..955e749 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
netif_rx_complete(dev, napi);
bdx_enable_interrupts(priv);
+ if (unlikely(work_done == napi->weight))
+ return work_done - 1;
}
return work_done;
}
---
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
2007-12-12 4:01 [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12 5:39 ` Stephen Hemminger
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
` (2 more replies)
2007-12-12 15:18 ` [PATCH 6/7] : " David Miller
1 sibling, 3 replies; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-12 5:39 UTC (permalink / raw)
To: Joonwoo Park; +Cc: netdev, linux-kernel, jgarzik, baum, andy
On Wed, 12 Dec 2007 13:01:27 +0900
"Joonwoo Park" <joonwpark81@gmail.com> wrote:
> [NETDEV]: tehuti Fix possible causing oops of net_rx_action
>
> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> ---
> drivers/net/tehuti.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> index 21230c9..955e749 100644
> --- a/drivers/net/tehuti.c
> +++ b/drivers/net/tehuti.c
> @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
>
> netif_rx_complete(dev, napi);
> bdx_enable_interrupts(priv);
> + if (unlikely(work_done == napi->weight))
> + return work_done - 1;
> }
> return work_done;
> }
A better fix would be not going over budget in the first place.
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 41+ messages in thread* [RFC] net: napi fix
2007-12-12 5:39 ` Stephen Hemminger
@ 2007-12-12 5:46 ` Stephen Hemminger
2007-12-12 6:05 ` Joonwoo Park
2007-12-12 15:21 ` David Miller
2007-12-12 5:48 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12 15:20 ` [PATCH 6/7] [NETDEV]: " David Miller
2 siblings, 2 replies; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-12 5:46 UTC (permalink / raw)
To: David S. Miller; +Cc: Joonwoo Park, netdev, linux-kernel, jgarzik, baum, andy
Isn't this a better fix for all drivers, rather than peppering every
driver with the special case. This is how the logic worked up until
2.6.24.
--- a/net/core/dev.c 2007-12-11 12:16:20.000000000 -0800
+++ b/net/core/dev.c 2007-12-11 21:43:39.000000000 -0800
@@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
have = netpoll_poll_lock(n);
- weight = n->weight;
+ weight = min(n->weight, budget);
/* This NAPI_STATE_SCHED test is for avoiding a race
* with netpoll's poll_napi(). Only the entity which
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
@ 2007-12-12 6:05 ` Joonwoo Park
2007-12-12 15:22 ` David Miller
2007-12-12 15:21 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: Joonwoo Park @ 2007-12-12 6:05 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David S. Miller, netdev, linux-kernel, jgarzik, baum, andy
2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.
>
>
> --- a/net/core/dev.c 2007-12-11 12:16:20.000000000 -0800
> +++ b/net/core/dev.c 2007-12-11 21:43:39.000000000 -0800
> @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
>
> have = netpoll_poll_lock(n);
>
> - weight = n->weight;
> + weight = min(n->weight, budget);
>
> /* This NAPI_STATE_SCHED test is for avoiding a race
> * with netpoll's poll_napi(). Only the entity which
>
Stephen,
Could you explain how it fix the problem?
IMHO I think your patch cannot solve the problem.
The drivers can call netif_rx_complete and net_rx_action can do
list_move_tail also.
Am I missing something?
Thanks
Joonwoo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 6:05 ` Joonwoo Park
@ 2007-12-12 15:22 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-12-12 15:22 UTC (permalink / raw)
To: joonwpark81; +Cc: shemminger, netdev, linux-kernel, jgarzik, baum, andy
From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Wed, 12 Dec 2007 15:05:26 +0900
> Could you explain how it fix the problem?
> IMHO I think your patch cannot solve the problem.
> The drivers can call netif_rx_complete and net_rx_action can do
> list_move_tail also.
Stephen is confused about what the bug is in these drivers,
that's all.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
2007-12-12 6:05 ` Joonwoo Park
@ 2007-12-12 15:21 ` David Miller
1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2007-12-12 15:21 UTC (permalink / raw)
To: shemminger; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, baum, andy
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 21:46:34 -0800
> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.
Stephen this is not the problem.
The problem is that the driver is doing a NAPI completion and
re-enabling chip interrupts with work_done == weight, and that is
illegal.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
2007-12-12 5:39 ` Stephen Hemminger
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
@ 2007-12-12 5:48 ` Joonwoo Park
2007-12-12 5:53 ` Stephen Hemminger
2007-12-12 15:20 ` [PATCH 6/7] [NETDEV]: " David Miller
2 siblings, 1 reply; 41+ messages in thread
From: Joonwoo Park @ 2007-12-12 5:48 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, linux-kernel, jgarzik, baum, andy
2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> On Wed, 12 Dec 2007 13:01:27 +0900
> "Joonwoo Park" <joonwpark81@gmail.com> wrote:
>
> > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> >
> > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > ---
> > drivers/net/tehuti.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > index 21230c9..955e749 100644
> > --- a/drivers/net/tehuti.c
> > +++ b/drivers/net/tehuti.c
> > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> >
> > netif_rx_complete(dev, napi);
> > bdx_enable_interrupts(priv);
> > + if (unlikely(work_done == napi->weight))
> > + return work_done - 1;
> > }
> > return work_done;
> > }
>
> A better fix would be not going over budget in the first place.
>
> --
> Stephen Hemminger <shemminger@linux-foundation.org>
>
Stephen,
This is code of bd_poll().
Do you mean remove napi_stop stuff?
static int bdx_poll(struct napi_struct *napi, int budget)
{
...
work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
if ((work_done < budget) ||
(priv->napi_stop++ >= 30)) {
DBG("rx poll is done. backing to isr-driven\n");
/* from time to time we exit to let NAPI layer release
* device lock and allow waiting tasks (eg rmmod) to advance) */
priv->napi_stop = 0;
netif_rx_complete(dev, napi);
bdx_enable_interrupts(priv);
if (unlikely(work_done == napi->weight))
return work_done - 1;
}
return work_done;
}
Thanks,
Joonwoo
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
2007-12-12 5:48 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12 5:53 ` Stephen Hemminger
0 siblings, 0 replies; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-12 5:53 UTC (permalink / raw)
To: Joonwoo Park; +Cc: netdev, linux-kernel, jgarzik, baum, andy
On Wed, 12 Dec 2007 14:48:27 +0900
"Joonwoo Park" <joonwpark81@gmail.com> wrote:
> 2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> > On Wed, 12 Dec 2007 13:01:27 +0900
> > "Joonwoo Park" <joonwpark81@gmail.com> wrote:
> >
> > > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> > >
> > > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > > ---
> > > drivers/net/tehuti.c | 2 ++
> > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > > index 21230c9..955e749 100644
> > > --- a/drivers/net/tehuti.c
> > > +++ b/drivers/net/tehuti.c
> > > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> > >
> > > netif_rx_complete(dev, napi);
> > > bdx_enable_interrupts(priv);
> > > + if (unlikely(work_done == napi->weight))
> > > + return work_done - 1;
> > > }
> > > return work_done;
> > > }
> >
> > A better fix would be not going over budget in the first place.
> >
> > --
> > Stephen Hemminger <shemminger@linux-foundation.org>
> >
>
> Stephen,
> This is code of bd_poll().
> Do you mean remove napi_stop stuff?
>
> static int bdx_poll(struct napi_struct *napi, int budget)
> {
> ...
> work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
> if ((work_done < budget) ||
> (priv->napi_stop++ >= 30)) {
Yes remove the napi_stop stuff, because current NAPI expects device
to be constrained only by budget. If you need to stop sooner, just
set napi weight to be smaller.
> DBG("rx poll is done. backing to isr-driven\n");
>
> /* from time to time we exit to let NAPI layer release
> * device lock and allow waiting tasks (eg rmmod) to advance) */
> priv->napi_stop = 0;
>
> netif_rx_complete(dev, napi);
> bdx_enable_interrupts(priv);
With my posted fix to rx_action the following two lines would not be needed.
> if (unlikely(work_done == napi->weight))
> return work_done - 1;
> }
> return work_done;
> }
>
> Thanks,
> Joonwoo
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
2007-12-12 5:39 ` Stephen Hemminger
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
2007-12-12 5:48 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12 15:20 ` David Miller
2007-12-12 16:36 ` Stephen Hemminger
2 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-12-12 15:20 UTC (permalink / raw)
To: shemminger; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, baum, andy
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 21:39:39 -0800
> On Wed, 12 Dec 2007 13:01:27 +0900
> "Joonwoo Park" <joonwpark81@gmail.com> wrote:
>
> > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> >
> > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > ---
> > drivers/net/tehuti.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > index 21230c9..955e749 100644
> > --- a/drivers/net/tehuti.c
> > +++ b/drivers/net/tehuti.c
> > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> >
> > netif_rx_complete(dev, napi);
> > bdx_enable_interrupts(priv);
> > + if (unlikely(work_done == napi->weight))
> > + return work_done - 1;
> > }
> > return work_done;
> > }
>
> A better fix would be not going over budget in the first place.
That's not the problem.
They are not going over the budget, rather, they are hitting
the budget yet doing netif_rx_complete() as well which is
illegal.
Unless you strictly process less than "weight" packets, you must
not netif_rx_complete() and re-enable chip interrupts.
I can't believe people are trying to fix this bug like this.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] [NETDEV]: tehuti Fix possible causing oops of net_rx_action
2007-12-12 15:20 ` [PATCH 6/7] [NETDEV]: " David Miller
@ 2007-12-12 16:36 ` Stephen Hemminger
0 siblings, 0 replies; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-12 16:36 UTC (permalink / raw)
To: David Miller, joonwpark81; +Cc: netdev, linux-kernel, jgarzik, baum, andy
On Wed, 12 Dec 2007 07:20:34 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Tue, 11 Dec 2007 21:39:39 -0800
>
> > On Wed, 12 Dec 2007 13:01:27 +0900
> > "Joonwoo Park" <joonwpark81@gmail.com> wrote:
> >
> > > [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> > >
> > > Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> > > ---
> > > drivers/net/tehuti.c | 2 ++
> > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> > > index 21230c9..955e749 100644
> > > --- a/drivers/net/tehuti.c
> > > +++ b/drivers/net/tehuti.c
> > > @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
> > >
> > > netif_rx_complete(dev, napi);
> > > bdx_enable_interrupts(priv);
> > > + if (unlikely(work_done == napi->weight))
> > > + return work_done - 1;
> > > }
> > > return work_done;
> > > }
> >
> > A better fix would be not going over budget in the first place.
>
> That's not the problem.
>
> They are not going over the budget, rather, they are hitting
> the budget yet doing netif_rx_complete() as well which is
> illegal.
>
> Unless you strictly process less than "weight" packets, you must
> not netif_rx_complete() and re-enable chip interrupts.
>
> I can't believe people are trying to fix this bug like this.
Sorry, I was looking at a different possible problem. The issue
is that if netdev_budget was set smaller (say 128) but device
weight was set larger (say 256). The new code would still allow
the device to do a full swipe (256) packets rather than only
128 as in earlier NAPI. I guess it is an okay behaviour change, because
we don't really guarantee that case.
The problem with the tehuti driver is the logic around priv->napi_stop.
That whole early stop concept should be removed since it just
duplicates the logic of netdev->weight but breaks the assumptions
in the calling netif_rx_action.
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
2007-12-12 4:01 [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12 5:39 ` Stephen Hemminger
@ 2007-12-12 15:18 ` David Miller
2007-12-12 21:58 ` [RFT] tehuti: napi fix Stephen Hemminger
2007-12-13 7:07 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
1 sibling, 2 replies; 41+ messages in thread
From: David Miller @ 2007-12-12 15:18 UTC (permalink / raw)
To: joonwpark81; +Cc: netdev, linux-kernel, jgarzik, baum, andy
From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Wed, 12 Dec 2007 13:01:27 +0900
> @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
>
> netif_rx_complete(dev, napi);
> bdx_enable_interrupts(priv);
> + if (unlikely(work_done == napi->weight))
> + return work_done - 1;
> }
> return work_done;
> }
Any time your trying to make a caller "happy" by adjusting
a return value forcefully, it's a hack.
And I stated this in another reply about this issue.
Please do not fix the problem this way.
The correct way to fix this is, if we did process a full
"weight" or work, we should not netif_rx_complete() and
we should not re-enable chip interrupts.
Instead we should return the true "work_done" value and
allow the caller to thus poll us one more time.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [RFT] tehuti: napi fix
2007-12-12 15:18 ` [PATCH 6/7] : " David Miller
@ 2007-12-12 21:58 ` Stephen Hemminger
2007-12-16 21:38 ` David Miller
2007-12-13 7:07 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
1 sibling, 1 reply; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-12 21:58 UTC (permalink / raw)
To: David Miller; +Cc: joonwpark81, netdev, jgarzik, baum, andy
This should fix the tehuti napi fence post problems by getting
rid of priv->napi_stop, and setting weight to 32 (like other 10G).
Also, used the wierd entry/exit macro's like rest of driver.
--- a/drivers/net/tehuti.c 2007-11-13 22:19:14.000000000 -0800
+++ b/drivers/net/tehuti.c 2007-12-12 13:37:01.000000000 -0800
@@ -295,18 +295,12 @@ static int bdx_poll(struct napi_struct *
ENTER;
bdx_tx_cleanup(priv);
work_done = bdx_rx_receive(priv, &priv->rxd_fifo0, budget);
- if ((work_done < budget) ||
- (priv->napi_stop++ >= 30)) {
+ if (work_done < budget) {
DBG("rx poll is done. backing to isr-driven\n");
-
- /* from time to time we exit to let NAPI layer release
- * device lock and allow waiting tasks (eg rmmod) to advance) */
- priv->napi_stop = 0;
-
netif_rx_complete(dev, napi);
bdx_enable_interrupts(priv);
}
- return work_done;
+ RET(work_done);
}
/* bdx_fw_load - loads firmware to NIC
@@ -2022,7 +2016,7 @@ bdx_probe(struct pci_dev *pdev, const st
priv->nic = nic;
priv->msg_enable = BDX_DEF_MSG_ENABLE;
- netif_napi_add(ndev, &priv->napi, bdx_poll, 64);
+ netif_napi_add(ndev, &priv->napi, bdx_poll, 32);
if ((readl(nic->regs + FPGA_VER) & 0xFFF) == 308) {
DBG("HW statistics not supported\n");
--- a/drivers/net/tehuti.h 2007-10-16 16:48:17.000000000 -0700
+++ b/drivers/net/tehuti.h 2007-12-12 13:25:25.000000000 -0800
@@ -257,7 +257,6 @@ struct bdx_priv {
struct rxd_fifo rxd_fifo0;
struct rxf_fifo rxf_fifo0;
struct rxdb *rxdb; /* rx dbs to store skb pointers */
- int napi_stop;
struct vlan_group *vlgrp;
/* Tx FIFOs: 1 for data desc, 1 for empty (acks) desc */
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [RFT] tehuti: napi fix
2007-12-12 21:58 ` [RFT] tehuti: napi fix Stephen Hemminger
@ 2007-12-16 21:38 ` David Miller
2007-12-17 20:18 ` Stephen Hemminger
0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-12-16 21:38 UTC (permalink / raw)
To: shemminger; +Cc: joonwpark81, netdev, jgarzik, baum, andy
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Wed, 12 Dec 2007 13:58:52 -0800
> This should fix the tehuti napi fence post problems by getting
> rid of priv->napi_stop, and setting weight to 32 (like other 10G).
>
> Also, used the wierd entry/exit macro's like rest of driver.
It fixes the fench-post problem, but like the comments you
removed explain:
> - /* from time to time we exit to let NAPI layer release
> - * device lock and allow waiting tasks (eg rmmod) to advance) */
> - priv->napi_stop = 0;
> -
We now hang on rmmod during constant packet load.
This change just trades one bug for another, we have to
get the device close issue sorted out before we can go
around removing these things.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFT] tehuti: napi fix
2007-12-16 21:38 ` David Miller
@ 2007-12-17 20:18 ` Stephen Hemminger
0 siblings, 0 replies; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-17 20:18 UTC (permalink / raw)
To: David Miller; +Cc: joonwpark81, netdev, jgarzik, baum, andy
On Sun, 16 Dec 2007 13:38:33 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> Date: Wed, 12 Dec 2007 13:58:52 -0800
>
> > This should fix the tehuti napi fence post problems by getting
> > rid of priv->napi_stop, and setting weight to 32 (like other 10G).
> >
> > Also, used the wierd entry/exit macro's like rest of driver.
>
> It fixes the fench-post problem, but like the comments you
> removed explain:
>
> > - /* from time to time we exit to let NAPI layer release
> > - * device lock and allow waiting tasks (eg rmmod) to advance) */
> > - priv->napi_stop = 0;
> > -
>
> We now hang on rmmod during constant packet load.
>
> This change just trades one bug for another, we have to
> get the device close issue sorted out before we can go
> around removing these things.
Well the napi_stop had the same effect as having a smaller weight
value, so my patch just shrunk the weight. That causes the device
to exit NAPI (and should solve the rmmod problem).
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
2007-12-12 15:18 ` [PATCH 6/7] : " David Miller
2007-12-12 21:58 ` [RFT] tehuti: napi fix Stephen Hemminger
@ 2007-12-13 7:07 ` Joonwoo Park
1 sibling, 0 replies; 41+ messages in thread
From: Joonwoo Park @ 2007-12-13 7:07 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, jgarzik, baum, andy
2007/12/13, David Miller <davem@davemloft.net>:
> From: "Joonwoo Park" <joonwpark81@gmail.com>
> Date: Wed, 12 Dec 2007 13:01:27 +0900
>
>
> Any time your trying to make a caller "happy" by adjusting
> a return value forcefully, it's a hack.
>
> And I stated this in another reply about this issue.
>
> Please do not fix the problem this way.
>
> The correct way to fix this is, if we did process a full
> "weight" or work, we should not netif_rx_complete() and
> we should not re-enable chip interrupts.
>
> Instead we should return the true "work_done" value and
> allow the caller to thus poll us one more time.
>
Thanks so much for your advice.
I agree, returning work_done itself exactly.
I will rework for these drivers.
Thanks.
Joonwoo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
@ 2007-12-12 17:29 Andrew Gallatin
2007-12-12 17:38 ` David Miller
0 siblings, 1 reply; 41+ messages in thread
From: Andrew Gallatin @ 2007-12-12 17:29 UTC (permalink / raw)
To: David Miller
Cc: joonwpark81, netdev, linux-kernel, jgarzik, Stephen Hemminger
[I apologize for loosing threading, I'm replying from the archives]
> The problem is that the driver is doing a NAPI completion and
> re-enabling chip interrupts with work_done == weight, and that is
> illegal.
The only time at least myri10ge will do this is due to
the !netif_running(netdev) check. Eg, from myri10ge's poll:
work_done = myri10ge_clean_rx_done(mgp, budget);
if (work_done < budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp->irq_claim);
}
Is the netif_running() check even required? Is this just
a bad way to solve a race with running NAPI at down() time
that would be better solved by putting a napi_synchronize()
in the driver's down() routine?
I'd rather fix this right than add another check to a
questionable code path.
Thanks,
Drew
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [RFC] net: napi fix
2007-12-12 17:29 [RFC] net: napi fix Andrew Gallatin
@ 2007-12-12 17:38 ` David Miller
2007-12-12 17:40 ` Andrew Gallatin
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: David Miller @ 2007-12-12 17:38 UTC (permalink / raw)
To: gallatin; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, shemminger
From: Andrew Gallatin <gallatin@myri.com>
Date: Wed, 12 Dec 2007 12:29:23 -0500
> Is the netif_running() check even required?
No, it is not.
When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [RFC] net: napi fix
2007-12-12 17:38 ` David Miller
@ 2007-12-12 17:40 ` Andrew Gallatin
2007-12-12 18:41 ` Kok, Auke
2007-12-20 9:52 ` Robert Olsson
2 siblings, 0 replies; 41+ messages in thread
From: Andrew Gallatin @ 2007-12-12 17:40 UTC (permalink / raw)
To: David Miller; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, shemminger
David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Wed, 12 Dec 2007 12:29:23 -0500
>
>> Is the netif_running() check even required?
>
> No, it is not.
>
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.
Great, thanks. I will submit a patch to remove the bogus
check. This should fix myri10ge properly.
Thank you,
Drew
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 17:38 ` David Miller
2007-12-12 17:40 ` Andrew Gallatin
@ 2007-12-12 18:41 ` Kok, Auke
2007-12-13 7:41 ` Joonwoo Park
2007-12-13 13:49 ` Jarek Poplawski
2007-12-20 9:52 ` Robert Olsson
2 siblings, 2 replies; 41+ messages in thread
From: Kok, Auke @ 2007-12-12 18:41 UTC (permalink / raw)
To: David Miller
Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger,
Jesse Brandeburg
David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Wed, 12 Dec 2007 12:29:23 -0500
>
>> Is the netif_running() check even required?
>
> No, it is not.
>
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.
I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
has this check as well and that's exactly what is causing most of the
net_rx_action oopses in the first place. Without the netif_running() check
previously the drivers were just unusable with NAPI and prone to many races with
down (i.e. touching some ethtool ioctl which wants to do a reset while routing
small packets at high numbers). that's why we added the netif_running() check in
the first place :)
There might be more drivers lurking that need this change...
Auke
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 18:41 ` Kok, Auke
@ 2007-12-13 7:41 ` Joonwoo Park
2007-12-13 14:13 ` Andrew Gallatin
2007-12-13 13:49 ` Jarek Poplawski
1 sibling, 1 reply; 41+ messages in thread
From: Joonwoo Park @ 2007-12-13 7:41 UTC (permalink / raw)
To: Kok, Auke
Cc: David Miller, gallatin, netdev, linux-kernel, jgarzik, shemminger,
Jesse Brandeburg
2007/12/13, Kok, Auke <auke-jan.h.kok@intel.com>:
> David Miller wrote:
> > From: Andrew Gallatin <gallatin@myri.com>
> > Date: Wed, 12 Dec 2007 12:29:23 -0500
> >
> >> Is the netif_running() check even required?
> >
> > No, it is not.
> >
> > When a device is brought down, one of the first things
> > that happens is that we wait for all pending NAPI polls
> > to complete, then block any new polls from starting.
>
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check in
> the first place :)
>
> There might be more drivers lurking that need this change...
>
> Auke
>
Also in my case, without netif_running() check, I cannot do ifconfig down.
It stucked if packet generator was sending packets.
Thanks
Joonwoo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 7:41 ` Joonwoo Park
@ 2007-12-13 14:13 ` Andrew Gallatin
2007-12-13 14:19 ` David Miller
2007-12-14 2:06 ` Joonwoo Park
0 siblings, 2 replies; 41+ messages in thread
From: Andrew Gallatin @ 2007-12-13 14:13 UTC (permalink / raw)
To: Joonwoo Park
Cc: Kok, Auke, David Miller, netdev, linux-kernel, jgarzik,
shemminger, Jesse Brandeburg
Joonwoo Park wrote:
> 2007/12/13, Kok, Auke <auke-jan.h.kok@intel.com>:
>> David Miller wrote:
>>> From: Andrew Gallatin <gallatin@myri.com>
>>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>>
>>>> Is the netif_running() check even required?
>>> No, it is not.
>>>
>>> When a device is brought down, one of the first things
>>> that happens is that we wait for all pending NAPI polls
>>> to complete, then block any new polls from starting.
>> I think this was previously (pre-2.6.24) not the case, which is why
e1000 et al
>> has this check as well and that's exactly what is causing most of the
>> net_rx_action oopses in the first place. Without the netif_running()
check
>> previously the drivers were just unusable with NAPI and prone to
many races with
>> down (i.e. touching some ethtool ioctl which wants to do a reset
while routing
>> small packets at high numbers). that's why we added the
netif_running() check in
>> the first place :)
>>
>> There might be more drivers lurking that need this change...
>>
>> Auke
>>
>
> Also in my case, without netif_running() check, I cannot do ifconfig
down.
> It stucked if packet generator was sending packets.
If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.
Drew
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 14:13 ` Andrew Gallatin
@ 2007-12-13 14:19 ` David Miller
2007-12-13 16:45 ` Kok, Auke
2007-12-13 18:22 ` Stephen Hemminger
2007-12-14 2:06 ` Joonwoo Park
1 sibling, 2 replies; 41+ messages in thread
From: David Miller @ 2007-12-13 14:19 UTC (permalink / raw)
To: gallatin
Cc: joonwpark81, auke-jan.h.kok, netdev, linux-kernel, jgarzik,
shemminger, jesse.brandeburg
From: Andrew Gallatin <gallatin@myri.com>
Date: Thu, 13 Dec 2007 09:13:54 -0500
> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.
That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 14:19 ` David Miller
@ 2007-12-13 16:45 ` Kok, Auke
2007-12-13 18:22 ` Stephen Hemminger
1 sibling, 0 replies; 41+ messages in thread
From: Kok, Auke @ 2007-12-13 16:45 UTC (permalink / raw)
To: David Miller
Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger,
jesse.brandeburg
David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
>
>> If the netif_running() check is indeed required to make a device break
>> out of napi polling and respond to an ifconfig down, then I think the
>> netif_running() check should be moved up into net_rx_action() to avoid
>> potential for driver complexity and bugs like the ones you found.
>
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)
hear hear!
Auke
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 14:19 ` David Miller
2007-12-13 16:45 ` Kok, Auke
@ 2007-12-13 18:22 ` Stephen Hemminger
2007-12-13 19:02 ` Andrew Gallatin
1 sibling, 1 reply; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-13 18:22 UTC (permalink / raw)
To: David Miller
Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
jgarzik, jesse.brandeburg
On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
>
> > If the netif_running() check is indeed required to make a device break
> > out of napi polling and respond to an ifconfig down, then I think the
> > netif_running() check should be moved up into net_rx_action() to avoid
> > potential for driver complexity and bugs like the ones you found.
>
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N. There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.
The existing pointer from napi to netdevice is only used by netconsole
now. For devices like sky2 it means that netconsole can't work on the the
second port which is a not a big problem. But adding a netif_running()
check would be a big issue.
--
Stephen Hemminger <shemminger@linux-foundation.org>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 18:22 ` Stephen Hemminger
@ 2007-12-13 19:02 ` Andrew Gallatin
2007-12-13 19:09 ` David Miller
0 siblings, 1 reply; 41+ messages in thread
From: Andrew Gallatin @ 2007-12-13 19:02 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Miller, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
jgarzik, jesse.brandeburg
Stephen Hemminger wrote:
> On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Andrew Gallatin <gallatin@myri.com>
>> Date: Thu, 13 Dec 2007 09:13:54 -0500
>>
>>> If the netif_running() check is indeed required to make a device break
>>> out of napi polling and respond to an ifconfig down, then I think the
>>> netif_running() check should be moved up into net_rx_action() to avoid
>>> potential for driver complexity and bugs like the ones you found.
>> That, or something like it, definitely sounds reasonable and much
>> better than putting the check into every driver :-)
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> It is not possible to do netif_running() check in generic code as currently
> written because of the case of devices where a single NAPI object is
> being used to handle two devices. The association between napi and netdevice
> is M to N. There are cases like niu that have multiple NAPI's and one
> netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.
Ah, now I see. I forgot that not every device has a 1:1::napi:netdev
relationship.
Could we make an optional *dev_state field in the napi structure.
It would be initialized to __LINK_STATE_START. Devices which have
a 1:1 NAPI:netdevice relationship would set it to &netdev->state.
The generic code would then do a test_bit(__LINK_STATE_START,
napi->dev_state), and 1:1 drivers could remove this check.
M:N drivers would pay for a useless (to them) test_bit, and would
have to provide their own netif_running check to get termination
under heavy load.
Just an idea, perhaps there is a better way which is less hacky.
Or perhaps we should just leave things as is.
Drew
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 19:02 ` Andrew Gallatin
@ 2007-12-13 19:09 ` David Miller
2007-12-13 19:35 ` Stephen Hemminger
0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-12-13 19:09 UTC (permalink / raw)
To: gallatin
Cc: shemminger, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
jgarzik, jesse.brandeburg
From: Andrew Gallatin <gallatin@myri.com>
Date: Thu, 13 Dec 2007 14:02:25 -0500
> Or perhaps we should just leave things as is.
We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.
net_rx_action() can then check this.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 19:09 ` David Miller
@ 2007-12-13 19:35 ` Stephen Hemminger
2007-12-13 20:38 ` David Miller
0 siblings, 1 reply; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-13 19:35 UTC (permalink / raw)
To: David Miller
Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
jgarzik, jesse.brandeburg
David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 14:02:25 -0500
>
>
>> Or perhaps we should just leave things as is.
>>
>
> We should probably add a "disabling" state bit to the
> napi struct flags, this will be set by napi_disable()
> before it loops trying to set the sched bit.
>
> net_rx_action() can then check this.
>
How about allowing a return value of -1 from napi_poll and letting device
check itself.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 19:35 ` Stephen Hemminger
@ 2007-12-13 20:38 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-12-13 20:38 UTC (permalink / raw)
To: shemminger
Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
jgarzik, jesse.brandeburg
From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Thu, 13 Dec 2007 11:35:07 -0800
> How about allowing a return value of -1 from napi_poll and letting
> device check itself.
It doesn't avoid the code duplication in the ->poll() fast paths.
I don't care, on the other hand, if crap accumulates in non-critical
slow paths like napi_disable() and dev_close(). That's why I'm
suggesting solutions in that area.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 14:13 ` Andrew Gallatin
2007-12-13 14:19 ` David Miller
@ 2007-12-14 2:06 ` Joonwoo Park
1 sibling, 0 replies; 41+ messages in thread
From: Joonwoo Park @ 2007-12-14 2:06 UTC (permalink / raw)
To: Andrew Gallatin
Cc: Kok, Auke, David Miller, netdev, linux-kernel, jgarzik,
shemminger, Jesse Brandeburg
2007/12/13, Andrew Gallatin <gallatin@myri.com>:
>
> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.
>
> Drew
>
Yep, It looks good.
Joonwoo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 18:41 ` Kok, Auke
2007-12-13 7:41 ` Joonwoo Park
@ 2007-12-13 13:49 ` Jarek Poplawski
2007-12-13 13:50 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 13:49 UTC (permalink / raw)
To: Kok, Auke
Cc: David Miller, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, Jesse Brandeburg
On 12-12-2007 19:41, Kok, Auke wrote:
> David Miller wrote:
>> From: Andrew Gallatin <gallatin@myri.com>
>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>
>>> Is the netif_running() check even required?
>> No, it is not.
>>
>> When a device is brought down, one of the first things
>> that happens is that we wait for all pending NAPI polls
>> to complete, then block any new polls from starting.
>
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check in
> the first place :)
>
> There might be more drivers lurking that need this change...
>
As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
I wonder what is the main reason or gain of leaving such a tricky
exception, instead of letting drivers to always decide which is the
best moment for napi_complete()? (Or maybe even, in such a case, they
should call some function with this list_move_tail() if it's so
useful?)
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 13:49 ` Jarek Poplawski
@ 2007-12-13 13:50 ` David Miller
2007-12-13 14:14 ` Jarek Poplawski
2007-12-13 20:16 ` Jarek Poplawski
0 siblings, 2 replies; 41+ messages in thread
From: David Miller @ 2007-12-13 13:50 UTC (permalink / raw)
To: jarkao2
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 14:49:53 +0100
> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> I wonder what is the main reason or gain of leaving such a tricky
> exception, instead of letting drivers to always decide which is the
> best moment for napi_complete()? (Or maybe even, in such a case, they
> should call some function with this list_move_tail() if it's so
> useful?)
It is the only sane way to synchronize the list manipulations.
There has to be a way for ->poll() to tell net_rx_action() two things:
1) How much work was completed, so we can adjust 'budget'
2) Was the NAPI quota exhausted? So that we know that
net_rx_action() still "owns" the polling context and
thus can do the list manipulation safely.
And these both need to be encoded into one single return value, thus
the adopted convention that "work == weight" means that the device has
not done a NAPI complete.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 13:50 ` David Miller
@ 2007-12-13 14:14 ` Jarek Poplawski
2007-12-13 20:16 ` Jarek Poplawski
1 sibling, 0 replies; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 14:14 UTC (permalink / raw)
To: David Miller
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
>
> > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> > I wonder what is the main reason or gain of leaving such a tricky
> > exception, instead of letting drivers to always decide which is the
> > best moment for napi_complete()? (Or maybe even, in such a case, they
> > should call some function with this list_move_tail() if it's so
> > useful?)
>
> It is the only sane way to synchronize the list manipulations.
>
> There has to be a way for ->poll() to tell net_rx_action() two things:
>
> 1) How much work was completed, so we can adjust 'budget'
> 2) Was the NAPI quota exhausted? So that we know that
> net_rx_action() still "owns" the polling context and
> thus can do the list manipulation safely.
>
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.
Thanks! So, I've to rethink this all...
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 13:50 ` David Miller
2007-12-13 14:14 ` Jarek Poplawski
@ 2007-12-13 20:16 ` Jarek Poplawski
2007-12-13 20:37 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 20:16 UTC (permalink / raw)
To: David Miller
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
David Miller wrote, On 12/13/2007 02:50 PM:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
>
>> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
>> I wonder what is the main reason or gain of leaving such a tricky
>> exception, instead of letting drivers to always decide which is the
>> best moment for napi_complete()? (Or maybe even, in such a case, they
>> should call some function with this list_move_tail() if it's so
>> useful?)
>
> It is the only sane way to synchronize the list manipulations.
>
> There has to be a way for ->poll() to tell net_rx_action() two things:
>
> 1) How much work was completed, so we can adjust 'budget'
The 'budget' line would stay where it is. IMHO, it's only about this
list_move_tail(). (Probably also doing netpoll_poll_unlock()
during n->poll() could be considered to let the driver even destroy
napi just after napi_complete() - but it's another subject.)
> 2) Was the NAPI quota exhausted? So that we know that
> net_rx_action() still "owns" the polling context and
> thus can do the list manipulation safely.
>
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.
Of course, with some care and explanations to driver maintainers, like in
this case, this all should probably work like it is. But IMHO it would be
easier to remember and maintain if there are some simple rules with no
exceptions, so here e.g. driver always "owns" (with functions like
napi_schedule(), napi_complete(), and maybe napi_move_tail()), and
net_rx_action() only reads the list and runs these functions?!
I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 20:16 ` Jarek Poplawski
@ 2007-12-13 20:37 ` David Miller
2007-12-13 20:41 ` Stephen Hemminger
2007-12-13 22:28 ` Jarek Poplawski
0 siblings, 2 replies; 41+ messages in thread
From: David Miller @ 2007-12-13 20:37 UTC (permalink / raw)
To: jarkao2
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 21:16:12 +0100
> I see in a nearby thread you would prefer to save some work to drivers
> (like this netif_running() check), but I think this all is at the cost
> of flexibility, and there will probably appear new problems, when a
> driver simply can't wait till the next poll (which btw. looks strange
> with all these hotplugging, usb and powersaving).
As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.
And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/
So we don't want to export datastructure details like that to the
driver.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 20:37 ` David Miller
@ 2007-12-13 20:41 ` Stephen Hemminger
2007-12-13 21:55 ` Jarek Poplawski
2007-12-13 22:28 ` Jarek Poplawski
1 sibling, 1 reply; 41+ messages in thread
From: Stephen Hemminger @ 2007-12-13 20:41 UTC (permalink / raw)
To: David Miller
Cc: jarkao2, auke-jan.h.kok, gallatin, joonwpark81, netdev,
linux-kernel, jgarzik, jesse.brandeburg
David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 21:16:12 +0100
>
>
>> I see in a nearby thread you would prefer to save some work to drivers
>> (like this netif_running() check), but I think this all is at the cost
>> of flexibility, and there will probably appear new problems, when a
>> driver simply can't wait till the next poll (which btw. looks strange
>> with all these hotplugging, usb and powersaving).
>>
>
> As someone who has actually had to edit the NAPI support of _EVERY_
> single driver in the tree I can tell you that code duplication and
> subtle semantic differences are a huge issue.
>
> And when you talk about driver flexibility, it's wise to mention that
> this comes at the expense of flexibility in the core implmentation.
> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/
>
> So we don't want to export datastructure details like that to the
> driver.
>
Also, most of the drivers should/could be doing the same thing. It is
seems that
driver writers just want to get "creative" and do things differently.
The code is
cleaner, safer, and less buggy if every device uses the interface in the
same way.
When I did the initial pass on this, I didn't see a single variation on
NAPI usage
that was better than the simple "get N packets and return" variation.
But Dave
did way more detailed grunt work on this.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 20:41 ` Stephen Hemminger
@ 2007-12-13 21:55 ` Jarek Poplawski
0 siblings, 0 replies; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 21:55 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Miller, auke-jan.h.kok, gallatin, joonwpark81, netdev,
linux-kernel, jgarzik, jesse.brandeburg
Stephen Hemminger wrote, On 12/13/2007 09:41 PM:
> David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 13 Dec 2007 21:16:12 +0100
>>
>>
>>> I see in a nearby thread you would prefer to save some work to drivers
>>> (like this netif_running() check), but I think this all is at the cost
>>> of flexibility, and there will probably appear new problems, when a
>>> driver simply can't wait till the next poll (which btw. looks strange
>>> with all these hotplugging, usb and powersaving).
>>>
>> As someone who has actually had to edit the NAPI support of _EVERY_
>> single driver in the tree I can tell you that code duplication and
>> subtle semantic differences are a huge issue.
>>
>> And when you talk about driver flexibility, it's wise to mention that
>> this comes at the expense of flexibility in the core implmentation.
>> For example, if we export the list handling widget into the ->poll()
>> routines, god help the person who wants to change how the poll list is
>> managed in net_rx_action() :-/
>>
>> So we don't want to export datastructure details like that to the
>> driver.
(I hope you both don't mind I save some 'paper' and do this
2 in 1...)
So, you've seen a few drivers, know this much better than me, and
maybe even thought why they all so unnecessarily different... Of
course, if you think that despite those differences they all can
work with simpler napi api then OK (until they don't have to do
any cheating, like with this 'work' here).
> Also, most of the drivers should/could be doing the same thing. It is
> seems that
> driver writers just want to get "creative" and do things differently.
> The code is
> cleaner, safer, and less buggy if every device uses the interface in the
> same way.
>
> When I did the initial pass on this, I didn't see a single variation on
> NAPI usage
> that was better than the simple "get N packets and return" variation.
> But Dave
> did way more detailed grunt work on this.
It seems there are some differences in thinking what is simple/complex.
I think drivers' developers are used to controlling their devices, so
they know better when to turn on/off interrupts. So, maybe similar model
could be appropriate here. Sometimes doing more looks simpler than doing
less and remembering how and when the rest will be done (like
this netif_running() test). But I hope I'm wrong here, and this will
work after all!
Cheers,
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 20:37 ` David Miller
2007-12-13 20:41 ` Stephen Hemminger
@ 2007-12-13 22:28 ` Jarek Poplawski
2007-12-13 22:34 ` David Miller
1 sibling, 1 reply; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 22:28 UTC (permalink / raw)
To: David Miller
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
David Miller wrote, On 12/13/2007 09:37 PM:
...
> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/
...I'm afraid I can't understand: I mean doing the same but without
passing this info with 'work == weight': if driver sends this info,
why it can't instead call something like napi_continue() with
this list_move_tail() (and probably additional local_irq_disable()/
enble() - but since it's unlikely()?) which looks much more readable,
and saves one whole unlikely if ()?
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 22:28 ` Jarek Poplawski
@ 2007-12-13 22:34 ` David Miller
2007-12-13 22:58 ` Jarek Poplawski
0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2007-12-13 22:34 UTC (permalink / raw)
To: jarkao2
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 23:28:41 +0100
> ...I'm afraid I can't understand: I mean doing the same but without
> passing this info with 'work == weight': if driver sends this info,
> why it can't instead call something like napi_continue() with
> this list_move_tail() (and probably additional local_irq_disable()/
> enble() - but since it's unlikely()?) which looks much more readable,
> and saves one whole unlikely if ()?
Because the poll list is private to net_rx_action() and we don't
want to expose implementation details like that to every
->poll() implementation.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-13 22:34 ` David Miller
@ 2007-12-13 22:58 ` Jarek Poplawski
0 siblings, 0 replies; 41+ messages in thread
From: Jarek Poplawski @ 2007-12-13 22:58 UTC (permalink / raw)
To: David Miller
Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
jgarzik, shemminger, jesse.brandeburg
David Miller wrote, On 12/13/2007 11:34 PM:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 23:28:41 +0100
>
>> ...I'm afraid I can't understand: I mean doing the same but without
>> passing this info with 'work == weight': if driver sends this info,
>> why it can't instead call something like napi_continue() with
>> this list_move_tail() (and probably additional local_irq_disable()/
>> enble() - but since it's unlikely()?) which looks much more readable,
>> and saves one whole unlikely if ()?
>
> Because the poll list is private to net_rx_action() and we don't
> want to expose implementation details like that to every
> ->poll() implementation.
So, it seems 'we' failed e.g. exposing napi_complete()...
OK, no offense, I'll only mention at the end that there is
always a possibility to redefine such a function to {} with any
change of implementation.
Jarek P.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-12 17:38 ` David Miller
2007-12-12 17:40 ` Andrew Gallatin
2007-12-12 18:41 ` Kok, Auke
@ 2007-12-20 9:52 ` Robert Olsson
2007-12-20 11:22 ` David Miller
2 siblings, 1 reply; 41+ messages in thread
From: Robert Olsson @ 2007-12-20 9:52 UTC (permalink / raw)
To: David Miller
Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger
David Miller writes:
> > Is the netif_running() check even required?
>
> No, it is not.
>
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.
Hello!
Yes but the reason was not to wait for all pending polls to
complete so a server/router could be rebooted even under high-
load and DOS. We've experienced some nasty problems with this.
Cheers.
--ro
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [RFC] net: napi fix
2007-12-20 9:52 ` Robert Olsson
@ 2007-12-20 11:22 ` David Miller
0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2007-12-20 11:22 UTC (permalink / raw)
To: Robert.Olsson
Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger
From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Thu, 20 Dec 2007 10:52:17 +0100
> Yes but the reason was not to wait for all pending polls to
> complete so a server/router could be rebooted even under high-
> load and DOS. We've experienced some nasty problems with this.
I know, see the rest of the thread where I agree that
we need to deal with this somehow.
The device is marked down first, and somehow we need to
tip off of that to break out of the NAPI loop. This
"how" is what hasn't been resolved yet.
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2007-12-20 11:22 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-12 4:01 [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12 5:39 ` Stephen Hemminger
2007-12-12 5:46 ` [RFC] net: napi fix Stephen Hemminger
2007-12-12 6:05 ` Joonwoo Park
2007-12-12 15:22 ` David Miller
2007-12-12 15:21 ` David Miller
2007-12-12 5:48 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12 5:53 ` Stephen Hemminger
2007-12-12 15:20 ` [PATCH 6/7] [NETDEV]: " David Miller
2007-12-12 16:36 ` Stephen Hemminger
2007-12-12 15:18 ` [PATCH 6/7] : " David Miller
2007-12-12 21:58 ` [RFT] tehuti: napi fix Stephen Hemminger
2007-12-16 21:38 ` David Miller
2007-12-17 20:18 ` Stephen Hemminger
2007-12-13 7:07 ` [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
-- strict thread matches above, loose matches on Subject: below --
2007-12-12 17:29 [RFC] net: napi fix Andrew Gallatin
2007-12-12 17:38 ` David Miller
2007-12-12 17:40 ` Andrew Gallatin
2007-12-12 18:41 ` Kok, Auke
2007-12-13 7:41 ` Joonwoo Park
2007-12-13 14:13 ` Andrew Gallatin
2007-12-13 14:19 ` David Miller
2007-12-13 16:45 ` Kok, Auke
2007-12-13 18:22 ` Stephen Hemminger
2007-12-13 19:02 ` Andrew Gallatin
2007-12-13 19:09 ` David Miller
2007-12-13 19:35 ` Stephen Hemminger
2007-12-13 20:38 ` David Miller
2007-12-14 2:06 ` Joonwoo Park
2007-12-13 13:49 ` Jarek Poplawski
2007-12-13 13:50 ` David Miller
2007-12-13 14:14 ` Jarek Poplawski
2007-12-13 20:16 ` Jarek Poplawski
2007-12-13 20:37 ` David Miller
2007-12-13 20:41 ` Stephen Hemminger
2007-12-13 21:55 ` Jarek Poplawski
2007-12-13 22:28 ` Jarek Poplawski
2007-12-13 22:34 ` David Miller
2007-12-13 22:58 ` Jarek Poplawski
2007-12-20 9:52 ` Robert Olsson
2007-12-20 11:22 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox