Re: [RFC] net: napi fix

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] net: napi fix
@ 2007-12-12 17:29 Andrew Gallatin
  2007-12-12 17:38 ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Gallatin @ 2007-12-12 17:29 UTC (permalink / raw)
  To: David Miller
  Cc: joonwpark81, netdev, linux-kernel, jgarzik, Stephen Hemminger

[I apologize for loosing threading, I'm replying from the archives]

 > The problem is that the driver is doing a NAPI completion and
 > re-enabling chip interrupts with work_done == weight, and that is
 > illegal.

The only time at least myri10ge will do this is due to
the !netif_running(netdev) check.   Eg, from myri10ge's poll:

	work_done = myri10ge_clean_rx_done(mgp, budget);

	if (work_done < budget || !netif_running(netdev)) {
		netif_rx_complete(netdev, napi);
		put_be32(htonl(3), mgp->irq_claim);
	}

Is the netif_running() check even required? Is this just
a bad way to solve a race with running NAPI at down() time
that would be better solved by putting a napi_synchronize()
in the driver's down() routine?

I'd rather fix this right than add another check to a
questionable code path.

Thanks,

Drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 17:29 [RFC] net: napi fix Andrew Gallatin
@ 2007-12-12 17:38 ` David Miller
  2007-12-12 17:40   ` Andrew Gallatin
                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: David Miller @ 2007-12-12 17:38 UTC (permalink / raw)
  To: gallatin; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, shemminger

From: Andrew Gallatin <gallatin@myri.com>
Date: Wed, 12 Dec 2007 12:29:23 -0500

> Is the netif_running() check even required?

No, it is not.

When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 17:38 ` David Miller
@ 2007-12-12 17:40   ` Andrew Gallatin
  2007-12-12 18:41   ` Kok, Auke
  2007-12-20  9:52   ` Robert Olsson
  2 siblings, 0 replies; 30+ messages in thread
From: Andrew Gallatin @ 2007-12-12 17:40 UTC (permalink / raw)
  To: David Miller; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, shemminger

David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Wed, 12 Dec 2007 12:29:23 -0500
> 
>> Is the netif_running() check even required?
> 
> No, it is not.
> 
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.

Great, thanks.  I will submit a patch to remove the bogus
check.  This should fix myri10ge properly.


Thank you,

Drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 17:38 ` David Miller
  2007-12-12 17:40   ` Andrew Gallatin
@ 2007-12-12 18:41   ` Kok, Auke
  2007-12-13  7:41     ` Joonwoo Park
  2007-12-13 13:49     ` Jarek Poplawski
  2007-12-20  9:52   ` Robert Olsson
  2 siblings, 2 replies; 30+ messages in thread
From: Kok, Auke @ 2007-12-12 18:41 UTC (permalink / raw)
  To: David Miller
  Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger,
	Jesse Brandeburg

David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Wed, 12 Dec 2007 12:29:23 -0500
> 
>> Is the netif_running() check even required?
> 
> No, it is not.
> 
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.

I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
has this check as well and that's exactly what is causing most of the
net_rx_action oopses in the first place. Without the netif_running() check
previously the drivers were just unusable with NAPI and prone to many races with
down (i.e. touching some ethtool ioctl which wants to do a reset while routing
small packets at high numbers). that's why we added the netif_running() check in
the first place :)

There might be more drivers lurking that need this change...

Auke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 18:41   ` Kok, Auke
@ 2007-12-13  7:41     ` Joonwoo Park
  2007-12-13 14:13       ` Andrew Gallatin
  2007-12-13 13:49     ` Jarek Poplawski
  1 sibling, 1 reply; 30+ messages in thread
From: Joonwoo Park @ 2007-12-13  7:41 UTC (permalink / raw)
  To: Kok, Auke
  Cc: David Miller, gallatin, netdev, linux-kernel, jgarzik, shemminger,
	Jesse Brandeburg

2007/12/13, Kok, Auke <auke-jan.h.kok@intel.com>:
> David Miller wrote:
> > From: Andrew Gallatin <gallatin@myri.com>
> > Date: Wed, 12 Dec 2007 12:29:23 -0500
> >
> >> Is the netif_running() check even required?
> >
> > No, it is not.
> >
> > When a device is brought down, one of the first things
> > that happens is that we wait for all pending NAPI polls
> > to complete, then block any new polls from starting.
>
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check in
> the first place :)
>
> There might be more drivers lurking that need this change...
>
> Auke
>

Also in my case, without netif_running() check, I cannot do ifconfig down.
It stucked if packet generator was sending packets.

Thanks
Joonwoo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13  7:41     ` Joonwoo Park
@ 2007-12-13 14:13       ` Andrew Gallatin
  2007-12-13 14:19         ` David Miller
  2007-12-14  2:06         ` Joonwoo Park
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Gallatin @ 2007-12-13 14:13 UTC (permalink / raw)
  To: Joonwoo Park
  Cc: Kok, Auke, David Miller, netdev, linux-kernel, jgarzik,
	shemminger, Jesse Brandeburg

Joonwoo Park wrote:
 > 2007/12/13, Kok, Auke <auke-jan.h.kok@intel.com>:
 >> David Miller wrote:
 >>> From: Andrew Gallatin <gallatin@myri.com>
 >>> Date: Wed, 12 Dec 2007 12:29:23 -0500
 >>>
 >>>> Is the netif_running() check even required?
 >>> No, it is not.
 >>>
 >>> When a device is brought down, one of the first things
 >>> that happens is that we wait for all pending NAPI polls
 >>> to complete, then block any new polls from starting.
 >> I think this was previously (pre-2.6.24) not the case, which is why 
e1000 et al
 >> has this check as well and that's exactly what is causing most of the
 >> net_rx_action oopses in the first place. Without the netif_running() 
check
 >> previously the drivers were just unusable with NAPI and prone to 
many races with
 >> down (i.e. touching some ethtool ioctl which wants to do a reset 
while routing
 >> small packets at high numbers). that's why we added the 
netif_running() check in
 >> the first place :)
 >>
 >> There might be more drivers lurking that need this change...
 >>
 >> Auke
 >>
 >
 > Also in my case, without netif_running() check, I cannot do ifconfig 
down.
 > It stucked if packet generator was sending packets.

If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

Drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 14:13       ` Andrew Gallatin
@ 2007-12-13 14:19         ` David Miller
  2007-12-13 16:45           ` Kok, Auke
  2007-12-13 18:22           ` Stephen Hemminger
  2007-12-14  2:06         ` Joonwoo Park
  1 sibling, 2 replies; 30+ messages in thread
From: David Miller @ 2007-12-13 14:19 UTC (permalink / raw)
  To: gallatin
  Cc: joonwpark81, auke-jan.h.kok, netdev, linux-kernel, jgarzik,
	shemminger, jesse.brandeburg

From: Andrew Gallatin <gallatin@myri.com>
Date: Thu, 13 Dec 2007 09:13:54 -0500

> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 14:19         ` David Miller
@ 2007-12-13 16:45           ` Kok, Auke
  2007-12-13 18:22           ` Stephen Hemminger
  1 sibling, 0 replies; 30+ messages in thread
From: Kok, Auke @ 2007-12-13 16:45 UTC (permalink / raw)
  To: David Miller
  Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger,
	jesse.brandeburg

David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
>> If the netif_running() check is indeed required to make a device break
>> out of napi polling and respond to an ifconfig down, then I think the
>> netif_running() check should be moved up into net_rx_action() to avoid
>> potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)

hear hear!

Auke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 14:19         ` David Miller
  2007-12-13 16:45           ` Kok, Auke
@ 2007-12-13 18:22           ` Stephen Hemminger
  2007-12-13 19:02             ` Andrew Gallatin
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-12-13 18:22 UTC (permalink / raw)
  To: David Miller
  Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
	jgarzik, jesse.brandeburg

On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
> > If the netif_running() check is indeed required to make a device break
> > out of napi polling and respond to an ifconfig down, then I think the
> > netif_running() check should be moved up into net_rx_action() to avoid
> > potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

The existing pointer from napi to netdevice is only used by netconsole
now. For devices like sky2 it means that netconsole can't work on the the
second port which is a not a big problem. But adding a netif_running()
check would be a big issue.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 18:22           ` Stephen Hemminger
@ 2007-12-13 19:02             ` Andrew Gallatin
  2007-12-13 19:09               ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Gallatin @ 2007-12-13 19:02 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
	jgarzik, jesse.brandeburg

Stephen Hemminger wrote:
> On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Andrew Gallatin <gallatin@myri.com>
>> Date: Thu, 13 Dec 2007 09:13:54 -0500
>>
>>> If the netif_running() check is indeed required to make a device break
>>> out of napi polling and respond to an ifconfig down, then I think the
>>> netif_running() check should be moved up into net_rx_action() to avoid
>>> potential for driver complexity and bugs like the ones you found.
>> That, or something like it, definitely sounds reasonable and much
>> better than putting the check into every driver :-)
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> It is not possible to do netif_running() check in generic code as currently
> written because of the case of devices where a single NAPI object is
> being used to handle two devices. The association between napi and netdevice
> is M to N.  There are cases like niu that have multiple NAPI's and one
> netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

Ah, now I see.  I forgot that not every device has a 1:1::napi:netdev
relationship.

Could we make an optional *dev_state field in the napi structure.
It would be initialized to __LINK_STATE_START.  Devices which have
a 1:1 NAPI:netdevice relationship would set it to &netdev->state.
The generic code would then do a test_bit(__LINK_STATE_START, 
napi->dev_state), and 1:1 drivers could remove this check.
M:N drivers would pay for a useless (to them) test_bit, and would
have to provide their own netif_running check to get termination
under heavy load.

Just an idea, perhaps there is a better way which is less hacky.

Or perhaps we should just leave things as is.

Drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 19:02             ` Andrew Gallatin
@ 2007-12-13 19:09               ` David Miller
  2007-12-13 19:35                 ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2007-12-13 19:09 UTC (permalink / raw)
  To: gallatin
  Cc: shemminger, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
	jgarzik, jesse.brandeburg

From: Andrew Gallatin <gallatin@myri.com>
Date: Thu, 13 Dec 2007 14:02:25 -0500

> Or perhaps we should just leave things as is.

We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 19:09               ` David Miller
@ 2007-12-13 19:35                 ` Stephen Hemminger
  2007-12-13 20:38                   ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-12-13 19:35 UTC (permalink / raw)
  To: David Miller
  Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
	jgarzik, jesse.brandeburg

David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Thu, 13 Dec 2007 14:02:25 -0500
>
>   
>> Or perhaps we should just leave things as is.
>>     
>
> We should probably add a "disabling" state bit to the
> napi struct flags, this will be set by napi_disable()
> before it loops trying to set the sched bit.
>
> net_rx_action() can then check this.
>   
How about allowing a return value of -1 from napi_poll and letting device
check itself.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 19:35                 ` Stephen Hemminger
@ 2007-12-13 20:38                   ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-12-13 20:38 UTC (permalink / raw)
  To: shemminger
  Cc: gallatin, joonwpark81, auke-jan.h.kok, netdev, linux-kernel,
	jgarzik, jesse.brandeburg

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Thu, 13 Dec 2007 11:35:07 -0800

> How about allowing a return value of -1 from napi_poll and letting
> device check itself.

It doesn't avoid the code duplication in the ->poll() fast paths.

I don't care, on the other hand, if crap accumulates in non-critical
slow paths like napi_disable() and dev_close().  That's why I'm
suggesting solutions in that area.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 14:13       ` Andrew Gallatin
  2007-12-13 14:19         ` David Miller
@ 2007-12-14  2:06         ` Joonwoo Park
  1 sibling, 0 replies; 30+ messages in thread
From: Joonwoo Park @ 2007-12-14  2:06 UTC (permalink / raw)
  To: Andrew Gallatin
  Cc: Kok, Auke, David Miller, netdev, linux-kernel, jgarzik,
	shemminger, Jesse Brandeburg

2007/12/13, Andrew Gallatin <gallatin@myri.com>:
>
> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.
>
> Drew
>

Yep, It looks good.

Joonwoo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 18:41   ` Kok, Auke
  2007-12-13  7:41     ` Joonwoo Park
@ 2007-12-13 13:49     ` Jarek Poplawski
  2007-12-13 13:50       ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 13:49 UTC (permalink / raw)
  To: Kok, Auke
  Cc: David Miller, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, Jesse Brandeburg

On 12-12-2007 19:41, Kok, Auke wrote:
> David Miller wrote:
>> From: Andrew Gallatin <gallatin@myri.com>
>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>
>>> Is the netif_running() check even required?
>> No, it is not.
>>
>> When a device is brought down, one of the first things
>> that happens is that we wait for all pending NAPI polls
>> to complete, then block any new polls from starting.
> 
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check in
> the first place :)
> 
> There might be more drivers lurking that need this change...
> 

As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
I wonder what is the main reason or gain of leaving such a tricky
exception, instead of letting drivers to always decide which is the
best moment for napi_complete()? (Or maybe even, in such a case, they
should call some function with this list_move_tail() if it's so
useful?)

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 13:49     ` Jarek Poplawski
@ 2007-12-13 13:50       ` David Miller
  2007-12-13 14:14         ` Jarek Poplawski
  2007-12-13 20:16         ` Jarek Poplawski
  0 siblings, 2 replies; 30+ messages in thread
From: David Miller @ 2007-12-13 13:50 UTC (permalink / raw)
  To: jarkao2
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 14:49:53 +0100

> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> I wonder what is the main reason or gain of leaving such a tricky
> exception, instead of letting drivers to always decide which is the
> best moment for napi_complete()? (Or maybe even, in such a case, they
> should call some function with this list_move_tail() if it's so
> useful?)

It is the only sane way to synchronize the list manipulations.

There has to be a way for ->poll() to tell net_rx_action() two things:

1) How much work was completed, so we can adjust 'budget'
2) Was the NAPI quota exhausted?  So that we know that
   net_rx_action() still "owns" the polling context and
   thus can do the list manipulation safely.

And these both need to be encoded into one single return value, thus
the adopted convention that "work == weight" means that the device has
not done a NAPI complete.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 13:50       ` David Miller
@ 2007-12-13 14:14         ` Jarek Poplawski
  2007-12-13 20:16         ` Jarek Poplawski
  1 sibling, 0 replies; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 14:14 UTC (permalink / raw)
  To: David Miller
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
> > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> > I wonder what is the main reason or gain of leaving such a tricky
> > exception, instead of letting drivers to always decide which is the
> > best moment for napi_complete()? (Or maybe even, in such a case, they
> > should call some function with this list_move_tail() if it's so
> > useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'
> 2) Was the NAPI quota exhausted?  So that we know that
>    net_rx_action() still "owns" the polling context and
>    thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Thanks! So, I've to rethink this all...

Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 13:50       ` David Miller
  2007-12-13 14:14         ` Jarek Poplawski
@ 2007-12-13 20:16         ` Jarek Poplawski
  2007-12-13 20:37           ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 20:16 UTC (permalink / raw)
  To: David Miller
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

David Miller wrote, On 12/13/2007 02:50 PM:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
>> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
>> I wonder what is the main reason or gain of leaving such a tricky
>> exception, instead of letting drivers to always decide which is the
>> best moment for napi_complete()? (Or maybe even, in such a case, they
>> should call some function with this list_move_tail() if it's so
>> useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'

The 'budget' line would stay where it is. IMHO, it's only about this
list_move_tail(). (Probably also doing netpoll_poll_unlock()
during n->poll() could be considered to let the driver even destroy
napi just after napi_complete() - but it's another subject.)

> 2) Was the NAPI quota exhausted?  So that we know that
>    net_rx_action() still "owns" the polling context and
>    thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Of course, with some care and explanations to driver maintainers, like in
this case, this all should probably work like it is. But IMHO it would be
easier to remember and maintain if there are some simple rules with no
exceptions, so here e.g. driver always "owns" (with functions like
napi_schedule(), napi_complete(), and maybe napi_move_tail()), and
net_rx_action() only reads the list and runs these functions?!

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 20:16         ` Jarek Poplawski
@ 2007-12-13 20:37           ` David Miller
  2007-12-13 20:41             ` Stephen Hemminger
  2007-12-13 22:28             ` Jarek Poplawski
  0 siblings, 2 replies; 30+ messages in thread
From: David Miller @ 2007-12-13 20:37 UTC (permalink / raw)
  To: jarkao2
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 21:16:12 +0100

> I see in a nearby thread you would prefer to save some work to drivers
> (like this netif_running() check), but I think this all is at the cost
> of flexibility, and there will probably appear new problems, when a
> driver simply can't wait till the next poll (which btw. looks strange
> with all these hotplugging, usb and powersaving).

As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 20:37           ` David Miller
@ 2007-12-13 20:41             ` Stephen Hemminger
  2007-12-13 21:55               ` Jarek Poplawski
  2007-12-13 22:28             ` Jarek Poplawski
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-12-13 20:41 UTC (permalink / raw)
  To: David Miller
  Cc: jarkao2, auke-jan.h.kok, gallatin, joonwpark81, netdev,
	linux-kernel, jgarzik, jesse.brandeburg

David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 21:16:12 +0100
>
>   
>> I see in a nearby thread you would prefer to save some work to drivers
>> (like this netif_running() check), but I think this all is at the cost
>> of flexibility, and there will probably appear new problems, when a
>> driver simply can't wait till the next poll (which btw. looks strange
>> with all these hotplugging, usb and powersaving).
>>     
>
> As someone who has actually had to edit the NAPI support of _EVERY_
> single driver in the tree I can tell you that code duplication and
> subtle semantic differences are a huge issue.
>
> And when you talk about driver flexibility, it's wise to mention that
> this comes at the expense of flexibility in the core implmentation.
> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/
>
> So we don't want to export datastructure details like that to the
> driver.
>   
Also, most of the drivers should/could be doing the same thing. It is 
seems that
driver writers just want to get "creative" and do things differently. 
The code is
cleaner, safer, and less buggy if every device uses the interface in the 
same way.

When I did the initial pass on this, I didn't see a single variation on 
NAPI usage
that was better than the simple "get N packets and return" variation.  
But Dave
did way more detailed grunt work on this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 20:41             ` Stephen Hemminger
@ 2007-12-13 21:55               ` Jarek Poplawski
  0 siblings, 0 replies; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 21:55 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, auke-jan.h.kok, gallatin, joonwpark81, netdev,
	linux-kernel, jgarzik, jesse.brandeburg

Stephen Hemminger wrote, On 12/13/2007 09:41 PM:

> David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Thu, 13 Dec 2007 21:16:12 +0100
>>
>>   
>>> I see in a nearby thread you would prefer to save some work to drivers
>>> (like this netif_running() check), but I think this all is at the cost
>>> of flexibility, and there will probably appear new problems, when a
>>> driver simply can't wait till the next poll (which btw. looks strange
>>> with all these hotplugging, usb and powersaving).
>>>     
>> As someone who has actually had to edit the NAPI support of _EVERY_
>> single driver in the tree I can tell you that code duplication and
>> subtle semantic differences are a huge issue.
>>
>> And when you talk about driver flexibility, it's wise to mention that
>> this comes at the expense of flexibility in the core implmentation.
>> For example, if we export the list handling widget into the ->poll()
>> routines, god help the person who wants to change how the poll list is
>> managed in net_rx_action() :-/
>>
>> So we don't want to export datastructure details like that to the
>> driver.
   

(I hope you both don't mind I save some 'paper' and do this
2 in 1...)

So, you've seen a few drivers, know this much better than me, and
maybe even thought why they all so unnecessarily different... Of
course, if you think that despite those differences they all can
work with simpler napi api then OK (until they don't have to do
any cheating, like with this 'work' here).

> Also, most of the drivers should/could be doing the same thing. It is 
> seems that
> driver writers just want to get "creative" and do things differently. 
> The code is
> cleaner, safer, and less buggy if every device uses the interface in the 
> same way.
> 
> When I did the initial pass on this, I didn't see a single variation on 
> NAPI usage
> that was better than the simple "get N packets and return" variation.  
> But Dave
> did way more detailed grunt work on this.

It seems there are some differences in thinking what is simple/complex.
I think drivers' developers are used to controlling their devices, so
they know better when to turn on/off interrupts. So, maybe similar model
could be appropriate here. Sometimes doing more looks simpler than doing
less and remembering how and when the rest will be done (like
this netif_running() test). But I hope I'm wrong here, and this will
work after all!

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 20:37           ` David Miller
  2007-12-13 20:41             ` Stephen Hemminger
@ 2007-12-13 22:28             ` Jarek Poplawski
  2007-12-13 22:34               ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 22:28 UTC (permalink / raw)
  To: David Miller
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

David Miller wrote, On 12/13/2007 09:37 PM:
...

> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/

...I'm afraid I can't understand: I mean doing the same but without
passing this info with 'work == weight': if driver sends this info,
why it can't instead call something like napi_continue() with
this list_move_tail() (and probably additional local_irq_disable()/
enble() - but since it's unlikely()?) which looks much more readable,
and saves one whole unlikely if ()?

Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 22:28             ` Jarek Poplawski
@ 2007-12-13 22:34               ` David Miller
  2007-12-13 22:58                 ` Jarek Poplawski
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2007-12-13 22:34 UTC (permalink / raw)
  To: jarkao2
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 13 Dec 2007 23:28:41 +0100

> ...I'm afraid I can't understand: I mean doing the same but without
> passing this info with 'work == weight': if driver sends this info,
> why it can't instead call something like napi_continue() with
> this list_move_tail() (and probably additional local_irq_disable()/
> enble() - but since it's unlikely()?) which looks much more readable,
> and saves one whole unlikely if ()?

Because the poll list is private to net_rx_action() and we don't
want to expose implementation details like that to every
->poll() implementation.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-13 22:34               ` David Miller
@ 2007-12-13 22:58                 ` Jarek Poplawski
  0 siblings, 0 replies; 30+ messages in thread
From: Jarek Poplawski @ 2007-12-13 22:58 UTC (permalink / raw)
  To: David Miller
  Cc: auke-jan.h.kok, gallatin, joonwpark81, netdev, linux-kernel,
	jgarzik, shemminger, jesse.brandeburg

David Miller wrote, On 12/13/2007 11:34 PM:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 13 Dec 2007 23:28:41 +0100
> 
>> ...I'm afraid I can't understand: I mean doing the same but without
>> passing this info with 'work == weight': if driver sends this info,
>> why it can't instead call something like napi_continue() with
>> this list_move_tail() (and probably additional local_irq_disable()/
>> enble() - but since it's unlikely()?) which looks much more readable,
>> and saves one whole unlikely if ()?
> 
> Because the poll list is private to net_rx_action() and we don't
> want to expose implementation details like that to every
> ->poll() implementation.

So, it seems 'we' failed e.g. exposing napi_complete()...
OK, no offense, I'll only mention at the end that there is
always a possibility to redefine such a function to {} with any
change of implementation.

Jarek P.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12 17:38 ` David Miller
  2007-12-12 17:40   ` Andrew Gallatin
  2007-12-12 18:41   ` Kok, Auke
@ 2007-12-20  9:52   ` Robert Olsson
  2007-12-20 11:22     ` David Miller
  2 siblings, 1 reply; 30+ messages in thread
From: Robert Olsson @ 2007-12-20  9:52 UTC (permalink / raw)
  To: David Miller
  Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger


David Miller writes:

 > > Is the netif_running() check even required?
 > 
 > No, it is not.
 > 
 > When a device is brought down, one of the first things
 > that happens is that we wait for all pending NAPI polls
 > to complete, then block any new polls from starting.

 Hello!

 Yes but the reason was not to wait for all pending polls to
 complete so a server/router could be rebooted even under high-
 load and DOS. We've experienced some nasty problems with this.

 Cheers.
					--ro

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-20  9:52   ` Robert Olsson
@ 2007-12-20 11:22     ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-12-20 11:22 UTC (permalink / raw)
  To: Robert.Olsson
  Cc: gallatin, joonwpark81, netdev, linux-kernel, jgarzik, shemminger

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Thu, 20 Dec 2007 10:52:17 +0100

>  Yes but the reason was not to wait for all pending polls to
>  complete so a server/router could be rebooted even under high-
>  load and DOS. We've experienced some nasty problems with this.

I know, see the rest of the thread where I agree that
we need to deal with this somehow.

The device is marked down first, and somehow we need to
tip off of that to break out of the NAPI loop.  This
"how" is what hasn't been resolved yet.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
@ 2007-12-12  4:01 Joonwoo Park
  2007-12-12  5:39 ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: Joonwoo Park @ 2007-12-12  4:01 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: jgarzik, baum, andy

[NETDEV]: tehuti Fix possible causing oops of net_rx_action

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
---
 drivers/net/tehuti.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index 21230c9..955e749 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
 
 		netif_rx_complete(dev, napi);
 		bdx_enable_interrupts(priv);
+		if (unlikely(work_done == napi->weight))
+			return work_done - 1;
 	}
 	return work_done;
 }
---


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action
  2007-12-12  4:01 [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
@ 2007-12-12  5:39 ` Stephen Hemminger
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-12-12  5:39 UTC (permalink / raw)
  To: Joonwoo Park; +Cc: netdev, linux-kernel, jgarzik, baum, andy

On Wed, 12 Dec 2007 13:01:27 +0900
"Joonwoo Park" <joonwpark81@gmail.com> wrote:

> [NETDEV]: tehuti Fix possible causing oops of net_rx_action
> 
> Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
> ---
>  drivers/net/tehuti.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
> index 21230c9..955e749 100644
> --- a/drivers/net/tehuti.c
> +++ b/drivers/net/tehuti.c
> @@ -305,6 +305,8 @@ static int bdx_poll(struct napi_struct *napi, int budget)
>  
>  		netif_rx_complete(dev, napi);
>  		bdx_enable_interrupts(priv);
> +		if (unlikely(work_done == napi->weight))
> +			return work_done - 1;
>  	}
>  	return work_done;
>  }

A better fix would be not going over budget in the first place.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC] net: napi fix
  2007-12-12  5:39 ` Stephen Hemminger
@ 2007-12-12  5:46   ` Stephen Hemminger
  2007-12-12  6:05     ` Joonwoo Park
  2007-12-12 15:21     ` David Miller
  0 siblings, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-12-12  5:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: Joonwoo Park, netdev, linux-kernel, jgarzik, baum, andy

Isn't this a better fix for all drivers, rather than peppering every
driver with the special case. This is how the logic worked up until
2.6.24.


--- a/net/core/dev.c	2007-12-11 12:16:20.000000000 -0800
+++ b/net/core/dev.c	2007-12-11 21:43:39.000000000 -0800
@@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
 
 		have = netpoll_poll_lock(n);
 
-		weight = n->weight;
+		weight = min(n->weight, budget);
 
 		/* This NAPI_STATE_SCHED test is for avoiding a race
 		 * with netpoll's poll_napi().  Only the entity which

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
@ 2007-12-12  6:05     ` Joonwoo Park
  2007-12-12 15:22       ` David Miller
  2007-12-12 15:21     ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Joonwoo Park @ 2007-12-12  6:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, netdev, linux-kernel, jgarzik, baum, andy

2007/12/12, Stephen Hemminger <shemminger@linux-foundation.org>:
> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.
>
>
> --- a/net/core/dev.c    2007-12-11 12:16:20.000000000 -0800
> +++ b/net/core/dev.c    2007-12-11 21:43:39.000000000 -0800
> @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
>
>                have = netpoll_poll_lock(n);
>
> -               weight = n->weight;
> +               weight = min(n->weight, budget);
>
>                /* This NAPI_STATE_SCHED test is for avoiding a race
>                 * with netpoll's poll_napi().  Only the entity which
>

Stephen,
Could you explain how it fix the problem?
IMHO I think your patch cannot solve the problem.
The drivers can call netif_rx_complete and net_rx_action can do
list_move_tail also.
Am I missing something?

Thanks
Joonwoo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  6:05     ` Joonwoo Park
@ 2007-12-12 15:22       ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2007-12-12 15:22 UTC (permalink / raw)
  To: joonwpark81; +Cc: shemminger, netdev, linux-kernel, jgarzik, baum, andy

From: "Joonwoo Park" <joonwpark81@gmail.com>
Date: Wed, 12 Dec 2007 15:05:26 +0900

> Could you explain how it fix the problem?
> IMHO I think your patch cannot solve the problem.
> The drivers can call netif_rx_complete and net_rx_action can do
> list_move_tail also.

Stephen is confused about what the bug is in these drivers,
that's all.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] net: napi fix
  2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
  2007-12-12  6:05     ` Joonwoo Park
@ 2007-12-12 15:21     ` David Miller
  1 sibling, 0 replies; 30+ messages in thread
From: David Miller @ 2007-12-12 15:21 UTC (permalink / raw)
  To: shemminger; +Cc: joonwpark81, netdev, linux-kernel, jgarzik, baum, andy

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Tue, 11 Dec 2007 21:46:34 -0800

> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.

Stephen this is not the problem.

The problem is that the driver is doing a NAPI completion and
re-enabling chip interrupts with work_done == weight, and that is
illegal.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-12-20 11:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-12 17:29 [RFC] net: napi fix Andrew Gallatin
2007-12-12 17:38 ` David Miller
2007-12-12 17:40   ` Andrew Gallatin
2007-12-12 18:41   ` Kok, Auke
2007-12-13  7:41     ` Joonwoo Park
2007-12-13 14:13       ` Andrew Gallatin
2007-12-13 14:19         ` David Miller
2007-12-13 16:45           ` Kok, Auke
2007-12-13 18:22           ` Stephen Hemminger
2007-12-13 19:02             ` Andrew Gallatin
2007-12-13 19:09               ` David Miller
2007-12-13 19:35                 ` Stephen Hemminger
2007-12-13 20:38                   ` David Miller
2007-12-14  2:06         ` Joonwoo Park
2007-12-13 13:49     ` Jarek Poplawski
2007-12-13 13:50       ` David Miller
2007-12-13 14:14         ` Jarek Poplawski
2007-12-13 20:16         ` Jarek Poplawski
2007-12-13 20:37           ` David Miller
2007-12-13 20:41             ` Stephen Hemminger
2007-12-13 21:55               ` Jarek Poplawski
2007-12-13 22:28             ` Jarek Poplawski
2007-12-13 22:34               ` David Miller
2007-12-13 22:58                 ` Jarek Poplawski
2007-12-20  9:52   ` Robert Olsson
2007-12-20 11:22     ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2007-12-12  4:01 [PATCH 6/7] : tehuti Fix possible causing oops of net_rx_action Joonwoo Park
2007-12-12  5:39 ` Stephen Hemminger
2007-12-12  5:46   ` [RFC] net: napi fix Stephen Hemminger
2007-12-12  6:05     ` Joonwoo Park
2007-12-12 15:22       ` David Miller
2007-12-12 15:21     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).