* [PATCH 0/2] e1000: fixes for netpoll+NAPI, ARM
@ 2006-06-05 23:09 Kok, Auke
2006-06-05 23:11 ` [PATCH 1/2] e1000: fix netpoll with NAPI Kok, Auke
2006-06-05 23:11 ` [PATCH 2/2] e1000: remove risky prefetch on next_skb->data Kok, Auke
0 siblings, 2 replies; 35+ messages in thread
From: Kok, Auke @ 2006-06-05 23:09 UTC (permalink / raw)
To: Garzik, Jeff; +Cc: netdev, Brandeburg, Jesse, Kok, Auke, Kok, Auke
Hi,
This patch series implements two e1000 fixes:
1: fix netpoll with NAPI
2: fix ARM prefetch failure by removing risky prefetches
These changes are available through git:
git://lost.foo-projects.org/~ahkok/git/netdev-2.6 upstream-fixes
these patches are against
netdev-2.6#upstream-fixes c7812d8ccf1d11f29f752f53727ef6b55bc35150
---
Jeff:
The patch earlier sent regarding:
e1000: fix irq sharing when running ethtool test
is also still in this branch.
They are intended for jgarzik/netdev-2.6#upstream-fixes.
Cheers,
Auke
---
drivers/net/e1000/e1000_main.c | 17 ++++++++++-------
1 files changed, 10 insertions(+), 7 deletions(-)
--
Auke Kok <auke-jan.h.kok@intel.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-05 23:09 [PATCH 0/2] e1000: fixes for netpoll+NAPI, ARM Kok, Auke
@ 2006-06-05 23:11 ` Kok, Auke
2006-06-06 13:52 ` Neil Horman
2006-06-05 23:11 ` [PATCH 2/2] e1000: remove risky prefetch on next_skb->data Kok, Auke
1 sibling, 1 reply; 35+ messages in thread
From: Kok, Auke @ 2006-06-05 23:11 UTC (permalink / raw)
To: Garzik, Jeff; +Cc: netdev, Brandeburg, Jesse, Kok, Auke, Kok, Auke
Netpoll was broken due to the earlier addition of multiqueue.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---
drivers/net/e1000/e1000_main.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index ed15fca..7103a0e 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -4629,10 +4629,17 @@ static void
e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
+#ifdef CONFIG_E1000_NAPI
+ int budget = 0;
+
+ disable_irq(adapter->pdev->irq);
+ e1000_clean_tx_irq(adapter, adapter->tx_ring);
+ adapter->clean_rx(adapter, adapter->rx_ring, &budget, netdev->weight);
+#else
+
disable_irq(adapter->pdev->irq);
e1000_intr(adapter->pdev->irq, netdev, NULL);
e1000_clean_tx_irq(adapter, adapter->tx_ring);
-#ifndef CONFIG_E1000_NAPI
adapter->clean_rx(adapter, adapter->rx_ring);
#endif
enable_irq(adapter->pdev->irq);
--
Auke Kok <auke-jan.h.kok@intel.com>
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-05 23:09 [PATCH 0/2] e1000: fixes for netpoll+NAPI, ARM Kok, Auke
2006-06-05 23:11 ` [PATCH 1/2] e1000: fix netpoll with NAPI Kok, Auke
@ 2006-06-05 23:11 ` Kok, Auke
2006-06-05 23:21 ` Rick Jones
1 sibling, 1 reply; 35+ messages in thread
From: Kok, Auke @ 2006-06-05 23:11 UTC (permalink / raw)
To: Garzik, Jeff; +Cc: netdev, Brandeburg, Jesse, Kok, Auke, Kok, Auke
It was brought to our attention that the prefetches break e1000 traffic
on xscale/arm architectures. Remove them for now. We'll let them
stay in mm for a while, or find a better solution to enable.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---
drivers/net/e1000/e1000_main.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 7103a0e..aef616f 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3519,7 +3519,7 @@ e1000_clean_rx_irq(struct e1000_adapter
buffer_info = &rx_ring->buffer_info[i];
while (rx_desc->status & E1000_RXD_STAT_DD) {
- struct sk_buff *skb, *next_skb;
+ struct sk_buff *skb;
u8 status;
#ifdef CONFIG_E1000_NAPI
if (*work_done >= work_to_do)
@@ -3537,8 +3537,6 @@ e1000_clean_rx_irq(struct e1000_adapter
prefetch(next_rxd);
next_buffer = &rx_ring->buffer_info[i];
- next_skb = next_buffer->skb;
- prefetch(next_skb->data - NET_IP_ALIGN);
cleaned = TRUE;
cleaned_count++;
@@ -3668,7 +3666,7 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
struct e1000_buffer *buffer_info, *next_buffer;
struct e1000_ps_page *ps_page;
struct e1000_ps_page_dma *ps_page_dma;
- struct sk_buff *skb, *next_skb;
+ struct sk_buff *skb;
unsigned int i, j;
uint32_t length, staterr;
int cleaned_count = 0;
@@ -3697,8 +3695,6 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
prefetch(next_rxd);
next_buffer = &rx_ring->buffer_info[i];
- next_skb = next_buffer->skb;
- prefetch(next_skb->data - NET_IP_ALIGN);
cleaned = TRUE;
cleaned_count++;
--
Auke Kok <auke-jan.h.kok@intel.com>
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-05 23:11 ` [PATCH 2/2] e1000: remove risky prefetch on next_skb->data Kok, Auke
@ 2006-06-05 23:21 ` Rick Jones
2006-06-06 0:12 ` Brandeburg, Jesse
0 siblings, 1 reply; 35+ messages in thread
From: Rick Jones @ 2006-06-05 23:21 UTC (permalink / raw)
To: Kok, Auke; +Cc: Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke
Kok, Auke wrote:
> It was brought to our attention that the prefetches break e1000 traffic
> on xscale/arm architectures. Remove them for now. We'll let them
> stay in mm for a while, or find a better solution to enable.
Out of curiousity, what breaks?
rick jones
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-05 23:21 ` Rick Jones
@ 2006-06-06 0:12 ` Brandeburg, Jesse
2006-06-06 0:16 ` Rick Jones
0 siblings, 1 reply; 35+ messages in thread
From: Brandeburg, Jesse @ 2006-06-06 0:12 UTC (permalink / raw)
To: Rick Jones
Cc: Kok, Auke-jan H, Garzik, Jeff, netdev, Brandeburg, Jesse,
Kok, Auke
On Mon, 5 Jun 2006, Rick Jones wrote:
>
> Kok, Auke wrote:
> > It was brought to our attention that the prefetches break e1000 traffic
> > on xscale/arm architectures. Remove them for now. We'll let them
> > stay in mm for a while, or find a better solution to enable.
>
> Out of curiousity, what breaks?
Hi Rick, according to our reporter, receives break. The prefetch (not
always, but sometimes) lets the processor get junk from the prefetched
area. Apparently this version of arm doesn't quite do as strict
enforcement of bus snoops as x86, ia64, (and even pSeries!) does.
This manifested with a large drop in receive peformance using TCP,
probably because it was retransmitting frequently.
Jesse
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-06 0:12 ` Brandeburg, Jesse
@ 2006-06-06 0:16 ` Rick Jones
2006-06-06 0:22 ` Andi Kleen
2006-06-06 0:26 ` Brandeburg, Jesse
0 siblings, 2 replies; 35+ messages in thread
From: Rick Jones @ 2006-06-06 0:16 UTC (permalink / raw)
To: Brandeburg, Jesse; +Cc: Kok, Auke-jan H, Garzik, Jeff, netdev, Kok, Auke
Brandeburg, Jesse wrote:
> On Mon, 5 Jun 2006, Rick Jones wrote:
>>Kok, Auke wrote:
>>
>>>It was brought to our attention that the prefetches break e1000 traffic
>>>on xscale/arm architectures. Remove them for now. We'll let them
>>>stay in mm for a while, or find a better solution to enable.
>>
>>Out of curiousity, what breaks?
>
>
> Hi Rick, according to our reporter, receives break. The prefetch (not
> always, but sometimes) lets the processor get junk from the prefetched
> area. Apparently this version of arm doesn't quite do as strict
> enforcement of bus snoops as x86, ia64, (and even pSeries!) does.
Bear with me, I'm a software guy :) I interpret that to mean that the
processor is basically broken? If so, wouldn't it be the case that
prefetch() needs to become a noop on that processor?
> This manifested with a large drop in receive peformance using TCP,
> probably because it was retransmitting frequently.
I forget - what were the gains on the other CPUs?
rick
>
> Jesse
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-06 0:16 ` Rick Jones
@ 2006-06-06 0:22 ` Andi Kleen
2006-06-06 0:26 ` Brandeburg, Jesse
1 sibling, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2006-06-06 0:22 UTC (permalink / raw)
To: Rick Jones
Cc: Brandeburg, Jesse, Kok, Auke-jan H, Garzik, Jeff, netdev,
Kok, Auke
> Bear with me, I'm a software guy :) I interpret that to mean that the
> processor is basically broken? If so, wouldn't it be the case that
> prefetch() needs to become a noop on that processor?
I would agree with Rick - if prefetch() is broken on Xscale it should be disabled
in the architecture, not in individual drivers. Otherwise all other code
that use it might be broken too.
Maybe it's only broken on MMIO and not normal memory - in that
case introducing a mmio_prefetch() (and defining it as an nop on Xscale) would be also
an option.
-Andi
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 2/2] e1000: remove risky prefetch on next_skb->data
2006-06-06 0:16 ` Rick Jones
2006-06-06 0:22 ` Andi Kleen
@ 2006-06-06 0:26 ` Brandeburg, Jesse
1 sibling, 0 replies; 35+ messages in thread
From: Brandeburg, Jesse @ 2006-06-06 0:26 UTC (permalink / raw)
To: Rick Jones
Cc: Brandeburg, Jesse, Kok, Auke-jan H, Garzik, Jeff, netdev,
Kok, Auke
On Mon, 5 Jun 2006, Rick Jones wrote:
> Brandeburg, Jesse wrote:
> > Hi Rick, according to our reporter, receives break. The prefetch (not
> > always, but sometimes) lets the processor get junk from the prefetched
> > area. Apparently this version of arm doesn't quite do as strict
> > enforcement of bus snoops as x86, ia64, (and even pSeries!) does.
>
> Bear with me, I'm a software guy :) I interpret that to mean that the
> processor is basically broken? If so, wouldn't it be the case that
> prefetch() needs to become a noop on that processor?
For a software guy, you're making a large leap :-) I wouldn't say the
processor is broken, it is more sensitive to our (admittedly) bad behavior
when prefetching data for one descriptor *past* any that we currently
*know* are done. ARM/XSCALE is able to use prefetch all over the place in
its arch specific code with no problems (i'm relying on advice from
someone else on this, i haven't validated with mine own eyes)
Someone else had complained about this particular prefetch anyway (and it
was the most speculative with the least amount of gain) on the list
before, so given this information we're trying to take the conservative
route.
This patch was tested by the reporter and he was pleased with the result.
> > This manifested with a large drop in receive peformance using TCP,
> > probably because it was retransmitting frequently.
>
> I forget - what were the gains on the other CPUs?
One system (from a while back) I have numbers for showed a 10% increase in
packets per second that could be handled using netperf udp receive, with
the prefetch code in place.
And to Andi's make that came in while I was typing this, I reiterate I do
not believe ARM/XSCALE prefetch to be broken.
Jesse
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-05 23:11 ` [PATCH 1/2] e1000: fix netpoll with NAPI Kok, Auke
@ 2006-06-06 13:52 ` Neil Horman
2006-06-06 16:39 ` Mitch Williams
0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-06 13:52 UTC (permalink / raw)
To: Kok, Auke; +Cc: Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke, jmoyer, mpm
On Mon, Jun 05, 2006 at 04:11:25PM -0700, Kok, Auke wrote:
>
> Netpoll was broken due to the earlier addition of multiqueue.
>
> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---
>
> drivers/net/e1000/e1000_main.c | 9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index ed15fca..7103a0e 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -4629,10 +4629,17 @@ static void
> e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
> +#ifdef CONFIG_E1000_NAPI
> + int budget = 0;
> +
> + disable_irq(adapter->pdev->irq);
> + e1000_clean_tx_irq(adapter, adapter->tx_ring);
> + adapter->clean_rx(adapter, adapter->rx_ring, &budget, netdev->weight);
> +#else
> +
I've been speaking about this fix with a Jeff Moyer, and we've come up with some
concerns regarding its implementation. Specifically the call to
adapter->clean_rx in the case of the e1000 driver is rather a layering
violation in the netpoll code, in the sense that the function pointed to by clean_rx
is functionality that is nominally used by the dev->poll method. In fact in
this case, it would appear possible since dev->poll is called under the
poll_lock, but dev->poll_controller is not, that is is possible to have cpus in
a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
corruption:
CPU0:
netpoll_poll_dev
dev->poll_controller (e1000_netpoll)
adapter->clean_rx (e1000_clean_rx_irq)
CPU1:
napi_poll
dev->poll (e1000_clean)
e1000_clean_rx_irq
I'm not sure what the right fix is here. A spinlock in e1000_clean_rx_irq[_ps]
would seem to be an easy and direct fix, but I don't know that thats the best
solution.
Something I don't understand is why the call to adapter->clean_rx is
required in the first place when the napi_poll routine calls it itself directly.
All other drivers schedule a napi poll and receive frames via that path. My
guess is that it has to do with the fact that we schedule polls on the device in
the polling_netdev array, rather than the actual registered netdev. Looking at
the driver code I note that while an entire array is allocated for polling
netdevs, we only ever use entry 0. Would it make sense to just remove the the
polling_netdev array and use the registered device like all the other drivers.
I expect that would eliminate the need for this patch as well.
Regards
Neil
> disable_irq(adapter->pdev->irq);
> e1000_intr(adapter->pdev->irq, netdev, NULL);
> e1000_clean_tx_irq(adapter, adapter->tx_ring);
> -#ifndef CONFIG_E1000_NAPI
> adapter->clean_rx(adapter, adapter->rx_ring);
> #endif
> enable_irq(adapter->pdev->irq);
>
>
>
> --
> Auke Kok <auke-jan.h.kok@intel.com>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 13:52 ` Neil Horman
@ 2006-06-06 16:39 ` Mitch Williams
2006-06-06 17:05 ` Neil Horman
2006-06-06 17:29 ` Jeff Moyer
0 siblings, 2 replies; 35+ messages in thread
From: Mitch Williams @ 2006-06-06 16:39 UTC (permalink / raw)
To: Neil Horman
Cc: Kok, Auke, Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke,
jmoyer, mpm
On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
> I've been speaking about this fix with a Jeff Moyer, and we've come up with some
> concerns regarding its implementation. Specifically the call to
> adapter->clean_rx in the case of the e1000 driver is rather a layering
> violation in the netpoll code, in the sense that the function pointed to by clean_rx
> is functionality that is nominally used by the dev->poll method. In fact in
> this case, it would appear possible since dev->poll is called under the
> poll_lock, but dev->poll_controller is not, that is is possible to have cpus in
> a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
> corruption:
>
> CPU0:
> netpoll_poll_dev
> dev->poll_controller (e1000_netpoll)
> adapter->clean_rx (e1000_clean_rx_irq)
>
> CPU1:
> napi_poll
> dev->poll (e1000_clean)
> e1000_clean_rx_irq
Hmmm. You may have a point. I don't think a spinlock is required, but
we do need to check if the poll is already scheduled on another CPU,
like netpoll does in poll_napi().
In practice, of course, we never see this. The only netpoll client in
the kernel is netconsole, which doesn't do receives. A few Major
Distros use netdump, which does do receives, but only after the system
has crashed. In that case, all other CPUs are stopped anyway.
However, just for the sake of correctness (and paranoia), I'll whip up
another patch that does this check.
Jeff, please do not commit this patch.
-Mitch
NB: The root of this problem is that e1000 uses a dummy netdev to
schedule NAPI polling. Since netpoll doesn't know about the dummy
netdev, it checks the "real" e1000 netdev struct to see if it's
scheduled for polling. Since this is never the case, netpoll fails when
e1000 is configured to use NAPI. Hence, this patch.
Why is the dummy netdev in place? To support multi-queue RX. Our PCIe
adapters support this, but the kernel's not _quite_ ready yet.
Hopefully, the VJ net channel stuff will enable this feature.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 16:39 ` Mitch Williams
@ 2006-06-06 17:05 ` Neil Horman
2006-06-06 17:18 ` Auke Kok
2006-06-06 17:29 ` Jeff Moyer
1 sibling, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-06 17:05 UTC (permalink / raw)
To: Mitch Williams
Cc: Kok, Auke, Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke,
jmoyer, mpm
On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
> > I've been speaking about this fix with a Jeff Moyer, and we've come up with some
> > concerns regarding its implementation. Specifically the call to
> > adapter->clean_rx in the case of the e1000 driver is rather a layering
> > violation in the netpoll code, in the sense that the function pointed to by clean_rx
> > is functionality that is nominally used by the dev->poll method. In fact in
> > this case, it would appear possible since dev->poll is called under the
> > poll_lock, but dev->poll_controller is not, that is is possible to have cpus in
> > a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
> > corruption:
> >
> > CPU0:
> > netpoll_poll_dev
> > dev->poll_controller (e1000_netpoll)
> > adapter->clean_rx (e1000_clean_rx_irq)
> >
> > CPU1:
> > napi_poll
> > dev->poll (e1000_clean)
> > e1000_clean_rx_irq
>
> Hmmm. You may have a point. I don't think a spinlock is required, but
> we do need to check if the poll is already scheduled on another CPU,
> like netpoll does in poll_napi().
>
> In practice, of course, we never see this. The only netpoll client in
> the kernel is netconsole, which doesn't do receives. A few Major
> Distros use netdump, which does do receives, but only after the system
> has crashed. In that case, all other CPUs are stopped anyway.
>
You are probably right, this is a rare case, if it ever happens at all, but I
think there is (at least as Jeff explained it to me) a corner case, where
netconsole on transmit, notes an exhaustion of tx descriptors, and in response
calls the poll controller method of the driver to clean up and make some of
those descriptors available:
printk
release_console_sem
call_console_drivers
netconsole.c:write_msg
netpoll_send_udp
netpoll_send_skb
if (netif_queue_stopped(np->dev)) <--- out of descriptors ?
netpoll_poll(np); <--- trigger a poll to clean up
If this is being done at the same time as a napi_poll on another cpu, we have a
real set of conditions under which a corruption can occur.
> However, just for the sake of correctness (and paranoia), I'll whip up
> another patch that does this check.
>
Thanks for the quick feedback!
Regards
Neil
> Jeff, please do not commit this patch.
>
> -Mitch
>
> NB: The root of this problem is that e1000 uses a dummy netdev to
> schedule NAPI polling. Since netpoll doesn't know about the dummy
> netdev, it checks the "real" e1000 netdev struct to see if it's
> scheduled for polling. Since this is never the case, netpoll fails when
> e1000 is configured to use NAPI. Hence, this patch.
>
> Why is the dummy netdev in place? To support multi-queue RX. Our PCIe
> adapters support this, but the kernel's not _quite_ ready yet.
> Hopefully, the VJ net channel stuff will enable this feature.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 17:05 ` Neil Horman
@ 2006-06-06 17:18 ` Auke Kok
2006-06-06 17:30 ` Jeff Moyer
0 siblings, 1 reply; 35+ messages in thread
From: Auke Kok @ 2006-06-06 17:18 UTC (permalink / raw)
To: Garzik, Jeff
Cc: Neil Horman, Mitch Williams, netdev, Brandeburg, Jesse, Kok, Auke,
jmoyer, mpm
Neil Horman wrote:
> On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
>> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
[snip]
>> However, just for the sake of correctness (and paranoia), I'll whip up
>> another patch that does this check.
>>
> Thanks for the quick feedback!
>
> Regards
> Neil
>
>> Jeff, please do not commit this patch.
Jeff,
I've popped the patch off from our gitserver, so you can pull the two
outstanding patches while we revamp this one.
Auke
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 16:39 ` Mitch Williams
2006-06-06 17:05 ` Neil Horman
@ 2006-06-06 17:29 ` Jeff Moyer
1 sibling, 0 replies; 35+ messages in thread
From: Jeff Moyer @ 2006-06-06 17:29 UTC (permalink / raw)
To: Mitch Williams
Cc: Neil Horman, Kok, Auke, Garzik, Jeff, netdev, Brandeburg, Jesse,
Kok, Auke, mpm
==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams <mitch.a.williams@intel.com> adds:
mitch> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
>> I've been speaking about this fix with a Jeff Moyer, and we've come up with some
>> concerns regarding its implementation. Specifically the call to
adapter-> clean_rx in the case of the e1000 driver is rather a layering
>> violation in the netpoll code, in the sense that the function pointed to by clean_rx
>> is functionality that is nominally used by the dev->poll method. In fact in
>> this case, it would appear possible since dev->poll is called under the
>> poll_lock, but dev->poll_controller is not, that is is possible to have cpus in
>> a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
>> corruption:
>>
>> CPU0:
>> netpoll_poll_dev
dev-> poll_controller (e1000_netpoll)
adapter-> clean_rx (e1000_clean_rx_irq)
>>
>> CPU1:
>> napi_poll
dev-> poll (e1000_clean)
>> e1000_clean_rx_irq
mitch> Hmmm. You may have a point. I don't think a spinlock is required, but
mitch> we do need to check if the poll is already scheduled on another CPU,
mitch> like netpoll does in poll_napi().
mitch> In practice, of course, we never see this. The only netpoll client in
mitch> the kernel is netconsole, which doesn't do receives. A few Major
mitch> Distros use netdump, which does do receives, but only after the system
mitch> has crashed. In that case, all other CPUs are stopped anyway.
Don't forget about driver/net/kgdb_eth.c, which is in-tree, and does sends
and receives.
-Jeff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 17:18 ` Auke Kok
@ 2006-06-06 17:30 ` Jeff Moyer
2006-06-06 17:34 ` Auke Kok
0 siblings, 1 reply; 35+ messages in thread
From: Jeff Moyer @ 2006-06-06 17:30 UTC (permalink / raw)
To: Auke Kok
Cc: Garzik, Jeff, Neil Horman, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke, mpm
==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
auke-jan.h.kok> Neil Horman wrote:
>> On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
>>> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
auke-jan.h.kok> [snip]
>>> However, just for the sake of correctness (and paranoia), I'll whip up
>>> another patch that does this check.
>>>
>> Thanks for the quick feedback!
>> Regards
>> Neil
>>
>>> Jeff, please do not commit this patch.
auke-jan.h.kok> Jeff,
auke-jan.h.kok> I've popped the patch off from our gitserver, so you can pull the two
auke-jan.h.kok> outstanding patches while we revamp this one.
Would you please send patches to this list? I'd certainly like to review
them. I don't think the problem needs solving in the e1000 driver. I
think it is an issue that should be handled properly by netpoll.
-Jeff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 17:30 ` Jeff Moyer
@ 2006-06-06 17:34 ` Auke Kok
2006-06-06 17:42 ` Jeff Moyer
0 siblings, 1 reply; 35+ messages in thread
From: Auke Kok @ 2006-06-06 17:34 UTC (permalink / raw)
To: Jeff Moyer
Cc: Garzik, Jeff, Neil Horman, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke, mpm
Jeff Moyer wrote:
> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
>
> auke-jan.h.kok> Neil Horman wrote:
>>> On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
>>>> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
>
>
> auke-jan.h.kok> [snip]
>
>
>>>> However, just for the sake of correctness (and paranoia), I'll whip up
>>>> another patch that does this check.
>>>>
>>> Thanks for the quick feedback!
>>> Regards
>>> Neil
>>>
>>>> Jeff, please do not commit this patch.
>
> auke-jan.h.kok> Jeff,
>
> auke-jan.h.kok> I've popped the patch off from our gitserver, so you can pull the two
> auke-jan.h.kok> outstanding patches while we revamp this one.
>
> Would you please send patches to this list? I'd certainly like to review
> them. I don't think the problem needs solving in the e1000 driver. I
> think it is an issue that should be handled properly by netpoll.
???
that message was directed to Jeff Garzik, perhaps that was too confusing.
They were sent here in the first place:
http://marc.theaimsgroup.com/?l=linux-netdev&m=114954878711789&w=2
Cheers,
Auke
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 17:34 ` Auke Kok
@ 2006-06-06 17:42 ` Jeff Moyer
2006-06-06 23:17 ` Matt Mackall
0 siblings, 1 reply; 35+ messages in thread
From: Jeff Moyer @ 2006-06-06 17:42 UTC (permalink / raw)
To: Auke Kok
Cc: Garzik, Jeff, Neil Horman, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke, mpm
==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
auke-jan.h.kok> Jeff Moyer wrote:
>> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
auke-jan.h.kok> Neil Horman wrote:
>>>> On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
>>>>> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
auke-jan.h.kok> [snip]
>>
>>>>> However, just for the sake of correctness (and paranoia), I'll whip up
>>>>> another patch that does this check.
>>>>>
>>>> Thanks for the quick feedback!
>>>> Regards
>>>> Neil
>>>>
>>>>> Jeff, please do not commit this patch.
auke-jan.h.kok> Jeff,
auke-jan.h.kok> I've popped the patch off from our gitserver, so you can
>> pull the two
auke-jan.h.kok> outstanding patches while we revamp this one.
>> Would you please send patches to this list? I'd certainly like to review
>> them. I don't think the problem needs solving in the e1000 driver. I
>> think it is an issue that should be handled properly by netpoll.
auke-jan.h.kok> ???
auke-jan.h.kok> that message was directed to Jeff Garzik, perhaps that was too confusing.
I figured it was directed at Jeff G.
auke-jan.h.kok> They were sent here in the first place:
auke-jan.h.kok> http://marc.theaimsgroup.com/?l=linux-netdev&m=114954878711789&w=2
Thanks for the pointer.
As you noted, e1000 now uses a separate device for polling. Netpoll should
be able to deal with this, though. I'd like to solicit mpm's input on
this, as I'm having difficulties coming up with a clean solution.
For some background, the netpoll_poll_lock calls were introduced to prevent
recursion in a device driver's ->poll routine. By essentially calling the
poll routine from the poll_controller routine, you are no longer prevented
from such recursion.
It would be best if the poll_controller routine was kept simple. It should
do the equivalent of the interrupt processing portion of the work, and
leave the delivery to the network stack for a call to the ->poll routine.
Solving this at the netpoll layer will be a bit difficult, since we have no
insight into the driver design (as your driver illustrates). Up until now,
our model worked well.
We could, potentially, walk the list of devices scheduled for a poll much
the same as net_rx_action does. However, we don't want to process work
pending on other network adapters, only the one associated with the netpoll
net device. I can think of at least one way to make this distinction, but
it feels too much like a hack.
Matt, any ideas on this?
-Jeff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 17:42 ` Jeff Moyer
@ 2006-06-06 23:17 ` Matt Mackall
2006-06-07 15:05 ` Neil Horman
0 siblings, 1 reply; 35+ messages in thread
From: Matt Mackall @ 2006-06-06 23:17 UTC (permalink / raw)
To: Jeff Moyer
Cc: Auke Kok, Garzik, Jeff, Neil Horman, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
On Tue, Jun 06, 2006 at 01:42:59PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
>
> auke-jan.h.kok> Jeff Moyer wrote:
> >> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
> auke-jan.h.kok> Neil Horman wrote:
> >>>> On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
> >>>>> On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
> auke-jan.h.kok> [snip]
> >>
> >>>>> However, just for the sake of correctness (and paranoia), I'll whip up
> >>>>> another patch that does this check.
> >>>>>
> >>>> Thanks for the quick feedback!
> >>>> Regards
> >>>> Neil
> >>>>
> >>>>> Jeff, please do not commit this patch.
> auke-jan.h.kok> Jeff,
> auke-jan.h.kok> I've popped the patch off from our gitserver, so you can
> >> pull the two
> auke-jan.h.kok> outstanding patches while we revamp this one.
> >> Would you please send patches to this list? I'd certainly like to review
> >> them. I don't think the problem needs solving in the e1000 driver. I
> >> think it is an issue that should be handled properly by netpoll.
>
> auke-jan.h.kok> ???
>
> auke-jan.h.kok> that message was directed to Jeff Garzik, perhaps that was too confusing.
>
> I figured it was directed at Jeff G.
>
> auke-jan.h.kok> They were sent here in the first place:
>
> auke-jan.h.kok> http://marc.theaimsgroup.com/?l=linux-netdev&m=114954878711789&w=2
>
> Thanks for the pointer.
>
> As you noted, e1000 now uses a separate device for polling. Netpoll should
> be able to deal with this, though. I'd like to solicit mpm's input on
> this, as I'm having difficulties coming up with a clean solution.
It's a bit ad-hoc at the moment, but it might be a step towards a
cleaner model.
> For some background, the netpoll_poll_lock calls were introduced to prevent
> recursion in a device driver's ->poll routine. By essentially calling the
> poll routine from the poll_controller routine, you are no longer prevented
> from such recursion.
>
> It would be best if the poll_controller routine was kept simple. It should
> do the equivalent of the interrupt processing portion of the work, and
> leave the delivery to the network stack for a call to the ->poll routine.
>
> Solving this at the netpoll layer will be a bit difficult, since we have no
> insight into the driver design (as your driver illustrates). Up until now,
> our model worked well.
That's probably an overstatement.
> We could, potentially, walk the list of devices scheduled for a poll much
> the same as net_rx_action does. However, we don't want to process work
> pending on other network adapters, only the one associated with the netpoll
> net device. I can think of at least one way to make this distinction, but
> it feels too much like a hack.
Processing work on other devices may not be completely wrong.
> Matt, any ideas on this?
Not at the moment.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-06 23:17 ` Matt Mackall
@ 2006-06-07 15:05 ` Neil Horman
2006-06-07 16:48 ` Matt Mackall
0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-07 15:05 UTC (permalink / raw)
To: Matt Mackall
Cc: Jeff Moyer, Auke Kok, Garzik, Jeff, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
>
> > Matt, any ideas on this?
>
> Not at the moment.
how about this for a solution? It doesn't make netpoll any more robust, but I
think in the interests of efficiency it would be fair to require that, when
netpolled, a driver must receive frames on the same net device for which it was
polled. With this patch we detect that condition and handle it accordingly in
e1000_intr. This eliminates the need for us to call the clean_rx method from
the poll_controller when napi is configured, instead allowing the poll method to
be called from napi_poll, as the netpoll model currently does. This fixes the
netdump regression, and eliminates the layering violation and the potential race
that we've been discussing. I've just tested it with netdump here and it works
quite well.
Thoughts appreciated.
Thanks & Regards
Neil
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
e1000_main.c | 54 +++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 45 insertions(+), 9 deletions(-)
--- linux-2.6.9/drivers/net/e1000/e1000_main.c.neil 2006-06-06 10:37:42.000000000 -0400
+++ linux-2.6.9/drivers/net/e1000/e1000_main.c 2006-06-07 10:48:22.000000000 -0400
@@ -3207,8 +3207,9 @@
* @pt_regs: CPU registers structure
**/
+
static irqreturn_t
-e1000_intr(int irq, void *data, struct pt_regs *regs)
+__e1000_intr(int irq, void *data, struct pt_regs *regs, int netpoll_op)
{
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
@@ -3217,6 +3218,7 @@
#ifndef CONFIG_E1000_NAPI
int i;
#else
+ struct net_device *dev_to_sched;
/* Interrupt Auto-Mask...upon reading ICR,
* interrupts are masked. No need for the
* IMC write, but it does mean we should
@@ -3255,8 +3257,22 @@
E1000_WRITE_REG(hw, IMC, ~0);
E1000_WRITE_FLUSH(hw);
}
- if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0])))
- __netif_rx_schedule(&adapter->polling_netdev[0]);
+
+ /*
+ * netpoll operations, in the interests of efficiency
+ * only do napi polling on the device passed to the
+ * poll_controller. Therefore, if we are preforming
+ * a netpoll operation, then we can't schedule a receive
+ * to one of the dummy net devices that exist for sole
+ * purpose of spreading out rx schedules
+ */
+ if (netpoll_op)
+ dev_to_sched = netdev;
+ else
+ dev_to_sched = &adapter->polling_netdev[0];
+
+ if (likely(netif_rx_schedule_prep(dev_to_sched)))
+ __netif_rx_schedule(dev_to_sched);
else
e1000_irq_enable(adapter);
#else
@@ -3288,6 +3304,13 @@
return IRQ_HANDLED;
}
+static irqreturn_t
+e1000_intr(int irq, void *data, struct pt_regs *regs)
+{
+ return __e1000_intr(irq, data, regs, 0);
+}
+
+
#ifdef CONFIG_E1000_NAPI
/**
* e1000_clean - NAPI Rx polling callback
@@ -3300,7 +3323,6 @@
struct e1000_adapter *adapter;
int work_to_do = min(*budget, poll_dev->quota);
int tx_cleaned = 0, i = 0, work_done = 0;
-
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3308,10 +3330,24 @@
if (!netif_carrier_ok(adapter->netdev))
goto quit_polling;
- while (poll_dev != &adapter->polling_netdev[i]) {
- i++;
- if (unlikely(i == adapter->num_rx_queues))
- BUG();
+ /*
+ * only search for a matching polling_netdev in the event
+ * that this isn't a real registered net_device
+ * A real net device can be passed in here in the event
+ * that netdump has been activated (this comes through
+ * netpoll_poll_dev). We detect this by virtue of the
+ * fact that each polling_netdev->priv points to the private
+ * data of its parent (registered) netdev. So if:
+ * poll_dev->priv == netdev_priv(poll_dev), its a real device
+ * otherwise its a polling_netdev.
+ */
+ if (adapter != netdev_priv(poll_dev)) {
+ while (poll_dev != &adapter->polling_netdev[i]) {
+ i++;
+ if (unlikely(i == adapter->num_rx_queues))
+ BUG();
+ }
+
}
if (likely(adapter->num_tx_queues == 1)) {
@@ -4624,7 +4660,7 @@
{
struct e1000_adapter *adapter = netdev_priv(netdev);
disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev, NULL);
+ __e1000_intr(adapter->pdev->irq, netdev, NULL, 1);
e1000_clean_tx_irq(adapter, adapter->tx_ring);
#ifndef CONFIG_E1000_NAPI
adapter->clean_rx(adapter, adapter->rx_ring);
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 15:05 ` Neil Horman
@ 2006-06-07 16:48 ` Matt Mackall
2006-06-07 18:25 ` Auke Kok
0 siblings, 1 reply; 35+ messages in thread
From: Matt Mackall @ 2006-06-07 16:48 UTC (permalink / raw)
To: Neil Horman
Cc: Jeff Moyer, Auke Kok, Garzik, Jeff, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
On Wed, Jun 07, 2006 at 11:05:22AM -0400, Neil Horman wrote:
> >
> > > Matt, any ideas on this?
> >
> > Not at the moment.
>
> how about this for a solution? It doesn't make netpoll any more robust, but I
> think in the interests of efficiency it would be fair to require that, when
> netpolled, a driver must receive frames on the same net device for which it was
> polled. With this patch we detect that condition and handle it accordingly in
> e1000_intr. This eliminates the need for us to call the clean_rx method from
> the poll_controller when napi is configured, instead allowing the poll method to
> be called from napi_poll, as the netpoll model currently does. This fixes the
> netdump regression, and eliminates the layering violation and the potential race
> that we've been discussing. I've just tested it with netdump here and it works
> quite well.
>
> Thoughts appreciated.
This looks pretty reasonable, mostly from the perspective that it
doesn't put any further ugliness in netpoll. We might want to add a
comment somewhere in netpoll of the new rule we're now observing.
I'll let the e1000 guys comment on the particulars of the driver change.
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 16:48 ` Matt Mackall
@ 2006-06-07 18:25 ` Auke Kok
2006-06-07 18:44 ` Jeff Moyer
2006-06-07 18:54 ` John W. Linville
0 siblings, 2 replies; 35+ messages in thread
From: Auke Kok @ 2006-06-07 18:25 UTC (permalink / raw)
To: Matt Mackall, Garzik, Jeff
Cc: Neil Horman, Jeff Moyer, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
[-- Attachment #1: Type: text/plain, Size: 1982 bytes --]
Matt Mackall wrote:
> On Wed, Jun 07, 2006 at 11:05:22AM -0400, Neil Horman wrote:
>>>
>>>> Matt, any ideas on this?
>>> Not at the moment.
>> how about this for a solution? It doesn't make netpoll any more robust, but I
>> think in the interests of efficiency it would be fair to require that, when
>> netpolled, a driver must receive frames on the same net device for which it was
>> polled. With this patch we detect that condition and handle it accordingly in
>> e1000_intr. This eliminates the need for us to call the clean_rx method from
>> the poll_controller when napi is configured, instead allowing the poll method to
>> be called from napi_poll, as the netpoll model currently does. This fixes the
>> netdump regression, and eliminates the layering violation and the potential race
>> that we've been discussing. I've just tested it with netdump here and it works
>> quite well.
>>
>> Thoughts appreciated.
>
> This looks pretty reasonable, mostly from the perspective that it
> doesn't put any further ugliness in netpoll. We might want to add a
> comment somewhere in netpoll of the new rule we're now observing.
> I'll let the e1000 guys comment on the particulars of the driver change.
Hi,
we're not too happy with this as it puts a branch right in the regular receive
path. We haven't ran the numbers on it yet but it is likely that this will
lower performance significantly during normal receives for something that is
not common use.
Attached below a (revised) patch that adds proper locking around the rx_clean
to prevent the race.
Cheers,
Auke
---
e1000: fix netpoll with NAPI
This fixes netpoll when using NAPI, and protects against a rare race condition
in the netpoll routine, while staying out of our ways from the normal hotpath.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---
e1000_main.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
[-- Attachment #2: e1000_netpoll_napi-r2.patch --]
[-- Type: text/x-patch, Size: 1035 bytes --]
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bd709e5..876251c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -4584,10 +4584,25 @@ static void
e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
+#ifdef CONFIG_E1000_NAPI
+ int budget = 0;
+
+ disable_irq(adapter->pdev->irq);
+ if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0]))) {
+ if (spin_trylock(&adapter->tx_queue_lock)) {
+ e1000_clean_tx_irq(adapter, &adapter->tx_ring[0]);
+ spin_unlock(&adapter->tx_queue_lock);
+ }
+ adapter->clean_rx(adapter, adapter->rx_ring,
+ &budget, netdev->weight);
+ clear_bit(__LINK_STATE_RX_SCHED,
+ &adapter->polling_netdev[0].state);
+ }
+#else
+
disable_irq(adapter->pdev->irq);
e1000_intr(adapter->pdev->irq, netdev, NULL);
e1000_clean_tx_irq(adapter, adapter->tx_ring);
-#ifndef CONFIG_E1000_NAPI
adapter->clean_rx(adapter, adapter->rx_ring);
#endif
enable_irq(adapter->pdev->irq);
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 18:25 ` Auke Kok
@ 2006-06-07 18:44 ` Jeff Moyer
2006-06-07 19:18 ` Neil Horman
2006-06-08 17:19 ` Mitch Williams
2006-06-07 18:54 ` John W. Linville
1 sibling, 2 replies; 35+ messages in thread
From: Jeff Moyer @ 2006-06-07 18:44 UTC (permalink / raw)
To: Auke Kok
Cc: Matt Mackall, Garzik, Jeff, Neil Horman, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
auke-jan.h.kok> Hi,
auke-jan.h.kok> we're not too happy with this as it puts a branch right in
auke-jan.h.kok> the regular receive path. We haven't ran the numbers on it
auke-jan.h.kok> yet but it is likely that this will lower performance
auke-jan.h.kok> significantly during normal receives for something that is
auke-jan.h.kok> not common use.
auke-jan.h.kok> Attached below a (revised) patch that adds proper locking
auke-jan.h.kok> around the rx_clean to prevent the race.
That patch locks around the tx clean routine. As such, it doesn't prevent
the problem.
> + disable_irq(adapter->pdev->irq);
> + if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0]))) {
> + if (spin_trylock(&adapter->tx_queue_lock)) {
> + e1000_clean_tx_irq(adapter, &adapter->tx_ring[0]);
> + spin_unlock(&adapter->tx_queue_lock);
> + }
> + adapter->clean_rx(adapter, adapter->rx_ring,
> + &budget, netdev->weight);
> + clear_bit(__LINK_STATE_RX_SCHED,
> + &adapter->polling_netdev[0].state);
-Jeff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 18:25 ` Auke Kok
2006-06-07 18:44 ` Jeff Moyer
@ 2006-06-07 18:54 ` John W. Linville
2006-06-08 17:23 ` Mitch Williams
1 sibling, 1 reply; 35+ messages in thread
From: John W. Linville @ 2006-06-07 18:54 UTC (permalink / raw)
To: Auke Kok
Cc: Matt Mackall, Garzik, Jeff, Neil Horman, Jeff Moyer,
Mitch Williams, netdev, Brandeburg, Jesse, Kok, Auke
On Wed, Jun 07, 2006 at 11:25:29AM -0700, Auke Kok wrote:
> @@ -4584,10 +4584,25 @@ static void
> e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
> +#ifdef CONFIG_E1000_NAPI
> + int budget = 0;
> +
> + disable_irq(adapter->pdev->irq);
> + if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0]))) {
> + if (spin_trylock(&adapter->tx_queue_lock)) {
> + e1000_clean_tx_irq(adapter, &adapter->tx_ring[0]);
> + spin_unlock(&adapter->tx_queue_lock);
> + }
> + adapter->clean_rx(adapter, adapter->rx_ring,
> + &budget, netdev->weight);
> + clear_bit(__LINK_STATE_RX_SCHED,
> + &adapter->polling_netdev[0].state);
> + }
> +#else
> +
> disable_irq(adapter->pdev->irq);
Pedantic objection, but I think this would read easier w/o the extra
newline before disable_irq.
John
--
John W. Linville
linville@tuxdriver.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 18:44 ` Jeff Moyer
@ 2006-06-07 19:18 ` Neil Horman
2006-06-08 17:19 ` Mitch Williams
1 sibling, 0 replies; 35+ messages in thread
From: Neil Horman @ 2006-06-07 19:18 UTC (permalink / raw)
To: Jeff Moyer
Cc: Auke Kok, Matt Mackall, Garzik, Jeff, Mitch Williams, netdev,
Brandeburg, Jesse, Kok, Auke
On Wed, Jun 07, 2006 at 02:44:54PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok <auke-jan.h.kok@intel.com> adds:
>
> auke-jan.h.kok> Hi,
>
> auke-jan.h.kok> we're not too happy with this as it puts a branch right in
> auke-jan.h.kok> the regular receive path. We haven't ran the numbers on it
> auke-jan.h.kok> yet but it is likely that this will lower performance
> auke-jan.h.kok> significantly during normal receives for something that is
> auke-jan.h.kok> not common use.
>
> auke-jan.h.kok> Attached below a (revised) patch that adds proper locking
> auke-jan.h.kok> around the rx_clean to prevent the race.
>
> That patch locks around the tx clean routine. As such, it doesn't prevent
> the problem.
>
Further to that, do tests on this if you like, but I would certainly think a
properly formed conditional operation is going to provide better performance
than a spin_lock operation in the receive path. Why not just turn the:
if(netpoll_op)
into an
if(unlikely(netpoll_op))
I expect that will reduce the overhead of the conditional to effectively zero
for the normal receive case. The following patch does that, and I expect you
performance won't suffer at all:
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
e1000_main.c | 54 +++++++++++++++++++++++++++++++++++++++++++++---------
1 files changed, 45 insertions(+), 9 deletions(-)
--- linux-2.6.9/drivers/net/e1000/e1000_main.c.neil 2006-06-06 10:37:42.000000000 -0400
+++ linux-2.6.9/drivers/net/e1000/e1000_main.c 2006-06-07 10:48:22.000000000 -0400
@@ -3207,8 +3207,9 @@ e1000_update_stats(struct e1000_adapter
* @pt_regs: CPU registers structure
**/
+
static irqreturn_t
-e1000_intr(int irq, void *data, struct pt_regs *regs)
+__e1000_intr(int irq, void *data, struct pt_regs *regs, int netpoll_op)
{
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
@@ -3217,6 +3218,7 @@ e1000_intr(int irq, void *data, struct p
#ifndef CONFIG_E1000_NAPI
int i;
#else
+ struct net_device *dev_to_sched;
/* Interrupt Auto-Mask...upon reading ICR,
* interrupts are masked. No need for the
* IMC write, but it does mean we should
@@ -3255,8 +3257,22 @@ e1000_intr(int irq, void *data, struct p
E1000_WRITE_REG(hw, IMC, ~0);
E1000_WRITE_FLUSH(hw);
}
- if (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0])))
- __netif_rx_schedule(&adapter->polling_netdev[0]);
+
+ /*
+ * netpoll operations, in the interests of efficiency
+ * only do napi polling on the device passed to the
+ * poll_controller. Therefore, if we are preforming
+ * a netpoll operation, then we can't schedule a receive
+ * to one of the dummy net devices that exist for sole
+ * purpose of spreading out rx schedules
+ */
+ if (unlikely(netpoll_op))
+ dev_to_sched = netdev;
+ else
+ dev_to_sched = &adapter->polling_netdev[0];
+
+ if (likely(netif_rx_schedule_prep(dev_to_sched)))
+ __netif_rx_schedule(dev_to_sched);
else
e1000_irq_enable(adapter);
#else
@@ -3288,6 +3304,13 @@ e1000_intr(int irq, void *data, struct p
return IRQ_HANDLED;
}
+static irqreturn_t
+e1000_intr(int irq, void *data, struct pt_regs *regs)
+{
+ return __e1000_intr(irq, data, regs, 0);
+}
+
+
#ifdef CONFIG_E1000_NAPI
/**
* e1000_clean - NAPI Rx polling callback
@@ -3300,7 +3323,6 @@ e1000_clean(struct net_device *poll_dev,
struct e1000_adapter *adapter;
int work_to_do = min(*budget, poll_dev->quota);
int tx_cleaned = 0, i = 0, work_done = 0;
-
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3308,10 +3330,24 @@ e1000_clean(struct net_device *poll_dev,
if (!netif_carrier_ok(adapter->netdev))
goto quit_polling;
- while (poll_dev != &adapter->polling_netdev[i]) {
- i++;
- if (unlikely(i == adapter->num_rx_queues))
- BUG();
+ /*
+ * only search for a matching polling_netdev in the event
+ * that this isn't a real registered net_device
+ * A real net device can be passed in here in the event
+ * that netdump has been activated (this comes through
+ * netpoll_poll_dev). We detect this by virtue of the
+ * fact that each polling_netdev->priv points to the private
+ * data of its parent (registered) netdev. So if:
+ * poll_dev->priv == netdev_priv(poll_dev), its a real device
+ * otherwise its a polling_netdev.
+ */
+ if (likely(adapter != netdev_priv(poll_dev))) {
+ while (poll_dev != &adapter->polling_netdev[i]) {
+ i++;
+ if (unlikely(i == adapter->num_rx_queues))
+ BUG();
+ }
+
}
if (likely(adapter->num_tx_queues == 1)) {
@@ -4624,7 +4660,7 @@ e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev, NULL);
+ __e1000_intr(adapter->pdev->irq, netdev, NULL, 1);
e1000_clean_tx_irq(adapter, adapter->tx_ring);
#ifndef CONFIG_E1000_NAPI
adapter->clean_rx(adapter, adapter->rx_ring);
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 18:44 ` Jeff Moyer
2006-06-07 19:18 ` Neil Horman
@ 2006-06-08 17:19 ` Mitch Williams
2006-06-08 17:29 ` Jeff Moyer
1 sibling, 1 reply; 35+ messages in thread
From: Mitch Williams @ 2006-06-08 17:19 UTC (permalink / raw)
To: Jeff Moyer
Cc: Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, Neil Horman, netdev,
Brandeburg, Jesse, Kok, Auke
On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:
> That patch locks around the tx clean routine. As such, it doesn't
> prevent
> the problem.
The call to netif_rx_schedule_prep provides locking because it sets the
__LINK_STATE_RX_SCHED bit atomically. The spinlock around
e1000_clean_tx_irq is to protect it from other calls to the transmit
routine, not NAPI.
-Mitch
> > + disable_irq(adapter->pdev->irq);
> > + if
> (likely(netif_rx_schedule_prep(&adapter->polling_netdev[0]))) {
> > + if (spin_trylock(&adapter->tx_queue_lock)) {
> > + e1000_clean_tx_irq(adapter,
> &adapter->tx_ring[0]);
> > + spin_unlock(&adapter->tx_queue_lock);
> > + }
> > + adapter->clean_rx(adapter, adapter->rx_ring,
> > + &budget, netdev->weight);
> > + clear_bit(__LINK_STATE_RX_SCHED,
> > + &adapter->polling_netdev[0].state);
>
> -Jeff
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-07 18:54 ` John W. Linville
@ 2006-06-08 17:23 ` Mitch Williams
2006-06-08 18:39 ` John W. Linville
0 siblings, 1 reply; 35+ messages in thread
From: Mitch Williams @ 2006-06-08 17:23 UTC (permalink / raw)
To: John W. Linville
Cc: Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, Neil Horman,
Jeff Moyer, netdev, Brandeburg, Jesse, Kok, Auke
On Wed, 2006-06-07 at 11:54 -0700, John W. Linville wrote:
> Pedantic objection, but I think this would read easier w/o the extra
> newline before disable_irq.
Heh. I prefer to have a newline between declarations and code. The
real problem is the position of the #ifdef -- that's what makes it
difficult to read. The other solution would be
{
struct e1000_adapter *adapter = netdev_priv(netdev);
#ifdef CONFIG_E1000_NAPI
int budget = 0;
#endif
disable_irq(adapter->pdev->irq);
#ifdef CONFIG_E1000_NAPI
< all that stuff >
#else
<rest of the stuff >
#endif
}
Which I think is worse to read.
-Mitch
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-08 17:19 ` Mitch Williams
@ 2006-06-08 17:29 ` Jeff Moyer
2006-06-12 0:13 ` Neil Horman
0 siblings, 1 reply; 35+ messages in thread
From: Jeff Moyer @ 2006-06-08 17:29 UTC (permalink / raw)
To: Mitch Williams
Cc: Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, Neil Horman, netdev,
Brandeburg, Jesse, Kok, Auke
==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams <mitch.a.williams@intel.com> adds:
mitch.a.williams> On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:
>> That patch locks around the tx clean routine. As such, it doesn't
>> prevent the problem.
mitch.a.williams> The call to netif_rx_schedule_prep provides locking
mitch.a.williams> because it sets the __LINK_STATE_RX_SCHED bit atomically.
mitch.a.williams> The spinlock around e1000_clean_tx_irq is to protect it
mitch.a.williams> from other calls to the transmit routine, not NAPI.
Yes, but what prevents recursion in the poll routine? Consider that the
poll routine could end up triggerring a printk (think iptables, here). In
that case, you end up calling into netpoll, and if the tx ring is full, we
call the poll_controller routine. We've now recursed.
The poll lock was originally introduced to prevent recursion, not
concurrent access.
-Jeff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-08 17:23 ` Mitch Williams
@ 2006-06-08 18:39 ` John W. Linville
0 siblings, 0 replies; 35+ messages in thread
From: John W. Linville @ 2006-06-08 18:39 UTC (permalink / raw)
To: Mitch Williams
Cc: Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, Neil Horman,
Jeff Moyer, netdev, Brandeburg, Jesse, Kok, Auke
On Thu, Jun 08, 2006 at 10:23:56AM -0700, Mitch Williams wrote:
> On Wed, 2006-06-07 at 11:54 -0700, John W. Linville wrote:
>
> > Pedantic objection, but I think this would read easier w/o the extra
> > newline before disable_irq.
>
> Heh. I prefer to have a newline between declarations and code. The
Normally I would agree. But in this case, I find the distraction of
the random newline after the #else to be more compelling.
> real problem is the position of the #ifdef -- that's what makes it
> difficult to read. The other solution would be
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
> #ifdef CONFIG_E1000_NAPI
> int budget = 0;
> #endif
>
> disable_irq(adapter->pdev->irq);
>
> #ifdef CONFIG_E1000_NAPI
> < all that stuff >
> #else
> <rest of the stuff >
> #endif
> }
>
> Which I think is worse to read.
I presume it is the double #ifdef that you find objectionable?
I don't really like it, but at least that idiom is quite common.
Given that the disable_irq appears in both code paths (almost by
necessity), there is a certain appeal to having it outside of the
#ifdef block. That seems more maintainable.
To me, the idiomatic #ifdef placement seems more readable, if for no
other reason than familiarity. I suppose we can agree to disagree.
John
--
John W. Linville
linville@tuxdriver.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-08 17:29 ` Jeff Moyer
@ 2006-06-12 0:13 ` Neil Horman
2006-06-12 16:42 ` Mitch Williams
0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-12 0:13 UTC (permalink / raw)
To: Jeff Moyer
Cc: Mitch Williams, Kok, Auke-jan H, Matt Mackall, Garzik, Jeff,
netdev, Brandeburg, Jesse, Kok, Auke
On Thu, Jun 08, 2006 at 01:29:00PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams <mitch.a.williams@intel.com> adds:
>
> mitch.a.williams> On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:
> >> That patch locks around the tx clean routine. As such, it doesn't
> >> prevent the problem.
>
> mitch.a.williams> The call to netif_rx_schedule_prep provides locking
> mitch.a.williams> because it sets the __LINK_STATE_RX_SCHED bit atomically.
> mitch.a.williams> The spinlock around e1000_clean_tx_irq is to protect it
> mitch.a.williams> from other calls to the transmit routine, not NAPI.
>
> Yes, but what prevents recursion in the poll routine? Consider that the
> poll routine could end up triggerring a printk (think iptables, here). In
> that case, you end up calling into netpoll, and if the tx ring is full, we
> call the poll_controller routine. We've now recursed.
>
> The poll lock was originally introduced to prevent recursion, not
> concurrent access.
>
Any further thoughts on this guys? I still think my last solution solves all of
the netpoll problems, and isn't going to have any noticable impact on
performance.
Regards
Neil
> -Jeff
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-12 0:13 ` Neil Horman
@ 2006-06-12 16:42 ` Mitch Williams
2006-06-12 18:06 ` Neil Horman
0 siblings, 1 reply; 35+ messages in thread
From: Mitch Williams @ 2006-06-12 16:42 UTC (permalink / raw)
To: Neil Horman
Cc: Jeff Moyer, Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, netdev,
Brandeburg, Jesse, Kok, Auke
On Sun, 2006-06-11 at 17:13 -0700, Neil Horman wrote:
> Any further thoughts on this guys? I still think my last solution
> solves all of
> the netpoll problems, and isn't going to have any noticable impact on
> performance.
>
I haven't had time to evaluate performance on your patch (sorry!), but
after thinking about it, I agree that it should not have any noticeable
impact. OTOH, performance tuning is a funny thing, and things you think
won't cause problems often do.
Anyway, I'm still not quite ready to ACK this because it's just not
future-proof. Eventually, we will need to support multiple RX queues,
and this solution will not work in that situation.
A simpler short-term solution is just to schedule our NAPI polling on
the "real" netdev instead of our polling netdev. This is a trivial
change and works correctly with a single queue. But, like your patch,
it isn't future-proof.
So, I'm still thinking and pondering on this one.
If we get a patch in to fix the recursive loop in netpoll, my original
patch will work, right? Or is there still another issue?
-Mitch
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-12 16:42 ` Mitch Williams
@ 2006-06-12 18:06 ` Neil Horman
2006-06-14 20:41 ` Neil Horman
0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-12 18:06 UTC (permalink / raw)
To: Mitch Williams
Cc: Jeff Moyer, Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, netdev,
Brandeburg, Jesse, Kok, Auke
On Mon, Jun 12, 2006 at 09:42:14AM -0700, Mitch Williams wrote:
> On Sun, 2006-06-11 at 17:13 -0700, Neil Horman wrote:
> > Any further thoughts on this guys? I still think my last solution
> > solves all of
> > the netpoll problems, and isn't going to have any noticable impact on
> > performance.
> >
> I haven't had time to evaluate performance on your patch (sorry!), but
> after thinking about it, I agree that it should not have any noticeable
> impact. OTOH, performance tuning is a funny thing, and things you think
> won't cause problems often do.
>
Thats ok, I just didn't hear out of anyone on friday, so I was curious as to
where we were on this. I don't have the ability to do any real world
performance testing here, but I'll try to record the run time of the interrupt
routine on a limited number of frames here.
> Anyway, I'm still not quite ready to ACK this because it's just not
> future-proof. Eventually, we will need to support multiple RX queues,
> and this solution will not work in that situation.
>
Not sure I understand this. My patch:
http://marc.theaimsgroup.com/?l=linux-netdev&m=114970506406155&w=2
still allows for the use of multiple rx
queues in the nominal case. Only when we have to use a relatively slow netpoll
driven operation (kgdb, netdump, etc), do we need to receive on the same
interface that we transmit on.
> A simpler short-term solution is just to schedule our NAPI polling on
> the "real" netdev instead of our polling netdev. This is a trivial
> change and works correctly with a single queue. But, like your patch,
> it isn't future-proof.
>
Again, not sure what you mean here. My last patch proposal:
http://marc.theaimsgroup.com/?l=linux-netdev&m=114970807606096&w=2
Does precisely what you describe, but it still allows for multiple rx queues in
the nominal (non poll_controller driven) receive case.
> So, I'm still thinking and pondering on this one.
>
> If we get a patch in to fix the recursive loop in netpoll, my original
> patch will work, right? Or is there still another issue?
>
I assume you are referring to this patch:
http://marc.theaimsgroup.com/?l=linux-netdev&m=114970506406155&w=2
If so, then no, that patch is still broken for the reasons Jeff outlined
previously, The recursion patch that I proposed earlier today is related to a
different recursion problem, and while the e1000 driver might be suceptable to
it, your patch is also suceptible to the data corruption that arises from when
the poll_controller calls adapter->clean_rx at the same that that dev->poll is
called for the same adapter on another cpu. If that happens we can have two
cpu's writing to the same private net_device data without the benefit of a
spinlock to protect them. And yes, you can add a spin lock to protect the case
where you have a dev->poll_controller and a dev->poll operation at the same
time, but that seems to me like it will also re-serialize all the parallel
operations that you could otherwise do with multiple rx queues
I'll post again if I get a chance to do some performance measurements
Regrds
Neil
> -Mitch
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-12 18:06 ` Neil Horman
@ 2006-06-14 20:41 ` Neil Horman
2006-06-14 23:44 ` Mitch Williams
0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2006-06-14 20:41 UTC (permalink / raw)
To: Mitch Williams
Cc: Jeff Moyer, Kok, Auke-jan H, Matt Mackall, Garzik, Jeff, netdev,
Brandeburg, Jesse, Kok, Auke
On Mon, Jun 12, 2006 at 02:06:00PM -0400, Neil Horman wrote:
> On Mon, Jun 12, 2006 at 09:42:14AM -0700, Mitch Williams wrote:
> > On Sun, 2006-06-11 at 17:13 -0700, Neil Horman wrote:
> > > Any further thoughts on this guys? I still think my last solution
> > > solves all of
> > > the netpoll problems, and isn't going to have any noticable impact on
> > > performance.
> > >
> > I haven't had time to evaluate performance on your patch (sorry!), but
> > after thinking about it, I agree that it should not have any noticeable
> > impact. OTOH, performance tuning is a funny thing, and things you think
> > won't cause problems often do.
> >
> Thats ok, I just didn't hear out of anyone on friday, so I was curious as to
> where we were on this. I don't have the ability to do any real world
> performance testing here, but I'll try to record the run time of the interrupt
> routine on a limited number of frames here.
>
Hey, as promised, I've done some rudimentary performance benchmarking on various
ways that we have talked about to solve this problem. As I previously mentioned
I didn't have the equipment to do any real full scale testing here, so what I
did was take a read of the real time counter at the start and end of the
e1000_intr routine with various patches applied, and I recorded the number of
ticks elapsed on the tsc during its run. I did this on my single cpu x86_64
machine here, using the latest unpatched e1000 driver as a base, and then
comparing it to the e1000 driver using my patch and separately with a patch that
spinlocks the e1000_clean_rx_irq routine (so as to serialize the critical
section that would otherwise be subject to data corruption. Here are my
results:
Base line:
Avg. 8145 Ticks on the tsc.
With my patch: http://marc.theaimsgroup.com/?l=linux-netdev&m=114970807606096&w=2
Avg. 8159 Ticks on the tsc. (+0.17% increase)
With a spinlock added to e1000_clean_rx_irq:
Avg. 8238 Ticks on the tsc. (+1.1% increase)
If you like I can send you the time stamp counter patch that I used, as well as
the patch which adds spinlocks to the clean routine. Note that the free running
counter values will vary so you probably want to look at percentage increase.
Either way, I think either solution provides very little impact on interrupt run
time. Given that my patch (granted using my test methodology here) is the
faster of the two, and arguably the more correct in terms of not using the poll
controller method to recieve frames, We should go with that patch.
Thoughts/opinions?
Neil
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-14 20:41 ` Neil Horman
@ 2006-06-14 23:44 ` Mitch Williams
2006-06-15 12:44 ` John W. Linville
0 siblings, 1 reply; 35+ messages in thread
From: Mitch Williams @ 2006-06-14 23:44 UTC (permalink / raw)
To: Neil Horman
Cc: Williams, Mitch A, Jeff Moyer, Kok, Auke-jan H, Matt Mackall,
Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke
On Wed, 14 Jun 2006, Neil Horman wrote:
> Hey, as promised, I've done some rudimentary performance benchmarking on various
> ways that we have talked about to solve this problem. As I previously mentioned
We see the same results here, Neil. However, we've got a much less
invasive patch undergoing internal review, and which we will post to
netdev once everybody gets happy with it. Basically, we just do our NAPI
scheduling on the "real" netdev structure instead of our polling netdev,
in the case where we only have one RX queue. Since this is the case for
all our currently-shipping parts under Linux, netpoll works again across
the board. It's a short-term fix because we do want to support multiple
queues going forward, but for now we need to get everybody working.
One of our engineers (on the I/O AT team) has been tasked with modifying
the Linux kernel to properly support multiple hardware queues (both TX and
RX). We'll make sure that he looks at the netpoll interface as part of
that process.
Stay tuned for our impending patch.
-Mitch
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-14 23:44 ` Mitch Williams
@ 2006-06-15 12:44 ` John W. Linville
2006-06-15 20:45 ` Mitch Williams
0 siblings, 1 reply; 35+ messages in thread
From: John W. Linville @ 2006-06-15 12:44 UTC (permalink / raw)
To: Mitch Williams
Cc: Neil Horman, Jeff Moyer, Kok, Auke-jan H, Matt Mackall,
Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke
On Wed, Jun 14, 2006 at 04:44:56PM -0700, Mitch Williams wrote:
> One of our engineers (on the I/O AT team) has been tasked with modifying
> the Linux kernel to properly support multiple hardware queues (both TX and
> RX). We'll make sure that he looks at the netpoll interface as part of
> that process.
Might I ask who this is? I might like to ping him/her on this topic.
There is potentially some overlap with wireless, at least on the
transmit side.
John
--
John W. Linville
linville@tuxdriver.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-15 12:44 ` John W. Linville
@ 2006-06-15 20:45 ` Mitch Williams
2006-06-20 8:28 ` Andrew Grover
0 siblings, 1 reply; 35+ messages in thread
From: Mitch Williams @ 2006-06-15 20:45 UTC (permalink / raw)
To: John W. Linville
Cc: Williams, Mitch A, Neil Horman, Jeff Moyer, Kok, Auke-jan H,
Matt Mackall, Garzik, Jeff, netdev, Brandeburg, Jesse, Kok, Auke
Andy Grover <andrew.grover@intel.com> is the guy. I'm not sure when he'll
be working on this; it's somewhere in his TODO pile.
On Thu, 15 Jun 2006, John W. Linville wrote:
>
> On Wed, Jun 14, 2006 at 04:44:56PM -0700, Mitch Williams wrote:
>
> > One of our engineers (on the I/O AT team) has been tasked with modifying
> > the Linux kernel to properly support multiple hardware queues (both TX and
> > RX). We'll make sure that he looks at the netpoll interface as part of
> > that process.
>
> Might I ask who this is? I might like to ping him/her on this topic.
> There is potentially some overlap with wireless, at least on the
> transmit side.
>
> John
> --
> John W. Linville
> linville@tuxdriver.com
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/2] e1000: fix netpoll with NAPI
2006-06-15 20:45 ` Mitch Williams
@ 2006-06-20 8:28 ` Andrew Grover
0 siblings, 0 replies; 35+ messages in thread
From: Andrew Grover @ 2006-06-20 8:28 UTC (permalink / raw)
To: linville; +Cc: netdev
(trimmed CC to just netdev)
> > > One of our engineers (on the I/O AT team) has been tasked with modifying
> > > the Linux kernel to properly support multiple hardware queues (both TX and
> > > RX). We'll make sure that he looks at the netpoll interface as part of
> > > that process.
> >
> > Might I ask who this is? I might like to ping him/her on this topic.
> > There is potentially some overlap with wireless, at least on the
> > transmit side.
> > John W. Linville
Hi John, so yeah we want multiple TX queues on wired ethernet for QoS,
same as wireless. Also for SMP scaling (maybe not needed until 10G+).
Did wireless people settle on a design since the email thread in
January?
Regards -- Andy
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2006-06-20 8:28 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-05 23:09 [PATCH 0/2] e1000: fixes for netpoll+NAPI, ARM Kok, Auke
2006-06-05 23:11 ` [PATCH 1/2] e1000: fix netpoll with NAPI Kok, Auke
2006-06-06 13:52 ` Neil Horman
2006-06-06 16:39 ` Mitch Williams
2006-06-06 17:05 ` Neil Horman
2006-06-06 17:18 ` Auke Kok
2006-06-06 17:30 ` Jeff Moyer
2006-06-06 17:34 ` Auke Kok
2006-06-06 17:42 ` Jeff Moyer
2006-06-06 23:17 ` Matt Mackall
2006-06-07 15:05 ` Neil Horman
2006-06-07 16:48 ` Matt Mackall
2006-06-07 18:25 ` Auke Kok
2006-06-07 18:44 ` Jeff Moyer
2006-06-07 19:18 ` Neil Horman
2006-06-08 17:19 ` Mitch Williams
2006-06-08 17:29 ` Jeff Moyer
2006-06-12 0:13 ` Neil Horman
2006-06-12 16:42 ` Mitch Williams
2006-06-12 18:06 ` Neil Horman
2006-06-14 20:41 ` Neil Horman
2006-06-14 23:44 ` Mitch Williams
2006-06-15 12:44 ` John W. Linville
2006-06-15 20:45 ` Mitch Williams
2006-06-20 8:28 ` Andrew Grover
2006-06-07 18:54 ` John W. Linville
2006-06-08 17:23 ` Mitch Williams
2006-06-08 18:39 ` John W. Linville
2006-06-06 17:29 ` Jeff Moyer
2006-06-05 23:11 ` [PATCH 2/2] e1000: remove risky prefetch on next_skb->data Kok, Auke
2006-06-05 23:21 ` Rick Jones
2006-06-06 0:12 ` Brandeburg, Jesse
2006-06-06 0:16 ` Rick Jones
2006-06-06 0:22 ` Andi Kleen
2006-06-06 0:26 ` Brandeburg, Jesse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).