Netdev List
 help / color / mirror / Atom feed
* [net-next 2/4] e1000: remove workaround for Errata 23 from jumbo alloc
From: Jeff Kirsher @ 2012-05-17 12:31 UTC (permalink / raw)
  To: davem; +Cc: Sebastian Andrzej Siewior, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337257912-8487-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

According to the comment, errata 23 says that the memory we allocate
can't cross a 64KiB boundary. In case of jumbo frames we allocate
complete pages which can never cross the 64KiB boundary because
PAGE_SIZE should be a multiple of 64KiB so we stop either before the
boundary or start after it but never cross it. Furthermore the check
seems bogus because it looks at skb->data which is not seen by the HW
at all because we only pass the DMA address of the page we allocated. So
I *think* the workaround is not required here.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |   24 ------------------------
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f1aef68..fefbf4d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4391,30 +4391,6 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 			break;
 		}
 
-		/* Fix for errata 23, can't cross 64kB boundary */
-		if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
-			struct sk_buff *oldskb = skb;
-			e_err(rx_err, "skb align check failed: %u bytes at "
-			      "%p\n", bufsz, skb->data);
-			/* Try again, without freeing the previous */
-			skb = netdev_alloc_skb_ip_align(netdev, bufsz);
-			/* Failed allocation, critical failure */
-			if (!skb) {
-				dev_kfree_skb(oldskb);
-				adapter->alloc_rx_buff_failed++;
-				break;
-			}
-
-			if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
-				/* give up */
-				dev_kfree_skb(skb);
-				dev_kfree_skb(oldskb);
-				break; /* while (cleaned_count--) */
-			}
-
-			/* Use new allocation */
-			dev_kfree_skb(oldskb);
-		}
 		buffer_info->skb = skb;
 		buffer_info->length = adapter->rx_buffer_len;
 check_page:
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 1/4] e1000e: fix typo in definition of E1000_CTRL_EXT_FORCE_SMBUS
From: Jeff Kirsher @ 2012-05-17 12:31 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337257912-8487-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

This define is needed by i217.

Reported-by: Bjorn Mork <bjorn@mork.no>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/defines.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 11c4666..351a409 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -76,7 +76,7 @@
 /* Extended Device Control */
 #define E1000_CTRL_EXT_LPCD  0x00000004     /* LCD Power Cycle Done */
 #define E1000_CTRL_EXT_SDP3_DATA 0x00000080 /* Value of SW Definable Pin 3 */
-#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000004 /* Force SMBus mode*/
+#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000800 /* Force SMBus mode */
 #define E1000_CTRL_EXT_EE_RST    0x00002000 /* Reinitialize from EEPROM */
 #define E1000_CTRL_EXT_SPD_BYPS  0x00008000 /* Speed Select Bypass */
 #define E1000_CTRL_EXT_RO_DIS    0x00020000 /* Relaxed Ordering disable */
-- 
1.7.7.6

^ permalink raw reply related

* [net-next v2 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-05-17 12:31 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series of patches contains updates for e1000, e1000e and igb.

v2: correct the patch for e1000 that looks for the last byte in a page to
    what was actually sent and tested.  What I originally sent in v1 of
    the series was a munge between v1 & v2 of what Sebastian sent me.

The following are changes since commit dc6b9b78234fecdc6d2ca5e1629185718202bcf5:
  net: include/net/sock.h cleanup
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Bruce Allan (1):
  e1000e: fix typo in definition of E1000_CTRL_EXT_FORCE_SMBUS

Matthew Vick (1):
  igb: Disable the BMC-to-OS Watchdog Enable bit for DMAC.

Sebastian Andrzej Siewior (2):
  e1000: remove workaround for Errata 23 from jumbo alloc
  e1000: look in the page and not in skb->data for the last byte

 drivers/net/ethernet/intel/e1000/e1000_main.c  |   30 ++++--------------------
 drivers/net/ethernet/intel/e1000e/defines.h    |    2 +-
 drivers/net/ethernet/intel/igb/e1000_defines.h |    2 +
 drivers/net/ethernet/intel/igb/igb_main.c      |    3 ++
 4 files changed, 11 insertions(+), 26 deletions(-)

-- 
1.7.7.6

^ permalink raw reply

* Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 12:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi Eric,

I'm facing a regression in stable 3.2.17 and 3.0.31 which is
exhibited by your patch 'tcp: allow splice() to build full TSO
packets' which unfortunately I am very interested in !

What I'm observing is that TCP transmits using splice() stall
quite quickly if I'm using pipes larger than 64kB (even 65537
is enough to reliably observe the stall).

I'm seeing this on haproxy running on a small ARM machine (a
dockstar), which exchanges data through a gig switch with my
development PC. The NIC (mv643xx) doesn't support TSO but has
GSO enabled. If I disable GSO, the problem remains. I can however
make the problem disappear by disabling SG or Tx checksumming.
BTW, using recv/send() instead of splice() also gets rid of the
problem.

I can also reduce the risk of seeing the problem by increasing
the default TCP buffer sizes in tcp_wmem. By default I'm running
at 16kB, but if I increase the output buffer size above the pipe
size, the problem *seems* to disappear though I can't be certain,
since larger buffers generally means the problem takes longer to
appear, probably due to the fact that the buffers don't need to
be filled. Still I'm certain that with 64k TCP buffers and 128k
pipes I'm still seeing it.

With strace, I'm seeing data fill up the pipe with the splice()
call responsible for pushing the data to the output socket returing
-1 EAGAIN. During this time, the client receives no data.

Something bugs me, I have tested with a dummy server of mine,
httpterm, which uses tee+splice() to push data outside, and it
has no problem filling the gig pipe, and correctly recoverers
from the EAGAIN :

  send(13, "HTTP/1.1 200\r\nConnection: close\r"..., 160, MSG_DONTWAIT|MSG_NOSIGNAL) = 160
  tee(0x3, 0x6, 0x10000, 0x2)             = 42552
  splice(0x5, 0, 0xd, 0, 0xa00000, 0x2)   = 14440
  tee(0x3, 0x6, 0x10000, 0x2)             = 13880
  splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2)   = -1 EAGAIN (Resource temporarily unavailable)
  ...
  tee(0x3, 0x6, 0x10000, 0x2)             = 13880
  splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2)   = 51100
  tee(0x3, 0x6, 0x10000, 0x2)             = 50744
  splice(0x5, 0, 0xd, 0, 0x9efffc, 0x2)   = 32120
  tee(0x3, 0x6, 0x10000, 0x2)             = 30264
  splice(0x5, 0, 0xd, 0, 0x9e8284, 0x2)   = -1 EAGAIN (Resource temporarily unavailable)

etc...

It's only with haproxy which uses splice() to copy data between
two sockets that I'm getting the issue (data forwarded from fd 0xe
to fd 0x6) :

  16:03:17.797144 pipe([36, 37])          = 0
  16:03:17.797318 fcntl64(36, 0x407 /* F_??? */, 0x20000) = 131072 ## note: fcntl(F_SETPIPE_SZ, 128k)
  16:03:17.797473 splice(0xe, 0, 0x25, 0, 0x9f2234, 0x3) = 10220
  16:03:17.797706 splice(0x24, 0, 0x6, 0, 0x27ec, 0x3) = 10220
  16:03:17.802036 gettimeofday({1324652597, 802093}, NULL) = 0
  16:03:17.802200 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
  16:03:17.802363 gettimeofday({1324652597, 802419}, NULL) = 0
  16:03:17.802530 splice(0xe, 0, 0x25, 0, 0x9efa48, 0x3) = 16060
  16:03:17.802789 splice(0x24, 0, 0x6, 0, 0x3ebc, 0x3) = 16060
  16:03:17.806593 gettimeofday({1324652597, 806651}, NULL) = 0
  16:03:17.806759 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 4
  16:03:17.806919 gettimeofday({1324652597, 806974}, NULL) = 0
  16:03:17.807087 splice(0xe, 0, 0x25, 0, 0x9ebb8c, 0x3) = 17520
  16:03:17.807356 splice(0x24, 0, 0x6, 0, 0x4470, 0x3) = 17520
  16:03:17.809565 gettimeofday({1324652597, 809620}, NULL) = 0
  16:03:17.809726 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
  16:03:17.809883 gettimeofday({1324652597, 809937}, NULL) = 0
  16:03:17.810047 splice(0xe, 0, 0x25, 0, 0x9e771c, 0x3) = 36500
  16:03:17.810399 splice(0x24, 0, 0x6, 0, 0x8e94, 0x3) = 23360
  16:03:17.810629 epoll_ctl(0x3, 0x1, 0x6, 0x85378) = 0       ## note: epoll_ctl(ADD, fd=6, dir=OUT).
  16:03:17.810792 gettimeofday({1324652597, 810848}, NULL) = 0
  16:03:17.810954 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
  16:03:17.811188 gettimeofday({1324652597, 811246}, NULL) = 0
  16:03:17.811356 splice(0xe, 0, 0x25, 0, 0x9de888, 0x3) = 21900
  16:03:17.811651 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)

So output fd 6 hangs here and will not appear anymore until
here where I pressed Ctrl-C to stop the test :

  16:03:24.740985 gettimeofday({1324652604, 741042}, NULL) = 0
  16:03:24.741148 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
  16:03:24.951762 gettimeofday({1324652604, 951838}, NULL) = 0
  16:03:24.951956 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EPIPE (Broken pipe)

I tried disabling LRO/GRO at the input interface (which happens to be
the same) to see if fragmentation of input data had any impact on this
but nothing chnages.

Please note that I'm not even certain the patch is the culprit, I'm
suspecting that by improving splice() efficiency, it might make a
latent issue become more visible. I have no data to back this
feeling, but nothing strikes me in your patch.

I don't know what I can do to troubleshoot this issue. I don't want
to pollute the list with network captures nor strace outputs, but I
have them if you're interested in verifying a few things.

I have another platform available for a test (Atom+82574L supporting
TSO). I'll rebuild and boot on this one to see if I observe the same
behaviour.

If you have any suggestion about things to check of tweaks to change
in the code, I'm quite open to experiment.

Best regards,
Willy

^ permalink raw reply

* Re: [net-next 3/4] e1000: look in the page and not in skb->data for the last byte
From: Jeff Kirsher @ 2012-05-17 12:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <4FB4E782.3070502@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Thu, 2012-05-17 at 13:56 +0200, Sebastian Andrzej Siewior wrote:
> On 05/17/2012 01:50 PM, Jeff Kirsher wrote:
> >
> > Your correct, I apologize.  This was my fault, I applied your v1 of the
> > patch and then realized there was a v2.
> >
> > I will re-send the series with the correct patch.
> 
> Okay. I haven't seen [0] in the series. Did you merge it somewhere?
> 
> [0] http://thread.gmane.org/gmane.linux.drivers.e1000.devel/10019
> 
> Sebastian

No, not yet.  Aaron is still validating that patch since it was actually
the last one you sent me.  I expect to be pushing it in the next day or
so with some ixgbe patches, once it finishes validation.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [net-next 3/4] e1000: look in the page and not in skb->data for the last byte
From: Sebastian Andrzej Siewior @ 2012-05-17 11:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <1337255417.2714.49.camel@jtkirshe-mobl>

On 05/17/2012 01:50 PM, Jeff Kirsher wrote:
>
> Your correct, I apologize.  This was my fault, I applied your v1 of the
> patch and then realized there was a v2.
>
> I will re-send the series with the correct patch.

Okay. I haven't seen [0] in the series. Did you merge it somewhere?

[0] http://thread.gmane.org/gmane.linux.drivers.e1000.devel/10019

Sebastian

^ permalink raw reply

* Re: [PATCH 06/15] batman-adv: Distributed ARP Table - add snooping functions for ARP messages
From: Marek Lindner @ 2012-05-17 11:53 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, David Miller
In-Reply-To: <201205121626.38520.lindner_marek-LWAfsSFWpa4@public.gmane.org>


David,

> On Tuesday, May 01, 2012 08:59:04 David Miller wrote:
> > From: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> > Date: Tue, 1 May 2012 00:22:30 +0200
> > 
> > > However this patch also contains a procedure which queries the neigh
> > > table in order to understand whether a given host is known or not.
> > > Would it be possible to do that in another way (Without manually
> > > touching the table)?
> > > 
> > > Instead, in the next patch (patch 06/15) batman-adv manually increase
> > > the neigh timeouts. Do you think we should avoid doing that as well?
> > > If we are allowed to do that, how can we perform the same operation in
> > > a cleaner way?
> > > 
> > > Last question: why can't other modules use exported functions? Are you
> > > going to change them as well?
> > 
> > I really don't have time to discuss your neigh issues right now as I'm
> > busy speaking at conferences and dealing with the backlog of other
> > patches.
> > 
> > You'll need to find someone else to discuss it with you, sorry.
> 
> I hope now is a good moment to bring the questions back onto the table. We
> still are not sure how to proceed because we have no clear picture of what
> is going to come and how the exported functions are supposed to be used.
> 
> David, if you don't have the time to discuss the ARP handling with us could
> you name someone who knows your plans and the code equally well ? So far,
> nobody has stepped up.

let me add another piece of information: The distributed ARP table does not 
really depend on the kernel's ARP table. We can easily write our own backend 
to be totally independent of the kernel's ARP table. Initially, we thought it 
might be considered a smart move if the code made use of existing kernel 
infrastructure instead of writing our own storage / user space API / etc, 
hence duplicating what is already there. But if you feel this is the better 
way forward we certainly will make the necessary changes.

Regards,
Marek

^ permalink raw reply

* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-05-17 11:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, sassmann
In-Reply-To: <1337254070-32500-1-git-send-email-jeffrey.t.kirsher@intel.com>

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

On Thu, 2012-05-17 at 04:27 -0700, Jeff Kirsher wrote:
> This series of patches contains updates for e1000, e1000e and igb.
> 
> The following are changes since commit dc6b9b78234fecdc6d2ca5e1629185718202bcf5:
>   net: include/net/sock.h cleanup
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
> 
> Bruce Allan (1):
>   e1000e: fix typo in definition of E1000_CTRL_EXT_FORCE_SMBUS
> 
> Matthew Vick (1):
>   igb: Disable the BMC-to-OS Watchdog Enable bit for DMAC.
> 
> Sebastian Andrzej Siewior (2):
>   e1000: remove workaround for Errata 23 from jumbo alloc
>   e1000: look in the page and not in skb->data for the last byte
> 
>  drivers/net/ethernet/intel/e1000/e1000_main.c  |   30 ++++--------------------
>  drivers/net/ethernet/intel/e1000e/defines.h    |    2 +-
>  drivers/net/ethernet/intel/igb/e1000_defines.h |    2 +
>  drivers/net/ethernet/intel/igb/igb_main.c      |    3 ++
>  4 files changed, 11 insertions(+), 26 deletions(-)
> 

v2 of the series will be coming.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [net-next 3/4] e1000: look in the page and not in skb->data for the last byte
From: Jeff Kirsher @ 2012-05-17 11:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <4FB4E34F.2050004@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

On Thu, 2012-05-17 at 13:38 +0200, Sebastian Andrzej Siewior wrote:
> On 05/17/2012 01:27 PM, Jeff Kirsher wrote:
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index fefbf4d..6ac80c8 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -4066,7 +4066,11 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
> >   		/* errors is only valid for DD + EOP descriptors */
> >   		if (unlikely((status&  E1000_RXD_STAT_EOP)&&
> >   		(rx_desc->errors&  E1000_RXD_ERR_FRAME_ERR_MASK))) {
> > -			u8 last_byte = *(skb->data + length - 1);
> > +			u8 *mapped;
> > +			u8 last_byte;
> > +
> > +			mapped = kmap_atomic(buffer_info->page);
> > +			last_byte = *(mapped + length - 1);
> >   			if (TBI_ACCEPT(hw, status, rx_desc->errors, length,
> >   				       last_byte)) {
> >   				spin_lock_irqsave(&adapter->stats_lock,
> 
> This is not what I've sent. My original patch [0] hat a unmap as well. 
> One comment was, that kmap_atomic() is too much overhead because the 
> page can never be highmem. So I changed it to page_address() [1].
> 
> [0] http://permalink.gmane.org/gmane.linux.drivers.e1000.devel/10008
> [1] http://permalink.gmane.org/gmane.linux.drivers.e1000.devel/10012
> 
> Sebastian

Your correct, I apologize.  This was my fault, I applied your v1 of the
patch and then realized there was a v2.

I will re-send the series with the correct patch.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1] smsc95xx: add FLAG_POINTTOPOINT flag for driver_info
From: Ben Hutchings @ 2012-05-17 11:45 UTC (permalink / raw)
  To: Xiao Jiang; +Cc: steve.glendinning, gregkh, netdev, linux-usb, linux-kernel
In-Reply-To: <4FB4B98E.7000208@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Thu, 2012-05-17 at 16:40 +0800, Xiao Jiang wrote:
> Ben Hutchings wrote:
> > On Wed, 2012-05-16 at 16:01 +0800, jgq516@gmail.com wrote:
> >   
> >> From: Xiao Jiang <jgq516@gmail.com>
> >>
> >> commit c26134 introduced FLAG_POINTTOPOINT flag for USB ethernet devices
> >> which possibly use "usb%d" names, add this flag to make sure pandaboard
> >> can mount nfs with smsc95xx NIC.
> >>     
> >
> > These are normal Ethernet interfaces, whereas FLAG_POINTTOPOINT is for
> > devices that use non-standard short physical links.
> >
> >   
> This flag is used by some usb NICs, I amn't familiar with those cards 
> perhaps those are
> non-standard short physical links as you said.
> But smsc95xx seems need this flag to use "usb%d" name,

But this is a regular Ethernet interface and should be named
accordingly.

> at least my 
> pandaboard can't
> mount nfs with eth0 name, is there other ways to avoid nfs issue with 
> keep smsc95xx's
> name unchange? thanks.
[...]

I don't know what this NFS issue is, but I don't see how this can be the
correct solution.

Ben.

-- 
Ben Hutchings
Every program is either trivial or else contains at least one bug

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [net-next 3/4] e1000: look in the page and not in skb->data for the last byte
From: Sebastian Andrzej Siewior @ 2012-05-17 11:38 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <1337254070-32500-4-git-send-email-jeffrey.t.kirsher@intel.com>

On 05/17/2012 01:27 PM, Jeff Kirsher wrote:
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index fefbf4d..6ac80c8 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -4066,7 +4066,11 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
>   		/* errors is only valid for DD + EOP descriptors */
>   		if (unlikely((status&  E1000_RXD_STAT_EOP)&&
>   		(rx_desc->errors&  E1000_RXD_ERR_FRAME_ERR_MASK))) {
> -			u8 last_byte = *(skb->data + length - 1);
> +			u8 *mapped;
> +			u8 last_byte;
> +
> +			mapped = kmap_atomic(buffer_info->page);
> +			last_byte = *(mapped + length - 1);
>   			if (TBI_ACCEPT(hw, status, rx_desc->errors, length,
>   				       last_byte)) {
>   				spin_lock_irqsave(&adapter->stats_lock,

This is not what I've sent. My original patch [0] hat a unmap as well. 
One comment was, that kmap_atomic() is too much overhead because the 
page can never be highmem. So I changed it to page_address() [1].

[0] http://permalink.gmane.org/gmane.linux.drivers.e1000.devel/10008
[1] http://permalink.gmane.org/gmane.linux.drivers.e1000.devel/10012

Sebastian

^ permalink raw reply

* [net-next 2/4] e1000: remove workaround for Errata 23 from jumbo alloc
From: Jeff Kirsher @ 2012-05-17 11:27 UTC (permalink / raw)
  To: davem; +Cc: Sebastian Andrzej Siewior, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337254070-32500-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

According to the comment, errata 23 says that the memory we allocate
can't cross a 64KiB boundary. In case of jumbo frames we allocate
complete pages which can never cross the 64KiB boundary because
PAGE_SIZE should be a multiple of 64KiB so we stop either before the
boundary or start after it but never cross it. Furthermore the check
seems bogus because it looks at skb->data which is not seen by the HW
at all because we only pass the DMA address of the page we allocated. So
I *think* the workaround is not required here.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |   24 ------------------------
 1 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f1aef68..fefbf4d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4391,30 +4391,6 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 			break;
 		}
 
-		/* Fix for errata 23, can't cross 64kB boundary */
-		if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
-			struct sk_buff *oldskb = skb;
-			e_err(rx_err, "skb align check failed: %u bytes at "
-			      "%p\n", bufsz, skb->data);
-			/* Try again, without freeing the previous */
-			skb = netdev_alloc_skb_ip_align(netdev, bufsz);
-			/* Failed allocation, critical failure */
-			if (!skb) {
-				dev_kfree_skb(oldskb);
-				adapter->alloc_rx_buff_failed++;
-				break;
-			}
-
-			if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
-				/* give up */
-				dev_kfree_skb(skb);
-				dev_kfree_skb(oldskb);
-				break; /* while (cleaned_count--) */
-			}
-
-			/* Use new allocation */
-			dev_kfree_skb(oldskb);
-		}
 		buffer_info->skb = skb;
 		buffer_info->length = adapter->rx_buffer_len;
 check_page:
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 3/4] e1000: look in the page and not in skb->data for the last byte
From: Jeff Kirsher @ 2012-05-17 11:27 UTC (permalink / raw)
  To: davem; +Cc: Sebastian Andrzej Siewior, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337254070-32500-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The code seems to want to look at the last byte where the HW puts some
information. Since the skb->data area is never seen by the HW I guess it
does not work as expected. We pass the page address to the HW so I
*think* in order to get to the last byte where the information might be
one should use the page buffer and take a look.
This is of course not more than just compile tested.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index fefbf4d..6ac80c8 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4066,7 +4066,11 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 		/* errors is only valid for DD + EOP descriptors */
 		if (unlikely((status & E1000_RXD_STAT_EOP) &&
 		    (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK))) {
-			u8 last_byte = *(skb->data + length - 1);
+			u8 *mapped;
+			u8 last_byte;
+
+			mapped = kmap_atomic(buffer_info->page);
+			last_byte = *(mapped + length - 1);
 			if (TBI_ACCEPT(hw, status, rx_desc->errors, length,
 				       last_byte)) {
 				spin_lock_irqsave(&adapter->stats_lock,
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 4/4] igb: Disable the BMC-to-OS Watchdog Enable bit for DMAC.
From: Jeff Kirsher @ 2012-05-17 11:27 UTC (permalink / raw)
  To: davem; +Cc: Matthew Vick, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337254070-32500-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Matthew Vick <matthew.vick@intel.com>

Under certain scenarios, it's possible that bursty manageability traffic
over the BMC-to-OS path may overrun the internal manageability receive
buffer causing dropped manageability packets. Clearing this bit prevents
this situation by interrupting coalescing to allow manageability traffic
through.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |    2 ++
 drivers/net/ethernet/intel/igb/igb_main.c      |    3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 6409f85..ec7e4fe 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -301,6 +301,8 @@
 							* transactions */
 #define E1000_DMACR_DMAC_LX_SHIFT       28
 #define E1000_DMACR_DMAC_EN             0x80000000 /* Enable DMA Coalescing */
+/* DMA Coalescing BMC-to-OS Watchdog Enable */
+#define E1000_DMACR_DC_BMC2OSW_EN	0x00008000
 
 #define E1000_DMCTXTH_DMCTTHR_MASK      0x00000FFF /* DMA Coalescing Transmit
 							* Threshold */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9bbf1a2..dd3bfe8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7147,6 +7147,9 @@ static void igb_init_dmac(struct igb_adapter *adapter, u32 pba)
 
 			/* watchdog timer= +-1000 usec in 32usec intervals */
 			reg |= (1000 >> 5);
+
+			/* Disable BMC-to-OS Watchdog Enable */
+			reg &= ~E1000_DMACR_DC_BMC2OSW_EN;
 			wr32(E1000_DMACR, reg);
 
 			/*
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-05-17 11:27 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series of patches contains updates for e1000, e1000e and igb.

The following are changes since commit dc6b9b78234fecdc6d2ca5e1629185718202bcf5:
  net: include/net/sock.h cleanup
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Bruce Allan (1):
  e1000e: fix typo in definition of E1000_CTRL_EXT_FORCE_SMBUS

Matthew Vick (1):
  igb: Disable the BMC-to-OS Watchdog Enable bit for DMAC.

Sebastian Andrzej Siewior (2):
  e1000: remove workaround for Errata 23 from jumbo alloc
  e1000: look in the page and not in skb->data for the last byte

 drivers/net/ethernet/intel/e1000/e1000_main.c  |   30 ++++--------------------
 drivers/net/ethernet/intel/e1000e/defines.h    |    2 +-
 drivers/net/ethernet/intel/igb/e1000_defines.h |    2 +
 drivers/net/ethernet/intel/igb/igb_main.c      |    3 ++
 4 files changed, 11 insertions(+), 26 deletions(-)

-- 
1.7.7.6

^ permalink raw reply

* [net-next 1/4] e1000e: fix typo in definition of E1000_CTRL_EXT_FORCE_SMBUS
From: Jeff Kirsher @ 2012-05-17 11:27 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1337254070-32500-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

This define is needed by i217.

Reported-by: Bjorn Mork <bjorn@mork.no>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/defines.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 11c4666..351a409 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -76,7 +76,7 @@
 /* Extended Device Control */
 #define E1000_CTRL_EXT_LPCD  0x00000004     /* LCD Power Cycle Done */
 #define E1000_CTRL_EXT_SDP3_DATA 0x00000080 /* Value of SW Definable Pin 3 */
-#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000004 /* Force SMBus mode*/
+#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000800 /* Force SMBus mode */
 #define E1000_CTRL_EXT_EE_RST    0x00002000 /* Reinitialize from EEPROM */
 #define E1000_CTRL_EXT_SPD_BYPS  0x00008000 /* Speed Select Bypass */
 #define E1000_CTRL_EXT_RO_DIS    0x00020000 /* Relaxed Ordering disable */
-- 
1.7.7.6

^ permalink raw reply related

* [net] e1000: Prevent reset task killing itself.
From: Jeff Kirsher @ 2012-05-17 11:04 UTC (permalink / raw)
  To: davem; +Cc: Tushar Dave, netdev, gospo, sassmann, stable, Jeff Kirsher

From: Tushar Dave <tushar.n.dave@intel.com>

Killing reset task while adapter is resetting causes deadlock.
Only kill reset task if adapter is not resetting.
Ref bug #43132 on bugzilla.kernel.org

CC: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

@stable - this patch is applicable back to 3.1 kernels

---
 drivers/net/ethernet/intel/e1000/e1000_main.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 37caa88..8d8908d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -493,7 +493,11 @@ out:
 static void e1000_down_and_stop(struct e1000_adapter *adapter)
 {
 	set_bit(__E1000_DOWN, &adapter->flags);
-	cancel_work_sync(&adapter->reset_task);
+
+	/* Only kill reset task if adapter is not resetting */
+	if (!test_bit(__E1000_RESETTING, &adapter->flags))
+		cancel_work_sync(&adapter->reset_task);
+
 	cancel_delayed_work_sync(&adapter->watchdog_task);
 	cancel_delayed_work_sync(&adapter->phy_info_task);
 	cancel_delayed_work_sync(&adapter->fifo_stall_task);
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH net-next] net/mlx4_en: num cores tx rings for every UP
From: Amir Vadai @ 2012-05-17 10:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Oren Duer, Amir Vadai, John Fastabend, Liran Liss

Change the TX ring scheme such that the number of rings for untagged packets
and for tagged packets (per each of the vlan priorities) is the same, unlike
the current situation where for tagged traffic there's one ring per priority
and for untagged rings as the number of core.

Queue selection is done as follows:

If the mqprio qdisc is operates on the interface, such that the core networking
code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
queue set is forced - for both, tagged and untagged traffic.

Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
all untagged traffic is sent through tx rings of UP 0.

The patch follows the convergence of discussing that issue with John Fastabend
over this thread http://comments.gmane.org/gmane.linux.network/229877

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c   |    6 ++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   41 +++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c     |   15 +++++---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    9 ++---
 4 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 346fdb2..988b242 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -101,6 +101,8 @@ static int mlx4_en_get_profile(struct mlx4_en_dev *mdev)
 	int i;
 
 	params->udp_rss = udp_rss;
+	params->num_tx_rings_p_up = min_t(int, num_online_cpus(),
+			MLX4_EN_MAX_TX_RING_P_UP);
 	if (params->udp_rss && !(mdev->dev->caps.flags
 					& MLX4_DEV_CAP_FLAG_UDP_RSS)) {
 		mlx4_warn(mdev, "UDP RSS is not supported on this device.\n");
@@ -113,8 +115,8 @@ static int mlx4_en_get_profile(struct mlx4_en_dev *mdev)
 		params->prof[i].tx_ppp = pfctx;
 		params->prof[i].tx_ring_size = MLX4_EN_DEF_TX_RING_SIZE;
 		params->prof[i].rx_ring_size = MLX4_EN_DEF_RX_RING_SIZE;
-		params->prof[i].tx_ring_num = MLX4_EN_NUM_TX_RINGS +
-			MLX4_EN_NUM_PPP_RINGS;
+		params->prof[i].tx_ring_num = params->num_tx_rings_p_up *
+			MLX4_EN_NUM_UP;
 		params->prof[i].rss_rings = 0;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index eaa8fad..926d8aa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -47,9 +47,22 @@
 
 static int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
-	if (up != MLX4_EN_NUM_UP)
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	int i;
+	unsigned int q, offset = 0;
+
+	if (up && up != MLX4_EN_NUM_UP)
 		return -EINVAL;
 
+	netdev_set_num_tc(dev, up);
+
+	/* Partition Tx queues evenly amongst UP's */
+	q = priv->tx_ring_num / up;
+	for (i = 0; i < up; i++) {
+		netdev_set_tc_queue(dev, i, q, offset);
+		offset += q;
+	}
+
 	return 0;
 }
 
@@ -661,7 +674,7 @@ int mlx4_en_start_port(struct net_device *dev)
 		/* Configure ring */
 		tx_ring = &priv->tx_ring[i];
 		err = mlx4_en_activate_tx_ring(priv, tx_ring, cq->mcq.cqn,
-				max(0, i - MLX4_EN_NUM_TX_RINGS));
+			i / priv->mdev->profile.num_tx_rings_p_up);
 		if (err) {
 			en_err(priv, "Failed allocating Tx ring\n");
 			mlx4_en_deactivate_cq(priv, cq);
@@ -986,6 +999,9 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 
 	mlx4_en_free_resources(priv);
 
+	kfree(priv->tx_ring);
+	kfree(priv->tx_cq);
+
 	free_netdev(dev);
 }
 
@@ -1091,6 +1107,18 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	priv->ctrl_flags = cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE |
 			MLX4_WQE_CTRL_SOLICITED);
 	priv->tx_ring_num = prof->tx_ring_num;
+	priv->tx_ring = kzalloc(sizeof(struct mlx4_en_tx_ring) *
+			priv->tx_ring_num, GFP_KERNEL);
+	if (!priv->tx_ring) {
+		err = -ENOMEM;
+		goto out;
+	}
+	priv->tx_cq = kzalloc(sizeof(struct mlx4_en_cq) * priv->tx_ring_num,
+			GFP_KERNEL);
+	if (!priv->tx_cq) {
+		err = -ENOMEM;
+		goto out;
+	}
 	priv->rx_ring_num = prof->rx_ring_num;
 	priv->mac_index = -1;
 	priv->msg_enable = MLX4_EN_MSG_LEVEL;
@@ -1138,15 +1166,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	netif_set_real_num_tx_queues(dev, priv->tx_ring_num);
 	netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
 
-	netdev_set_num_tc(dev, MLX4_EN_NUM_UP);
-
-	/* First 9 rings are for UP 0 */
-	netdev_set_tc_queue(dev, 0, MLX4_EN_NUM_TX_RINGS + 1, 0);
-
-	/* Partition Tx queues evenly amongst UP's 1-7 */
-	for (i = 1; i < MLX4_EN_NUM_UP; i++)
-		netdev_set_tc_queue(dev, i, 1, MLX4_EN_NUM_TX_RINGS + i);
-
 	SET_ETHTOOL_OPS(dev, &mlx4_en_ethtool_ops);
 
 	/* Set defualt MAC */
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 9a38483..019d856 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -525,14 +525,17 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk
 
 u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
-	u16 vlan_tag = 0;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	u16 rings_p_up = priv->mdev->profile.num_tx_rings_p_up;
+	u8 up = 0;
 
-	if (vlan_tx_tag_present(skb)) {
-		vlan_tag = vlan_tx_tag_get(skb);
-		return MLX4_EN_NUM_TX_RINGS + (vlan_tag >> 13);
-	}
+	if (dev->num_tc)
+		return skb_tx_hash(dev, skb);
+
+	if (vlan_tx_tag_present(skb))
+		up = vlan_tx_tag_get(skb) >> VLAN_PRIO_SHIFT;
 
-	return skb_tx_hash(dev, skb);
+	return __skb_tx_hash(dev, skb, rings_p_up) + up * rings_p_up;
 }
 
 static void mlx4_bf_copy(void __iomem *dst, unsigned long *src, unsigned bytecnt)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 5d87637..6ae3509 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -111,9 +111,7 @@ enum {
 #define MLX4_EN_MIN_TX_SIZE	(4096 / TXBB_SIZE)
 
 #define MLX4_EN_SMALL_PKT_SIZE		64
-#define MLX4_EN_NUM_TX_RINGS		8
-#define MLX4_EN_NUM_PPP_RINGS		8
-#define MAX_TX_RINGS			(MLX4_EN_NUM_TX_RINGS + MLX4_EN_NUM_PPP_RINGS)
+#define MLX4_EN_MAX_TX_RING_P_UP	32
 #define MLX4_EN_NUM_UP			8
 #define MLX4_EN_DEF_TX_RING_SIZE	512
 #define MLX4_EN_DEF_RX_RING_SIZE  	1024
@@ -339,6 +337,7 @@ struct mlx4_en_profile {
 	u32 active_ports;
 	u32 small_pkt_int;
 	u8 no_reset;
+	u8 num_tx_rings_p_up;
 	struct mlx4_en_port_profile prof[MLX4_MAX_PORTS + 1];
 };
 
@@ -477,9 +476,9 @@ struct mlx4_en_priv {
 	u16 num_frags;
 	u16 log_rx_info;
 
-	struct mlx4_en_tx_ring tx_ring[MAX_TX_RINGS];
+	struct mlx4_en_tx_ring *tx_ring;
 	struct mlx4_en_rx_ring rx_ring[MAX_RX_RINGS];
-	struct mlx4_en_cq tx_cq[MAX_TX_RINGS];
+	struct mlx4_en_cq *tx_cq;
 	struct mlx4_en_cq rx_cq[MAX_RX_RINGS];
 	struct work_struct mcast_task;
 	struct work_struct mac_task;
-- 
1.7.8.2

^ permalink raw reply related

* Re: [PATCH v4 6/6] net: sh_eth: use NAPI
From: Francois Romieu @ 2012-05-17 10:33 UTC (permalink / raw)
  To: Shimoda, Yoshihiro; +Cc: netdev, SH-Linux
In-Reply-To: <4FB32D17.30404@renesas.com>

Shimoda, Yoshihiro <yoshihiro.shimoda.uh@renesas.com> :
[...]
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index c64a31c..edc7dfe 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
[...]
> +static int sh_eth_poll(struct napi_struct *napi, int budget)
> +{
> +	struct sh_eth_private *mdp = container_of(napi, struct sh_eth_private,
> +						  napi);
> +	struct net_device *ndev = mdp->ndev;
> +	struct sh_eth_cpu_data *cd = mdp->cd;
> +	int work_done = 0, txfree_num;
> +	u32 intr_status = sh_eth_read(ndev, EESR);
> +
> +	/* Clear interrupt flags */
> +	sh_eth_write(ndev, intr_status, EESR);
> +
> +	/* check txdesc */
> +	txfree_num = sh_eth_txfree(ndev);

[...]
> @@ -1678,19 +1710,15 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>  	struct sh_eth_private *mdp = netdev_priv(ndev);
>  	struct sh_eth_txdesc *txdesc;
>  	u32 entry;
> -	unsigned long flags;
> 
> -	spin_lock_irqsave(&mdp->lock, flags);
>  	if ((mdp->cur_tx - mdp->dirty_tx) >= (mdp->num_tx_ring - 4)) {
>  		if (!sh_eth_txfree(ndev)) {

There are now two racing sh_eth_txfree and there is no [PATCH v4 7/6].

If I may suggest a slightly different approach, I would apply the patch
below before anything NAPI related:

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index d63e09b..6d77462 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1495,18 +1495,6 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	u32 entry;
 	unsigned long flags;
 
-	spin_lock_irqsave(&mdp->lock, flags);
-	if ((mdp->cur_tx - mdp->dirty_tx) >= (TX_RING_SIZE - 4)) {
-		if (!sh_eth_txfree(ndev)) {
-			if (netif_msg_tx_queued(mdp))
-				dev_warn(&ndev->dev, "TxFD exhausted.\n");
-			netif_stop_queue(ndev);
-			spin_unlock_irqrestore(&mdp->lock, flags);
-			return NETDEV_TX_BUSY;
-		}
-	}
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
 	entry = mdp->cur_tx % TX_RING_SIZE;
 	mdp->tx_skbuff[entry] = skb;
 	txdesc = &mdp->tx_ring[entry];
@@ -1531,6 +1519,15 @@ static int sh_eth_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	if (!(sh_eth_read(ndev, EDTRR) & sh_eth_get_edtrr_trns(mdp)))
 		sh_eth_write(ndev, sh_eth_get_edtrr_trns(mdp), EDTRR);
 
+	spin_lock_irqsave(&mdp->lock, flags);
+	if ((mdp->cur_tx - mdp->dirty_tx) >= (TX_RING_SIZE - 4)) {
+		if (netif_msg_tx_queued(mdp)) {
+			dev_warn(&ndev->dev, "TxFD exhausted.\n");
+			netif_stop_queue(ndev);
+		}
+	}
+	spin_unlock_irqrestore(&mdp->lock, flags);
+
 	return NETDEV_TX_OK;
 }
 

Rationale: the driver does not need to return NETDEV_TX_BUSY when it
should signal that it will not handle more packets after the current
one. You may add an extra assertion at the start of sh_eth_start_xmit()
and return NETDEV_TX_BUSY but it should be understood as a debug / bug
helper only.

Then you can convert to a {start/stop} queue race free NAPI with adequate
barriers.

-- 
Ueimor

^ permalink raw reply related

* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: KAMEZAWA Hiroyuki @ 2012-05-17 10:27 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, devel-GEFAQzZX7r8dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, Michal Hocko
In-Reply-To: <4FB4D14D.4020303-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

(2012/05/17 19:22), Glauber Costa wrote:

> On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote:
>> (2012/05/17 18:52), Glauber Costa wrote:
>>
>>> On 05/17/2012 09:37 AM, Andrew Morton wrote:
>>>>>>   If that happens, locking in static_key_slow_inc will prevent any damage.
>>>>>>   My previous version had explicit code to prevent that, but we were
>>>>>>   pointed out that this is already part of the static_key expectations, so
>>>>>>   that was dropped.
>>>> This makes no sense.  If two threads run that code concurrently,
>>>> key->enabled gets incremented twice.  Nobody anywhere has a record that
>>>> this happened so it cannot be undone.  key->enabled is now in an
>>>> unknown state.
>>>
>>> Kame, Tejun,
>>>
>>> Andrew is right. It seems we will need that mutex after all. Just this
>>> is not a race, and neither something that should belong in the
>>> static_branch interface.
>>>
>>
>>
>> Hmm....how about having
>>
>> res_counter_xchg_limit(res,&old_limit, new_limit);
>>
>> if (!cg_proto->updated&&  old_limit == RESOURCE_MAX)
>> 	....update labels...
>>
>> Then, no mutex overhead maybe and activated will be updated only once.
>> Ah, but please fix in a way you like. Above is an example.
> 
> I think a mutex is a lot cleaner than adding a new function to the 
> res_counter interface.
> 
> We could do a counter, and then later decrement the key until the 
> counter reaches zero, but between those two, I still think a mutex here 
> is preferable.
> 
> Only that, instead of coming up with a mutex of ours, we could export 
> and reuse set_limit_mutex from memcontrol.c
> 


ok, please.

thx,
-Kame

> 
>> Thanks,
>> -Kame
>> (*) I'm sorry I won't be able to read e-mails, tomorrow.
>>
> Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be 
> hurting any real workload.
> 
> 

^ permalink raw reply

* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: Glauber Costa @ 2012-05-17 10:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, devel-GEFAQzZX7r8dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, Michal Hocko
In-Reply-To: <4FB4D061.10406-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

On 05/17/2012 02:18 PM, KAMEZAWA Hiroyuki wrote:
> (2012/05/17 18:52), Glauber Costa wrote:
>
>> On 05/17/2012 09:37 AM, Andrew Morton wrote:
>>>>>   If that happens, locking in static_key_slow_inc will prevent any damage.
>>>>>   My previous version had explicit code to prevent that, but we were
>>>>>   pointed out that this is already part of the static_key expectations, so
>>>>>   that was dropped.
>>> This makes no sense.  If two threads run that code concurrently,
>>> key->enabled gets incremented twice.  Nobody anywhere has a record that
>>> this happened so it cannot be undone.  key->enabled is now in an
>>> unknown state.
>>
>> Kame, Tejun,
>>
>> Andrew is right. It seems we will need that mutex after all. Just this
>> is not a race, and neither something that should belong in the
>> static_branch interface.
>>
>
>
> Hmm....how about having
>
> res_counter_xchg_limit(res,&old_limit, new_limit);
>
> if (!cg_proto->updated&&  old_limit == RESOURCE_MAX)
> 	....update labels...
>
> Then, no mutex overhead maybe and activated will be updated only once.
> Ah, but please fix in a way you like. Above is an example.

I think a mutex is a lot cleaner than adding a new function to the 
res_counter interface.

We could do a counter, and then later decrement the key until the 
counter reaches zero, but between those two, I still think a mutex here 
is preferable.

Only that, instead of coming up with a mutex of ours, we could export 
and reuse set_limit_mutex from memcontrol.c


> Thanks,
> -Kame
> (*) I'm sorry I won't be able to read e-mails, tomorrow.
>
Ok Kame. I am not in a terrible hurry to fix this, it doesn't seem to be 
hurting any real workload.

^ permalink raw reply

* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: KAMEZAWA Hiroyuki @ 2012-05-17 10:18 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Andrew Morton, cgroups, linux-mm, devel, netdev, Tejun Heo,
	Li Zefan, Johannes Weiner, Michal Hocko
In-Reply-To: <4FB4CA4D.50608@parallels.com>

(2012/05/17 18:52), Glauber Costa wrote:

> On 05/17/2012 09:37 AM, Andrew Morton wrote:
>>>>  If that happens, locking in static_key_slow_inc will prevent any damage.
>>>>  My previous version had explicit code to prevent that, but we were
>>>>  pointed out that this is already part of the static_key expectations, so
>>>>  that was dropped.
>> This makes no sense.  If two threads run that code concurrently,
>> key->enabled gets incremented twice.  Nobody anywhere has a record that
>> this happened so it cannot be undone.  key->enabled is now in an
>> unknown state.
> 
> Kame, Tejun,
> 
> Andrew is right. It seems we will need that mutex after all. Just this 
> is not a race, and neither something that should belong in the 
> static_branch interface.
> 


Hmm....how about having

res_counter_xchg_limit(res, &old_limit, new_limit);

if (!cg_proto->updated && old_limit == RESOURCE_MAX)
	....update labels...

Then, no mutex overhead maybe and activated will be updated only once.
Ah, but please fix in a way you like. Above is an example.

Thanks,
-Kame
(*) I'm sorry I won't be able to read e-mails, tomorrow.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v5 2/2] decrement static keys on real destroy time
From: Glauber Costa @ 2012-05-17  9:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	devel-GEFAQzZX7r8dnm+yROfE0A,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	netdev-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Li Zefan,
	Johannes Weiner, Michal Hocko
In-Reply-To: <20120516223715.5d1b4385.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

On 05/17/2012 09:37 AM, Andrew Morton wrote:
>> >  If that happens, locking in static_key_slow_inc will prevent any damage.
>> >  My previous version had explicit code to prevent that, but we were
>> >  pointed out that this is already part of the static_key expectations, so
>> >  that was dropped.
> This makes no sense.  If two threads run that code concurrently,
> key->enabled gets incremented twice.  Nobody anywhere has a record that
> this happened so it cannot be undone.  key->enabled is now in an
> unknown state.

Kame, Tejun,

Andrew is right. It seems we will need that mutex after all. Just this 
is not a race, and neither something that should belong in the 
static_branch interface.

We want to make sure that enabled is not updated before the jump label 
update, because we need a specific ordering guarantee at the patched 
sites. And *that*, the interface guarantees, and we were wrong to 
believe it did not. That is a correction issue for the accounting, and 
that part is right.

But when we disarm it, we'll need to make sure that happened only once, 
otherwise we may never unpatch it. That, or we'd need that to be a 
counter. The jump label interface does not - and should not - keep track 
of how many updates happened to a key. That's the role of whoever is 
using it.

If you agree with the above, I'll send this patch again with the correction.

Andrew, thank you very much. Do you spot anything else here?

^ permalink raw reply

* Re: [PATCH 1/1] smsc95xx: add FLAG_POINTTOPOINT flag for driver_info
From: Xiao Jiang @ 2012-05-17  9:51 UTC (permalink / raw)
  To: Ming Lei; +Cc: steve.glendinning, gregkh, netdev, linux-usb, linux-kernel
In-Reply-To: <CACVXFVPLf9+8qQKgkikexq3ao=b9fM4jOCasMWVJrbZEVSj_Tg@mail.gmail.com>

Ming Lei wrote:
> On Thu, May 17, 2012 at 10:23 AM, Xiao Jiang <jgq516@gmail.com> wrote:
>   
>> Ming Lei wrote:
>>     
>>> On Wed, May 16, 2012 at 4:01 PM,  <jgq516@gmail.com> wrote:
>>>
>>>       
>>>> From: Xiao Jiang <jgq516@gmail.com>
>>>>
>>>> commit c26134 introduced FLAG_POINTTOPOINT flag for USB ethernet devices
>>>> which possibly use "usb%d" names, add this flag to make sure pandaboard
>>>> can mount nfs with smsc95xx NIC.
>>>>
>>>>         
>>> Without the flag, I also can mount nfs successfully on my Pandaboard...
>>>       
>
> I always mount nfs in console, and not tried to mount nfs as root fs.
>
>   
>>>       
>> I have pulled latest tree
>> (git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> commit 0e93b4b304ae052ba1bc73f6d34a68556fe93429), and enable related options
>> (USB_NET_SMSC95XX,
>> USB_EHCI_HCD and USB_EHCI_HCD_OMAP) with omap2plus_config, However the
>> kernel still can't mount
>> nfs, pls see below infos.
>>
>> [    3.114105] smsc95xx v1.0.4
>> [    4.533752] smsc95xx 1-1.1:1.0: *eth0*: register 'smsc95xx' at
>> usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, fe:b9:1b:07:8e:d1
>> [  108.854217] VFS: Unable to mount root fs via NFS, trying floppy.
>> [  108.861114] VFS: Cannot open root device "nfs" or unknown-block(2,0):
>> error -6
>> [  108.868713] Please append a correct "root=" boot option; here are the
>> available partitions:
>> [  108.877655] b300         7761920 mmcblk0  driver: mmcblk
>> [  108.883239]   b301           40131 mmcblk0p1
>> 00000000-0000-0000-0000-000000000mmcblk0p1
>> [  108.891662]   b302         7719232 mmcblk0p2
>> 00000000-0000-0000-0000-000000000mmcblk0p2
>> [  108.900146] Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(2,0)
>>
>> BTW: I tested it with OMAP4430 ES2.2 pandaboard, the issue can be solved
>> with apply the patch.
>>
>> Is there something which I missed? thanks.
>>     
>
> What is your kernel parameter? Maybe you use 'usb%d' in kernel parameter for
> mounting nfs as root fs. If so, could you try 'eth%d' in kernel cmd?
>
> In fact, smsc95xx is a real LAN interface, and 'eth%d' should be prefered name
> as described in changelog of commit
> c261344d3ce3edac781f9d3c7eabe2e96d8e8fe8(usbnet:use eth%d name for
> known ethernet devices)
>
>   
Thanks for your notice, I used wrong kernel parameter.

Regards,
Xiao
> Thanks,
>   

^ permalink raw reply

* tcp timestamp issues with google servers
From: Miklos Szeredi @ 2012-05-17  9:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Sometimes connection to google.com, gmail.com and other google servers
doesn't work or takes ages to connect.  When this hits it hits all
google servers at the same time and it's persistent.  It never happens
to anything other than google.  Rebooting helps.  Rarely it goes away
spontaneously.

Apparently google is sometimes replying with an invalid TSecr timestamp
value (smaller than the one sent in the last packet) and this confuses
the Linux TCP stack which either discards the packet or sends a Reset.

Network dump attached.

I found only a couple of references to this issue:

http://gotchas.livejournal.com/3028.html

http://groups.google.com/group/comp.os.linux.networking/browse_thread/thread/29f56feded11b42a

Turning tcp timestamps fixes the issue:

  sysctl -w net.ipv4.tcp_timestamps=0

Not sure why this happens only to me and a very few others.

It appears to be an issue with google TCP stack (is it a modified
stack?) but I thought about issues in my network switch (restarting it
doesn't help) or something in the ISP, but those look unlikely.

Any ideas?

Thanks,
Miklos



  1   0.000000 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35355050 TSER=0 WS=5
  2   0.002730 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=0 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184565067 TSER=35325344 WS=6
  3   0.002776 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [RST] Seq=1 Win=0 Len=0
  4   1.001408 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35356052 TSER=0 WS=5
  5   1.004136 74.125.232.226 -> 192.168.28.100 TCP [TCP Previous segment lost] http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184566068 TSER=35325344 WS=6
  6   1.411915 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184566476 TSER=35325344 WS=6
  7   2.011568 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184567076 TSER=35325344 WS=6
  8   3.005400 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35358056 TSER=0 WS=5
  9   3.007972 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184568072 TSER=35325344 WS=6
 10   3.212862 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184568277 TSER=35325344 WS=6
 11   5.612449 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184570677 TSER=35325344 WS=6
 12   7.013405 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35362064 TSER=0 WS=5
 13   7.016627 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184572080 TSER=35325344 WS=6
 14  10.412642 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184575477 TSER=35325344 WS=6
 15  15.029547 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35370080 TSER=0 WS=5
 16  15.032931 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=15638919 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184580097 TSER=35325344 WS=6
 17  31.061400 192.168.28.100 -> 74.125.232.226 TCP 51303 > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSV=35386112 TSER=0 WS=5
 18  31.064538 74.125.232.226 -> 192.168.28.100 TCP [TCP Previous segment lost] http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184596129 TSER=35325344 WS=6
 19  31.416339 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184596480 TSER=35325344 WS=6
 20  32.015998 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184597081 TSER=35325344 WS=6
 21  33.216276 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184598281 TSER=35325344 WS=6
 22  35.616879 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184600681 TSER=35325344 WS=6
 23  40.417065 74.125.232.226 -> 192.168.28.100 TCP http > 51303 [SYN, ACK] Seq=485350292 Ack=1 Win=14180 Len=0 MSS=1430 SACK_PERM=1 TSV=1184605482 TSER=35325344 WS=6

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox