Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] virtio-net: Read MAC only after initializing MSI-X
From: Michael S. Tsirkin @ 2011-08-20 20:00 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, Rusty Russell, virtualization, netdev, kvm
In-Reply-To: <1313771587.12243.16.camel@lappy>

On Fri, Aug 19, 2011 at 07:33:07PM +0300, Sasha Levin wrote:
> On Fri, 2011-08-19 at 18:23 +0300, Michael S. Tsirkin wrote:
> > On Sat, Aug 13, 2011 at 11:51:01AM +0300, Sasha Levin wrote:
> > > The MAC of a virtio-net device is located at the first field of the device
> > > specific header. This header is located at offset 20 if the device doesn't
> > > support MSI-X or offset 24 if it does.
> > > 
> > > Current code in virtnet_probe() used to probe the MAC before checking for
> > > MSI-X, which means that the read was always made from offset 20 regardless
> > > of whether MSI-X in enabled or not.
> > > 
> > > This patch moves the MAC probe to after the detection of whether MSI-X is
> > > enabled. This way the MAC will be read from offset 24 if the device indeed
> > > supports MSI-X.
> > > 
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: virtualization@lists.linux-foundation.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > 
> > I am not sure I see a bug in virtio: the config pace layout simply
> > changes as msix is enabled and disabled (and if you look at the latest
> > draft, also on whether 64 bit features are enabled).
> > It doesn't depend on msix capability being present in device.
> > 
> > The spec seems to be explicit enough:
> > 	If MSI-X is enabled for the device, two additional fields immediately
> > 	follow this header.
> > 
> > So I'm guessing the bug is in kvm tools which assume
> > same layout for when msix is enabled and disabled.
> > qemu-kvm seems to do the right thing so the device
> > seems to get the correct mac.
> 
> We assumed that PCI config space has a static layout like most other
> devices. Having a behavior of "First bit 20 does something, but after
> enabling MSI-X it does something completely different" sounds strange.

The layout is always virtio header followed by device specific header.
We started with a small header so when more data was added, we could not
extend the header unconditionally.

We can't change that behaviour for MSI-X now, guests and
hosts rely on it.

>
> I'm wondering why offsets of the config structure change during run time
> and are not statically defined when the device is started.

That's because of backwards compatibility with old guests.
When we know the guest is new, we expose new layout,
but old guests must see old layout.

> It's not like VIRTIO_F_FEATURES_HI can be disabled after it was enabled,

Yes it can, e.g. at guest reset. Generally features can be tweaked
any way guest likes until status is set to OK.

> or MSI-X can be simply disabled during run time.

Not sure what you mean by 'run time'. Guest can reset
or disable the device, change any parameters,
then re-enable.

> Maybe this is better solved by copying the way it was done in PCI itself
> with capability linked list?
> 
> -- 
> 
> Sasha.

There are any number of ways to lay out the structure.  I went for what
seemed a simplest one.  For MSI-X the train has left the station.  We
can probably still tweak where the high 32 bit features
for 64 bit features are.  No idea if it's worth it.


-- 
MST

^ permalink raw reply

* Re: Linux Kernel | Intel Driver Bug - please update
From: Nicolás Sigal | LocalHost Soluciones Innovadoras @ 2011-08-20 20:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: kristoffer, Brandeburg, Jesse, Allan, Bruce W, Ronciak, John,
	netdev
In-Reply-To: <1311547638.2835.72.camel@jtkirshe-mobl>

Please Jeff, we need the update as soon as possible, we still having crashes 
of eth because the driver..

Best regards;

.................................................
Nicolás Sigal
CEO :: LocalHost Soluciones Innovadoras
Mendoza 2917 :: C1428DKY
Ciudad Autónoma de Buenos Aires :: Argentina
Tel/Fax: 0810 55 LOCALHOST :: (011) 4784.6993
http://www.localhost.net.ar/
nicolas.sigal@localhost.net.ar

----- Original Message ----- 
From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
To: "Nicolás Sigal | LocalHost Soluciones Innovadoras" 
<nicolas.sigal@localhost.net.ar>
Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse" 
<jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>; 
"Ronciak, John" <john.ronciak@intel.com>; <support@localhost.net.ar>; 
"netdev" <netdev@vger.kernel.org>
Sent: Sunday, July 24, 2011 7:47 PM
Subject: Re: Linux Kernel | Intel Driver Bug - please update

On Sun, 2011-07-24 at 13:28 -0700, Nicolás Sigal | LocalHost Soluciones
Innovadoras wrote:
> Please, can you update the e1000e driver of the kernel to v1.4.4?
>
> http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3299&DwnldID=15817&ProductFamily=Componentes+Ethernet&ProductLine=Controladores+Ethernet&ProductProduct=Controlador+Intel%C2%AE+82579+Gigabit+Ethernet&DownloadType=Controladoresspa
>
> This version have an important bug fixed:
> * 82579: Fix for Tx Hang on FTS ME Platform
>
> And we have this problem in all of our servers with this NIC and ME
> on.
>
> Debug:
>
> Jul 24 14:23:27 [kernel] e1000e 0000:00:19.0: eth0: Detected Hardware
> Unit Hang:
> Jul 24 14:23:27 [kernel]   TDH                  <0>
> Jul 24 14:23:27 [kernel]   TDT                  <16>
> Jul 24 14:23:27 [kernel]   next_to_use          <16>
> Jul 24 14:23:27 [kernel]   next_to_clean        <0>
> Jul 24 14:23:27 [kernel] buffer_info[next_to_clean]:
> Jul 24 14:23:27 [kernel]   time_stamp           <10282156f>
> Jul 24 14:23:27 [kernel]   next_to_watch        <0>
> Jul 24 14:23:27 [kernel]   jiffies              <102821aed>
> Jul 24 14:23:27 [kernel]   next_to_watch.status <0>
> Jul 24 14:23:27 [kernel] MAC Status             <80143>
> Jul 24 14:23:27 [kernel] PHY Status             <796d>
> Jul 24 14:23:27 [kernel] PHY 1000BASE-T Status  <0>
> Jul 24 14:23:27 [kernel] PHY Extended Status    <3000>
> Jul 24 14:23:27 [kernel] PCI Status             <10>
> Jul 24 14:23:28 [kernel] e1000e 0000:00:19.0: eth0: Reset adapter
> Jul 24 14:23:30 [kernel] e1000e: eth0 NIC Link is Up 100 Mbps Full
> Duplex, Flow Control: Rx/Tx
> Jul 24 14:23:30 [kernel] e1000e 0000:00:19.0: eth0: 10/100 speed:
> disabling TSO
>


[removed ixgbe/igb Intel maintainers, and Linus from the email thread]
Added a more appropriate mailing list (netdev)

We currently have the kernel patches in review and testing to update the
e1000e driver to v1.4.4.  I should be able to push some, if not all of
the changes upstream later this week.


^ permalink raw reply

* Re: [PATCH] atm: br2684: Fix oops due to skb->dev being NULL
From: David Miller @ 2011-08-20 21:13 UTC (permalink / raw)
  To: daniel.schwierzeck; +Cc: netdev, stable
In-Reply-To: <1313791460-13652-1-git-send-email-daniel.schwierzeck@googlemail.com>

From: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com>
Date: Sat, 20 Aug 2011 00:04:20 +0200

> This oops have been already fixed with commit
> 
>     27141666b69f535a4d63d7bc6d9e84ee5032f82a
> 
>     atm: [br2684] Fix oops due to skb->dev being NULL
> 
>     It happens that if a packet arrives in a VC between the call to open it on
>     the hardware and the call to change the backend to br2684, br2684_regvcc
>     processes the packet and oopses dereferencing skb->dev because it is
>     NULL before the call to br2684_push().
> 
> but have been introduced again with commit
> 
>     b6211ae7f2e56837c6a4849316396d1535606e90
> 
>     atm: Use SKB queue and list helpers instead of doing it by-hand.
> 
> Signed-off-by: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v2] dm9000: define debug level as a module parameter
From: David Miller @ 2011-08-20 21:17 UTC (permalink / raw)
  To: vz; +Cc: netdev, ben-linux
In-Reply-To: <1313785900-27367-1-git-send-email-vz@mleia.com>

From: Vladimir Zapolskiy <vz@mleia.com>
Date: Fri, 19 Aug 2011 23:31:40 +0300

> This change allows to get driver specific debug messages output
> providing a module parameter. As far as the maximum level of verbosity
> is too high, it is demoted by default.
> 
> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: reserve ooo_okay when copying skb header
From: David Miller @ 2011-08-20 21:21 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, netdev
In-Reply-To: <1313765058-9315-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Fri, 19 Aug 2011 22:44:18 +0800

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

I think you meant "preserve" not "reserve" :-)

I fixed this up and applied your patch, thanks.

^ permalink raw reply

* Re: Linux Kernel | Intel Driver Bug - please update
From: Jeff Kirsher @ 2011-08-20 21:51 UTC (permalink / raw)
  To: Nicolás Sigal | LocalHost Soluciones Innovadoras
  Cc: kristoffer@gaisler.com, Brandeburg, Jesse, Allan, Bruce W,
	Ronciak, John, netdev
In-Reply-To: <504DB0A1B3544F7F8719EF324951BBA7@NOTENIKO>

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

On Sat, 2011-08-20 at 13:35 -0700, Nicolás Sigal | LocalHost Soluciones
Innovadoras wrote:
> Please Jeff, we need the update as soon as possible, we still having crashes 
> of eth because the driver..
> 
> Best regards;
> 

The patches have been pushed and accepted.  Are you using Linus's latest
3.1 tree?

> 
> ----- Original Message ----- 
> From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
> To: "Nicolás Sigal | LocalHost Soluciones Innovadoras" 
> <nicolas.sigal@localhost.net.ar>
> Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse" 
> <jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>; 
> "Ronciak, John" <john.ronciak@intel.com>; <support@localhost.net.ar>; 
> "netdev" <netdev@vger.kernel.org>
> Sent: Sunday, July 24, 2011 7:47 PM
> Subject: Re: Linux Kernel | Intel Driver Bug - please update
> 
> On Sun, 2011-07-24 at 13:28 -0700, Nicolás Sigal | LocalHost Soluciones
> Innovadoras wrote:
> > Please, can you update the e1000e driver of the kernel to v1.4.4?
> >
> > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3299&DwnldID=15817&ProductFamily=Componentes+Ethernet&ProductLine=Controladores+Ethernet&ProductProduct=Controlador+Intel%C2%AE+82579+Gigabit+Ethernet&DownloadType=Controladoresspa
> >
> > This version have an important bug fixed:
> > * 82579: Fix for Tx Hang on FTS ME Platform
> >
> > And we have this problem in all of our servers with this NIC and ME
> > on.
> >
> > Debug:
> >
> > Jul 24 14:23:27 [kernel] e1000e 0000:00:19.0: eth0: Detected Hardware
> > Unit Hang:
> > Jul 24 14:23:27 [kernel]   TDH                  <0>
> > Jul 24 14:23:27 [kernel]   TDT                  <16>
> > Jul 24 14:23:27 [kernel]   next_to_use          <16>
> > Jul 24 14:23:27 [kernel]   next_to_clean        <0>
> > Jul 24 14:23:27 [kernel] buffer_info[next_to_clean]:
> > Jul 24 14:23:27 [kernel]   time_stamp           <10282156f>
> > Jul 24 14:23:27 [kernel]   next_to_watch        <0>
> > Jul 24 14:23:27 [kernel]   jiffies              <102821aed>
> > Jul 24 14:23:27 [kernel]   next_to_watch.status <0>
> > Jul 24 14:23:27 [kernel] MAC Status             <80143>
> > Jul 24 14:23:27 [kernel] PHY Status             <796d>
> > Jul 24 14:23:27 [kernel] PHY 1000BASE-T Status  <0>
> > Jul 24 14:23:27 [kernel] PHY Extended Status    <3000>
> > Jul 24 14:23:27 [kernel] PCI Status             <10>
> > Jul 24 14:23:28 [kernel] e1000e 0000:00:19.0: eth0: Reset adapter
> > Jul 24 14:23:30 [kernel] e1000e: eth0 NIC Link is Up 100 Mbps Full
> > Duplex, Flow Control: Rx/Tx
> > Jul 24 14:23:30 [kernel] e1000e 0000:00:19.0: eth0: 10/100 speed:
> > disabling TSO
> >
> 
> 
> [removed ixgbe/igb Intel maintainers, and Linus from the email thread]
> Added a more appropriate mailing list (netdev)
> 
> We currently have the kernel patches in review and testing to update the
> e1000e driver to v1.4.4.  I should be able to push some, if not all of
> the changes upstream later this week.
> 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: Linux Kernel | Intel Driver Bug - please update
From: Nicolás Sigal | LocalHost Soluciones Innovadoras @ 2011-08-20 21:57 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: kristoffer, Brandeburg, Jesse, Allan, Bruce W, Ronciak, John,
	netdev
In-Reply-To: <1313877113.2128.189.camel@jtkirshe-mobl>

Jeff, we always use the latest stable release, in this case, v3.0.3

----- Original Message ----- 
From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
To: "Nicolás Sigal | LocalHost Soluciones Innovadoras" 
<nicolas.sigal@localhost.net.ar>
Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse" 
<jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>; 
"Ronciak, John" <john.ronciak@intel.com>; "netdev" <netdev@vger.kernel.org>
Sent: Saturday, August 20, 2011 6:51 PM
Subject: Re: Linux Kernel | Intel Driver Bug - please update

On Sat, 2011-08-20 at 13:35 -0700, Nicolás Sigal | LocalHost Soluciones
Innovadoras wrote:
> Please Jeff, we need the update as soon as possible, we still having 
> crashes
> of eth because the driver..
>
> Best regards;
>

The patches have been pushed and accepted.  Are you using Linus's latest
3.1 tree?

>
> ----- Original Message ----- 
> From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
> To: "Nicolás Sigal | LocalHost Soluciones Innovadoras"
> <nicolas.sigal@localhost.net.ar>
> Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse"
> <jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>;
> "Ronciak, John" <john.ronciak@intel.com>; <support@localhost.net.ar>;
> "netdev" <netdev@vger.kernel.org>
> Sent: Sunday, July 24, 2011 7:47 PM
> Subject: Re: Linux Kernel | Intel Driver Bug - please update
>
> On Sun, 2011-07-24 at 13:28 -0700, Nicolás Sigal | LocalHost Soluciones
> Innovadoras wrote:
> > Please, can you update the e1000e driver of the kernel to v1.4.4?
> >
> > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3299&DwnldID=15817&ProductFamily=Componentes+Ethernet&ProductLine=Controladores+Ethernet&ProductProduct=Controlador+Intel%C2%AE+82579+Gigabit+Ethernet&DownloadType=Controladoresspa
> >
> > This version have an important bug fixed:
> > * 82579: Fix for Tx Hang on FTS ME Platform
> >
> > And we have this problem in all of our servers with this NIC and ME
> > on.
> >
> > Debug:
> >
> > Jul 24 14:23:27 [kernel] e1000e 0000:00:19.0: eth0: Detected Hardware
> > Unit Hang:
> > Jul 24 14:23:27 [kernel]   TDH                  <0>
> > Jul 24 14:23:27 [kernel]   TDT                  <16>
> > Jul 24 14:23:27 [kernel]   next_to_use          <16>
> > Jul 24 14:23:27 [kernel]   next_to_clean        <0>
> > Jul 24 14:23:27 [kernel] buffer_info[next_to_clean]:
> > Jul 24 14:23:27 [kernel]   time_stamp           <10282156f>
> > Jul 24 14:23:27 [kernel]   next_to_watch        <0>
> > Jul 24 14:23:27 [kernel]   jiffies              <102821aed>
> > Jul 24 14:23:27 [kernel]   next_to_watch.status <0>
> > Jul 24 14:23:27 [kernel] MAC Status             <80143>
> > Jul 24 14:23:27 [kernel] PHY Status             <796d>
> > Jul 24 14:23:27 [kernel] PHY 1000BASE-T Status  <0>
> > Jul 24 14:23:27 [kernel] PHY Extended Status    <3000>
> > Jul 24 14:23:27 [kernel] PCI Status             <10>
> > Jul 24 14:23:28 [kernel] e1000e 0000:00:19.0: eth0: Reset adapter
> > Jul 24 14:23:30 [kernel] e1000e: eth0 NIC Link is Up 100 Mbps Full
> > Duplex, Flow Control: Rx/Tx
> > Jul 24 14:23:30 [kernel] e1000e 0000:00:19.0: eth0: 10/100 speed:
> > disabling TSO
> >
>
>
> [removed ixgbe/igb Intel maintainers, and Linus from the email thread]
> Added a more appropriate mailing list (netdev)
>
> We currently have the kernel patches in review and testing to update the
> e1000e driver to v1.4.4.  I should be able to push some, if not all of
> the changes upstream later this week.
>




^ permalink raw reply

* Re: Linux Kernel | Intel Driver Bug - please update
From: Jeff Kirsher @ 2011-08-20 22:04 UTC (permalink / raw)
  To: Nicolás Sigal | LocalHost Soluciones Innovadoras
  Cc: kristoffer@gaisler.com, Brandeburg, Jesse, Allan, Bruce W,
	Ronciak, John, netdev
In-Reply-To: <47DB0021C2D24B869BCEFB9DD7E05303@NOTENIKO>

[-- Attachment #1: Type: text/plain, Size: 4031 bytes --]

On Sat, 2011-08-20 at 14:57 -0700, Nicolás Sigal | LocalHost Soluciones
Innovadoras wrote:
> Jeff, we always use the latest stable release, in this case, v3.0.3

The fixes (and driver bump) went into Linus's 3.1 tree, only one fix was
applied to the "stable" releases.

I would try Linus's 3.1 tree to verify that the issue you have been
seeing is fixed.

> 
> ----- Original Message ----- 
> From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
> To: "Nicolás Sigal | LocalHost Soluciones Innovadoras" 
> <nicolas.sigal@localhost.net.ar>
> Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse" 
> <jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>; 
> "Ronciak, John" <john.ronciak@intel.com>; "netdev" <netdev@vger.kernel.org>
> Sent: Saturday, August 20, 2011 6:51 PM
> Subject: Re: Linux Kernel | Intel Driver Bug - please update
> 
> On Sat, 2011-08-20 at 13:35 -0700, Nicolás Sigal | LocalHost Soluciones
> Innovadoras wrote:
> > Please Jeff, we need the update as soon as possible, we still having 
> > crashes
> > of eth because the driver..
> >
> > Best regards;
> >
> 
> The patches have been pushed and accepted.  Are you using Linus's latest
> 3.1 tree?
> 
> >
> > ----- Original Message ----- 
> > From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
> > To: "Nicolás Sigal | LocalHost Soluciones Innovadoras"
> > <nicolas.sigal@localhost.net.ar>
> > Cc: <kristoffer@gaisler.com>; "Brandeburg, Jesse"
> > <jesse.brandeburg@intel.com>; "Allan, Bruce W" <bruce.w.allan@intel.com>;
> > "Ronciak, John" <john.ronciak@intel.com>; <support@localhost.net.ar>;
> > "netdev" <netdev@vger.kernel.org>
> > Sent: Sunday, July 24, 2011 7:47 PM
> > Subject: Re: Linux Kernel | Intel Driver Bug - please update
> >
> > On Sun, 2011-07-24 at 13:28 -0700, Nicolás Sigal | LocalHost Soluciones
> > Innovadoras wrote:
> > > Please, can you update the e1000e driver of the kernel to v1.4.4?
> > >
> > > http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3299&DwnldID=15817&ProductFamily=Componentes+Ethernet&ProductLine=Controladores+Ethernet&ProductProduct=Controlador+Intel%C2%AE+82579+Gigabit+Ethernet&DownloadType=Controladoresspa
> > >
> > > This version have an important bug fixed:
> > > * 82579: Fix for Tx Hang on FTS ME Platform
> > >
> > > And we have this problem in all of our servers with this NIC and ME
> > > on.
> > >
> > > Debug:
> > >
> > > Jul 24 14:23:27 [kernel] e1000e 0000:00:19.0: eth0: Detected Hardware
> > > Unit Hang:
> > > Jul 24 14:23:27 [kernel]   TDH                  <0>
> > > Jul 24 14:23:27 [kernel]   TDT                  <16>
> > > Jul 24 14:23:27 [kernel]   next_to_use          <16>
> > > Jul 24 14:23:27 [kernel]   next_to_clean        <0>
> > > Jul 24 14:23:27 [kernel] buffer_info[next_to_clean]:
> > > Jul 24 14:23:27 [kernel]   time_stamp           <10282156f>
> > > Jul 24 14:23:27 [kernel]   next_to_watch        <0>
> > > Jul 24 14:23:27 [kernel]   jiffies              <102821aed>
> > > Jul 24 14:23:27 [kernel]   next_to_watch.status <0>
> > > Jul 24 14:23:27 [kernel] MAC Status             <80143>
> > > Jul 24 14:23:27 [kernel] PHY Status             <796d>
> > > Jul 24 14:23:27 [kernel] PHY 1000BASE-T Status  <0>
> > > Jul 24 14:23:27 [kernel] PHY Extended Status    <3000>
> > > Jul 24 14:23:27 [kernel] PCI Status             <10>
> > > Jul 24 14:23:28 [kernel] e1000e 0000:00:19.0: eth0: Reset adapter
> > > Jul 24 14:23:30 [kernel] e1000e: eth0 NIC Link is Up 100 Mbps Full
> > > Duplex, Flow Control: Rx/Tx
> > > Jul 24 14:23:30 [kernel] e1000e 0000:00:19.0: eth0: 10/100 speed:
> > > disabling TSO
> > >
> >
> >
> > [removed ixgbe/igb Intel maintainers, and Linus from the email thread]
> > Added a more appropriate mailing list (netdev)
> >
> > We currently have the kernel patches in review and testing to update the
> > e1000e driver to v1.4.4.  I should be able to push some, if not all of
> > the changes upstream later this week.
> >
> 
> 
> 



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* inetd and Linux kernel 3.0
From: Dâniel Fraga @ 2011-08-20 22:29 UTC (permalink / raw)
  To: netdev

	Hi, I upgraded to Linux kernel 3.0 and now I have the following
message on the log:

Aug 19 23:22:13 tux inetd[3143]: nntp/tcp: bind: Address family not
supported by protocol

	With kernel 2.6.39 everything worked fine. Is it a problem with
inetd? Maybe some incompatibility?

	Thanks.

	Ps: I asked here, because inetutils maintainer didn't reply me
and I couldn't fine anything on Google. Maybe you can give me some hint
or workaround.

-- 

^ permalink raw reply

* Re: inetd and Linux kernel 3.0
From: David Miller @ 2011-08-20 23:33 UTC (permalink / raw)
  To: fragabr; +Cc: netdev
In-Reply-To: <4e503565.28b8ec0a.650c.5823@mx.google.com>

From: Dâniel Fraga <fragabr@gmail.com>
Date: Sat, 20 Aug 2011 19:29:54 -0300

> 	Hi, I upgraded to Linux kernel 3.0 and now I have the following
> message on the log:
> 
> Aug 19 23:22:13 tux inetd[3143]: nntp/tcp: bind: Address family not
> supported by protocol
> 
> 	With kernel 2.6.39 everything worked fine. Is it a problem with
> inetd? Maybe some incompatibility?

It is trying to bind to an ipv6 address using an ipv4 socket.


^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-20 23:39 UTC (permalink / raw)
  To: davem, linux, mpm; +Cc: dan, gerrit, herbert, linux-kernel, netdev, w
In-Reply-To: <20110816103028.1360.qmail@science.horizon.com>

(Apologies if I'm obverbroad on the Cc: list.)

I've beeen concerned by the recent change to initial sequence number
generation, from a time-varying 24-bit hash of the endpoint addresses
to a fixed 32-bit hash.

First of all, my apologies that I didn't see this when it was posted
for comment August 7; I only noticed when I tried to merge some local
experiments with -stable and found a conflict in drivers/random.c.

My concern primarily is that the local secret used to compute the hashes
is generated very early in the boot sequence, before any significant
amount of entropy is accumulated.  And since it's constant for the uptime
of the machine, an attacker has a considerable length of time to find
and explot the secret value.

While the increase to 32 bits is definitely desirable, and defends
against a much less sophisticated attack, I'm concerned that this is a
case of robbing Peter to pay Paul.

Trying to improve this, I'm working in a few directions:
1) Postpone the seeding as late in the boot process as possible.
   It's quite low-overhead to generate it only when the first TCP
   connection is made, which hopefully is preceded by running
   init and at least a little bit of device driver activity.

2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
   They can be added together and provide the security advantages of
   both.  The only cost is having to compute two hashes per SYN.

   The main problem here is coming up with a hash function fast enough
   that computing both hashes is no slower than one MD5 invocation.

3) Extend the 24-bit time-varying hash to a 28-bit one.
   This can cause the sequence numbers to wrap in 7/8 of the time
   they would with a fixed offset, but that doesn't seem too bad.
   (That's worst case; it's a triangular distribution centered
   on 15/16.)

It's relatively easy to hash quickly with 15 64-bit registers, but doing
it with 7 32-bit registers is decidedly trickier.

I'm currently playing with a 36-round 6x32-bit variant of the SHA-3
candidate Skein.  I haven't run the genetic algorithm to select optimal
rotation constants, but they shouldn't affect the timing.
(I'm also going to ask the Skein team to look over my work.)

So far, it is notably faster than MD5 (89 ns/hash vs. 148 on a 2.5 GHz
Phenom), as well as being much smaller (383 bytes as opposed to 1951 for
the core transform).  One limitation is that it only hashes 6 32-bit
words per transform.  Thus, IPv6 would need to use two iterations,
or go back to MD5.

As mentioned, we can use a different algorithm for 64-bit processors.
Or even 32-bit ones with more registers.  So the speed problem only
exists for IPv6 on 32-bit x86.

(For example, on a 64-bit processor, two parallel MD5 tranforms
can be computed in barely more time than one.)

A few questions, all related to performance requirements:
* Should I worry about 32-bit x86 performance at all, since it's
  pretty unlikely that a 32-bit machine will be running traffic levels
  (1000+ connections/sec) where it matters?
* Should I worry about 32-bit IPv6 performance, since that's even more
  unlikely to be running heavy loads on 32-bit hardware?
* If yes, is this fast enough to be acceptable, or do I need to work
  harder to find more speed?

Willy, apparently you did some benchmarking of various hash functions.
Is that data available somewhere?  Even if not, just a brief description
of the methodology and assumptions would help to make sure I'm measuring
in a reasonable way.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: David Miller @ 2011-08-20 23:44 UTC (permalink / raw)
  To: linux; +Cc: mpm, dan, gerrit, herbert, linux-kernel, netdev, w
In-Reply-To: <20110820233951.6428.qmail@science.horizon.com>

From: "George Spelvin" <linux@horizon.com>
Date: 20 Aug 2011 19:39:51 -0400

> While the increase to 32 bits is definitely desirable, and defends
> against a much less sophisticated attack, I'm concerned that this is a
> case of robbing Peter to pay Paul.

I disagree, attacking this random number selection is much more theoretical
than the brute force attacks possible on 24-bits of entropy.

Show me a usable attack on a real system, then we can talk.

By comparison, real attacks against the 24-bit value have been
demonstrated.

> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>    They can be added together and provide the security advantages of
>    both.  The only cost is having to compute two hashes per SYN.
> 
>    The main problem here is coming up with a hash function fast enough
>    that computing both hashes is no slower than one MD5 invocation.

Doubling the hashing cost is a non-starter.   Going to MD5 itself was
a huge lose, and was right at the brink of acceptable performance loss.

This whole change was nearly nixed because of the cost introduced
merely by going to MD5.

> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>    This can cause the sequence numbers to wrap in 7/8 of the time
>    they would with a fixed offset, but that doesn't seem too bad.
>    (That's worst case; it's a triangular distribution centered
>    on 15/16.)

I want to stay with a 32-bits of entropy, thank you very much.

^ permalink raw reply

* Re: inetd and Linux kernel 3.0
From: Dâniel Fraga @ 2011-08-20 23:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110820.163312.1686705823210711719.davem@davemloft.net>

On Sat, 20 Aug 2011 16:33:12 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> It is trying to bind to an ipv6 address using an ipv4 socket.

	Thanks David! Exactly. I compiled inetd with ipv6 disabled and
now everything is fine.

	Thank you very much!

-- 

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Update
From: David Miller @ 2011-08-21  0:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo
In-Reply-To: <1313759486-23575-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 19 Aug 2011 06:11:20 -0700

> The following series contains updates to e1000e and ixgbe.
> 
> The following are changes since commit ae1511bf769cafeae5ab61aaf9947a16a22cbd10:
>   net: rps: support PPPOE session messages
> and are available in the git repository at:
>   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

I had done a net --> net-next merge right before pulling this so there
was a slight merge conflict, which I think I resolved correctly.

Please double-check my work.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: add APIs for manipulating skb page fragments.
From: David Miller @ 2011-08-21  0:31 UTC (permalink / raw)
  To: ian.campbell; +Cc: netdev, linux-kernel, eric.dumazet, mirq-linux
In-Reply-To: <1313771100-22993-1-git-send-email-ian.campbell@citrix.com>

From: Ian Campbell <ian.campbell@citrix.com>
Date: Fri, 19 Aug 2011 17:25:00 +0100

> The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
> bare get/put_page on SKB pages fragments and to isolate users from subsequent
> changes to the skb_frag_t data structure.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

You're going to have to protect all of the things using the interfaces
from linux/dma-mapping.h with CONFIG_HAS_DMA otherwise it won't build
on platforms like S390.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-21  0:49 UTC (permalink / raw)
  To: davem, linux; +Cc: dan, gerrit, herbert, linux-kernel, mpm, netdev, w
In-Reply-To: <20110820.164436.684817976385970137.davem@davemloft.net>

>> While the increase to 32 bits is definitely desirable, and defends
>> against a much less sophisticated attack, I'm concerned that this is a
>> case of robbing Peter to pay Paul.

> I disagree, attacking this random number selection is much more theoretical
> than the brute force attacks possible on 24-bits of entropy.

Can you explain more precisely what you disagree with?

What you state after the comma appears to be agreeing with what I
wrote (it seems like a restatement of my first two clauses), so I'm
unenlightened as to where the disagreement is.

I'm not saying you didn't address a real problem, just that fixing
one problem exposed another, and it would be nice to address *both*.

> Show me a usable attack on a real system, then we can talk.

If you like.  It's about a week of implementation work.  (And I
don't have 1 week/week of free time, so more than that elapsed.)

> By comparison, real attacks against the 24-bit value have been
> demonstrated.

Anywhere that I can see?

>> 2) Do *both*: Use a fixed 32-bit offset *plus* a time-varing one.
>>    They can be added together and provide the security advantages of
>>    both.  The only cost is having to compute two hashes per SYN.
>> 
>>    The main problem here is coming up with a hash function fast enough
>>    that computing both hashes is no slower than one MD5 invocation.

> Doubling the hashing cost is a non-starter.   Going to MD5 itself was
> a huge lose, and was right at the brink of acceptable performance loss.
> 
> This whole change was nearly nixed because of the cost introduced
> merely by going to MD5.

Okay, I'll make certain a proposed solution is strictly faster than MD5.
I was asking about performance goals, and you've given me an answer.
Thank you very much!

The patch comment was fairly offhand about the performance cost, and
prior discussion was apparently private, so it wasn't clear how much
pain people experienced.

My only other question is whether IPv6 on x86-32 specificaly needs to be
faster than MD5.  Is that negotiable, or is that also a hard limit?
(This is challenging because it's trying to hash 288 bits of address material
in 224 bits of available registers.)

Eureka!  The possible source addresses are very limited.  It's possible to
pre-hash them, then you only have 160 bits of per-connection variability,
which can fit in a second hash block.

This requires finding somewhere in the network stack to store the
pre-hashed IPv6 addresses, as well as a fallback to use when spoofing
other source addresses, but that shouldn't be TOO difficult.

>> 3) Extend the 24-bit time-varying hash to a 28-bit one.
>>    This can cause the sequence numbers to wrap in 7/8 of the time
>>    they would with a fixed offset, but that doesn't seem too bad.
>>    (That's worst case; it's a triangular distribution centered
>>    on 15/16.)

> I want to stay with a 32-bits of entropy, thank you very much.

My goal is to give you *both*.  32 bits fixed + 28 bits time-varying.
An attacker would have to cryptanalyze the 32 bits (which the 28 bits
makes harder) *and* brute-force the 28 bits.

(It's almost certainly simpler to brute-force 32 bits.)


Thank you for your response!

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Willy Tarreau @ 2011-08-21  1:28 UTC (permalink / raw)
  To: George Spelvin; +Cc: davem, mpm, dan, gerrit, herbert, linux-kernel, netdev
In-Reply-To: <20110820233951.6428.qmail@science.horizon.com>

Hi George,

On Sat, Aug 20, 2011 at 07:39:51PM -0400, George Spelvin wrote:
(...)
> A few questions, all related to performance requirements:
> * Should I worry about 32-bit x86 performance at all, since it's
>   pretty unlikely that a 32-bit machine will be running traffic levels
>   (1000+ connections/sec) where it matters?

1000 connections per second is a moderately low load even for a
32-bit machine. I'm used to play in the 10-100k/s range on 32-bit,
depending on the usage pattern, I even reached 300k/s on an anti-ddos
machine. So yes, performance matters a lot, especially when we risk
to slow down one small operation that is done many times a second.

> * Should I worry about 32-bit IPv6 performance, since that's even more
>   unlikely to be running heavy loads on 32-bit hardware?

On x86 you're probably right, but there are other very fast platforms
such as ARM, which are used to build routers or appliances, and which
are 32-bit and there it may matter.

> * If yes, is this fast enough to be acceptable, or do I need to work
>   harder to find more speed?

I'd suggest that the most important is no performance regression. Probably
that if you can bring something which brings back what we lost with MD5,
your work would gain interest.

> Willy, apparently you did some benchmarking of various hash functions.
> Is that data available somewhere?  Even if not, just a brief description
> of the methodology and assumptions would help to make sure I'm measuring
> in a reasonable way.

I'm copy-pasting here the memo I exchanged in private after my tests, there
is nothing secret in it, so better post the whole explanation :

-------------------------------------------------------------------------
I did an ugly patch which consists in replacing calls to md5_transform()
with sha_transform() in secure_ip_id(), secure_tcp_sequence_number(),
secure_ipv4_port_ephemeral() on top of David's patches. I kept the same
hashing method, without calling sha_init() and by filling the hash with
net_secret, eg :

@@ -104,28 +107,32 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
 __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
                                 __be16 sport, __be16 dport)
 {
-       u32 hash[MD5_DIGEST_WORDS];
+       u32 hash[SHA_DIGEST_WORDS];
+       u32 workspace[SHA_WORKSPACE_WORDS];

        hash[0] = (__force u32)saddr;
        hash[1] = (__force u32)daddr;
        hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-       hash[3] = net_secret[15];
+       hash[3] = net_secret[14];
+       hash[4] = net_secret[15];

-       md5_transform(hash, net_secret);
+       sha_transform(hash, (const char *)net_secret, workspace);

        return seq_scale(hash[0]);
 }


With this I could run tests on mainline (called "MD4" below), David's code
("MD5") and the transform above ("SHA1"). The tests involved connecting
from the test machine to an external HTTP server and retrieving an empty
object. This test was followed by two other series, one on a server which
immediately resets upon accept (to reproduce the SYN, SYN/ACK, ACK, RST
sequence I'm used to encounter when setting up anti-DDoS filters), and
a SYN, RST sequence caused by sending the traffic to a closed port, in
order to more accurately observe the differences.

I switch the test machine to an Atom N450 running in 64-bit mode in order
to benefit from the SHA1 optimizations.

Numbers are in connections per second.

kernel   http   RST server   closed port
-------+------+------------+------------
 MD4     9610      7840       16950
 MD5     9340      7560       16360
 SHA1    9250      7280       15400

In HTTP, performance drops by 2.8% when switching to MD5, and by 3.75
when using SHA1 instead. With the reset server, MD5 takes a 3.6% hit
and SHA1 7.15%. On the closed port test, which sees only SYN and RST
packets, MD5 takes a 3.5% hit and SHA1 a 9.15% one.

Note that the biggest hit was still the 2.6.35.11 -> 3.0-git upgrade,
because HTTP gives me 10040 cps in 2.6.35.11. I think it's the compiler
and not the kernel : I used to build 2.6.35 with gcc-3.4 but had to
use a more recent toolchain (gcc 4.4) with 3.0 due to cmpxchg16b, and
my experience with gcc has always been a noticeable performance loss
with each new version, so that seems consistent...

All in all, while the SHA1 cost becomes concerning, it could be used
as an alternative to MD5 when we add a sysctl to select between
performance and security.
-------------------------------------------------------------------------

Note that this wasn't the best machine for the test, but it was available
and moreover it required little additional hardware to saturate it ;-)

Best regards,
Willy

^ permalink raw reply

* strange routing issue--packets stop getting forwarded for a live connection
From: Corey Hickey @ 2011-08-21  2:15 UTC (permalink / raw)
  To: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]

Hi,

Please forgive me for asking a user question on a dev list; does the
linux-net list no longer exist? Majordomo wouldn't subscribe me and I
see no recent history in the archives. If there's a better place for
this question, please tell me. Anyway:

I have a strange issue where, reliably, certain conditions cause my
Linux router to stop forwarding packets for a connection.

----------------------------------------------------------------------

This is my setup:

client      --> linux router          --> vpn --> work desktop
198.18.0.3      198.18.0.1    (eth0)              192.168.10.88
                192.168.6.230 (tun0)

All hosts are running Debian Sid with the stock Debian 3.0.0-1-amd64
kernel. tun0 is set up by openconnect (open-source client for cisco
anyconnnect), which has been historically reliable for me.

I noticed this problem happening when I replaced the router with a new
host. The old host was 32-bit, running Linux 2.6.38, and configured
identically (I think) with respect to routing and iptables. I didn't
have a problem then.

----------------------------------------------------------------------

I have seen this problem happen with http, sometimes, but the easiest
way to reproduce the issue every time is to use SSH with X11 forwarding
(I have no idea why). I can SSH, through my router and VPN connection,
to my desktop at work. I can log in, poke around, do whatever; as soon
as I run some particular X11 programs, the connection hangs. xlogo and
xeyes are fine, but rxvt and jconsole are not.

So, my baseline test is to run rxvt directly. This command always hangs:

$ ssh -X chickey@192.168.10.88 rxvt

I have run simultaneous tcpdumps on the router: one on eth0 and the
other on tun0. I see the tcp connection and ssh sessions get set up,
then many encrypted packets go back and forth. At a certain, reliably
reproducible point, a 1368 byte packet comes in on eth0 and does not
leave tun0; the retransmissions do not get forwarded either.

I have not been able to figure out the cause of this. Here's what I have
investigated:

1. Number of packets on the connection; doesn't seem to matter, because
I can use SSH for other purposes just fine.

2. Transmission rate; doesn't seem to matter, because I can do
$ ssh -X chickey@192.168.10.88 cat /dev/zero > /dev/null

3. MTU size; 1500 on eth0 and 1406 on tun0. Bigger packets have been
transferred fine.

4. VPN client bug; maybe, but I don't think so yet. I can do the same
thing if I SSH directly from the router. This is fine:
ssh -X 198.18.0.1 "ssh -X chickey@192.168.10.88 rxvt"

5. Connection tracking issue; conntrack shows no change in stage for the
connection when it hangs.

6. Some firewall rule. Stripping down my iptables setup to the minimum
does not help. I have also removed all qdiscs.

----------------------------------------------------------------------

Can anybody please suggest something else I should try here? This is
very confusing to me.

I am attaching a tarball of tcpdumps and other pertinent information.


Thank you,
Corey

[-- Attachment #2: problem.tar.bz2 --]
[-- Type: application/octet-stream, Size: 23175 bytes --]

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-08-21  2:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, gospo@redhat.com
In-Reply-To: <20110820.172919.1482966002228420753.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

On Sat, 2011-08-20 at 17:29 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Fri, 19 Aug 2011 06:11:20 -0700
> 
> > The following series contains updates to e1000e and ixgbe.
> > 
> > The following are changes since commit ae1511bf769cafeae5ab61aaf9947a16a22cbd10:
> >   net: rps: support PPPOE session messages
> > and are available in the git repository at:
> >   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master
> 
> I had done a net --> net-next merge right before pulling this so there
> was a slight merge conflict, which I think I resolved correctly.
> 
> Please double-check my work.
> 
> Thanks.

I knew the e1000e driver version would conflict as soon as a I saw you
updated against net.

e1000e and ixgbe look good, thanks Dave.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: George Spelvin @ 2011-08-21  3:04 UTC (permalink / raw)
  To: linux, w; +Cc: dan, davem, gerrit, herbert, linux-kernel, mpm, netdev
In-Reply-To: <20110821012844.GA15222@1wt.eu>

>> * Should I worry about 32-bit IPv6 performance, since that's even more
>>   unlikely to be running heavy loads on 32-bit hardware?

> On x86 you're probably right, but there are other very fast platforms
> such as ARM, which are used to build routers or appliances, and which
> are 32-bit and there it may matter.

Thanks for the feedback.  Your point about routers is well-taken.
It's particularly x86-32 which gives me fits, but I'll keep ARM
performance in mind, too.

>> * If yes, is this fast enough to be acceptable, or do I need to work
>>   harder to find more speed?

> I'd suggest that the most important is no performance regression. Probably
> that if you can bring something which brings back what we lost with MD5,
> your work would gain interest.

Okay, I'll go back to the drawing board on performance.

Damn, this is going to be tough.

> I'm copy-pasting here the memo I exchanged in private after my tests, there
> is nothing secret in it, so better post the whole explanation :

Thank you very much.  It helps me figure out what the time budget is.

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Ted Ts'o @ 2011-08-21  3:27 UTC (permalink / raw)
  To: George Spelvin; +Cc: w, dan, davem, gerrit, herbert, linux-kernel, mpm, netdev
In-Reply-To: <20110821030415.18106.qmail@science.horizon.com>

Here's a random thought --- it won't help on anything other than
modern x86's, but who's to say we have to use the same algorithm on
all platforms?  Does the AES-NI facility provide enough of a speedup
that it's worth using it instead of MD5, at least on modern x86
systems which have this support?

					- Ted

^ permalink raw reply

* Re: [PATCH 0/2] Improve sequence number generation.
From: Herbert Xu @ 2011-08-21  4:02 UTC (permalink / raw)
  To: Ted Ts'o, George Spelvin, w, dan, davem, gerrit, linux-kernel,
	mpm, net
In-Reply-To: <20110821032753.GA8992@thunk.org>

On Sat, Aug 20, 2011 at 11:27:53PM -0400, Ted Ts'o wrote:
> Here's a random thought --- it won't help on anything other than
> modern x86's, but who's to say we have to use the same algorithm on
> all platforms?  Does the AES-NI facility provide enough of a speedup
> that it's worth using it instead of MD5, at least on modern x86
> systems which have this support?

It is fast but it also touches SSE state.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: strange routing issue--packets stop getting forwarded for a live connection
From: Julian Anastasov @ 2011-08-21  6:35 UTC (permalink / raw)
  To: Corey Hickey; +Cc: Linux Netdev List
In-Reply-To: <4E506A46.6060407@fatooh.org>


	Hello,

On Sat, 20 Aug 2011, Corey Hickey wrote:

> Hi,
> 
> Please forgive me for asking a user question on a dev list; does the
> linux-net list no longer exist? Majordomo wouldn't subscribe me and I
> see no recent history in the archives. If there's a better place for
> this question, please tell me. Anyway:
> 
> I have a strange issue where, reliably, certain conditions cause my
> Linux router to stop forwarding packets for a connection.
> 
> ----------------------------------------------------------------------
> 
> This is my setup:
> 
> client      --> linux router          --> vpn --> work desktop
> 198.18.0.3      198.18.0.1    (eth0)              192.168.10.88
>                 192.168.6.230 (tun0)
> 
> All hosts are running Debian Sid with the stock Debian 3.0.0-1-amd64
> kernel. tun0 is set up by openconnect (open-source client for cisco
> anyconnnect), which has been historically reliable for me.
> 
> I noticed this problem happening when I replaced the router with a new
> host. The old host was 32-bit, running Linux 2.6.38, and configured
> identically (I think) with respect to routing and iptables. I didn't
> have a problem then.
> 
> ----------------------------------------------------------------------
> 
> I have seen this problem happen with http, sometimes, but the easiest
> way to reproduce the issue every time is to use SSH with X11 forwarding
> (I have no idea why). I can SSH, through my router and VPN connection,
> to my desktop at work. I can log in, poke around, do whatever; as soon
> as I run some particular X11 programs, the connection hangs. xlogo and
> xeyes are fine, but rxvt and jconsole are not.
> 
> So, my baseline test is to run rxvt directly. This command always hangs:
> 
> $ ssh -X chickey@192.168.10.88 rxvt
> 
> I have run simultaneous tcpdumps on the router: one on eth0 and the
> other on tun0. I see the tcp connection and ssh sessions get set up,
> then many encrypted packets go back and forth. At a certain, reliably
> reproducible point, a 1368 byte packet comes in on eth0 and does not
> leave tun0; the retransmissions do not get forwarded either.
> 
> I have not been able to figure out the cause of this. Here's what I have
> investigated:
> 
> 1. Number of packets on the connection; doesn't seem to matter, because
> I can use SSH for other purposes just fine.
> 
> 2. Transmission rate; doesn't seem to matter, because I can do
> $ ssh -X chickey@192.168.10.88 cat /dev/zero > /dev/null
> 
> 3. MTU size; 1500 on eth0 and 1406 on tun0. Bigger packets have been
> transferred fine.

	Lower MTU, it can be PMTUD problem. At 04:50:24.112658
I see 7801:9169 is 1420 bytes and no ICMP FRAG NEEDED is generated.
May be these two regressions explain it:

http://marc.info/?l=linux-netdev&m=131342172722536&w=2

	There are 2 fixes you can try or more recent kernel
tree, for example 3.1-rc2 has the fixes.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [net-next 00/10][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to ixgbe.

The following are changes since commit ca1ba7caa68520864e4b9227e67f3bbc6fed373b:
  Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
and are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (10):
  ixgbe: Simplify transmit cleanup path
  ixgbe: convert rings from q_vector bit indexed array to linked list
  ixgbe: Drop the TX work limit and instead just leave it to budget
  ixgbe: consolidate all MSI-X ring interrupts and poll routines into
    one
  ixgbe: cleanup allocation and freeing of IRQ affinity hint
  ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA
  ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt
    types
  ixgbe: Drop unnecessary adapter->hw dereference in loopback test
    setup
  ixgbe: combine PCI_VDEVICE and board declaration to same line
  ixgbe: Update TXDCTL configuration to correctly handle WTHRESH

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   11 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   25 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  738 +++++++---------------
 3 files changed, 239 insertions(+), 535 deletions(-)

-- 
1.7.6


^ permalink raw reply

* [net-next 01/10] ixgbe: Simplify transmit cleanup path
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher
In-Reply-To: <1313911761-11709-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch helps to simplify the work being done by the transmit path by
removing the unnecessary compares between count and the work limit.  Instead
we can simplify this by just adding a budget value that will act as a count
down from the work limit value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e8aad76..e5a4eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -804,13 +804,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
+	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
-	u16 count;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
 	tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 
-	for (count = 0; count < q_vector->tx.work_limit; count++) {
+	for (; budget; budget--) {
 		union ixgbe_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
 
 		/* if next_to_watch is not set then there is no work pending */
@@ -891,11 +891,11 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		ixgbe_tx_timeout_reset(adapter);
 
 		/* the adapter is about to reset, no point in enabling stuff */
-		return true;
+		return budget;
 	}
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(count && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
@@ -908,7 +908,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		}
 	}
 
-	return count < q_vector->tx.work_limit;
+	return budget;
 }
 
 #ifdef CONFIG_IXGBE_DCA
-- 
1.7.6


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox