Netdev List

Netdev List
 help / color / mirror / Atom feed

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:23 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* [PATCH] tcp: disable auto corking by default
From: Thomas Glanzmann @ 2014-02-08  9:19 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208091828.GA16336@glanzmann.de>

When using auto corking with iSCSI the round trip time at least increases by
factor 25 probably more. Other protocols are very likely also effected.

Signed-off-by: Thomas Glanzmann <thomas@glanzmann.de>
---
 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4475b3b..da563a4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -285,7 +285,7 @@ int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
 int sysctl_tcp_min_tso_segs __read_mostly = 2;
 
-int sysctl_tcp_autocorking __read_mostly = 1;
+int sysctl_tcp_autocorking __read_mostly = 0;
 
 struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);
-- 
1.7.10.4

^ permalink raw reply related

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:18 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 4 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* [PATCH] sections, ipvs: Remove useless __read_mostly for ipvs genl_ops
From: Andi Kleen @ 2014-02-08  7:57 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Andi Kleen, Wensong Zhang, Simon Horman,
	Patrick McHardy, lvs-devel

const __read_mostly does not make any sense, because const
data is already read-only. Remove the __read_mostly
for the ipvs genl_ops. This avoids a LTO
section conflict compile problem.

Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Patrick McHardy <kaber@trash.net>
Cc: lvs-devel@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 35be035..2a68a38 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3580,7 +3580,7 @@ out:
 }
 
 
-static const struct genl_ops ip_vs_genl_ops[] __read_mostly = {
+static const struct genl_ops ip_vs_genl_ops[] = {
 	{
 		.cmd	= IPVS_CMD_NEW_SERVICE,
 		.flags	= GENL_ADMIN_PERM,
-- 
1.8.5.2


^ permalink raw reply related

* Re: [PATCH] net: rfkill-regulator: Add devicetree support.
From: Bill Fink @ 2014-02-08  6:22 UTC (permalink / raw)
  To: Marek Belisko
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak, rob,
	linville, johannes, davem, grant.likely, neilb, hns, devicetree,
	linux-doc, linux-kernel, linux-wireless, netdev
In-Reply-To: <1391802529-29861-1-git-send-email-marek@goldelico.com>

On Fri,  7 Feb 2014, Marek Belisko wrote:

> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Marek Belisko <marek@goldelico.com>
> ---
> Based on Neil's patch and extend for documentation and bindings include.
> 
>  .../bindings/net/rfkill/rfkill-relugator.txt       | 28 ++++++++++++++++

                                  ^^^^^^^^^
                                  Typo in file name.

					-Bill



>  include/dt-bindings/net/rfkill-regulator.h         | 23 +++++++++++++
>  net/rfkill/rfkill-regulator.c                      | 38 ++++++++++++++++++++++
>  3 files changed, 89 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
>  create mode 100644 include/dt-bindings/net/rfkill-regulator.h
> 
> diff --git a/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
> new file mode 100644
> index 0000000..cdb7dd7
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
> @@ -0,0 +1,28 @@
> +Regulator consumer for rfkill devices
> +
> +Required properties:
> +- compatible   : Must be "rfkill-regulator".
> +- label  : Name of rfkill device.
> +- type  : Type of rfkill device.
> +
> +Possible values (defined in include/dt-bindings/net/rfkill-regulator.h):
> +	RFKILL_TYPE_ALL
> +	RFKILL_TYPE_WLAN
> +	RFKILL_TYPE_BLUETOOTH
> +	RFKILL_TYPE_UWB
> +	RFKILL_TYPE_WIMAX
> +	RFKILL_TYPE_WWAN
> +	RFKILL_TYPE_GPS
> +	RFKILL_TYPE_FM
> +	RFKILL_TYPE_NFC
> +
> +- vrfkill-supply - regulator device.
> +
> +Example:
> +	gps-rfkill {
> +		compatible = "rfkill-regulator";
> +		label = "GPS";
> +		type = <RFKILL_TYPE_GPS>;
> +		vrfkill-supply = <&reg>;
> +	};
> +

^ permalink raw reply

* Proposal
From: Mark Reyes Guus @ 2014-02-08 10:09 UTC (permalink / raw)
  To: Recipients

Good day. I am Mark Reyes Guus, I work with Abn Amro Bank as an auditor. I have a proposition to discuss with you. Should you be interested, please e-mail back to me.

Private Email: markreyesguus@abnmrob.co.uk OR markguus.reyes01@yahoo.nl

Yours Sincerely,
Mark Reyes Guus.

^ permalink raw reply

* Re: [PATCH net-next] igb: enable VLAN stripping for VMs with i350
From: Aaron Brown @ 2014-02-08  5:29 UTC (permalink / raw)
  To: Stefan Assmann; +Cc: e1000-devel, netdev, davem
In-Reply-To: <1386754354-22039-1-git-send-email-sassmann@kpanic.de>

On Wed, 2013-12-11 at 10:32 +0100, Stefan Assmann wrote:
> For i350 VLAN stripping for VMs is not enabled in the VMOLR register
> but in
> the DVMOLR register. Making the changes accordingly. It's not
> necessary to
> unset the E1000_VMOLR_STRVLAN bit on i350 as the hardware will simply
> ignore
> it.
> 
> Without this change if a VLAN is configured for a VF assigned to a
> guest
> via (i.e.)
> ip link set p1p1 vf 0 vlan 10
> the VLAN tag will not be stripped from packets going into the VM.
> Which they
> should be because the VM itself is not aware of the VLAN at all.
> 
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>

> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.h | 4 ++++
>  drivers/net/ethernet/intel/igb/e1000_regs.h  | 1 +
>  drivers/net/ethernet/intel/igb/igb_main.c    | 7 +++++++
>  3 files changed, 12 insertions(+)



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: ax88179 regression
From: renevant @ 2014-02-08  3:56 UTC (permalink / raw)
  To: renevant; +Cc: netdev
In-Reply-To: <1890496.0JLCYTPnkG@athas>

Everything is still working at commit 63a67a72d63dd077c2313cf19eb29d8e4bfa6963 

At this point i'm beginning to think all the issues i've been having are 
motherboard bios and compiler flag issues.

Regards,

Will Trives


On Saturday 08 February 2014 12:56:10 renevant@internode.on.net wrote:
> Hello,
> 
> I have finally nailed down my other issues and i'm at a point where I can
> bisect from a point that is 100% stable and working.
> 
> 
> I am currently running a kernel checked out at commit
> d194c031994d3fc1038fa09e9e92d9be24a21921
> 
> A point in 3.12rc4
> 
> At this point the ax88179 works without issue even with scatter gather
> turned on. So somewhere from this point something goes wrong and there is
> some condition that exists that can lock up the nic.
> 
> 
> I will keep bisecting at report my findings.
> 
> 
> Regards,
> 
> Will Trives

^ permalink raw reply

* Re: [PATCH v3 net 2/9] bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
From: Toshiaki Makita @ 2014-02-08  2:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Toshiaki Makita, David S . Miller, Vlad Yasevich, netdev
In-Reply-To: <20140207093127.56f78187@samsung-9>

On Fri, 2014-02-07 at 09:31 -0700, Stephen Hemminger wrote:
> On Fri,  7 Feb 2014 16:48:19 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> 
> > Since commit bc9a25d21ef8 ("bridge: Add vlan support for local fdb entries"),
> > br_fdb_changeaddr() has inserted a new local fdb entry only if it can
> > find old one. But if we have two ports where they have the same address
> > or user has deleted a local entry, there will be no entry for one of the
> > ports.
> > 
> > Example of problematic case:
> >   ip link set eth0 address aa:bb:cc:dd:ee:ff
> >   ip link set eth1 address aa:bb:cc:dd:ee:ff
> >   brctl addif br0 eth0
> >   brctl addif br0 eth1 # eth1 will not have a local entry due to dup.
> 
> I think the second addif should fail, it doesn't seem valid to have
> two interfaces on same bridge with same address. Most hardware switches
> would disable the port in that case.

Thank you for your comment, but I don't think so for several reasons.

- From other network elements on the same network, bridge ports don't
appear to have a mac address, but the bridge appears to have several mac
addresses that can reach to the bridge. The duplicated address is simply
seen as one of those addresses. I don't think it is a problem.

- This operation (add a port that has duplicated address) has allowed
for several years, and it is obviously intended, as commented in
fdb_insert().

417                 /* it is okay to have multiple ports with same
418                  * address, just use the first one.
419                  */

- Hardware switches usually have one mac address per one switch. Their
ports don't have mac addresses. It is not reasonable to compare hardware
switches.

Thanks,
Toshiaki Makita

^ permalink raw reply

* [PATCH v2] SUNRPC: Allow one callback request to be received from two sk_buff
From: shaobingqing @ 2014-02-08  2:29 UTC (permalink / raw)
  To: trond.myklebust, bfields, davem
  Cc: linux-nfs, netdev, linux-kernel, shaobingqing
In-Reply-To: <no>

In current code, there only one struct rpc_rqst is prealloced. If one
callback request is received from two sk_buff, the xprt_alloc_bc_request
would be execute two times with the same transport->xid. The first time
xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
bit of transport->tcp_flags will not be cleared. The second time
xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
pointer will be returned, then xprt_force_disconnect occur. I think one
callback request can be allowed to be received from two sk_buff.

Signed-off-by: shaobingqing <shaobingqing@bwstor.com.cn>
---
 include/linux/sunrpc/xprt.h |    1 +
 net/sunrpc/xprt.c           |    1 +
 net/sunrpc/xprtsock.c       |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index cec7b9b..82bfe01 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -211,6 +211,7 @@ struct rpc_xprt {
 						 * items */
 	struct list_head	bc_pa_list;	/* List of preallocated
 						 * backchannel rpc_rqst's */
+	struct rpc_rqst	*req_first;
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 	struct list_head	recv;
 
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 095363e..93ad8bc 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1256,6 +1256,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 	spin_lock_init(&xprt->bc_pa_lock);
 	INIT_LIST_HEAD(&xprt->bc_pa_list);
+	xprt->req_first = NULL;
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
 	xprt->last_used = jiffies;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ee03d35..c43dca4 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1272,7 +1272,16 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
 				container_of(xprt, struct sock_xprt, xprt);
 	struct rpc_rqst *req;
 
-	req = xprt_alloc_bc_request(xprt);
+	if (xprt->req_first != NULL &&
+			xprt->req_first->rq_xid == transport->tcp_xid) {
+		req = xprt->req_first;
+	} else if (xprt->req_first != NULL &&
+			xprt->req_first->rq_xid != transport->tcp_xid) {
+		xprt_free_bc_request(xprt);
+		req = xprt_alloc_bc_request(xprt);
+	} else {
+		req = xprt_alloc_bc_request(xprt);
+	}
 	if (req == NULL) {
 		printk(KERN_WARNING "Callback slot table overflowed\n");
 		xprt_force_disconnect(xprt);
@@ -1297,6 +1306,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
 		list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
 		spin_unlock(&bc_serv->sv_cb_lock);
 		wake_up(&bc_serv->sv_cb_waitq);
+	} else {
+		xprt->req_first = req;
 	}
 
 	req->rq_private_buf.len = transport->tcp_copied;
-- 
1.7.4.2

^ permalink raw reply related

* ax88179 regression
From: renevant @ 2014-02-08  1:56 UTC (permalink / raw)
  To: netdev

Hello,

I have finally nailed down my other issues and i'm at a point where I can 
bisect from a point that is 100% stable and working.

I am currently running a kernel checked out at commit 
d194c031994d3fc1038fa09e9e92d9be24a21921 

A point in 3.12rc4

At this point the ax88179 works without issue even with scatter gather turned 
on. So somewhere from this point something goes wrong and there is some 
condition that exists that can lock up the nic.

I will keep bisecting at report my findings.

Regards,

Will Trives

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Ding Tianhong @ 2014-02-08  1:43 UTC (permalink / raw)
  To: Jay Vosburgh, sfeldma@cumulusnetworks.com
  Cc: Cong Wang, Thomas Glanzmann, Eric Dumazet, Veaceslav Falico, andy,
	Jiří Pírko, netdev
In-Reply-To: <7882.1391822502@death.nxdomain>

On 2014/2/8 9:21, Jay Vosburgh wrote:
> Jay Vosburgh <fubar@us.ibm.com> wrote:
> 
>>
>> Cong Wang <cwang@twopensource.com> wrote:
>>
>>> On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>> Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>>
>>>>> Cong Wang <cwang@twopensource.com> wrote:
>>>>>
>>>>>
>>>>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>>>> RTNL is not necessary on the vast majority of state machine runs
>>>>> (because no state changes take place, i.e., no ports are disabled or
>>>>> enabled).  The above change would add 10 round trips per second to RTNL,
>>>>> which seems excessive.
>>>>>
>>>>>       Also, we cannot unconditionally acquire RTNL in this function,
>>>>> as it would race with the call to cancel_delayed_work_sync from
>>>>> bond_close (via bond_work_cancel_all).
>>>
>>> OK.
>>>
>>>>
>>>>         Thought of one more problem: we can't hold a regular lock while
>>>> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
>>>> call has to be RTNL and nothing else.
>>>>
>>>
>>> s/GFP_KERNEL/GFP_ATOMIC/
>>
>> 	Yah, that would help with extra locks, but not totally solve
>> things.  I'm looking around, and seeing a number of other places that
>> will end up at one of these rtmsg_ifinfo calls with incorrect locking:
>>
>> 	bond_ab_arp_probe calls via bond_set_slave_active_flags and
>> bond_set_slave_inactive_flags without RTNL.
>>
>> 	bond_change_active_slave calls via bond_set_slave_inactive_flags
>> and bond_set_slave_active_flags with other locks held, and maybe without
>> RTNL; I'm not sure if bond_option_active_slave_set holds RTNL when it
>> calls bond_select_active_slave.
>>
>> 	bond_open calls via bond_set_slave_active_flags and
>> bond_set_slave_inactive_flags with RTNL, but also with other locks held.
>>
>> 	bond_loadbalance_arp_mon calls bond_set_active_slave and
>> bond_set_backup_slave without RTNL.
>>
>> 	This is in addition to the cases in the 802.3ad code from
>> __enable_port and __disable_port calls.
> 
> 	Just an update in case anybody else is looking into this, and
> some questions for Scott.
> 
> 	Acquiring RTNL for the __enable_port and __disable_port cases is
> difficult, as those calls generally already hold the state machine lock,
> and cannot unconditionally call rtnl_lock because either they already
> hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential
> for deadlock with bond_3ad_adapter_speed_changed,
> bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or
> bond_3ad_update_lacp_rate.  All four of those are called with RTNL held,
> and acquire the state machine lock second.  The calling contexts for
> __enable_port and __disable_port already hold the state machine lock,
> and may or may not need RTNL.

Agree, it is hard to add RTNL here, deadlock is easily happened.

> 
> 	Scott: you added these calls, so can you explain what they're
> for?  I'm asking for two reasons:
> 
> 	First, if they do not occur synchronously is it going to be a
> problem?  E.g., for the 802.3ad case, if the rtmsg_ifinfo is called
> either at the end of the state machine run, or for non-state machine
> events, at the next run of the state machine (which is every 100 ms),
> would that be a problem?  Setting a flag in the slave somewhere that an
> rtmsg_ifinfo is needed should be doable for the 802.3ad case.
> 
> 	Second, what do the messages mean?  That the slave is now
> "active and usable"?  I'm asking because I suspect the bond_ab_arp_probe
> usage wherein it adjusts the flags and curr_active_slave should not
> actually call rtmsg_ifinfo, as the slave there is not really "up."
> What's going on there is that the ARP monitor cycles through each slave
> one by one, and tests to see if that slave works.  If it does work, then
> it is set as the active elsewhere in the monitor code.  This function
> adjusts the flags so that the ARP monitor will treat the "testing" slave
> as "active" for purposes of determining whether or not it is up.  I
> suspect this adjustment to the flags should not actually generate an
> rtmsg_ifinfo.
> 
> 	I think the remaining cases can be dealt with, but clarification
> on the above two questions would be very helpful.
> 
> 	-J
> 

commit 6fde8f037e604e05df1529 fix the problem for bond_loadbalance_arp_mon(),
and commit 66dd1c077a3f3c130d1 fix the problem for bond_activebackup_arp_mon(),
but we still miss the 3ad monitor, I think if the slave should send the message
by netlink, it is better to refer to fdb_notify() for bridge,I doubts that why we need to send so many
message, just slave info is enough, then RTNL is not needed here.

Ding


> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply

* Re: [PATCH] net: fix 'ip rule' iif/oif device rename
From: Eric Dumazet @ 2014-02-08  1:41 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Żenczykowski, David S. Miller, netdev,
	Willem de Bruijn, Eric Dumazet, Chris Davis, Carlo Contavalli
In-Reply-To: <1391819028-10722-1-git-send-email-zenczykowski@gmail.com>

On Fri, 2014-02-07 at 16:23 -0800, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> ip rules with iif/oif references do not update:
> (detach/attach) across interface renames.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> CC: Willem de Bruijn <willemb@google.com>
> CC: Eric Dumazet <edumazet@google.com>
> CC: Chris Davis <chrismd@google.com>
> CC: Carlo Contavalli <ccontavalli@google.com>
> 
> Google-Bug-Id: 12936021
> ---
>  net/core/fib_rules.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-08  1:21 UTC (permalink / raw)
  To: sfeldma@cumulusnetworks.com
  Cc: Cong Wang, Thomas Glanzmann, Eric Dumazet, Veaceslav Falico, andy,
	Jiří Pírko, netdev
In-Reply-To: <31653.1391725983@death.nxdomain>

Jay Vosburgh <fubar@us.ibm.com> wrote:

>
>Cong Wang <cwang@twopensource.com> wrote:
>
>>On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>> Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>
>>>>Cong Wang <cwang@twopensource.com> wrote:
>>>>
>>>>
>>>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>>>RTNL is not necessary on the vast majority of state machine runs
>>>>(because no state changes take place, i.e., no ports are disabled or
>>>>enabled).  The above change would add 10 round trips per second to RTNL,
>>>>which seems excessive.
>>>>
>>>>       Also, we cannot unconditionally acquire RTNL in this function,
>>>>as it would race with the call to cancel_delayed_work_sync from
>>>>bond_close (via bond_work_cancel_all).
>>
>>OK.
>>
>>>
>>>         Thought of one more problem: we can't hold a regular lock while
>>> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
>>> call has to be RTNL and nothing else.
>>>
>>
>>s/GFP_KERNEL/GFP_ATOMIC/
>
>	Yah, that would help with extra locks, but not totally solve
>things.  I'm looking around, and seeing a number of other places that
>will end up at one of these rtmsg_ifinfo calls with incorrect locking:
>
>	bond_ab_arp_probe calls via bond_set_slave_active_flags and
>bond_set_slave_inactive_flags without RTNL.
>
>	bond_change_active_slave calls via bond_set_slave_inactive_flags
>and bond_set_slave_active_flags with other locks held, and maybe without
>RTNL; I'm not sure if bond_option_active_slave_set holds RTNL when it
>calls bond_select_active_slave.
>
>	bond_open calls via bond_set_slave_active_flags and
>bond_set_slave_inactive_flags with RTNL, but also with other locks held.
>
>	bond_loadbalance_arp_mon calls bond_set_active_slave and
>bond_set_backup_slave without RTNL.
>
>	This is in addition to the cases in the 802.3ad code from
>__enable_port and __disable_port calls.

	Just an update in case anybody else is looking into this, and
some questions for Scott.

	Acquiring RTNL for the __enable_port and __disable_port cases is
difficult, as those calls generally already hold the state machine lock,
and cannot unconditionally call rtnl_lock because either they already
hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential
for deadlock with bond_3ad_adapter_speed_changed,
bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or
bond_3ad_update_lacp_rate.  All four of those are called with RTNL held,
and acquire the state machine lock second.  The calling contexts for
__enable_port and __disable_port already hold the state machine lock,
and may or may not need RTNL.

	Scott: you added these calls, so can you explain what they're
for?  I'm asking for two reasons:

	First, if they do not occur synchronously is it going to be a
problem?  E.g., for the 802.3ad case, if the rtmsg_ifinfo is called
either at the end of the state machine run, or for non-state machine
events, at the next run of the state machine (which is every 100 ms),
would that be a problem?  Setting a flag in the slave somewhere that an
rtmsg_ifinfo is needed should be doable for the 802.3ad case.

	Second, what do the messages mean?  That the slave is now
"active and usable"?  I'm asking because I suspect the bond_ab_arp_probe
usage wherein it adjusts the flags and curr_active_slave should not
actually call rtmsg_ifinfo, as the slave there is not really "up."
What's going on there is that the ARP monitor cycles through each slave
one by one, and tests to see if that slave works.  If it does work, then
it is set as the active elsewhere in the monitor code.  This function
adjusts the flags so that the ARP monitor will treat the "testing" slave
as "active" for purposes of determining whether or not it is up.  I
suspect this adjustment to the flags should not actually generate an
rtmsg_ifinfo.

	I think the remaining cases can be dealt with, but clarification
on the above two questions would be very helpful.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Hannes Frederic Sowa @ 2014-02-08  0:58 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: David Miller, netdev, Templin, Fred L, nicolas.dichtel
In-Reply-To: <CALnjE+pbcbENWjQcE8T0QLa=d0iia3EBBVJz-sGVzAZbsQarLQ@mail.gmail.com>

[Cc Nicolas]

On Fri, Feb 07, 2014 at 02:49:20PM -0800, Pravin Shelar wrote:
> On Fri, Feb 7, 2014 at 2:28 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > On Fri, Feb 07, 2014 at 02:12:38PM -0800, Pravin wrote:
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -3905,12 +3905,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
> >>   */
> >>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> >>  {
> >> -     if (xnet)
> >> +     if (xnet) {
> >>               skb_orphan(skb);
> >> +             skb->local_df = 0;
> >> +     }
> >>       skb->tstamp.tv64 = 0;
> >>       skb->pkt_type = PACKET_HOST;
> >>       skb->skb_iif = 0;
> >> -     skb->local_df = 0;
> >>       skb_dst_drop(skb);
> >>       skb->mark = 0;
> >>       secpath_reset(skb);
> >
> > I wonder if this should be the right behaviour for tunnels, which should just
> > do fragmentation based on IP_DF, even if the packet originated locally from a
> > socket which allowed local fragmentation (inet->pmtudisc < IP_PMTUDISC_DO).
> >
> This is not about tunneling, skb_scrub_packet() is generic function
> which should not reset local_df on all packets.
> 
> We can have separate discussion about use of local_df and tunneling in
> another thread.

This change only affects tunnel code as of current net branch, how do
you not expect a discussion about that in this thread, I really wonder?

May I know because of wich vport, vxlan or gre, you did this change?

I am feeling a bit uncomfortable handling remote and local packets that
differently on lower tunnel output (local_df is mostly set on locally
originating packets).

Thanks,

  Hannes

^ permalink raw reply

* [PATCH V2] staging: r8188eu: Fix missing header
From: Larry Finger @ 2014-02-08  0:38 UTC (permalink / raw)
  To: gregkh; +Cc: devel, netdev, Larry Finger

Commit 2397c6e0927675d983b34a03401affdb64818d07 entitled "staging: r8188eu:
Remove wrappers around vmalloc and vzalloc" and
commit: 03bd6aea7ba610a1a19f840c373624b8b0adde0d entitled "staging: r8188eu:
Remove wrappers around vfree" failed to add the header file needed
to provide vzalloc and vfree.

This problem was reported by the kbuild test robot.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---

V2 - add attribution to the build robot
---

 drivers/staging/rtl8188eu/core/rtw_mlme.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_mp.c        | 1 +
 drivers/staging/rtl8188eu/core/rtw_recv.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_sta_mgt.c   | 1 +
 drivers/staging/rtl8188eu/core/rtw_xmit.c      | 1 +
 drivers/staging/rtl8188eu/os_dep/ioctl_linux.c | 1 +
 drivers/staging/rtl8188eu/os_dep/usb_intf.c    | 1 +
 7 files changed, 7 insertions(+)

diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme.c b/drivers/staging/rtl8188eu/core/rtw_mlme.c
index 2037be0..927fc72 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mlme.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mlme.c
@@ -31,6 +31,7 @@
 #include <wlan_bssdef.h>
 #include <rtw_ioctl_set.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 extern unsigned char	MCS_rate_2R[16];
 extern unsigned char	MCS_rate_1R[16];
diff --git a/drivers/staging/rtl8188eu/core/rtw_mp.c b/drivers/staging/rtl8188eu/core/rtw_mp.c
index 9e97b57..99c06c4 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mp.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mp.c
@@ -23,6 +23,7 @@
 
 #include "odm_precomp.h"
 #include "rtl8188e_hal.h"
+#include <linux/vmalloc.h>
 
 u32 read_macreg(struct adapter *padapter, u32 addr, u32 sz)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_recv.c b/drivers/staging/rtl8188eu/core/rtw_recv.c
index 8490d51..ed308ff 100644
--- a/drivers/staging/rtl8188eu/core/rtw_recv.c
+++ b/drivers/staging/rtl8188eu/core/rtw_recv.c
@@ -28,6 +28,7 @@
 #include <ethernet.h>
 #include <usb_ops.h>
 #include <wifi.h>
+#include <linux/vmalloc.h>
 
 static u8 SNAP_ETH_TYPE_IPX[2] = {0x81, 0x37};
 static u8 SNAP_ETH_TYPE_APPLETALK_AARP[2] = {0x80, 0xf3};
diff --git a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
index 6df9669..e8a654d 100644
--- a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
+++ b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
@@ -25,6 +25,7 @@
 #include <xmit_osdep.h>
 #include <mlme_osdep.h>
 #include <sta_info.h>
+#include <linux/vmalloc.h>
 
 static void _rtw_init_stainfo(struct sta_info *psta)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c b/drivers/staging/rtl8188eu/core/rtw_xmit.c
index aa77270..2c0a40f 100644
--- a/drivers/staging/rtl8188eu/core/rtw_xmit.c
+++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c
@@ -26,6 +26,7 @@
 #include <ip.h>
 #include <usb_ops.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 };
 static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 };
diff --git a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
index 0204082..f3584dd 100644
--- a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
+++ b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
@@ -35,6 +35,7 @@
 
 #include <rtw_mp.h>
 #include <rtw_iol.h>
+#include <linux/vmalloc.h>
 
 #define RTL_IOCTL_WPA_SUPPLICANT	(SIOCIWFIRSTPRIV + 30)
 
diff --git a/drivers/staging/rtl8188eu/os_dep/usb_intf.c b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
index 0a585b2..8ad3948 100644
--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
@@ -26,6 +26,7 @@
 #include <hal_intf.h>
 #include <rtw_version.h>
 #include <linux/usb.h>
+#include <linux/vmalloc.h>
 #include <osdep_intf.h>
 
 #include <usb_vendor_req.h>
-- 
1.8.4.5

^ permalink raw reply related

* [PATCH] net: fix 'ip rule' iif/oif device rename
From: Maciej Żenczykowski @ 2014-02-08  0:23 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S. Miller
  Cc: netdev, Willem de Bruijn, Eric Dumazet, Chris Davis,
	Carlo Contavalli

From: Maciej Żenczykowski <maze@google.com>

ip rules with iif/oif references do not update:
(detach/attach) across interface renames.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: Willem de Bruijn <willemb@google.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Chris Davis <chrismd@google.com>
CC: Carlo Contavalli <ccontavalli@google.com>

Google-Bug-Id: 12936021
---
 net/core/fib_rules.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index f409e0bd35c0..185c341fafbd 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -745,6 +745,13 @@ static int fib_rules_event(struct notifier_block *this, unsigned long event,
 			attach_rules(&ops->rules_list, dev);
 		break;
 
+	case NETDEV_CHANGENAME:
+		list_for_each_entry(ops, &net->rules_ops, list) {
+			detach_rules(&ops->rules_list, dev);
+			attach_rules(&ops->rules_list, dev);
+		}
+		break;
+
 	case NETDEV_UNREGISTER:
 		list_for_each_entry(ops, &net->rules_ops, list)
 			detach_rules(&ops->rules_list, dev);
-- 
1.8.3

^ permalink raw reply related

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Eric W. Biederman @ 2014-02-08  0:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, rientjes, linux-kernel
In-Reply-To: <20140206.222932.292588043950970246.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 06 Feb 2014 10:42:42 -0800
>
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> sock_alloc_send_pskb() & sk_page_frag_refill()
>> have a loop trying high order allocations to prepare
>> skb with low number of fragments as this increases performance.
>> 
>> Problem is that under memory pressure/fragmentation, this can
>> trigger OOM while the intent was only to try the high order
>> allocations, then fallback to order-0 allocations.
>> 
>> We had various reports from unexpected regressions.
>> 
>> According to David, setting __GFP_NORETRY should be fine,
>> as the asynchronous compaction is still enabled, and this
>> will prevent OOM from kicking as in :
>  ...
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Acked-by: David Rientjes <rientjes@google.com>
>
> Applied, do we want this for -stable?

The first hunk goes back to 3.12 and the second hunk goes back to 3.8.

I think so.    The change is safe and this class of problem can cause an
external attack to trigger an OOM on your box, by controlling the packet
flow.

Eric

^ permalink raw reply

* Fw: [Bug 70151] New: redundant GRE tunnel on same interface not brought down when gre link is downed
From: Stephen Hemminger @ 2014-02-08  0:21 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Thu, 6 Feb 2014 07:19:48 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 70151] New: redundant GRE tunnel on same interface not brought down when gre link is downed


https://bugzilla.kernel.org/show_bug.cgi?id=70151

            Bug ID: 70151
           Summary: redundant GRE tunnel on same interface not brought
                    down when gre link is downed
           Product: Networking
           Version: 2.5
    Kernel Version: 3.10.0+
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: tbeadle@gmail.com
        Regression: No

Say I have two IP's assigned to an interface (either a physical interface or an
802.1q sub-interface) and I create two gre interface, using each of the IP
addresses assigned above as the local endpoint.  I can send GRE keepalives to
either gre interface and a response is received.

Now if I bring one of the gre interfaces down and then send a keepalive to it,
a response is still received.  Not until I bring down the other interface also
do I stop getting responses.

Prior to the GRE/tunnel refactoring introduced in 3.10.0 (commit c5441932), the
keepalives to the downed gre interface would not elicit a response.  I have
tested kernels from 3.10.0 up through 3.13.0.

Steps to reproduce:
- Assign 192.168.56.1/24 to eth0 on a machine to be used to send the keepalives
(SENDER).

- Assign 192.168.56.3/24 and 192.168.56.4/24 to eth0 on the device under test
(DUT).
ip addr add 192.168.56.3/24 dev eth0
ip addr add 192.168.56.4/24 dev eth0

- Bring up the 2 GRE interfaces on DUT.
ip tunnel add gre1 mode gre remote 192.168.56.1 local 192.168.56.3 ttl 255
ip link set gre1 up
sysctl -w net.ipv4.conf.gre1.accept_local=1
sysctl -w net.ipv4.conf.gre1.forwarding=1
sysctl -w net.ipv6.conf.gre1.forwarding=1
ip tunnel add gre2 mode gre remote 192.168.56.1 local 192.168.56.4 ttl 255
ip link set gre2 up
sysctl -w net.ipv4.conf.gre2.accept_local=1
sysctl -w net.ipv4.conf.gre2.forwarding=1
sysctl -w net.ipv6.conf.gre2.forwarding=1

- I used scapy on SENDER to generate a GRE keepalive packet to 192.168.56.3
(08:00:27:53:aa:c1 is the MAC address of eth0 on DUT):
>>> ip = IP(src='192.168.56.1', dst='192.168.56.3')
>>> pkt = Ether(dst='08:00:27:53:aa:c1')/ip/GRE()/IP(src=ip.dst, dst=ip.src)/GRE()
>>> sendp(pkt, iface='eth0')

- When issuing the sendp command, run "tcpdump -nvi eth0 proto gre" on SENDER. 
You should see the following request and reply:

22:24:09.608401 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto GRE
(47), length 48)
    192.168.56.1 > 192.168.56.3: GREv0, Flags [none], length 28
        IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto GRE (47),
length 24)
    192.168.56.3 > 192.168.56.1: GREv0, Flags [none], length 4
        gre-proto-0x0
22:24:09.608879 IP (tos 0x0, ttl 63, id 1, offset 0, flags [none], proto GRE
(47), length 24)
    192.168.56.3 > 192.168.56.1: GREv0, Flags [none], length 4
        gre-proto-0x0

- On DUT, run "ip link set gre1 down" and do the sendp command again from
SENDER, watching the tcpdump output as well.  With kernels 3.10.0+, you'll
still see a response packet.  With earlier kernels, no response is sent, which
is the appropriate behavior.

Note that, with 3.10.0+, if you now run "ip link set gre2 down" and do the
sendp again, no response is sent.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH] staging: r8188eu: Fix missing header
From: Larry Finger @ 2014-02-08  0:09 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, netdev
In-Reply-To: <20140208000423.GB17796@kroah.com>

On 02/07/2014 06:04 PM, Greg KH wrote:
> On Fri, Feb 07, 2014 at 05:12:10PM -0600, Larry Finger wrote:
>> Commit 2397c6e0927675d983b34a03401affdb64818d07 entitled "staging: r8188eu:
>> Remove wrappers around vmalloc and vzalloc" and
>> commit: 03bd6aea7ba610a1a19f840c373624b8b0adde0d entitled "staging: r8188eu:
>> Remove wrappers around vfree" failed to add the header file needed
>> to provide vzalloc and vfree.
>>
>> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
>
> Was this reported by the 0-day bot?  If so, please credit it.

Yes it was. V2 coming soon.

Larry

^ permalink raw reply

* Re: [PATCH] staging: r8188eu: Fix missing header
From: Greg KH @ 2014-02-08  0:04 UTC (permalink / raw)
  To: Larry Finger; +Cc: devel, netdev
In-Reply-To: <1391814730-3461-1-git-send-email-Larry.Finger@lwfinger.net>

On Fri, Feb 07, 2014 at 05:12:10PM -0600, Larry Finger wrote:
> Commit 2397c6e0927675d983b34a03401affdb64818d07 entitled "staging: r8188eu:
> Remove wrappers around vmalloc and vzalloc" and
> commit: 03bd6aea7ba610a1a19f840c373624b8b0adde0d entitled "staging: r8188eu:
> Remove wrappers around vfree" failed to add the header file needed
> to provide vzalloc and vfree.
> 
> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>

Was this reported by the 0-day bot?  If so, please credit it.

thanks,

greg k-h

^ permalink raw reply

* [PATCH] wan: dlci: Remove unused netdev_priv pointer
From: Christian Engelmayer @ 2014-02-07 23:21 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Zefan Li, Jiri Pirko

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

Remove occurrences of unused pointer to network device private data in
functions dlci_header() and dlci_receive().

Detected by Coverity: CID 139844, CID 139845.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
---
 drivers/net/wan/dlci.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/wan/dlci.c b/drivers/net/wan/dlci.c
index 0d1c759..19f7cb2 100644
--- a/drivers/net/wan/dlci.c
+++ b/drivers/net/wan/dlci.c
@@ -71,12 +71,9 @@ static int dlci_header(struct sk_buff *skb, struct net_device *dev,
 		       const void *saddr, unsigned len)
 {
 	struct frhdr		hdr;
-	struct dlci_local	*dlp;
 	unsigned int		hlen;
 	char			*dest;
 
-	dlp = netdev_priv(dev);
-
 	hdr.control = FRAD_I_UI;
 	switch (type)
 	{
@@ -107,11 +104,9 @@ static int dlci_header(struct sk_buff *skb, struct net_device *dev,
 
 static void dlci_receive(struct sk_buff *skb, struct net_device *dev)
 {
-	struct dlci_local *dlp;
 	struct frhdr		*hdr;
 	int					process, header;
 
-	dlp = netdev_priv(dev);
 	if (!pskb_may_pull(skb, sizeof(*hdr))) {
 		netdev_notice(dev, "invalid data no header\n");
 		dev->stats.rx_errors++;
-- 
1.8.3.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related

* [PATCH] staging: r8188eu: Fix missing header
From: Larry Finger @ 2014-02-07 23:12 UTC (permalink / raw)
  To: gregkh; +Cc: netdev, devel, Larry Finger

Commit 2397c6e0927675d983b34a03401affdb64818d07 entitled "staging: r8188eu:
Remove wrappers around vmalloc and vzalloc" and
commit: 03bd6aea7ba610a1a19f840c373624b8b0adde0d entitled "staging: r8188eu:
Remove wrappers around vfree" failed to add the header file needed
to provide vzalloc and vfree.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/staging/rtl8188eu/core/rtw_mlme.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_mp.c        | 1 +
 drivers/staging/rtl8188eu/core/rtw_recv.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_sta_mgt.c   | 1 +
 drivers/staging/rtl8188eu/core/rtw_xmit.c      | 1 +
 drivers/staging/rtl8188eu/os_dep/ioctl_linux.c | 1 +
 drivers/staging/rtl8188eu/os_dep/usb_intf.c    | 1 +
 7 files changed, 7 insertions(+)

diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme.c b/drivers/staging/rtl8188eu/core/rtw_mlme.c
index 2037be0..927fc72 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mlme.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mlme.c
@@ -31,6 +31,7 @@
 #include <wlan_bssdef.h>
 #include <rtw_ioctl_set.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 extern unsigned char	MCS_rate_2R[16];
 extern unsigned char	MCS_rate_1R[16];
diff --git a/drivers/staging/rtl8188eu/core/rtw_mp.c b/drivers/staging/rtl8188eu/core/rtw_mp.c
index 9e97b57..99c06c4 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mp.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mp.c
@@ -23,6 +23,7 @@
 
 #include "odm_precomp.h"
 #include "rtl8188e_hal.h"
+#include <linux/vmalloc.h>
 
 u32 read_macreg(struct adapter *padapter, u32 addr, u32 sz)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_recv.c b/drivers/staging/rtl8188eu/core/rtw_recv.c
index 8490d51..ed308ff 100644
--- a/drivers/staging/rtl8188eu/core/rtw_recv.c
+++ b/drivers/staging/rtl8188eu/core/rtw_recv.c
@@ -28,6 +28,7 @@
 #include <ethernet.h>
 #include <usb_ops.h>
 #include <wifi.h>
+#include <linux/vmalloc.h>
 
 static u8 SNAP_ETH_TYPE_IPX[2] = {0x81, 0x37};
 static u8 SNAP_ETH_TYPE_APPLETALK_AARP[2] = {0x80, 0xf3};
diff --git a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
index 6df9669..e8a654d 100644
--- a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
+++ b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
@@ -25,6 +25,7 @@
 #include <xmit_osdep.h>
 #include <mlme_osdep.h>
 #include <sta_info.h>
+#include <linux/vmalloc.h>
 
 static void _rtw_init_stainfo(struct sta_info *psta)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c b/drivers/staging/rtl8188eu/core/rtw_xmit.c
index aa77270..2c0a40f 100644
--- a/drivers/staging/rtl8188eu/core/rtw_xmit.c
+++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c
@@ -26,6 +26,7 @@
 #include <ip.h>
 #include <usb_ops.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 };
 static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 };
diff --git a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
index 0204082..f3584dd 100644
--- a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
+++ b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
@@ -35,6 +35,7 @@
 
 #include <rtw_mp.h>
 #include <rtw_iol.h>
+#include <linux/vmalloc.h>
 
 #define RTL_IOCTL_WPA_SUPPLICANT	(SIOCIWFIRSTPRIV + 30)
 
diff --git a/drivers/staging/rtl8188eu/os_dep/usb_intf.c b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
index 0a585b2..8ad3948 100644
--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
@@ -26,6 +26,7 @@
 #include <hal_intf.h>
 #include <rtw_version.h>
 #include <linux/usb.h>
+#include <linux/vmalloc.h>
 #include <osdep_intf.h>
 
 #include <usb_vendor_req.h>
-- 
1.8.4.5

^ permalink raw reply related

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Pravin Shelar @ 2014-02-07 22:49 UTC (permalink / raw)
  To: Pravin, David Miller, netdev, Templin, Fred L
In-Reply-To: <20140207222840.GD16198@order.stressinduktion.org>

On Fri, Feb 7, 2014 at 2:28 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi!
>
> On Fri, Feb 07, 2014 at 02:12:38PM -0800, Pravin wrote:
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3905,12 +3905,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
>>   */
>>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>>  {
>> -     if (xnet)
>> +     if (xnet) {
>>               skb_orphan(skb);
>> +             skb->local_df = 0;
>> +     }
>>       skb->tstamp.tv64 = 0;
>>       skb->pkt_type = PACKET_HOST;
>>       skb->skb_iif = 0;
>> -     skb->local_df = 0;
>>       skb_dst_drop(skb);
>>       skb->mark = 0;
>>       secpath_reset(skb);
>
> I wonder if this should be the right behaviour for tunnels, which should just
> do fragmentation based on IP_DF, even if the packet originated locally from a
> socket which allowed local fragmentation (inet->pmtudisc < IP_PMTUDISC_DO).
>
This is not about tunneling, skb_scrub_packet() is generic function
which should not reset local_df on all packets.

We can have separate discussion about use of local_df and tunneling in
another thread.

^ permalink raw reply

* Re: [PATCH V3] net/dt: Add support for overriding phy configuration from device tree
From: Florian Fainelli @ 2014-02-07 22:43 UTC (permalink / raw)
  To: David Laight
  Cc: Matthew Garrett, netdev,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kishon Vijay Abraham I
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6B8BCA-VkEWCZq2GCInGFn1LkZF6NBPR1lH4CV8@public.gmane.org>

2014-02-05 David Laight <David.Laight-JxhZ9S5GRejQT0dZR+AlfA@public.gmane.org>:
> From: Florian Fainelli
>> It would be good to explain exactly how your hardware is broken
>> exactly. I really do not think that such a fine-grained setting where
>> you could disable, e.g: 100BaseT_Full, but allow 100BaseT_Half to
>> remain usable makes that much sense. In general, Gigabit might be
>> badly broken, but 100 and 10Mbits/sec should work fine. How about the
>> MASTER-SLAVE bit, is overriding it really required?
>
> There are plenty of systems out there where you'd want to disable
> either HDX or FDX modes.
> The MAC unit has to know whether the PHY is in HDX or FDX in order
> to work properly. Many do not need to know the speed - since the
> PHY is responsible for the tx/rx fifo clock.
> Getting the negotiated speed out of the PHY can be difficult, while
> the ANAR can easily be set.
> Unfortunately it is usually impossible to disable the 'fall-back'
> 10M HDX.

The problem that I have with that approach in general is that:

- it bloats the code for a set of properties that are going to be used
by hopefully a few percentage of the actual Device Trees out there
- it puts no limits on what is acceptable/best-practice to be put in
terms of configuration in the Device Tree, how about the 16x16 other
register values out there which are standardized?
- a PHY fixup should be registered based on the top-level compatible
property for a given board where the specific PHY on a specific board
is known to be broken
- make things incredibly harder to debug than they are today

I do acknowledge the need to have a solution to these problems, but
this seems to duplicate existing mechanisms available (e.g: PHY
fixups) without leveraging information that should be properly flagged
in the Device Tree (board model, root-node compatible string etc...)
to allow software to take corrective measures.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox