Netdev List
 help / color / mirror / Atom feed
* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:53 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208133744.GA20512@glanzmann.de>

On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Very nice.

Now we have to check your NIC and how TX completion is performed.

What is your NIC model and driver ?

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:50 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391866389.10160.80.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
> On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> > Here is the combined patch, could you test it ?
> 
> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.
> 
> 

BTW this problem demonstrates there is room for improvement in iCSCI,
using MSG_MORE to avoid sending two small segments in separate frames.

[1] 00:32:35.726568 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], length 48
[2] 00:32:35.838074 IP 10.101.0.13.27778 > 10.101.99.5.3260: Flags [.], ack 193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
[3] 00:32:35.838099 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], length 512

[1] & [3] could be coalesced, and [2] would be avoided.

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 13:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391866389.10160.80.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.

confirmed:

(node-62) [~/work/linux-2.6] git show a181ceb501b31b4bf8812a5c84c716cc31d82c2d | head
commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Dec 17 09:58:30 2013 -0800

    tcp: autocork should not hold first packet in write queue

    Willem noticed a TCP_RR regression caused by TCP autocorking
    on a Mellanox test bed. MLX4_EN_TX_COAL_TIME is 16 us, which can be
    right above RTT between hosts.

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 13:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391865273.10160.76.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> > tcp corking kills iSCSI performance

> Here is the combined patch, could you test it?

the patch did not apply, so I edited by hand. Here is the resulting
patch:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b8..40d1958 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+			tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+	                               0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,16 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
-			break;
+			/* It is possible TX completion already happened
+			 * before we set TSQ_THROTTLED, so we must
+			 * test again the condition.
+			 * We abuse smp_mb__after_clear_bit() because
+			 * there is no smp_mb__after_set_bit() yet
+			 */
+			smp_mb__after_clear_bit();
+			if (atomic_read(&sk->sk_wmem_alloc) > limit)
+				break;
+
 		}
 
 		limit = mss_now;

-- cut here --

It fixes my case but if you look at the round trip time it is not even
close what it used to be. So while this fixes my problem I'm still for
disabling it by default.

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Cheers,
        Thomas

^ permalink raw reply related

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:33 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391865273.10160.76.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> Here is the combined patch, could you test it ?

Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
("tcp: autocork should not hold first packet in write queue")
in your tree.

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:14 UTC (permalink / raw)
  To: Thomas Glanzmann, John Ogness
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208093808.GD16336@glanzmann.de>

On Sat, 2014-02-08 at 10:38 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> [RESEND: the time it took the VMFS was created was switched between
> on/off so with on it took over 2 minutes with off it took less than 4
> seconds]
> 
> [RESEND 2: The throughput graphs were switched as well ;-(]
> 
> > * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > > and 15 minutes on 3.14.0-rc2+.
> 
> * Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> > Would it be possible to try a couple of different stable kernel
> > versions to help track this down?
> 
> I bisected[1] it and found the offending commit f54b311 tcp auto corking
> [2] 'if we have a small send and a previous packet is already in the
> qdisc or device queue, defer until TX completion or we get more data.'
> - Description by David S. Miller
> 
> I gathered a pcap with tcp_autocorking on and off.
> 
> On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png
> 
> Off: - took 4 seconds to create a 500 GB VMFS file system
> sysctl net.ipv4.tcp_autocorking=0
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png
> 
> First graph can be generated by opening bunziping the file, opening it
> in wireshark and select Statistics > IO Grap and change the unit to
> Bytes/Tick. The second graph can be generated by selecting Statistics >
> TCP Stream Graph > Round Trip Time.
> 
> You can also see that the round trip time increases by factor 25 at
> least.
> 
> I once saw a similar problem with dealyed ACK packets of the
> paravirtulized network driver in xen it caused that the tcp window
> filled up and slowed down the throughput from 30 MB/s to less than 100
> KB/s the symptom was that the login to a Windows desktop took more than
> 10 minutes while it used to be below 30 seconds because the profile of
> the user was loaded slowly from a CIFS server. At that time the culprit
> were also delayed small packets: ACK packets in the CIFS case. However I
> only proofed iSCSI regression so far for tcp auto corking but assume we
> will see many others if we leave it enabled.
> 
> I found the problem by doing the following:
>         - I compiled kernel by executing the following commands:
>                 yes '' | make oldconfig
>                 time make -j 24
>                 / make modules_install
>                 / mkinitramfs -o /boot/initrd.img-bisect <version>
> 
>         - I cleaned the iSCSI configuration after each test by issuing:
>                 /etc/init.d/target stop
>                 rm /iscsi?/* /etc/target/*
> 
>         - I configured iSCSI after each reboot
>                 cat > lio-v101.conf <<EOF
> set global auto_cd_after_create=false
> /backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true
> 
> /iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1
> 
> saveconfig
> yes
> EOF
>                 targetcli < lio-v101.conf
>                 And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
>                 to the iSCSI target, configured the portal, rescanned and
>                 created a 500 GB VMFS 5 filesystem and noticed the time if it
>                 was longer than 2 minutes it was bad if it was below 10 seconds
>                 it was good.
>                 git bisect good/bad
> 
> My network config is:
> 
> auto bond0
> iface bond0 inet static
>        address 10.100.4.62
>        netmask 255.255.0.0
>        gateway 10.100.0.1
>        slaves eth0 eth1
>        bond-mode 802.3ad
>        bond-miimon 100
> 
> auto bond0.101
> iface bond0.101 inet static
>        address 10.101.99.4
>        netmask 255.255.0.0
> 
> auto bond1
> iface bond1 inet static
>        address 10.100.5.62
>        netmask 255.255.0.0
>        slaves eth2 eth3
>        bond-mode 802.3ad
>        bond-miimon 100
> 
> auto bond1.101
> iface bond1.101 inet static
>        address 10.101.99.5
>        netmask 255.255.0.0
> 
> I propose to disable tcp_autocorking by default because it obviously degrades
> iSCSI performance and probably many other protocols. Also the commit mentions
> that applications can explicitly disable auto corking we probably should do
> that for the iSCSI target, but I don't know how. Anyone?
> 
> [1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
> [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3
> 
> Cheers,
>         Thomas

Hi Thomas, thanks a lot for this very detailed bug report.

I think you are hit by other bug(s), lets try to fix it/them instead of
disabling this feature.

John Ogness started a thread yesterday about TCP_NODELAY being hit by
the TCP Small Queue mechanism, which is the base of TCP auto corking.

Two RFC patches were discussed.

One dealing with the TCP_NODELAY flag that John posted, and I'll adapt
it to the current kernel.

One dealing with a possible race, that I suggested (I doubt this could
trigger at every write, but lets fix it anyway)

Here is the combined patch, could you test it ?

Thanks !

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 10435b3b9d0f..3be16727f058 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+		tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+			       0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,15 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
-			break;
+			/* It is possible TX completion already happened
+			 * before we set TSQ_THROTTLED, so we must
+			 * test again the condition.
+			 * We abuse smp_mb__after_clear_bit() because
+			 * there is no smp_mb__after_set_bit() yet
+			 */
+			smp_mb__after_clear_bit();
+			if (atomic_read(&sk->sk_wmem_alloc) > limit)
+				break;
 		}
 
 		limit = mss_now;

^ permalink raw reply related

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:38 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

[RESEND 2: The throughput graphs were switched as well ;-(]

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:23 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* [PATCH] tcp: disable auto corking by default
From: Thomas Glanzmann @ 2014-02-08  9:19 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208091828.GA16336@glanzmann.de>

When using auto corking with iSCSI the round trip time at least increases by
factor 25 probably more. Other protocols are very likely also effected.

Signed-off-by: Thomas Glanzmann <thomas@glanzmann.de>
---
 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4475b3b..da563a4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -285,7 +285,7 @@ int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
 int sysctl_tcp_min_tso_segs __read_mostly = 2;
 
-int sysctl_tcp_autocorking __read_mostly = 1;
+int sysctl_tcp_autocorking __read_mostly = 0;
 
 struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);
-- 
1.7.10.4

^ permalink raw reply related

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:18 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 4 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* [PATCH] sections, ipvs: Remove useless __read_mostly for ipvs genl_ops
From: Andi Kleen @ 2014-02-08  7:57 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Andi Kleen, Wensong Zhang, Simon Horman,
	Patrick McHardy, lvs-devel

const __read_mostly does not make any sense, because const
data is already read-only. Remove the __read_mostly
for the ipvs genl_ops. This avoids a LTO
section conflict compile problem.

Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Patrick McHardy <kaber@trash.net>
Cc: lvs-devel@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 35be035..2a68a38 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3580,7 +3580,7 @@ out:
 }
 
 
-static const struct genl_ops ip_vs_genl_ops[] __read_mostly = {
+static const struct genl_ops ip_vs_genl_ops[] = {
 	{
 		.cmd	= IPVS_CMD_NEW_SERVICE,
 		.flags	= GENL_ADMIN_PERM,
-- 
1.8.5.2


^ permalink raw reply related

* Re: [PATCH] net: rfkill-regulator: Add devicetree support.
From: Bill Fink @ 2014-02-08  6:22 UTC (permalink / raw)
  To: Marek Belisko
  Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak, rob,
	linville, johannes, davem, grant.likely, neilb, hns, devicetree,
	linux-doc, linux-kernel, linux-wireless, netdev
In-Reply-To: <1391802529-29861-1-git-send-email-marek@goldelico.com>

On Fri,  7 Feb 2014, Marek Belisko wrote:

> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Marek Belisko <marek@goldelico.com>
> ---
> Based on Neil's patch and extend for documentation and bindings include.
> 
>  .../bindings/net/rfkill/rfkill-relugator.txt       | 28 ++++++++++++++++

                                  ^^^^^^^^^
                                  Typo in file name.

					-Bill



>  include/dt-bindings/net/rfkill-regulator.h         | 23 +++++++++++++
>  net/rfkill/rfkill-regulator.c                      | 38 ++++++++++++++++++++++
>  3 files changed, 89 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
>  create mode 100644 include/dt-bindings/net/rfkill-regulator.h
> 
> diff --git a/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
> new file mode 100644
> index 0000000..cdb7dd7
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/rfkill/rfkill-relugator.txt
> @@ -0,0 +1,28 @@
> +Regulator consumer for rfkill devices
> +
> +Required properties:
> +- compatible   : Must be "rfkill-regulator".
> +- label  : Name of rfkill device.
> +- type  : Type of rfkill device.
> +
> +Possible values (defined in include/dt-bindings/net/rfkill-regulator.h):
> +	RFKILL_TYPE_ALL
> +	RFKILL_TYPE_WLAN
> +	RFKILL_TYPE_BLUETOOTH
> +	RFKILL_TYPE_UWB
> +	RFKILL_TYPE_WIMAX
> +	RFKILL_TYPE_WWAN
> +	RFKILL_TYPE_GPS
> +	RFKILL_TYPE_FM
> +	RFKILL_TYPE_NFC
> +
> +- vrfkill-supply - regulator device.
> +
> +Example:
> +	gps-rfkill {
> +		compatible = "rfkill-regulator";
> +		label = "GPS";
> +		type = <RFKILL_TYPE_GPS>;
> +		vrfkill-supply = <&reg>;
> +	};
> +

^ permalink raw reply

* Proposal
From: Mark Reyes Guus @ 2014-02-08 10:09 UTC (permalink / raw)
  To: Recipients

Good day. I am Mark Reyes Guus, I work with Abn Amro Bank as an auditor. I have a proposition to discuss with you. Should you be interested, please e-mail back to me.

Private Email: markreyesguus@abnmrob.co.uk OR markguus.reyes01@yahoo.nl

Yours Sincerely,
Mark Reyes Guus.

^ permalink raw reply

* Re: [PATCH net-next] igb: enable VLAN stripping for VMs with i350
From: Aaron Brown @ 2014-02-08  5:29 UTC (permalink / raw)
  To: Stefan Assmann; +Cc: e1000-devel, netdev, davem
In-Reply-To: <1386754354-22039-1-git-send-email-sassmann@kpanic.de>

On Wed, 2013-12-11 at 10:32 +0100, Stefan Assmann wrote:
> For i350 VLAN stripping for VMs is not enabled in the VMOLR register
> but in
> the DVMOLR register. Making the changes accordingly. It's not
> necessary to
> unset the E1000_VMOLR_STRVLAN bit on i350 as the hardware will simply
> ignore
> it.
> 
> Without this change if a VLAN is configured for a VF assigned to a
> guest
> via (i.e.)
> ip link set p1p1 vf 0 vlan 10
> the VLAN tag will not be stripped from packets going into the VM.
> Which they
> should be because the VM itself is not aware of the VLAN at all.
> 
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>

> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.h | 4 ++++
>  drivers/net/ethernet/intel/igb/e1000_regs.h  | 1 +
>  drivers/net/ethernet/intel/igb/igb_main.c    | 7 +++++++
>  3 files changed, 12 insertions(+)



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: ax88179 regression
From: renevant @ 2014-02-08  3:56 UTC (permalink / raw)
  To: renevant; +Cc: netdev
In-Reply-To: <1890496.0JLCYTPnkG@athas>

Everything is still working at commit 63a67a72d63dd077c2313cf19eb29d8e4bfa6963 

At this point i'm beginning to think all the issues i've been having are 
motherboard bios and compiler flag issues.

Regards,

Will Trives


On Saturday 08 February 2014 12:56:10 renevant@internode.on.net wrote:
> Hello,
> 
> I have finally nailed down my other issues and i'm at a point where I can
> bisect from a point that is 100% stable and working.
> 
> 
> I am currently running a kernel checked out at commit
> d194c031994d3fc1038fa09e9e92d9be24a21921
> 
> A point in 3.12rc4
> 
> At this point the ax88179 works without issue even with scatter gather
> turned on. So somewhere from this point something goes wrong and there is
> some condition that exists that can lock up the nic.
> 
> 
> I will keep bisecting at report my findings.
> 
> 
> Regards,
> 
> Will Trives

^ permalink raw reply

* Re: [PATCH v3 net 2/9] bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
From: Toshiaki Makita @ 2014-02-08  2:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Toshiaki Makita, David S . Miller, Vlad Yasevich, netdev
In-Reply-To: <20140207093127.56f78187@samsung-9>

On Fri, 2014-02-07 at 09:31 -0700, Stephen Hemminger wrote:
> On Fri,  7 Feb 2014 16:48:19 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> 
> > Since commit bc9a25d21ef8 ("bridge: Add vlan support for local fdb entries"),
> > br_fdb_changeaddr() has inserted a new local fdb entry only if it can
> > find old one. But if we have two ports where they have the same address
> > or user has deleted a local entry, there will be no entry for one of the
> > ports.
> > 
> > Example of problematic case:
> >   ip link set eth0 address aa:bb:cc:dd:ee:ff
> >   ip link set eth1 address aa:bb:cc:dd:ee:ff
> >   brctl addif br0 eth0
> >   brctl addif br0 eth1 # eth1 will not have a local entry due to dup.
> 
> I think the second addif should fail, it doesn't seem valid to have
> two interfaces on same bridge with same address. Most hardware switches
> would disable the port in that case.

Thank you for your comment, but I don't think so for several reasons.

- From other network elements on the same network, bridge ports don't
appear to have a mac address, but the bridge appears to have several mac
addresses that can reach to the bridge. The duplicated address is simply
seen as one of those addresses. I don't think it is a problem.

- This operation (add a port that has duplicated address) has allowed
for several years, and it is obviously intended, as commented in
fdb_insert().

417                 /* it is okay to have multiple ports with same
418                  * address, just use the first one.
419                  */

- Hardware switches usually have one mac address per one switch. Their
ports don't have mac addresses. It is not reasonable to compare hardware
switches.

Thanks,
Toshiaki Makita

^ permalink raw reply

* [PATCH v2] SUNRPC: Allow one callback request to be received from two sk_buff
From: shaobingqing @ 2014-02-08  2:29 UTC (permalink / raw)
  To: trond.myklebust, bfields, davem
  Cc: linux-nfs, netdev, linux-kernel, shaobingqing
In-Reply-To: <no>

In current code, there only one struct rpc_rqst is prealloced. If one
callback request is received from two sk_buff, the xprt_alloc_bc_request
would be execute two times with the same transport->xid. The first time
xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
bit of transport->tcp_flags will not be cleared. The second time
xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
pointer will be returned, then xprt_force_disconnect occur. I think one
callback request can be allowed to be received from two sk_buff.

Signed-off-by: shaobingqing <shaobingqing@bwstor.com.cn>
---
 include/linux/sunrpc/xprt.h |    1 +
 net/sunrpc/xprt.c           |    1 +
 net/sunrpc/xprtsock.c       |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index cec7b9b..82bfe01 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -211,6 +211,7 @@ struct rpc_xprt {
 						 * items */
 	struct list_head	bc_pa_list;	/* List of preallocated
 						 * backchannel rpc_rqst's */
+	struct rpc_rqst	*req_first;
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 	struct list_head	recv;
 
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 095363e..93ad8bc 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1256,6 +1256,7 @@ static void xprt_init(struct rpc_xprt *xprt, struct net *net)
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 	spin_lock_init(&xprt->bc_pa_lock);
 	INIT_LIST_HEAD(&xprt->bc_pa_list);
+	xprt->req_first = NULL;
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 
 	xprt->last_used = jiffies;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ee03d35..c43dca4 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1272,7 +1272,16 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
 				container_of(xprt, struct sock_xprt, xprt);
 	struct rpc_rqst *req;
 
-	req = xprt_alloc_bc_request(xprt);
+	if (xprt->req_first != NULL &&
+			xprt->req_first->rq_xid == transport->tcp_xid) {
+		req = xprt->req_first;
+	} else if (xprt->req_first != NULL &&
+			xprt->req_first->rq_xid != transport->tcp_xid) {
+		xprt_free_bc_request(xprt);
+		req = xprt_alloc_bc_request(xprt);
+	} else {
+		req = xprt_alloc_bc_request(xprt);
+	}
 	if (req == NULL) {
 		printk(KERN_WARNING "Callback slot table overflowed\n");
 		xprt_force_disconnect(xprt);
@@ -1297,6 +1306,8 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
 		list_add(&req->rq_bc_list, &bc_serv->sv_cb_list);
 		spin_unlock(&bc_serv->sv_cb_lock);
 		wake_up(&bc_serv->sv_cb_waitq);
+	} else {
+		xprt->req_first = req;
 	}
 
 	req->rq_private_buf.len = transport->tcp_copied;
-- 
1.7.4.2

^ permalink raw reply related

* ax88179 regression
From: renevant @ 2014-02-08  1:56 UTC (permalink / raw)
  To: netdev

Hello,

I have finally nailed down my other issues and i'm at a point where I can 
bisect from a point that is 100% stable and working.


I am currently running a kernel checked out at commit 
d194c031994d3fc1038fa09e9e92d9be24a21921 

A point in 3.12rc4

At this point the ax88179 works without issue even with scatter gather turned 
on. So somewhere from this point something goes wrong and there is some 
condition that exists that can lock up the nic.


I will keep bisecting at report my findings.


Regards,

Will Trives

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Ding Tianhong @ 2014-02-08  1:43 UTC (permalink / raw)
  To: Jay Vosburgh, sfeldma@cumulusnetworks.com
  Cc: Cong Wang, Thomas Glanzmann, Eric Dumazet, Veaceslav Falico, andy,
	Jiří Pírko, netdev
In-Reply-To: <7882.1391822502@death.nxdomain>

On 2014/2/8 9:21, Jay Vosburgh wrote:
> Jay Vosburgh <fubar@us.ibm.com> wrote:
> 
>>
>> Cong Wang <cwang@twopensource.com> wrote:
>>
>>> On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>> Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>>
>>>>> Cong Wang <cwang@twopensource.com> wrote:
>>>>>
>>>>>
>>>>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>>>> RTNL is not necessary on the vast majority of state machine runs
>>>>> (because no state changes take place, i.e., no ports are disabled or
>>>>> enabled).  The above change would add 10 round trips per second to RTNL,
>>>>> which seems excessive.
>>>>>
>>>>>       Also, we cannot unconditionally acquire RTNL in this function,
>>>>> as it would race with the call to cancel_delayed_work_sync from
>>>>> bond_close (via bond_work_cancel_all).
>>>
>>> OK.
>>>
>>>>
>>>>         Thought of one more problem: we can't hold a regular lock while
>>>> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
>>>> call has to be RTNL and nothing else.
>>>>
>>>
>>> s/GFP_KERNEL/GFP_ATOMIC/
>>
>> 	Yah, that would help with extra locks, but not totally solve
>> things.  I'm looking around, and seeing a number of other places that
>> will end up at one of these rtmsg_ifinfo calls with incorrect locking:
>>
>> 	bond_ab_arp_probe calls via bond_set_slave_active_flags and
>> bond_set_slave_inactive_flags without RTNL.
>>
>> 	bond_change_active_slave calls via bond_set_slave_inactive_flags
>> and bond_set_slave_active_flags with other locks held, and maybe without
>> RTNL; I'm not sure if bond_option_active_slave_set holds RTNL when it
>> calls bond_select_active_slave.
>>
>> 	bond_open calls via bond_set_slave_active_flags and
>> bond_set_slave_inactive_flags with RTNL, but also with other locks held.
>>
>> 	bond_loadbalance_arp_mon calls bond_set_active_slave and
>> bond_set_backup_slave without RTNL.
>>
>> 	This is in addition to the cases in the 802.3ad code from
>> __enable_port and __disable_port calls.
> 
> 	Just an update in case anybody else is looking into this, and
> some questions for Scott.
> 
> 	Acquiring RTNL for the __enable_port and __disable_port cases is
> difficult, as those calls generally already hold the state machine lock,
> and cannot unconditionally call rtnl_lock because either they already
> hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential
> for deadlock with bond_3ad_adapter_speed_changed,
> bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or
> bond_3ad_update_lacp_rate.  All four of those are called with RTNL held,
> and acquire the state machine lock second.  The calling contexts for
> __enable_port and __disable_port already hold the state machine lock,
> and may or may not need RTNL.

Agree, it is hard to add RTNL here, deadlock is easily happened.

> 
> 	Scott: you added these calls, so can you explain what they're
> for?  I'm asking for two reasons:
> 
> 	First, if they do not occur synchronously is it going to be a
> problem?  E.g., for the 802.3ad case, if the rtmsg_ifinfo is called
> either at the end of the state machine run, or for non-state machine
> events, at the next run of the state machine (which is every 100 ms),
> would that be a problem?  Setting a flag in the slave somewhere that an
> rtmsg_ifinfo is needed should be doable for the 802.3ad case.
> 
> 	Second, what do the messages mean?  That the slave is now
> "active and usable"?  I'm asking because I suspect the bond_ab_arp_probe
> usage wherein it adjusts the flags and curr_active_slave should not
> actually call rtmsg_ifinfo, as the slave there is not really "up."
> What's going on there is that the ARP monitor cycles through each slave
> one by one, and tests to see if that slave works.  If it does work, then
> it is set as the active elsewhere in the monitor code.  This function
> adjusts the flags so that the ARP monitor will treat the "testing" slave
> as "active" for purposes of determining whether or not it is up.  I
> suspect this adjustment to the flags should not actually generate an
> rtmsg_ifinfo.
> 
> 	I think the remaining cases can be dealt with, but clarification
> on the above two questions would be very helpful.
> 
> 	-J
> 

commit 6fde8f037e604e05df1529 fix the problem for bond_loadbalance_arp_mon(),
and commit 66dd1c077a3f3c130d1 fix the problem for bond_activebackup_arp_mon(),
but we still miss the 3ad monitor, I think if the slave should send the message
by netlink, it is better to refer to fdb_notify() for bridge,I doubts that why we need to send so many
message, just slave info is enough, then RTNL is not needed here.

Ding


> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply

* Re: [PATCH] net: fix 'ip rule' iif/oif device rename
From: Eric Dumazet @ 2014-02-08  1:41 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Żenczykowski, David S. Miller, netdev,
	Willem de Bruijn, Eric Dumazet, Chris Davis, Carlo Contavalli
In-Reply-To: <1391819028-10722-1-git-send-email-zenczykowski@gmail.com>

On Fri, 2014-02-07 at 16:23 -0800, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> ip rules with iif/oif references do not update:
> (detach/attach) across interface renames.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> CC: Willem de Bruijn <willemb@google.com>
> CC: Eric Dumazet <edumazet@google.com>
> CC: Chris Davis <chrismd@google.com>
> CC: Carlo Contavalli <ccontavalli@google.com>
> 
> Google-Bug-Id: 12936021
> ---
>  net/core/fib_rules.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-08  1:21 UTC (permalink / raw)
  To: sfeldma@cumulusnetworks.com
  Cc: Cong Wang, Thomas Glanzmann, Eric Dumazet, Veaceslav Falico, andy,
	Jiří Pírko, netdev
In-Reply-To: <31653.1391725983@death.nxdomain>

Jay Vosburgh <fubar@us.ibm.com> wrote:

>
>Cong Wang <cwang@twopensource.com> wrote:
>
>>On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>> Jay Vosburgh <fubar@us.ibm.com> wrote:
>>>
>>>>Cong Wang <cwang@twopensource.com> wrote:
>>>>
>>>>
>>>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>>>RTNL is not necessary on the vast majority of state machine runs
>>>>(because no state changes take place, i.e., no ports are disabled or
>>>>enabled).  The above change would add 10 round trips per second to RTNL,
>>>>which seems excessive.
>>>>
>>>>       Also, we cannot unconditionally acquire RTNL in this function,
>>>>as it would race with the call to cancel_delayed_work_sync from
>>>>bond_close (via bond_work_cancel_all).
>>
>>OK.
>>
>>>
>>>         Thought of one more problem: we can't hold a regular lock while
>>> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
>>> call has to be RTNL and nothing else.
>>>
>>
>>s/GFP_KERNEL/GFP_ATOMIC/
>
>	Yah, that would help with extra locks, but not totally solve
>things.  I'm looking around, and seeing a number of other places that
>will end up at one of these rtmsg_ifinfo calls with incorrect locking:
>
>	bond_ab_arp_probe calls via bond_set_slave_active_flags and
>bond_set_slave_inactive_flags without RTNL.
>
>	bond_change_active_slave calls via bond_set_slave_inactive_flags
>and bond_set_slave_active_flags with other locks held, and maybe without
>RTNL; I'm not sure if bond_option_active_slave_set holds RTNL when it
>calls bond_select_active_slave.
>
>	bond_open calls via bond_set_slave_active_flags and
>bond_set_slave_inactive_flags with RTNL, but also with other locks held.
>
>	bond_loadbalance_arp_mon calls bond_set_active_slave and
>bond_set_backup_slave without RTNL.
>
>	This is in addition to the cases in the 802.3ad code from
>__enable_port and __disable_port calls.

	Just an update in case anybody else is looking into this, and
some questions for Scott.

	Acquiring RTNL for the __enable_port and __disable_port cases is
difficult, as those calls generally already hold the state machine lock,
and cannot unconditionally call rtnl_lock because either they already
hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential
for deadlock with bond_3ad_adapter_speed_changed,
bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or
bond_3ad_update_lacp_rate.  All four of those are called with RTNL held,
and acquire the state machine lock second.  The calling contexts for
__enable_port and __disable_port already hold the state machine lock,
and may or may not need RTNL.

	Scott: you added these calls, so can you explain what they're
for?  I'm asking for two reasons:

	First, if they do not occur synchronously is it going to be a
problem?  E.g., for the 802.3ad case, if the rtmsg_ifinfo is called
either at the end of the state machine run, or for non-state machine
events, at the next run of the state machine (which is every 100 ms),
would that be a problem?  Setting a flag in the slave somewhere that an
rtmsg_ifinfo is needed should be doable for the 802.3ad case.

	Second, what do the messages mean?  That the slave is now
"active and usable"?  I'm asking because I suspect the bond_ab_arp_probe
usage wherein it adjusts the flags and curr_active_slave should not
actually call rtmsg_ifinfo, as the slave there is not really "up."
What's going on there is that the ARP monitor cycles through each slave
one by one, and tests to see if that slave works.  If it does work, then
it is set as the active elsewhere in the monitor code.  This function
adjusts the flags so that the ARP monitor will treat the "testing" slave
as "active" for purposes of determining whether or not it is up.  I
suspect this adjustment to the flags should not actually generate an
rtmsg_ifinfo.

	I think the remaining cases can be dealt with, but clarification
on the above two questions would be very helpful.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Hannes Frederic Sowa @ 2014-02-08  0:58 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: David Miller, netdev, Templin, Fred L, nicolas.dichtel
In-Reply-To: <CALnjE+pbcbENWjQcE8T0QLa=d0iia3EBBVJz-sGVzAZbsQarLQ@mail.gmail.com>

[Cc Nicolas]

On Fri, Feb 07, 2014 at 02:49:20PM -0800, Pravin Shelar wrote:
> On Fri, Feb 7, 2014 at 2:28 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > On Fri, Feb 07, 2014 at 02:12:38PM -0800, Pravin wrote:
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -3905,12 +3905,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
> >>   */
> >>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> >>  {
> >> -     if (xnet)
> >> +     if (xnet) {
> >>               skb_orphan(skb);
> >> +             skb->local_df = 0;
> >> +     }
> >>       skb->tstamp.tv64 = 0;
> >>       skb->pkt_type = PACKET_HOST;
> >>       skb->skb_iif = 0;
> >> -     skb->local_df = 0;
> >>       skb_dst_drop(skb);
> >>       skb->mark = 0;
> >>       secpath_reset(skb);
> >
> > I wonder if this should be the right behaviour for tunnels, which should just
> > do fragmentation based on IP_DF, even if the packet originated locally from a
> > socket which allowed local fragmentation (inet->pmtudisc < IP_PMTUDISC_DO).
> >
> This is not about tunneling, skb_scrub_packet() is generic function
> which should not reset local_df on all packets.
> 
> We can have separate discussion about use of local_df and tunneling in
> another thread.

This change only affects tunnel code as of current net branch, how do
you not expect a discussion about that in this thread, I really wonder?

May I know because of wich vport, vxlan or gre, you did this change?

I am feeling a bit uncomfortable handling remote and local packets that
differently on lower tunnel output (local_df is mostly set on locally
originating packets).

Thanks,

  Hannes

^ permalink raw reply

* [PATCH V2] staging: r8188eu: Fix missing header
From: Larry Finger @ 2014-02-08  0:38 UTC (permalink / raw)
  To: gregkh; +Cc: devel, netdev, Larry Finger

Commit 2397c6e0927675d983b34a03401affdb64818d07 entitled "staging: r8188eu:
Remove wrappers around vmalloc and vzalloc" and
commit: 03bd6aea7ba610a1a19f840c373624b8b0adde0d entitled "staging: r8188eu:
Remove wrappers around vfree" failed to add the header file needed
to provide vzalloc and vfree.

This problem was reported by the kbuild test robot.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
---

V2 - add attribution to the build robot
---

 drivers/staging/rtl8188eu/core/rtw_mlme.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_mp.c        | 1 +
 drivers/staging/rtl8188eu/core/rtw_recv.c      | 1 +
 drivers/staging/rtl8188eu/core/rtw_sta_mgt.c   | 1 +
 drivers/staging/rtl8188eu/core/rtw_xmit.c      | 1 +
 drivers/staging/rtl8188eu/os_dep/ioctl_linux.c | 1 +
 drivers/staging/rtl8188eu/os_dep/usb_intf.c    | 1 +
 7 files changed, 7 insertions(+)

diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme.c b/drivers/staging/rtl8188eu/core/rtw_mlme.c
index 2037be0..927fc72 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mlme.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mlme.c
@@ -31,6 +31,7 @@
 #include <wlan_bssdef.h>
 #include <rtw_ioctl_set.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 extern unsigned char	MCS_rate_2R[16];
 extern unsigned char	MCS_rate_1R[16];
diff --git a/drivers/staging/rtl8188eu/core/rtw_mp.c b/drivers/staging/rtl8188eu/core/rtw_mp.c
index 9e97b57..99c06c4 100644
--- a/drivers/staging/rtl8188eu/core/rtw_mp.c
+++ b/drivers/staging/rtl8188eu/core/rtw_mp.c
@@ -23,6 +23,7 @@
 
 #include "odm_precomp.h"
 #include "rtl8188e_hal.h"
+#include <linux/vmalloc.h>
 
 u32 read_macreg(struct adapter *padapter, u32 addr, u32 sz)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_recv.c b/drivers/staging/rtl8188eu/core/rtw_recv.c
index 8490d51..ed308ff 100644
--- a/drivers/staging/rtl8188eu/core/rtw_recv.c
+++ b/drivers/staging/rtl8188eu/core/rtw_recv.c
@@ -28,6 +28,7 @@
 #include <ethernet.h>
 #include <usb_ops.h>
 #include <wifi.h>
+#include <linux/vmalloc.h>
 
 static u8 SNAP_ETH_TYPE_IPX[2] = {0x81, 0x37};
 static u8 SNAP_ETH_TYPE_APPLETALK_AARP[2] = {0x80, 0xf3};
diff --git a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
index 6df9669..e8a654d 100644
--- a/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
+++ b/drivers/staging/rtl8188eu/core/rtw_sta_mgt.c
@@ -25,6 +25,7 @@
 #include <xmit_osdep.h>
 #include <mlme_osdep.h>
 #include <sta_info.h>
+#include <linux/vmalloc.h>
 
 static void _rtw_init_stainfo(struct sta_info *psta)
 {
diff --git a/drivers/staging/rtl8188eu/core/rtw_xmit.c b/drivers/staging/rtl8188eu/core/rtw_xmit.c
index aa77270..2c0a40f 100644
--- a/drivers/staging/rtl8188eu/core/rtw_xmit.c
+++ b/drivers/staging/rtl8188eu/core/rtw_xmit.c
@@ -26,6 +26,7 @@
 #include <ip.h>
 #include <usb_ops.h>
 #include <usb_osintf.h>
+#include <linux/vmalloc.h>
 
 static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 };
 static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 };
diff --git a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
index 0204082..f3584dd 100644
--- a/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
+++ b/drivers/staging/rtl8188eu/os_dep/ioctl_linux.c
@@ -35,6 +35,7 @@
 
 #include <rtw_mp.h>
 #include <rtw_iol.h>
+#include <linux/vmalloc.h>
 
 #define RTL_IOCTL_WPA_SUPPLICANT	(SIOCIWFIRSTPRIV + 30)
 
diff --git a/drivers/staging/rtl8188eu/os_dep/usb_intf.c b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
index 0a585b2..8ad3948 100644
--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
@@ -26,6 +26,7 @@
 #include <hal_intf.h>
 #include <rtw_version.h>
 #include <linux/usb.h>
+#include <linux/vmalloc.h>
 #include <osdep_intf.h>
 
 #include <usb_vendor_req.h>
-- 
1.8.4.5

^ permalink raw reply related

* [PATCH] net: fix 'ip rule' iif/oif device rename
From: Maciej Żenczykowski @ 2014-02-08  0:23 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S. Miller
  Cc: netdev, Willem de Bruijn, Eric Dumazet, Chris Davis,
	Carlo Contavalli

From: Maciej Żenczykowski <maze@google.com>

ip rules with iif/oif references do not update:
(detach/attach) across interface renames.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
CC: Willem de Bruijn <willemb@google.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Chris Davis <chrismd@google.com>
CC: Carlo Contavalli <ccontavalli@google.com>

Google-Bug-Id: 12936021
---
 net/core/fib_rules.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index f409e0bd35c0..185c341fafbd 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -745,6 +745,13 @@ static int fib_rules_event(struct notifier_block *this, unsigned long event,
 			attach_rules(&ops->rules_list, dev);
 		break;
 
+	case NETDEV_CHANGENAME:
+		list_for_each_entry(ops, &net->rules_ops, list) {
+			detach_rules(&ops->rules_list, dev);
+			attach_rules(&ops->rules_list, dev);
+		}
+		break;
+
 	case NETDEV_UNREGISTER:
 		list_for_each_entry(ops, &net->rules_ops, list)
 			detach_rules(&ops->rules_list, dev);
-- 
1.8.3

^ permalink raw reply related

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Eric W. Biederman @ 2014-02-08  0:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, rientjes, linux-kernel
In-Reply-To: <20140206.222932.292588043950970246.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 06 Feb 2014 10:42:42 -0800
>
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> sock_alloc_send_pskb() & sk_page_frag_refill()
>> have a loop trying high order allocations to prepare
>> skb with low number of fragments as this increases performance.
>> 
>> Problem is that under memory pressure/fragmentation, this can
>> trigger OOM while the intent was only to try the high order
>> allocations, then fallback to order-0 allocations.
>> 
>> We had various reports from unexpected regressions.
>> 
>> According to David, setting __GFP_NORETRY should be fine,
>> as the asynchronous compaction is still enabled, and this
>> will prevent OOM from kicking as in :
>  ...
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Acked-by: David Rientjes <rientjes@google.com>
>
> Applied, do we want this for -stable?

The first hunk goes back to 3.12 and the second hunk goes back to 3.8.

I think so.    The change is safe and this class of problem can cause an
external attack to trigger an OOM on your box, by controlling the packet
flow.

Eric

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox