Netdev List
 help / color / mirror / Atom feed
* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:23 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08  9:38 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Nicholas A. Bellinger, target-devel, Linux Network Development,
	LKML
In-Reply-To: <20140207205142.GA8609@glanzmann.de>

Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

[RESEND 2: The throughput graphs were switched as well ;-(]

> * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
        - I compiled kernel by executing the following commands:
                yes '' | make oldconfig
                time make -j 24
                / make modules_install
                / mkinitramfs -o /boot/initrd.img-bisect <version>

        - I cleaned the iSCSI configuration after each test by issuing:
                /etc/init.d/target stop
                rm /iscsi?/* /etc/target/*

        - I configured iSCSI after each reboot
                cat > lio-v101.conf <<EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1

saveconfig
yes
EOF
                targetcli < lio-v101.conf
                And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
                to the iSCSI target, configured the portal, rescanned and
                created a 500 GB VMFS 5 filesystem and noticed the time if it
                was longer than 2 minutes it was bad if it was below 10 seconds
                it was good.
                git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:14 UTC (permalink / raw)
  To: Thomas Glanzmann, John Ogness
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208093808.GD16336@glanzmann.de>

On Sat, 2014-02-08 at 10:38 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> [RESEND: the time it took the VMFS was created was switched between
> on/off so with on it took over 2 minutes with off it took less than 4
> seconds]
> 
> [RESEND 2: The throughput graphs were switched as well ;-(]
> 
> > * Thomas Glanzmann <thomas@glanzmann.de> [2014-02-07 08:55]:
> > > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > > and 15 minutes on 3.14.0-rc2+.
> 
> * Nicholas A. Bellinger <nab@linux-iscsi.org> [2014-02-07 20:30]:
> > Would it be possible to try a couple of different stable kernel
> > versions to help track this down?
> 
> I bisected[1] it and found the offending commit f54b311 tcp auto corking
> [2] 'if we have a small send and a previous packet is already in the
> qdisc or device queue, defer until TX completion or we get more data.'
> - Description by David S. Miller
> 
> I gathered a pcap with tcp_autocorking on and off.
> 
> On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png
> 
> Off: - took 4 seconds to create a 500 GB VMFS file system
> sysctl net.ipv4.tcp_autocorking=0
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png
> 
> First graph can be generated by opening bunziping the file, opening it
> in wireshark and select Statistics > IO Grap and change the unit to
> Bytes/Tick. The second graph can be generated by selecting Statistics >
> TCP Stream Graph > Round Trip Time.
> 
> You can also see that the round trip time increases by factor 25 at
> least.
> 
> I once saw a similar problem with dealyed ACK packets of the
> paravirtulized network driver in xen it caused that the tcp window
> filled up and slowed down the throughput from 30 MB/s to less than 100
> KB/s the symptom was that the login to a Windows desktop took more than
> 10 minutes while it used to be below 30 seconds because the profile of
> the user was loaded slowly from a CIFS server. At that time the culprit
> were also delayed small packets: ACK packets in the CIFS case. However I
> only proofed iSCSI regression so far for tcp auto corking but assume we
> will see many others if we leave it enabled.
> 
> I found the problem by doing the following:
>         - I compiled kernel by executing the following commands:
>                 yes '' | make oldconfig
>                 time make -j 24
>                 / make modules_install
>                 / mkinitramfs -o /boot/initrd.img-bisect <version>
> 
>         - I cleaned the iSCSI configuration after each test by issuing:
>                 /etc/init.d/target stop
>                 rm /iscsi?/* /etc/target/*
> 
>         - I configured iSCSI after each reboot
>                 cat > lio-v101.conf <<EOF
> set global auto_cd_after_create=false
> /backstores/fileio create shared-01.v101.campusvl.de /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true
> 
> /iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.101.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals create 10.102.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns create /backstores/fileio/shared-01.v101.campusvl.de lun=10
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 cache_dynamic_acls=1
> 
> saveconfig
> yes
> EOF
>                 targetcli < lio-v101.conf
>                 And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
>                 to the iSCSI target, configured the portal, rescanned and
>                 created a 500 GB VMFS 5 filesystem and noticed the time if it
>                 was longer than 2 minutes it was bad if it was below 10 seconds
>                 it was good.
>                 git bisect good/bad
> 
> My network config is:
> 
> auto bond0
> iface bond0 inet static
>        address 10.100.4.62
>        netmask 255.255.0.0
>        gateway 10.100.0.1
>        slaves eth0 eth1
>        bond-mode 802.3ad
>        bond-miimon 100
> 
> auto bond0.101
> iface bond0.101 inet static
>        address 10.101.99.4
>        netmask 255.255.0.0
> 
> auto bond1
> iface bond1 inet static
>        address 10.100.5.62
>        netmask 255.255.0.0
>        slaves eth2 eth3
>        bond-mode 802.3ad
>        bond-miimon 100
> 
> auto bond1.101
> iface bond1.101 inet static
>        address 10.101.99.5
>        netmask 255.255.0.0
> 
> I propose to disable tcp_autocorking by default because it obviously degrades
> iSCSI performance and probably many other protocols. Also the commit mentions
> that applications can explicitly disable auto corking we probably should do
> that for the iSCSI target, but I don't know how. Anyone?
> 
> [1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
> [2] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f54b311142a92ea2e42598e347b84e1655caf8e3
> 
> Cheers,
>         Thomas

Hi Thomas, thanks a lot for this very detailed bug report.

I think you are hit by other bug(s), lets try to fix it/them instead of
disabling this feature.

John Ogness started a thread yesterday about TCP_NODELAY being hit by
the TCP Small Queue mechanism, which is the base of TCP auto corking.

Two RFC patches were discussed.

One dealing with the TCP_NODELAY flag that John posted, and I'll adapt
it to the current kernel.

One dealing with a possible race, that I suggested (I doubt this could
trigger at every write, but lets fix it anyway)

Here is the combined patch, could you test it ?

Thanks !

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 10435b3b9d0f..3be16727f058 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+		tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+			       0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,15 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
-			break;
+			/* It is possible TX completion already happened
+			 * before we set TSQ_THROTTLED, so we must
+			 * test again the condition.
+			 * We abuse smp_mb__after_clear_bit() because
+			 * there is no smp_mb__after_set_bit() yet
+			 */
+			smp_mb__after_clear_bit();
+			if (atomic_read(&sk->sk_wmem_alloc) > limit)
+				break;
 		}
 
 		limit = mss_now;

^ permalink raw reply related

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:33 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391865273.10160.76.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> Here is the combined patch, could you test it ?

Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
("tcp: autocork should not hold first packet in write queue")
in your tree.

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 13:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391865273.10160.76.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> > tcp corking kills iSCSI performance

> Here is the combined patch, could you test it?

the patch did not apply, so I edited by hand. Here is the resulting
patch:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b8..40d1958 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 	     TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-		tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+			tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+	                               0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,16 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
-			break;
+			/* It is possible TX completion already happened
+			 * before we set TSQ_THROTTLED, so we must
+			 * test again the condition.
+			 * We abuse smp_mb__after_clear_bit() because
+			 * there is no smp_mb__after_set_bit() yet
+			 */
+			smp_mb__after_clear_bit();
+			if (atomic_read(&sk->sk_wmem_alloc) > limit)
+				break;
+
 		}
 
 		limit = mss_now;

-- cut here --

It fixes my case but if you look at the round trip time it is not even
close what it used to be. So while this fixes my problem I'm still for
disabling it by default.

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Cheers,
        Thomas

^ permalink raw reply related

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 13:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391866389.10160.80.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.

confirmed:

(node-62) [~/work/linux-2.6] git show a181ceb501b31b4bf8812a5c84c716cc31d82c2d | head
commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Dec 17 09:58:30 2013 -0800

    tcp: autocork should not hold first packet in write queue

    Willem noticed a TCP_RR regression caused by TCP autocorking
    on a Mellanox test bed. MLX4_EN_TX_COAL_TIME is 16 us, which can be
    right above RTT between hosts.

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:50 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391866389.10160.80.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
> On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> > Here is the combined patch, could you test it ?
> 
> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.
> 
> 

BTW this problem demonstrates there is room for improvement in iCSCI,
using MSG_MORE to avoid sending two small segments in separate frames.

[1] 00:32:35.726568 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], length 48
[2] 00:32:35.838074 IP 10.101.0.13.27778 > 10.101.99.5.3260: Flags [.], ack 193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
[3] 00:32:35.838099 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], length 512

[1] & [3] could be coalesced, and [2] would be avoided.

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 13:53 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208133744.GA20512@glanzmann.de>

On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Very nice.

Now we have to check your NIC and how TX completion is performed.

What is your NIC model and driver ?

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 13:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391867614.10160.89.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> What is your NIC model and driver?

I have four Intel Corporation I350 Gigabit Network Connection (rev 01).

(node-62) [~/work/linux-2.6] lspci -v | pbot
http://pbot.rmdir.de/rgu6yHMBDVQpflMmbcJACg
(node-62) [~/work/linux-2.6] ip a s | pbot
http://pbot.rmdir.de/xJjRT8u-ekC6mrWgl09ZtQ
(node-62) [~/work/linux-2.6] dmesg | pbot
http://pbot.rmdir.de/MigrSPtxGmp0fI1CRgXsHw

I do 802.3ad link aggregation layer 2 hash with two network cards to one
switch.

I'm running:
Linux node-62 3.14.0-rc1+ #23 SMP Sat Feb 8 14:27:47 CET 2014 x86_64 GNU/Linux

Driver: igb

If you need remote access to the machine let me know.

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 14:09 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208133744.GA20512@glanzmann.de>

On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2

This pcap was taken on which host ?

10.101.99.5 or  10.101.0.13 ?

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 14:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391868564.10160.91.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

[RESEND: dropped CC accidently]

> 10.101.99.5 or 10.101.0.13?

10.101.99.5 (iSCSI Target)

tcpdump -i bond0.101 -s 0 -w /tmp/tcp_auto_corking_on_patched.pcap host esx-03.v101.campusvl.de

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 14:13 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391867404.10160.88.camel@edumazet-glaptop2.roam.corp.google.com>

On Sat, 2014-02-08 at 05:50 -0800, Eric Dumazet wrote:
> On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
> > On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> > > Here is the combined patch, could you test it ?
> > 
> > Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> > ("tcp: autocork should not hold first packet in write queue")
> > in your tree.
> > 
> > 
> 
> BTW this problem demonstrates there is room for improvement in iCSCI,
> using MSG_MORE to avoid sending two small segments in separate frames.
> 
> [1] 00:32:35.726568 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], length 48
> [2] 00:32:35.838074 IP 10.101.0.13.27778 > 10.101.99.5.3260: Flags [.], ack 193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
> [3] 00:32:35.838099 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], length 512
> 
> [1] & [3] could be coalesced, and [2] would be avoided.
> 

With the fix, new pcap is more explicit about this suboptimal behavior :

05:34:16.280900 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.280949 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

05:34:16.280982 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
05:34:16.281000 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

05:34:16.281107 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281157 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

05:34:16.281190 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
05:34:16.281208 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

05:34:16.281337 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281390 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

05:34:16.281423 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
05:34:16.281440 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 14:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391868816.10160.93.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> > BTW this problem demonstrates there is room for improvement in iCSCI,
> > using MSG_MORE to avoid sending two small segments in separate frames.

> With the fix, new pcap is more explicit about this suboptimal behavior :

> 05:34:16.280900 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.280949 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

> 05:34:16.280982 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
> 05:34:16.281000 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

> 05:34:16.281107 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.281157 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

> 05:34:16.281190 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
> 05:34:16.281208 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

> 05:34:16.281337 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.281390 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 48

> 05:34:16.281423 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 48
> 05:34:16.281440 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 1732452], length 512

I get the idea. However I'm a little bit confused, when I do a 'git grep
MSG_MORE' I don't see much references in the Linux kernel who use it at
all. So do you have an example for me where this flags needs to be
applied?

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 14:30 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208141905.GG20512@glanzmann.de>

On Sat, 2014-02-08 at 15:19 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> I get the idea. However I'm a little bit confused, when I do a 'git grep
> MSG_MORE' I don't see much references in the Linux kernel who use it at
> all. So do you have an example for me where this flags needs to be
> applied?

Idea would be to set this flag when calling sendmsg() of the 48 bytes of
the header, and not set it on the sendmsg() of the 512 bytes of the
payload.

iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
it would be nice to add a new _initial_ flags parameter to
iscsi_sw_tcp_xmit_segment()

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391869805.10160.97.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Idea would be to set this flag when calling sendmsg() of the 48 bytes
> of the header, and not set it on the sendmsg() of the 512 bytes of the
> payload.

I see.

> iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
> it would be nice to add a new _initial_ flags parameter to
> iscsi_sw_tcp_xmit_segment()

This is for the iscsi initiator implementation. I'm interested in iSCSI
target code, but I already found it and experiemented a little bit, but
I need to dig deeper if I want to prepare a patch.

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH] tcp: disable auto corking by default
From: Eric Dumazet @ 2014-02-08 15:04 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208091944.GB16336@glanzmann.de>

On Sat, 2014-02-08 at 10:19 +0100, Thomas Glanzmann wrote:
> When using auto corking with iSCSI the round trip time at least increases by
> factor 25 probably more. Other protocols are very likely also effected.
> 
> Signed-off-by: Thomas Glanzmann <thomas@glanzmann.de>
> ---
>  net/ipv4/tcp.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I think there is no hurry.

We should let auto corking on during 3.14 development cycle so that we
can fix the bugs, and thing of some optimizations.

auto cork gives a strong incentive to applications to use
TCP_CORK/MSG_MORE to avoid overhead of sending multiple small segments.

In the normal case, the extra delay is something like 10 us, so if an
application is really hit by this delay, its a strong sign it could be
improved, especially if auto corking is off.

Lets wait the end of 3.14 dev cycle before considering this patch.

Don't shoot the messenger :)

Thanks !

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 15:06 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208150001.GI20512@glanzmann.de>

On Sat, 2014-02-08 at 16:00 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Idea would be to set this flag when calling sendmsg() of the 48 bytes
> > of the header, and not set it on the sendmsg() of the 512 bytes of the
> > payload.
> 
> I see.
> 
> > iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
> > it would be nice to add a new _initial_ flags parameter to
> > iscsi_sw_tcp_xmit_segment()
> 
> This is for the iscsi initiator implementation. I'm interested in iSCSI
> target code, but I already found it and experiemented a little bit, but
> I need to dig deeper if I want to prepare a patch.

Fantastic !

Let me know if you want some help.

Note : We did some patches in the MSG_MORE logic for sendpage(), but in
your case I do not think its related

(git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious

^ permalink raw reply

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
From: Ben Greear @ 2014-02-08 16:43 UTC (permalink / raw)
  To: netdev
In-Reply-To: <52F012FF.9030105@candelatech.com>

On 02/03/2014 02:06 PM, Ben Greear wrote:
> On 02/03/2014 02:03 PM, Hannes Frederic Sowa wrote:
>> Hi Ben,
>>
>> On Mon, Feb 03, 2014 at 12:37:52PM -0800, Ben Greear wrote:
>>> The kernel has some additional patches, but not much to IPv6.
>>>
>>> The bug is that when we have lots of mac-vlans on some ixgbe ports
>>> (500 per interface in this case), and boot up the system with the ports unplugged,
>>> we get this crash almost every time.  Boot-up is going to do normal bootup
>>> stuff plus create and configure the 1000 mac-vlans, dump their routing
>>> tables, etc.
>>>
>>> We are using one routing table per network device, and some
>>> ip rules.
>>>
>>> If we plug in the ixgbe ports, we do not ever see a crash.
>>>
>>> We have not yet tried reproducing it on other drivers, but I suspect
>>> the issue is not related to ixgbe.
>>>
>>> Any ideas on this one?
>>
>> Could you bring the machine to a panic again with enabling RT6_DEBUG at the
>> top of ip6_fib.c and send a dump of the trace?
> 
> Yes, but it will be a bit until we can create a duplicate machine.
> We ended up delivering the machine with a note to make sure the
> interfaces were plugged in (we found the bug hours before shipping
> the system, of course).

According to my system test guy, it took a lot longer to reproduce
the problem with the debug enabled kernel, but I do not see any extra
debug messages on the serial console logging or in /var/log/messages

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] tcp: disable auto corking by default
From: Thomas Glanzmann @ 2014-02-08 16:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391871850.10160.103.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> > Disable auto corking by default

> We should let auto corking on during 3.14 development cycle so that we
> can fix the bugs, and thing of some optimizations.

I agree that leaving it enabled helps to find bugs, however I'm not
happy with the round trip time degradation.

> auto cork gives a strong incentive to applications to use
> TCP_CORK/MSG_MORE to avoid overhead of sending multiple small
> segments.

I agree. But if it breaks the application many people won't be happy,
for example I spend already 5 hours to track it down.

> In the normal case, the extra delay is something like 10 us, so if an
> application is really hit by this delay, its a strong sign it could be
> improved, especially if auto corking is off.

Yes, but 230 micro seconds for others. :-(

> Lets wait the end of 3.14 dev cycle before considering this patch.

I agree.

Btw. I mixed up the pcaps for autocork on and off, so I moved the files
that they know show what they should show.

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 16:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391871986.10160.105.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Note : We did some patches in the MSG_MORE logic for sendpage(), but
> in your case I do not think its related
> (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious

thank you for the pointer. The iSCSI target code actually uses sendpage
whenever it can.

Cheers,
        Thomas

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Eric Dumazet @ 2014-02-08 17:08 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208165732.GB22359@glanzmann.de>

On Sat, 2014-02-08 at 17:57 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Note : We did some patches in the MSG_MORE logic for sendpage(), but
> > in your case I do not think its related
> > (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious
> 
> thank you for the pointer. The iSCSI target code actually uses sendpage
> whenever it can.

Yep, but the problem (at least on your pcap), is about sending the 48
bytes headers in  TCP segment of its own, then the 512 byte payload in a
separate segment.

I suspect the sendpage() is only used for the payload. No need for
MSG_MORE here.

The MSG_MORE would need to be set on the first part (48 bytes header),
so that TCP stack will defer the push of the segment at the time the 512
bytes payload is added.

^ permalink raw reply

* [PATCH] 3c59x: Remove unused pointer in vortex_eisa_cleanup()
From: Christian Engelmayer @ 2014-02-08 17:11 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

Remove unused network device private data pointer 'vp' in function
vortex_eisa_cleanup(). Detected by Coverity: CID 139826.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
---
 drivers/net/ethernet/3com/3c59x.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c
index 0f4241c..238ccea 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -3294,7 +3294,6 @@ static int __init vortex_init(void)
 
 static void __exit vortex_eisa_cleanup(void)
 {
-	struct vortex_private *vp;
 	void __iomem *ioaddr;
 
 #ifdef CONFIG_EISA
@@ -3303,7 +3302,6 @@ static void __exit vortex_eisa_cleanup(void)
 #endif
 
 	if (compaq_net_device) {
-		vp = netdev_priv(compaq_net_device);
 		ioaddr = ioport_map(compaq_net_device->base_addr,
 		                    VORTEX_TOTAL_SIZE);
 
-- 
1.8.3.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related

* Re: [PATCH] tcp: disable auto corking by default
From: Eric Dumazet @ 2014-02-08 17:12 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <20140208165539.GA22359@glanzmann.de>

On Sat, 2014-02-08 at 17:55 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > > Disable auto corking by default
> 
> > We should let auto corking on during 3.14 development cycle so that we
> > can fix the bugs, and thing of some optimizations.
> 
> I agree that leaving it enabled helps to find bugs, however I'm not
> happy with the round trip time degradation.
> 
> > auto cork gives a strong incentive to applications to use
> > TCP_CORK/MSG_MORE to avoid overhead of sending multiple small
> > segments.
> 
> I agree. But if it breaks the application many people won't be happy,
> for example I spend already 5 hours to track it down.

Sure, but if we put this flag to zero, nobody will ever use it and find
any bug.

Thanks for running latest git tree and be part of linux improvement.

If we can add the MSG_MORE at the right place, your workload might gain
~20% exec time, and maybe 30% better efficiency, since you'll divide by
2 the total number of network segments.

Just to be clear : No stable kernel has yet any issue, right ?

^ permalink raw reply

* Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]
From: Thomas Glanzmann @ 2014-02-08 17:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: John Ogness, Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391879318.10160.108.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Yep, but the problem (at least on your pcap), is about sending the 48
> bytes headers in  TCP segment of its own, then the 512 byte payload in
> a separate segment.

I agree.

> I suspect the sendpage() is only used for the payload. No need for
> MSG_MORE here.

I see.

> The MSG_MORE would need to be set on the first part (48 bytes header),
> so that TCP stack will defer the push of the segment at the time the 512
> bytes payload is added.

The iSCSI target uses one function to send all outbound data. So in
order to do it right every function that is sending data in multiple
chunks need to mark it correctly. Of course someone could also do some
wild guessing and saying that everything that is below 512 Bytes gets
pushed out. I wonder what Nab has to say about this?

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH] tcp: disable auto corking by default
From: Thomas Glanzmann @ 2014-02-08 17:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S. Miller, Nicholas A. Bellinger,
	target-devel, Linux Network Development, LKML
In-Reply-To: <1391879558.10160.112.camel@edumazet-glaptop2.roam.corp.google.com>

Hello Eric,

> Sure, but if we put this flag to zero, nobody will ever use it and
> find any bug.

I agree.

> If we can add the MSG_MORE at the right place, your workload might gain
> ~20% exec time, and maybe 30% better efficiency, since you'll divide by
> 2 the total number of network segments.

That is in fact promising.

> Just to be clear: No stable kernel has yet any issue, right?

Not with TCP CORK as it was recently introduced in the development
branch but it will become stable at one point.

Cheers,
        Thomas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox