Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] tcp: temporarily disable Fast Open on SYN timeout
From: David Miller @ 2013-10-30  2:51 UTC (permalink / raw)
  To: ycheng; +Cc: ncardwell, edumazet, netdev
In-Reply-To: <1383066545-27348-1-git-send-email-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Tue, 29 Oct 2013 10:09:05 -0700

> Fast Open currently has a fall back feature to address SYN-data
> being dropped by but it requires the middle-box to pass on regular
> SYN retry after SYN-data. This is implemented in commit aab487435
> ("net-tcp: Fast Open client - detecting SYN-data drops")
> 
> However some NAT boxes will drop all subsequent packets after first
> SYN-data and blackholes the entire connections.  An example is incommit
> 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast Open".
> 
> The sender should note such incidents and falls back to use regular
> TCP handshake on subsequent attempt temporarily as well: after the
> second SYN timeouts the original Fast Open SYN is most likely lost.
> When such an event recurs Fast Open is disabled based on the number
> of recurrences exponentially.
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] octeon_mgmt: drop redundant mac address check
From: David Miller @ 2013-10-30  2:53 UTC (permalink / raw)
  To: luka; +Cc: netdev
In-Reply-To: <1383088152-24869-1-git-send-email-luka@openwrt.org>

From: Luka Perkov <luka@openwrt.org>
Date: Wed, 30 Oct 2013 00:09:12 +0100

> Checking if MAC address is valid using is_valid_ether_addr() is already done in
> of_get_mac_address().
> 
> Signed-off-by: Luka Perkov <luka@openwrt.org>
> Acked-by: David Daney <david.daney@cavium.com>
> CC: David Miller <davem@davemloft.net>

Applied.

^ permalink raw reply

* Re: [PATCH v2] mvneta: drop redundant mac address check
From: David Miller @ 2013-10-30  2:53 UTC (permalink / raw)
  To: luka; +Cc: netdev
In-Reply-To: <1383088201-25136-1-git-send-email-luka@openwrt.org>

From: Luka Perkov <luka@openwrt.org>
Date: Wed, 30 Oct 2013 00:10:01 +0100

> Checking if MAC address is valid using is_valid_ether_addr() is already done in
> of_get_mac_address().
> 
> Signed-off-by: Luka Perkov <luka@openwrt.org>
> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] arc_emac: drop redundant mac address check
From: David Miller @ 2013-10-30  2:53 UTC (permalink / raw)
  To: luka; +Cc: netdev, Alexey.Brodkin
In-Reply-To: <1383088260-25422-1-git-send-email-luka@openwrt.org>

From: Luka Perkov <luka@openwrt.org>
Date: Wed, 30 Oct 2013 00:11:00 +0100

> Checking if MAC address is valid using is_valid_ether_addr() is already done in
> of_get_mac_address(). While at it, reorganize checking so it matches checks in
> other drivers.
> 
> Signed-off-by: Luka Perkov <luka@openwrt.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] ibm emac: Don't call napi_complete if napi_reschedule failed
From: David Miller @ 2013-10-30  2:58 UTC (permalink / raw)
  To: alistair; +Cc: netdev, benh
In-Reply-To: <1383090638-21526-1-git-send-email-alistair@popple.id.au>

From: Alistair Popple <alistair@popple.id.au>
Date: Wed, 30 Oct 2013 10:50:37 +1100

> This patch fixes a bug which would trigger the BUG_ON() at
> net/core/dev.c:4156. It was found that this was due to continuing
> processing in the current poll call even when the call to
> napi_reschedule failed, indicating the device was already on the
> polling list. This resulted in an extra call to napi_complete which
> triggered the BUG_ON().
> 
> This patch ensures that we only contine processing rotting packets in
> the current mal_poll call if we are not already on the polling list.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] ibm emac: Fix locking for enable/disable eob irq
From: David Miller @ 2013-10-30  2:58 UTC (permalink / raw)
  To: alistair; +Cc: netdev, benh
In-Reply-To: <1383090638-21526-2-git-send-email-alistair@popple.id.au>

From: Alistair Popple <alistair@popple.id.au>
Date: Wed, 30 Oct 2013 10:50:38 +1100

> Calls to mal_enable_eob_irq perform a read-write-modify of a dcr to
> enable device irqs which is protected by a spin lock. However calls to
> mal_disable_eob_irq do not take the corresponding lock.
> 
> This patch resolves the problem by ensuring that calls to
> mal_disable_eob_irq also take the lock.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>

Applied.

^ permalink raw reply

* Re: [PATCH v2] bgmac: don't update slot on skb alloc/dma mapping error
From: David Miller @ 2013-10-30  2:59 UTC (permalink / raw)
  To: nlhintz; +Cc: netdev, zajec5
In-Reply-To: <BLU0-SMTP255BD097115C7E52E264D41AC0A0@phx.gbl>

From: Nathan Hintz <nlhintz@hotmail.com>
Date: Tue, 29 Oct 2013 19:32:01 -0700

> Don't update the slot in "bgmac_dma_rx_skb_for_slot" unless both the
> skb alloc and dma mapping are successful; and free the newly allocated
> skb if a dma mapping error occurs.  This relieves the caller of the need
> to deduce/execute the appropriate cleanup action required when an error
> occurs.
> 
> Signed-off-by: Nathan Hintz <nlhintz@hotmail.com>
> Acked-by: Rafał Miłecki <zajec5@gmail.com>

Applied, thanks.

^ permalink raw reply

* RE: [PATCH net 2/3] r8152: modify the tx flow
From: hayeswang @ 2013-10-30  3:03 UTC (permalink / raw)
  To: 'David Miller'
  Cc: netdev, 'nic_swsd', linux-kernel, linux-usb
In-Reply-To: <20131029.174936.13317032766742363.davem@davemloft.net>

 David Miller [mailto:davem@davemloft.net] 
> Sent: Wednesday, October 30, 2013 5:50 AM
> To: Hayeswang
> Cc: netdev@vger.kernel.org; nic_swsd; 
> linux-kernel@vger.kernel.org; linux-usb@vger.kernel.org
> Subject: Re: [PATCH net 2/3] r8152: modify the tx flow
> 
> From: Hayes Wang <hayeswang@realtek.com>
> Date: Tue, 29 Oct 2013 15:56:16 +0800
> 
> > Support stopping and waking tx queue. The maximum tx queue length
> > is 60.
> 
> What is so special about the number 60?  It seems arbitrary, and if
> it isn't arbitrary you haven't described why this value was choosen.

The value is arbitrary. I think it is better to stop tx when
queuing many packets, otherwise all the available memory may
be used for tx skb. The queue length could be any value or
unlimited if the memory is enough. Should I remove it?

> I've asked you politely last time around to significantly improve
> the quality of your commit messages, and you haven't done this at
> all.

I thought you indicated the last patch only and the others are clear enough.
I would improve them.

> I'm not applying any of these patches until your commit messages
> properly describe your changes completely.
> 

^ permalink raw reply

* Re: [PATCH net 2/3] r8152: modify the tx flow
From: David Miller @ 2013-10-30  3:08 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <7F69554959F74FA58AE07F06B9FCF029@realtek.com.tw>

From: hayeswang <hayeswang@realtek.com>
Date: Wed, 30 Oct 2013 11:03:55 +0800

>  David Miller [mailto:davem@davemloft.net] 
>> Sent: Wednesday, October 30, 2013 5:50 AM
>> To: Hayeswang
>> Cc: netdev@vger.kernel.org; nic_swsd; 
>> linux-kernel@vger.kernel.org; linux-usb@vger.kernel.org
>> Subject: Re: [PATCH net 2/3] r8152: modify the tx flow
>> 
>> From: Hayes Wang <hayeswang@realtek.com>
>> Date: Tue, 29 Oct 2013 15:56:16 +0800
>> 
>> > Support stopping and waking tx queue. The maximum tx queue length
>> > is 60.
>> 
>> What is so special about the number 60?  It seems arbitrary, and if
>> it isn't arbitrary you haven't described why this value was choosen.
> 
> The value is arbitrary. I think it is better to stop tx when
> queuing many packets, otherwise all the available memory may
> be used for tx skb. The queue length could be any value or
> unlimited if the memory is enough. Should I remove it?

You should at least pick some value that you have analyzed in some
way.  We've done a lot of work to strongly limit the amount of SKB
data which sits in device queues on transmit, and what you're doing
here works against to those goals.

Ideally you should pick a value which is sufficient to meet two
goals at the same time:

1) With constant transmit traffic coming from the networking stack,
   the device never starves for new transmit data to send.

2) We never queue up more traffic than we need to satisfy requirement
   #1.

^ permalink raw reply

* [PATCH net-next v2] tipc: remove two indentation levels in tipc_recv_msg routine
From: Ying Xue @ 2013-10-30  3:26 UTC (permalink / raw)
  To: davem, David.Laight, maloy
  Cc: Paul.Gortmaker, jon.maloy, erik.hugne, andreas.bofjall, ying.xue,
	tipc-discussion, netdev

The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
---
v2: Incorporated comments from David Laight and Andreas Bofjäll

 net/tipc/link.c |  173 +++++++++++++++++++++++++++----------------------------
 1 file changed, 84 insertions(+), 89 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index e8153f6..54163f9 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1507,15 +1507,15 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr)
 
 		/* Ensure bearer is still enabled */
 		if (unlikely(!b_ptr->active))
-			goto cont;
+			goto discard;
 
 		/* Ensure message is well-formed */
 		if (unlikely(!link_recv_buf_validate(buf)))
-			goto cont;
+			goto discard;
 
 		/* Ensure message data is a single contiguous unit */
 		if (unlikely(skb_linearize(buf)))
-			goto cont;
+			goto discard;
 
 		/* Handle arrival of a non-unicast link message */
 		msg = buf_msg(buf);
@@ -1531,20 +1531,18 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr)
 		/* Discard unicast link messages destined for another node */
 		if (unlikely(!msg_short(msg) &&
 			     (msg_destnode(msg) != tipc_own_addr)))
-			goto cont;
+			goto discard;
 
 		/* Locate neighboring node that sent message */
 		n_ptr = tipc_node_find(msg_prevnode(msg));
 		if (unlikely(!n_ptr))
-			goto cont;
+			goto discard;
 		tipc_node_lock(n_ptr);
 
 		/* Locate unicast link endpoint that should handle message */
 		l_ptr = n_ptr->links[b_ptr->identity];
-		if (unlikely(!l_ptr)) {
-			tipc_node_unlock(n_ptr);
-			goto cont;
-		}
+		if (unlikely(!l_ptr))
+			goto unlock_discard;
 
 		/* Verify that communication with node is currently allowed */
 		if ((n_ptr->block_setup & WAIT_PEER_DOWN) &&
@@ -1554,10 +1552,8 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr)
 			!msg_redundant_link(msg))
 			n_ptr->block_setup &= ~WAIT_PEER_DOWN;
 
-		if (n_ptr->block_setup) {
-			tipc_node_unlock(n_ptr);
-			goto cont;
-		}
+		if (n_ptr->block_setup)
+			goto unlock_discard;
 
 		/* Validate message sequence number info */
 		seq_no = msg_seqno(msg);
@@ -1593,98 +1589,97 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr)
 
 		/* Now (finally!) process the incoming message */
 protocol_check:
-		if (likely(link_working_working(l_ptr))) {
-			if (likely(seq_no == mod(l_ptr->next_in_no))) {
-				l_ptr->next_in_no++;
-				if (unlikely(l_ptr->oldest_deferred_in))
-					head = link_insert_deferred_queue(l_ptr,
-									  head);
-deliver:
-				if (likely(msg_isdata(msg))) {
-					tipc_node_unlock(n_ptr);
-					tipc_port_recv_msg(buf);
-					continue;
-				}
-				switch (msg_user(msg)) {
-					int ret;
-				case MSG_BUNDLER:
-					l_ptr->stats.recv_bundles++;
-					l_ptr->stats.recv_bundled +=
-						msg_msgcnt(msg);
-					tipc_node_unlock(n_ptr);
-					tipc_link_recv_bundle(buf);
-					continue;
-				case NAME_DISTRIBUTOR:
-					n_ptr->bclink.recv_permitted = true;
-					tipc_node_unlock(n_ptr);
-					tipc_named_recv(buf);
-					continue;
-				case BCAST_PROTOCOL:
-					tipc_link_recv_sync(n_ptr, buf);
-					tipc_node_unlock(n_ptr);
-					continue;
-				case CONN_MANAGER:
-					tipc_node_unlock(n_ptr);
-					tipc_port_recv_proto_msg(buf);
-					continue;
-				case MSG_FRAGMENTER:
-					l_ptr->stats.recv_fragments++;
-					ret = tipc_link_recv_fragment(
-						&l_ptr->defragm_buf,
-						&buf, &msg);
-					if (ret == 1) {
-						l_ptr->stats.recv_fragmented++;
-						goto deliver;
-					}
-					if (ret == -1)
-						l_ptr->next_in_no--;
-					break;
-				case CHANGEOVER_PROTOCOL:
-					type = msg_type(msg);
-					if (link_recv_changeover_msg(&l_ptr,
-								     &buf)) {
-						msg = buf_msg(buf);
-						seq_no = msg_seqno(msg);
-						if (type == ORIGINAL_MSG)
-							goto deliver;
-						goto protocol_check;
-					}
-					break;
-				default:
-					kfree_skb(buf);
-					buf = NULL;
-					break;
-				}
+		if (unlikely(!link_working_working(l_ptr))) {
+			if (msg_user(msg) == LINK_PROTOCOL) {
+				link_recv_proto_msg(l_ptr, buf);
+				head = link_insert_deferred_queue(l_ptr, head);
+				tipc_node_unlock(n_ptr);
+				continue;
+			}
+
+			/* Traffic message. Conditionally activate link */
+			link_state_event(l_ptr, TRAFFIC_MSG_EVT);
+
+			if (link_working_working(l_ptr)) {
+				/* Re-insert buffer in front of queue */
+				buf->next = head;
+				head = buf;
 				tipc_node_unlock(n_ptr);
-				tipc_net_route_msg(buf);
 				continue;
 			}
+			goto unlock_discard;
+		}
+
+		/* Link is now in state WORKING_WORKING */
+		if (unlikely(seq_no != mod(l_ptr->next_in_no))) {
 			link_handle_out_of_seq_msg(l_ptr, buf);
 			head = link_insert_deferred_queue(l_ptr, head);
 			tipc_node_unlock(n_ptr);
 			continue;
 		}
-
-		/* Link is not in state WORKING_WORKING */
-		if (msg_user(msg) == LINK_PROTOCOL) {
-			link_recv_proto_msg(l_ptr, buf);
+		l_ptr->next_in_no++;
+		if (unlikely(l_ptr->oldest_deferred_in))
 			head = link_insert_deferred_queue(l_ptr, head);
+deliver:
+		if (likely(msg_isdata(msg))) {
 			tipc_node_unlock(n_ptr);
+			tipc_port_recv_msg(buf);
 			continue;
 		}
-
-		/* Traffic message. Conditionally activate link */
-		link_state_event(l_ptr, TRAFFIC_MSG_EVT);
-
-		if (link_working_working(l_ptr)) {
-			/* Re-insert buffer in front of queue */
-			buf->next = head;
-			head = buf;
+		switch (msg_user(msg)) {
+			int ret;
+		case MSG_BUNDLER:
+			l_ptr->stats.recv_bundles++;
+			l_ptr->stats.recv_bundled += msg_msgcnt(msg);
+			tipc_node_unlock(n_ptr);
+			tipc_link_recv_bundle(buf);
+			continue;
+		case NAME_DISTRIBUTOR:
+			n_ptr->bclink.recv_permitted = true;
+			tipc_node_unlock(n_ptr);
+			tipc_named_recv(buf);
+			continue;
+		case BCAST_PROTOCOL:
+			tipc_link_recv_sync(n_ptr, buf);
 			tipc_node_unlock(n_ptr);
 			continue;
+		case CONN_MANAGER:
+			tipc_node_unlock(n_ptr);
+			tipc_port_recv_proto_msg(buf);
+			continue;
+		case MSG_FRAGMENTER:
+			l_ptr->stats.recv_fragments++;
+			ret = tipc_link_recv_fragment(&l_ptr->defragm_buf,
+						      &buf, &msg);
+			if (ret == 1) {
+				l_ptr->stats.recv_fragmented++;
+				goto deliver;
+			}
+			if (ret == -1)
+				l_ptr->next_in_no--;
+			break;
+		case CHANGEOVER_PROTOCOL:
+			type = msg_type(msg);
+			if (link_recv_changeover_msg(&l_ptr, &buf)) {
+				msg = buf_msg(buf);
+				seq_no = msg_seqno(msg);
+				if (type == ORIGINAL_MSG)
+					goto deliver;
+				goto protocol_check;
+			}
+			break;
+		default:
+			kfree_skb(buf);
+			buf = NULL;
+			break;
 		}
 		tipc_node_unlock(n_ptr);
-cont:
+		tipc_net_route_msg(buf);
+		continue;
+unlock_discard:
+
+		tipc_node_unlock(n_ptr);
+discard:
 		kfree_skb(buf);
 	}
 	read_unlock_bh(&tipc_net_lock);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] bnx2: Use dev_kfree_skb_any() in bnx2_tx_int()
From: Cong Wang @ 2013-10-30  3:50 UTC (permalink / raw)
  To: David Miller; +Cc: tdmackey, mchan, Linux Kernel Network Developers, LKML
In-Reply-To: <20131029.224227.1700872646112764527.davem@davemloft.net>

On Tue, Oct 29, 2013 at 7:42 PM, David Miller <davem@davemloft.net> wrote:
> From: David Mackey <tdmackey@booleanhaiku.com>
> Date: Tue, 29 Oct 2013 15:16:38 -0700
>
>> Using dev_kfree_skb_any() will resolve the below issue when a
>> netconsole message is transmitted in an irq.
>  ...
>> Signed-off-by: David Mackey <tdmackey@booleanhaiku.com>
>
> This is absolutely not the correct fix.
>
> The netpoll facility must invoke ->poll() in an environment which
> is compatible, locking and interrupt/soft-interrupt wise, as that
> in which it is normally called.
>

Normally ->poll() is called in softirq context, while netpoll could
be called in any context depending on its caller.

Also, netpoll disables IRQ for tx, which is another difference.

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-30  4:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Christoph Paasch, netdev
In-Reply-To: <20131030015002.GB1347@gondor.apana.org.au>

On Wed, 2013-10-30 at 09:50 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 08:08:13AM -0700, Eric Dumazet wrote:
> >
> > GRO layer was updated to be able to stack two or three sk_buff,
> > fully populated with page frags.
> > 
> > Thats quite mandatory to support line rate for 40Gb links.
> 
> Indeed I missed this.  Which commit introduced this change?

This was mentioned in the changelog :

<quote>

Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")

</quote>

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Herbert Xu @ 2013-10-30  4:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christoph Paasch, netdev
In-Reply-To: <1383105785.4857.21.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 09:03:05PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-30 at 09:50 +0800, Herbert Xu wrote:
> > 
> > Indeed I missed this.  Which commit introduced this change?
> 
> This was mentioned in the changelog :
> 
> <quote>
> 
> Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
> by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
> 
> </quote>

Thanks.

In that case we should be able to segment these full-sized skbs
without linearising.  What we should do is iterate through
each frag_list entry and apply the usual GSO code to them.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30  4:06 UTC (permalink / raw)
  To: David Miller; +Cc: christoph.paasch, herbert, netdev, hkchu, mwdalton
In-Reply-To: <20131029.220253.1263087684709722001.davem@davemloft.net>

On Tue, 2013-10-29 at 22:02 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 29 Oct 2013 17:53:48 -0700
> 
> > So should we apply the first fix to avoid the BUG_ON() ?
> 
> Please be more specific, are you talking about splitting up
> this patch in some way?

I am referring to the first version I sent to Christoph :

http://www.spinics.net/lists/netdev/msg255452.html

Then I added the sysctl to avoid future packets to get a frag_list in
the first place.

Doing a smart skb_segment() is possible, but this function is already
complex.

I am not sure 64K GRO packets that must be segmented are going to be
faster than 22K packets without segmentation at all (TSO path on xmit)

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30  4:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383106012.4857.26.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
>
> I am not sure 64K GRO packets that must be segmented are going to be
> faster than 22K packets without segmentation at all (TSO path on xmit)

Indeed that is a tough call, but I think conceptually the 64K
case is much nicer than a sysctl that gets magically turned off.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30  4:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030040818.GB2755@gondor.apana.org.au>

On Wed, Oct 30, 2013 at 12:08:18PM +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
> >
> > I am not sure 64K GRO packets that must be segmented are going to be
> > faster than 22K packets without segmentation at all (TSO path on xmit)
> 
> Indeed that is a tough call, but I think conceptually the 64K
> case is much nicer than a sysctl that gets magically turned off.

Also at some point we'll want to do >64K GRO/GSO too so we'll
have to face this complexity one day.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Jerry Chu @ 2013-10-30  4:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, David Miller, Christoph Paasch,
	netdev@vger.kernel.org, Michael Dalton
In-Reply-To: <20131030040949.GA2912@gondor.apana.org.au>

On Tue, Oct 29, 2013 at 9:09 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Oct 30, 2013 at 12:08:18PM +0800, Herbert Xu wrote:
>> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
>> >
>> > I am not sure 64K GRO packets that must be segmented are going to be
>> > faster than 22K packets without segmentation at all (TSO path on xmit)
>>
>> Indeed that is a tough call, but I think conceptually the 64K
>> case is much nicer than a sysctl that gets magically turned off.
>
> Also at some point we'll want to do >64K GRO/GSO too so we'll
> have to face this complexity one day.

Not sure how this is possible with the IP datagram length limit. (Am I
missing something, like pkt chaining?)

Jerry

>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30  4:16 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030040818.GB2755@gondor.apana.org.au>

On Wed, 2013-10-30 at 12:08 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:06:52PM -0700, Eric Dumazet wrote:
> >
> > I am not sure 64K GRO packets that must be segmented are going to be
> > faster than 22K packets without segmentation at all (TSO path on xmit)
> 
> Indeed that is a tough call, but I think conceptually the 64K
> case is much nicer than a sysctl that gets magically turned off.

The thing is this only matters for hosts receiving at line rate on few
TCP flows.

A router should not build too big GRO packets, as it adds latencies.

Really, we had to make TSO packets being auto sized, lets not add the
syndrome again.

So I do not really understand David concern about emitting a warning.

If a machine is used as a router, building GRO packets of 17 MSS is
absolutely fine.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30  4:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383106577.4857.30.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 09:16:17PM -0700, Eric Dumazet wrote:
>
> The thing is this only matters for hosts receiving at line rate on few
> TCP flows.
> 
> A router should not build too big GRO packets, as it adds latencies.
> 
> Really, we had to make TSO packets being auto sized, lets not add the
> syndrome again.
> 
> So I do not really understand David concern about emitting a warning.
> 
> If a machine is used as a router, building GRO packets of 17 MSS is
> absolutely fine.

It's not just routers you know, we use the same code on bridges
as part of virtualisation.  So it absolutely does matter.

In fact this is why I wrote GRO in the first place, to make it
work for virtualisation.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: named network namespace -- setns() with Invalid argument (errno 22)
From: Eric W. Biederman @ 2013-10-30  4:33 UTC (permalink / raw)
  To: dilip.daya; +Cc: netdev
In-Reply-To: <1383092184.12859.78.camel@dilip-laptop>

Dilip Daya <dilip.daya@hp.com> writes:

> Hi All,
>
> Is the following intended behavior for adding "nested" named network namespaces ?

Not exactly intended but this is not misbehavior either.

Mostly this is a don't do that then scenario.

Eric


> Steps to reproduce:
>
> # uname -r
> 3.10.1
>
>
> # /sbin/ip -V
> ip utility, iproute2-ss130903
>
>
> Existing network namespaces:
> # ip netns list
> NETNS0
> NETNS1
>
>
> List of named network namespace objects with inode/permissions:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>
> Enter existing named network namespace:
> # ip netns exec NETNS0 bash
>
> List network devices for named netns:
> # ls -l /sys/class/net/
> total 0
> lrwxrwxrwx 1 root root 0 Oct 29 12:25 lo -> ../../devices/virtual/net/lo/
>
> List of named network namespace objects with inode/permissions:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>
>
> # ip netns add NETNS0a    <<< adding NETNS0a from within NETNS0
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
> 4026532423 -r--r--r-- 1 root root 0 Oct 29 12:28 NETNS0a
> ^^^^^^^^^^ ^^^^^^^^^^
>   inode    permissions
>
>
> # ip netns exec NETNS0a ls -l /sys/class/net/
> total 0
> lrwxrwxrwx 1 root root 0 Oct 29 12:28 lo -> ../../devices/virtual/net/lo
>
> # exit   <<< exiting from NETNS0
>
> Listing from host/default namespace:
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>     964863 ---------- 1 root root 0 Oct 29 12:28 NETNS0a  <<< NULL permissions
>     ^^^^^^ ^^^^^^^^^^
>
>
> Re-enter NETNS0:
> # ip netns exec NETNS0 bash
> # ls -li /var/run/netns/
> total 0
> 4026532310 -r--r--r-- 1 root root 0 Oct 29 09:11 NETNS0
> 4026532366 -r--r--r-- 1 root root 0 Oct 29 09:13 NETNS1
>     964863 ---------- 1 root root 0 Oct 29 12:28 NETNS0a  <<< NULL permissions
>            ^^^^^^^^^^
>
>
> # ip netns exec NETNS0a ls -l /sys/class/net/
> seting the network namespace "NETNS0a" failed: Invalid argument
>
> => It seems the bash shell that created the nested named netns is the only
>    one that can view/enter the nested named netns. All other attempts from
>    either another bash shell or host/default namespace will get a different 
>    inode with NULL permissions. Once the initial bash shell that created the
>    nested named netns exists the nested netns is rendered unusable due to
>    NULL permissions on its inode. setns() Invalid argument (errno 22) seems
>    to be due to NULL permissions on /var/run/netns/<netnsName> object.
>
>
> Thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-30  4:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <20131030041959.GA3162@gondor.apana.org.au>

On Wed, 2013-10-30 at 12:19 +0800, Herbert Xu wrote:
> On Tue, Oct 29, 2013 at 09:16:17PM -0700, Eric Dumazet wrote:
> >
> > The thing is this only matters for hosts receiving at line rate on few
> > TCP flows.
> > 
> > A router should not build too big GRO packets, as it adds latencies.
> > 
> > Really, we had to make TSO packets being auto sized, lets not add the
> > syndrome again.
> > 
> > So I do not really understand David concern about emitting a warning.
> > 
> > If a machine is used as a router, building GRO packets of 17 MSS is
> > absolutely fine.
> 
> It's not just routers you know, we use the same code on bridges
> as part of virtualisation.  So it absolutely does matter.
> 

What matters ?

GRO ?

Or making size of GRO packets not too big, or making them bigger ?

Before my patch, GRO packets were 17 MSS, and nobody complained packets
were too small, so what are you saying exactly ?

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-30  4:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Christoph Paasch, netdev
In-Reply-To: <20131030040644.GA2755@gondor.apana.org.au>

On Wed, 2013-10-30 at 12:06 +0800, Herbert Xu wrote:

> In that case we should be able to segment these full-sized skbs
> without linearising.  What we should do is iterate through
> each frag_list entry and apply the usual GSO code to them.

Well, if you really want to be smart, we could build GSO packets of ~16
MSS, as most NIC support TSO.

So a 64K packet containing 44 MSS could be split into 3 TSO packets.

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: migrate mergeable rx buffers to page frag allocators
From: Jason Wang @ 2013-10-30  4:41 UTC (permalink / raw)
  To: Michael Dalton, Eric Northup
  Cc: Michael S. Tsirkin, netdev, Eric Dumazet, lf-virt,
	Daniel Borkmann, David S. Miller
In-Reply-To: <CANJ5vP+=wCrzJ3Fdg0_5ZTTT-V-fw2AXUG=EgwsUcxXO=UpMgw@mail.gmail.com>

On 10/30/2013 03:05 AM, Michael Dalton wrote:
> Agreed Eric, the buffer size should be increased so that we can accommodate a
> MTU-sized packet + mergeable virtio net header in a single buffer. I will send
> a patch to fix shortly cleaning up the #define headers as Rusty indicated and
> increasing the buffer size slightly by VirtioNet header size bytes per Eric.
>
> Jason, I'll followup with you directly - I'd like to know your exact workload
> (single steam or multi-stream netperf?), VM configuration, etc, and also see if
> the nit that Erichas pointed out affects your results.

It's just a single TCP_STREAM from local host to guest with vhost_net
enabled. The host and guest were connected with bridge.
>   It is also
> worth noting that
> we may want to tune the queue sizes for your benchmarks, e.g, by reducing
> buffer size from 4KB to MTU-sized but keeping queue length constant, we're
> implicitly decreasing the number of bytes stored in the VirtioQueue for the
> VirtioNet device, so increasing the queue size may help.

The problem is the overhead of introduced by the frag refill and frag
list. Looking at tg3 and bnx2x, the frag were only used for small
packets not for jumbo frame which does not need a frag list. But your
patch tires to use it for GSO packets. So we'd better to solve it in
device level (e.g multiple rings with different size of buffers).

I try to change the queue size from 256 to 1024. Unfortunately, I didn't
see improvement. As a workaround, If you care only the MTU size packet
and do not want GSO packet to be received, you can just disable it by
specifying gso_tso{4|6}=off in qemu command line or let's make it
configurable through ethtool.
>
> Best,
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Herbert Xu @ 2013-10-30  4:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, christoph.paasch, netdev, hkchu, mwdalton
In-Reply-To: <1383107681.4857.33.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 09:34:41PM -0700, Eric Dumazet wrote:
>
> What matters ?
> 
> GRO ?

What matters is that you should not treat the forwarding case
separately from the host case.

For virtualisation the host case looks exactly like the forwarding
case.

IOW, if having a 64KB packet matters for the host, then it matters
for forwarding as well.

> Before my patch, GRO packets were 17 MSS, and nobody complained packets
> were too small, so what are you saying exactly ?

I'm not criticsing your mega-GRO patch at all.  That one is great
and means that we'll get aggregated packets up to 64K.  What we need
to do is just to patch up the GSO code so that it can handle these
mega-packets properly.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Herbert Xu @ 2013-10-30  4:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christoph Paasch, netdev
In-Reply-To: <1383107858.4857.35.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 09:37:38PM -0700, Eric Dumazet wrote:
>
> Well, if you really want to be smart, we could build GSO packets of ~16
> MSS, as most NIC support TSO.
> 
> So a 64K packet containing 44 MSS could be split into 3 TSO packets.

Yes that would be nice for sure.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox