Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
From: Eric Dumazet @ 2012-06-26  8:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120626.004658.2123525722448546355.davem@davemloft.net>

On Tue, 2012-06-26 at 00:46 -0700, David Miller wrote:

> And for legitimate traffic it's completely the wrong thing to do.
> 
> There is absolutely zero reason to pure valid entries when hash chains
> average length of one.
> 
> I've been monitoring routing cache activity, and it's the height of
> stupidity.  Every 5 minutes we pure, and then they all get regenerated
> again.
> 

Thats because gc_interval (60) is big compared to ip_rt_gc_timeout
(300)

So each time rt_check_expire() triggers, we handle a big part of the
cache. On big servers I had to lower gc_interval to smooth things.

Garbage collect is needed to not waste kernel memory, even on legitimate
traffic on a typical web server.

Taken from my 8GB machine : 

# cat /proc/sys/net/ipv4/route/gc_thresh
262144

320 bytes per dst : 262144*320 = 83886080 bytes to store one dst per hash chain.

Also, why keeping a dst in cache if no traffic uses it in a 5 minutes period ?

^ permalink raw reply

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
From: Eric Dumazet @ 2012-06-26  7:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120626.004658.2123525722448546355.davem@davemloft.net>

On Tue, 2012-06-26 at 00:46 -0700, David Miller wrote:

> But regardless, could you actually answer the question I asked of you?

I did a revert, no more no less.

^ permalink raw reply

* [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: Jeff Kirsher @ 2012-06-26  7:54 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, stable, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

FCoE target mode was experiencing issues due to the fact that we were
sending up data frames that were padded to 60 bytes after the DDP logic had
already stripped the frame down to 52 or 56 depending on the use of VLANs.
This was resulting in the FCoE DDP logic having issues since it thought the
frame still had data in it due to the padding.

To resolve this, adding code so that we do not pad FCoE frames prior to
handling them to the stack.

CC: <stable@vger.kernel.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   14 ++++++++++----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 3ef3c52..7af291e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -196,7 +196,7 @@ enum ixgbe_ring_state_t {
 	__IXGBE_HANG_CHECK_ARMED,
 	__IXGBE_RX_RSC_ENABLED,
 	__IXGBE_RX_CSUM_UDP_ZERO_ERR,
-	__IXGBE_RX_FCOE_BUFSZ,
+	__IXGBE_RX_FCOE,
 };
 
 #define check_for_tx_hang(ring) \
@@ -290,7 +290,7 @@ struct ixgbe_ring_feature {
 #if defined(IXGBE_FCOE) && (PAGE_SIZE < 8192)
 static inline unsigned int ixgbe_rx_pg_order(struct ixgbe_ring *ring)
 {
-	return test_bit(__IXGBE_RX_FCOE_BUFSZ, &ring->state) ? 1 : 0;
+	return test_bit(__IXGBE_RX_FCOE, &ring->state) ? 1 : 0;
 }
 #else
 #define ixgbe_rx_pg_order(_ring) 0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index af1a531..c377706 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -634,7 +634,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 			f = &adapter->ring_feature[RING_F_FCOE];
 			if ((rxr_idx >= f->mask) &&
 			    (rxr_idx < f->mask + f->indices))
-				set_bit(__IXGBE_RX_FCOE_BUFSZ, &ring->state);
+				set_bit(__IXGBE_RX_FCOE, &ring->state);
 		}
 
 #endif /* IXGBE_FCOE */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbb05d6..18ca3bc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1058,17 +1058,17 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 #ifdef IXGBE_FCOE
 /**
  * ixgbe_rx_is_fcoe - check the rx desc for incoming pkt type
- * @adapter: address of board private structure
+ * @ring: structure containing ring specific data
  * @rx_desc: advanced rx descriptor
  *
  * Returns : true if it is FCoE pkt
  */
-static inline bool ixgbe_rx_is_fcoe(struct ixgbe_adapter *adapter,
+static inline bool ixgbe_rx_is_fcoe(struct ixgbe_ring *ring,
 				    union ixgbe_adv_rx_desc *rx_desc)
 {
 	__le16 pkt_info = rx_desc->wb.lower.lo_dword.hs_rss.pkt_info;
 
-	return (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) &&
+	return test_bit(__IXGBE_RX_FCOE, &ring->state) &&
 	       ((pkt_info & cpu_to_le16(IXGBE_RXDADV_PKTTYPE_ETQF_MASK)) ==
 		(cpu_to_le16(IXGBE_ETQF_FILTER_FCOE <<
 			     IXGBE_RXDADV_PKTTYPE_ETQF_SHIFT)));
@@ -1549,6 +1549,12 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 		skb->truesize -= ixgbe_rx_bufsz(rx_ring);
 	}
 
+#ifdef IXGBE_FCOE
+	/* do not attempt to pad FCoE Frames as this will disrupt DDP */
+	if (ixgbe_rx_is_fcoe(rx_ring, rx_desc))
+		return false;
+
+#endif
 	/* if skb_pad returns an error the skb was freed */
 	if (unlikely(skb->len < 60)) {
 		int pad_len = 60 - skb->len;
@@ -1775,7 +1781,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 #ifdef IXGBE_FCOE
 		/* if ddp, not passing to ULD unless for FCP_RSP or error */
-		if (ixgbe_rx_is_fcoe(adapter, rx_desc)) {
+		if (ixgbe_rx_is_fcoe(rx_ring, rx_desc)) {
 			ddp_bytes = ixgbe_fcoe_ddp(adapter, rx_desc, skb);
 			if (!ddp_bytes) {
 				dev_kfree_skb_any(skb);
-- 
1.7.10.2

^ permalink raw reply related

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: Jeff Kirsher @ 2012-06-26  7:53 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <20120626.005029.909254566674577767.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

On Tue, 2012-06-26 at 00:50 -0700, David Miller wrote:
> Sorry, quotes don't work either, what you did is still a SMTP syntax error,
> here's what is in the bounce I get back:
> 
> 	<stable@vger.kernel.org> "[3.4]",
> 	Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Illegal-Object:	Syntax error in Cc: address found on vger.kernel.org:
> 	Cc:	<stable@vger.kernel.org>"[3.4]"
> 						^-missing end of address

Grrr...

I will re-send without the "[3.4]", Greg will just have to deal with it.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: David Miller @ 2012-06-26  7:50 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <1340696980.2255.18.camel@jtkirshe-mobl>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 26 Jun 2012 00:49:40 -0700

> On Tue, 2012-06-26 at 00:43 -0700, David Miller wrote:
>> You can't put things like "[3.4]" unquoted into the CC: list, that's
>> not kosher and vger rejected it.
> 
> Re-sent with the proper quoting.

Doesn't work, see my reply.

^ permalink raw reply

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: David Miller @ 2012-06-26  7:50 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <1340696951-2686-1-git-send-email-jeffrey.t.kirsher@intel.com>


Sorry, quotes don't work either, what you did is still a SMTP syntax error,
here's what is in the bounce I get back:

	<stable@vger.kernel.org> "[3.4]",
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Illegal-Object:	Syntax error in Cc: address found on vger.kernel.org:
	Cc:	<stable@vger.kernel.org>"[3.4]"
						^-missing end of address

^ permalink raw reply

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: Jeff Kirsher @ 2012-06-26  7:49 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <20120626.004302.1065943933933136878.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 200 bytes --]

On Tue, 2012-06-26 at 00:43 -0700, David Miller wrote:
> You can't put things like "[3.4]" unquoted into the CC: list, that's
> not kosher and vger rejected it.

Re-sent with the proper quoting.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: Jeff Kirsher @ 2012-06-26  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <20120626.004302.1065943933933136878.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]

On Tue, 2012-06-26 at 00:43 -0700, David Miller wrote:
> You can't put things like "[3.4]" unquoted into the CC: list, that's
> not kosher and vger rejected it.

Sorry, that was what Greg told me to do.  I did not realize it needed to
be in quotes, my bad.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
From: David Miller @ 2012-06-26  7:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340696398.10893.209.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 Jun 2012 09:39:58 +0200

> On Tue, 2012-06-26 at 00:21 -0700, David Miller wrote:
>> IPv4 routing cache entries no longer use dst->expires, because the
>> metrics, PMTU, and redirect information are stored in the inetpeer
>> cache.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
>> 
>> Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
>> (ipv4: reintroduce route cache garbage collector) do you remember
>> if the thing we needed was the real expiry or both the
>> rt_is_expired() and the rt_may_expire() cases?
>> 
>> I really want to remove rt_may_expire() from this conditional because
>> it results in absolutely stupid behavior.  If your system is idle
>> for 5 minutes, all of your input routing cache entries are purged.
> 
> Hmm, after a DDOS, purging all those routing cache entries in 5 minutes
> is good to recover some Mbytes of kernel memory.

And for legitimate traffic it's completely the wrong thing to do.

There is absolutely zero reason to pure valid entries when hash chains
average length of one.

I've been monitoring routing cache activity, and it's the height of
stupidity.  Every 5 minutes we pure, and then they all get regenerated
again.

Routing cache entries are expensive to recreate, it's much easier to
just keep them around then to potentially eat four full trie lookups
because that's what it will cost to reconstitute those guys.

But regardless, could you actually answer the question I asked of you?

^ permalink raw reply

* Re: [net] ixgbe: Do not pad FCoE frames as this can cause issues with FCoE DDP
From: David Miller @ 2012-06-26  7:43 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, gospo, sassmann, stable
In-Reply-To: <1340695993-1911-1-git-send-email-jeffrey.t.kirsher@intel.com>


You can't put things like "[3.4]" unquoted into the CC: list, that's
not kosher and vger rejected it.

^ permalink raw reply

* Re: [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
From: Eric Dumazet @ 2012-06-26  7:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120626.002124.2220875506847485306.davem@davemloft.net>

On Tue, 2012-06-26 at 00:21 -0700, David Miller wrote:
> IPv4 routing cache entries no longer use dst->expires, because the
> metrics, PMTU, and redirect information are stored in the inetpeer
> cache.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
> 
> Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
> (ipv4: reintroduce route cache garbage collector) do you remember
> if the thing we needed was the real expiry or both the
> rt_is_expired() and the rt_may_expire() cases?
> 
> I really want to remove rt_may_expire() from this conditional because
> it results in absolutely stupid behavior.  If your system is idle
> for 5 minutes, all of your input routing cache entries are purged.

Hmm, after a DDOS, purging all those routing cache entries in 5 minutes
is good to recover some Mbytes of kernel memory.

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: Hans Schillstrom @ 2012-06-26  7:27 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet@gmail.com, subramanian.vijay@gmail.com,
	dave.taht@gmail.com, netdev@vger.kernel.org, ncardwell@google.com,
	therbert@google.com, brouer@redhat.com
In-Reply-To: <20120626.001124.36486380031998542.davem@davemloft.net>

On Tuesday 26 June 2012 09:11:24 David Miller wrote:
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> Date: Tue, 26 Jun 2012 07:34:31 +0200
> 
> > The big cycle consumer during a syn attack is SHA sum right now, 
> > so from that perspective it's better to add aes crypto (by using AES-NI) 
> > to the syn cookies instead of SHA sum. Even if only newer x86_64 can use it.
> 
> I'm surprised that x86 lacks an SHA1 instruction, even shitty sparcs
> have one now.
> 
> SHA1 seems overkill for what the syncookie code is trying to do, could
> you give the following a try?
> 

Sure, I'll give it a try later today 

Thanks

^ permalink raw reply

* Re: [PATCH net] net: qmi_wwan: fix Oops while disconnecting
From: Ming Lei @ 2012-06-26  7:23 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Bjørn Mork, netdev, linux-usb, Marius Bjørnstad Kotsbak
In-Reply-To: <201206251410.13454.oliver@neukum.org>

On Mon, Jun 25, 2012 at 8:10 PM, Oliver Neukum <oliver@neukum.org> wrote:
> Am Montag, 25. Juni 2012, 09:15:21 schrieb Ming Lei:
>> Any locking isn't needed if the set to NULL is put after
>> driver_info->unbind,  since ->unbind will call subdriver->disconnect,
>> which will hold the open/disconnect lock of wdm_mutex.
>
> True for cdc_wdm. But usbnet needs to work well for everything.

Suppose there are other usbnet drivers which may have this kind of
subdriver, and they have to take one lock to avoid open/disconnect
race, there are only two ways to do it:

          - the lock is held before calling usbnet_disconnect
          - the lock is held inside driver_info->unbind

So putting the set to NULL after driver_info->unbind should work
for the both two ways above.

Also we can document the usage in comments.

>
>> > We can move to after unregister_netdev()
>> > I am unhappy with it going after unbind.
>> >
>>
>> Could you let us know the reason? I think it may let the
>> patch not necessary.
>
> Very well. This is the code:
>
>  void usbnet_disconnect (struct usb_interface *intf)
> {
>        struct usbnet           *dev;
>        struct usb_device       *xdev;
>        struct net_device       *net;
>
>        dev = usb_get_intfdata(intf);
>        usb_set_intfdata(intf, NULL);
>        if (!dev)
>                return;
>
> This code needs to check for NULL (cdc_ether and similar drivers)
> It is cleaner that if we need to check for NULL we also set to NULL.
> But that is no good reason to keep it if there's real trouble
>
>        xdev = interface_to_usbdev (intf);
>
>        netif_info(dev, probe, dev->net, "unregister '%s' usb-%s-%s, %s\n",
>                   intf->dev.driver->name,
>                   xdev->bus->bus_name, xdev->devpath,
>                   dev->driver_info->description);
>
>        net = dev->net;
>        unregister_netdev (net);
>
> Here intfdata is NULL.
>
>        cancel_work_sync(&dev->kevent);
>
>        if (dev->driver_info->unbind)
>                dev->driver_info->unbind (dev, intf);
>
> At this point a minidriver must not follow the intfdata pointer,
> because the interface may again be probed. So if here a minidriver

IMO, probe is serialized strictly with driver unbind since both the parent
lock and its own device lock have been held, so the probe may only be
started after driver unbinding is completed.

> still uses intfdata, locking will be needed. We want to catch those
> casees.

Suppose infdata is used here somewhere, it is surely a bug because
the usbnet instance pointed by intfdata will be freed soon.

So looks putting the set to NULL after driver_info->unbind is good,
doesn't it?

>
>        usb_kill_urb(dev->interrupt);
>        usb_free_urb(dev->interrupt);
>
>        free_netdev(net);
>        usb_put_dev (xdev);
> }
>
>> > Sure, it is a debugging aid. It has the drawback that minidrivers have
>> > to be able to deal with intfdata being NULL. That is not hard to do.
>>
>> The check isn't needed if the set to NULL is put after  driver_info->unbind
>> in usbnet_disconnect.
>
> True, but we don't catch bugs.

If the check is added, the bugs may be hided, and no stack will be
dumped, :-)


Thanks,
-- 
Ming Lei

^ permalink raw reply

* [PATCH] ipv4: Remove unnecessary code from rt_check_expire().
From: David Miller @ 2012-06-26  7:21 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet


IPv4 routing cache entries no longer use dst->expires, because the
metrics, PMTU, and redirect information are stored in the inetpeer
cache.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Eric, when you did commit 9f28a2fc0bd77511f649c0a788c7bf9a5fd04edb
(ipv4: reintroduce route cache garbage collector) do you remember
if the thing we needed was the real expiry or both the
rt_is_expired() and the rt_may_expire() cases?

I really want to remove rt_may_expire() from this conditional because
it results in absolutely stupid behavior.  If your system is idle
for 5 minutes, all of your input routing cache entries are purged.

 net/ipv4/route.c |   34 +++++++++++-----------------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8d62d85..846961c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -870,34 +870,22 @@ static void rt_check_expire(void)
 		while ((rth = rcu_dereference_protected(*rthp,
 					lockdep_is_held(rt_hash_lock_addr(i)))) != NULL) {
 			prefetch(rth->dst.rt_next);
-			if (rt_is_expired(rth)) {
+			if (rt_is_expired(rth) ||
+			    rt_may_expire(rth, tmo, ip_rt_gc_timeout)) {
 				*rthp = rth->dst.rt_next;
 				rt_free(rth);
 				continue;
 			}
-			if (rth->dst.expires) {
-				/* Entry is expired even if it is in use */
-				if (time_before_eq(jiffies, rth->dst.expires)) {
-nofree:
-					tmo >>= 1;
-					rthp = &rth->dst.rt_next;
-					/*
-					 * We only count entries on
-					 * a chain with equal hash inputs once
-					 * so that entries for different QOS
-					 * levels, and other non-hash input
-					 * attributes don't unfairly skew
-					 * the length computation
-					 */
-					length += has_noalias(rt_hash_table[i].chain, rth);
-					continue;
-				}
-			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
-				goto nofree;
 
-			/* Cleanup aged off entries. */
-			*rthp = rth->dst.rt_next;
-			rt_free(rth);
+			/* We only count entries on a chain with equal
+			 * hash inputs once so that entries for
+			 * different QOS levels, and other non-hash
+			 * input attributes don't unfairly skew the
+			 * length computation
+			 */
+			tmo >>= 1;
+			rthp = &rth->dst.rt_next;
+			length += has_noalias(rt_hash_table[i].chain, rth);
 		}
 		spin_unlock_bh(rt_hash_lock_addr(i));
 		sum += length;
-- 
1.7.10

^ permalink raw reply related

* [rfc] virtio-spec: introduce VIRTIO_NET_F_MULTIQUEUE
From: Jason Wang @ 2012-06-26  7:15 UTC (permalink / raw)
  To: jasowang, mst, rusty; +Cc: netdev, virtualization

This patch introduces the multiqueue capabilities to virtio net devices. The
number of tx/rx queue pairs available in the device were exposed through config
space, and driver could negotiate the number of pairs it wish to use through
ctrl vq.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 virtio-0.9.5.lyx |  180 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 176 insertions(+), 4 deletions(-)

diff --git a/virtio-0.9.5.lyx b/virtio-0.9.5.lyx
index 3c80ecf..480e9c7 100644
--- a/virtio-0.9.5.lyx
+++ b/virtio-0.9.5.lyx
@@ -56,6 +56,7 @@
 \html_math_output 0
 \html_css_as_file 0
 \html_be_strict false
+\author 2090695081 "Jason Wang" 
 \end_header
 
 \begin_body
@@ -3854,11 +3855,22 @@ ID 1
 \end_layout
 
 \begin_layout Description
-Virtqueues 0:receiveq.
+Virtqueues 
+\change_inserted 2090695081 1340693104
+
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 2090695081 1340693118
+When VIRTIO_NET_F_MULTIQUEUE is not set: 
+\change_unchanged
+0:receiveq.
  1:transmitq.
  2:controlq
 \begin_inset Foot
-status open
+status collapsed
 
 \begin_layout Plain Layout
 Only if VIRTIO_NET_F_CTRL_VQ set
@@ -3867,9 +3879,60 @@ Only if VIRTIO_NET_F_CTRL_VQ set
 \end_inset
 
 
+\change_inserted 2090695081 1340693122
+
 \end_layout
 
 \begin_layout Description
+
+\change_inserted 2090695081 1340693866
+When VIRTIO_NET_F_MULTIQUEUE is set and there's N tx/rx queue pairs: 0:receiveq1.
+ 1:transmitq1.
+ 2:controlq
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693141
+Only if VIRTIO_NET_F_CTRL_VQ set
+\end_layout
+
+\end_inset
+
+ ...
+ 2N-1
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693284
+2N-2 If VIRTIO_NET_F_CTRL_VQ not set
+\end_layout
+
+\end_inset
+
+:receiveqN.
+ 2N
+\begin_inset Foot
+status collapsed
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693302
+2N-1 If VIRTIO_NET_F_CTRL_VQ is not set
+\end_layout
+
+\end_inset
+
+: transmitqN
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
 Feature
 \begin_inset space ~
 \end_inset
@@ -4027,6 +4090,16 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 2090695081 1340692965
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 2090695081 1340693017
+VIRTIO_NET_F_MULTIQUEUE (22) Device has multiple tx/rx queues.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4039,11 +4112,22 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout T
+\change_inserted 2090695081 1340693345
+hree
+\change_deleted 2090695081 1340693344
+wo
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 2090695081 1340693398
+ The num queue pairs fields only exist if VIRTIO_NET_F_MULTIQUEUE is set.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4076,6 +4160,17 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
     u16 status;
+\change_inserted 2090695081 1340692955
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340692962
+
+	u16 num_queue_pairs;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4527,7 +4622,7 @@ O features are used, the Guest will need to accept packets of up to 65550
  So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every buffer in the receive
  queue needs to be at least this length 
 \begin_inset Foot
-status open
+status collapsed
 
 \begin_layout Plain Layout
 Obviously each one can be split across multiple descriptor elements.
@@ -4980,6 +5075,83 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
 \begin_layout Enumerate
 .
  
+\change_inserted 2090695081 1340693446
+
+\end_layout
+
+\begin_layout Subsection*
+
+\change_inserted 2090695081 1340693500
+Negotiating the number of queue pairs
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693733
+If the driver negotiates the VIRTIO_NET_F_MULTIQUEUE (depends on VIRTIO_NET_F_CT
+RL_VQ), it can then negotiate the number of queue pairs it wish to use by
+ placing the number in num_queue_pairs field of virtio_net_ctrl_multiqueue
+ through VIRTIO_NET_CTRL_MULTIQUEUE_NUM command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693782
+If the driver doesn't negotiate the number, all tx/rx queues were enabled
+ by default.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1340693616
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693620
+
+struct virtio_net_ctrl_multiqueue {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693627
+
+	u16 num_queue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693616
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693616
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693639
+
+#define VIRTIO_NET_CTRL_MULTIQUEUE    4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1340693646
+
+ #define VIRTIO_NET_CTRL_MULTIQUEUE_NUM        0 
+\end_layout
+
+\end_inset
+
+
 \end_layout
 
 \begin_layout Chapter*
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: David Miller @ 2012-06-26  7:11 UTC (permalink / raw)
  To: hans.schillstrom
  Cc: eric.dumazet, subramanian.vijay, dave.taht, netdev, ncardwell,
	therbert, brouer
In-Reply-To: <201206260734.33472.hans.schillstrom@ericsson.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Tue, 26 Jun 2012 07:34:31 +0200

> The big cycle consumer during a syn attack is SHA sum right now, 
> so from that perspective it's better to add aes crypto (by using AES-NI) 
> to the syn cookies instead of SHA sum. Even if only newer x86_64 can use it.

I'm surprised that x86 lacks an SHA1 instruction, even shitty sparcs
have one now.

SHA1 seems overkill for what the syncookie code is trying to do, could
you give the following a try?

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6660ffc..b280bf4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -435,7 +435,6 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb);
 int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size);
 
 /* From syncookies.c */
-extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
 extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
 				    struct ip_options *opt);
 #ifdef CONFIG_SYN_COOKIES
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index eab2a7f..60116dc 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -13,9 +13,9 @@
 #include <linux/tcp.h>
 #include <linux/slab.h>
 #include <linux/random.h>
-#include <linux/cryptohash.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/jhash.h>
 #include <net/tcp.h>
 #include <net/route.h>
 
@@ -25,7 +25,7 @@
 
 extern int sysctl_tcp_syncookies;
 
-__u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
+__u32 syncookie_secret[2];
 EXPORT_SYMBOL(syncookie_secret);
 
 static __init int init_syncookies(void)
@@ -38,22 +38,14 @@ __initcall(init_syncookies);
 #define COOKIEBITS 24	/* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
 
-static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS],
-		      ipv4_cookie_scratch);
-
 static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
 		       u32 count, int c)
 {
-	__u32 *tmp = __get_cpu_var(ipv4_cookie_scratch);
-
-	memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c]));
-	tmp[0] = (__force u32)saddr;
-	tmp[1] = (__force u32)daddr;
-	tmp[2] = ((__force u32)sport << 16) + (__force u32)dport;
-	tmp[3] = count;
-	sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
-	return tmp[17];
+	return jhash_3words((__force __u32) saddr + count,
+			    (__force __u32) daddr,
+			    (((__force __u32) sport) << 16 |
+			     (__force __u32) dport),
+			    syncookie_secret[c]);
 }
 
 

^ permalink raw reply related

* Re: [PATCH 0/2 net-next]: fixups for wrong conflict resolution
From: David Miller @ 2012-06-26  6:55 UTC (permalink / raw)
  To: ordex; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1340693391-16434-1-git-send-email-ordex@autistici.org>

From: Antonio Quartulli <ordex@autistici.org>
Date: Tue, 26 Jun 2012 08:49:49 +0200

> here are our two fixes to recover from the little problems introduced during the
> last conflict resolution involving translation-table.c.
> Sorry for the inconvenient.

Both applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/sh-eth: Check return value of sh_eth_reset when chip reset fail
From: David Miller @ 2012-06-26  6:55 UTC (permalink / raw)
  To: nobuhiro.iwamatsu.yj; +Cc: netdev, florian
In-Reply-To: <1340681712-2212-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>

From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Date: Tue, 26 Jun 2012 12:35:12 +0900

> The sh_eth_reset function resets chip, but this performs nothing when failed.
> This changes sh_eth_reset return an error, when this failed in reset.
> 
> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v3] net/sh-eth: Add support selecting MII function for SH7734 and R8A7740
From: David Miller @ 2012-06-26  6:55 UTC (permalink / raw)
  To: nobuhiro.iwamatsu.yj; +Cc: netdev
In-Reply-To: <1340681654-2159-1-git-send-email-nobuhiro.iwamatsu.yj@renesas.com>

From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Date: Tue, 26 Jun 2012 12:34:14 +0900

> Ethernet IP of SH7734 and R8A7740 has selecting MII register.
> The user needs to change a value according to MII to be used.
> This adds the function to change the value of this register.
> 
> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v2] net/ipv6/route.c: packets originating on device match lo
From: David Miller @ 2012-06-26  6:54 UTC (permalink / raw)
  To: david_mccullough; +Cc: netdev
In-Reply-To: <20120626014226.GB3455@mcafee.com>

From: David McCullough <david_mccullough@mcafee.com>
Date: Tue, 26 Jun 2012 11:42:26 +1000

> Fix to allow IPv6 packets originating locally to match rules with the "iff"
> set to "lo".  This allows IPv6 rule matching work the same as it does for
> IPv4.  From the iproute2 man page:
> 
>    iif NAME
> 		  select  the incoming device to match.  If the interface is loop‐
> 		  back, the rule only matches packets originating from this  host.
> 		  This  means that you may create separate routing tables for for‐
> 		  warded and local packets and, hence, completely segregate them.
> 
> Signed-off-by: David McCullough <david_mccullough@mcafee.com>

Applied to net-next.

^ permalink raw reply

* [PATCH 1/2 net-next] batman-adv: fix condition in AP isolation
From: Antonio Quartulli @ 2012-06-26  6:49 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
In-Reply-To: <1340693391-16434-1-git-send-email-ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

During the last conflict resolution involving translation-table.c something went
wrong and a condition in the AP isolation code was reversed. This patch fixes
this problem.

Signed-off-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
---
 net/batman-adv/translation-table.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index cf79883..e4f27a8 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -2187,7 +2187,7 @@ bool batadv_is_ap_isolated(struct bat_priv *bat_priv, uint8_t *src,
 	if (!tt_global_entry)
 		goto out;
 
-	if (_batadv_is_ap_isolated(tt_local_entry, tt_global_entry))
+	if (!_batadv_is_ap_isolated(tt_local_entry, tt_global_entry))
 		goto out;
 
 	ret = true;
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 2/2 net-next] batman-adv: fix global TT entry deletion
From: Antonio Quartulli @ 2012-06-26  6:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1340693391-16434-1-git-send-email-ordex@autistici.org>

During the last merge involving translation-table.c something went wrong and two
lines disappeared from translation-table.c. This patch recovers them.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/translation-table.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index e4f27a8..c673b58 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -149,6 +149,8 @@ static void batadv_tt_orig_list_entry_free_rcu(struct rcu_head *rcu)
 static void
 batadv_tt_orig_list_entry_free_ref(struct tt_orig_list_entry *orig_entry)
 {
+	/* to avoid race conditions, immediately decrease the tt counter */
+	atomic_dec(&orig_entry->orig_node->tt_size);
 	call_rcu(&orig_entry->rcu, batadv_tt_orig_list_entry_free_rcu);
 }
 
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 0/2 net-next]: fixups for wrong conflict resolution
From: Antonio Quartulli @ 2012-06-26  6:49 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20120625.161933.730861831753419928.davem@davemloft.net>

Hello David,

here are our two fixes to recover from the little problems introduced during the
last conflict resolution involving translation-table.c.
Sorry for the inconvenient.

Thank you,
	Antonio

^ permalink raw reply

* Re: [PATCH 5/6] tuntap: per queue 64 bit stats
From: Jason Wang @ 2012-06-26  6:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: mst, akong, habanero, tahm, haixiao, jwhan, ernesto.martin,
	mashirle, davem, netdev, linux-kernel, krkumar2, shemminger,
	edumazet
In-Reply-To: <1340691016.10893.197.camel@edumazet-glaptop>

On 06/26/2012 02:10 PM, Eric Dumazet wrote:
> On Tue, 2012-06-26 at 14:00 +0800, Jason Wang wrote:
>
>> Yes, looks like it's hard to use NETIF_F_LLTX without breaking the u64
>> statistics, may worth to use tx lock and alloc_netdev_mq().
> Yes, this probably needs percpu storage (if you really want to use
> include/linux/u64_stats_sync.h).
>
> But percpu storage seems a bit overkill with a raising number of cpus
> on typical machines.
>
> For loopback device, its fine because we only have one lo device per
> network namespace, and some workloads really hit hard this device.
>
> But for tuntap, I am not sure ?
>

The problem is that we want to collect per-queue statistics. So if we 
convert tuntap to use alloc_netdev_mq(), the tx statistics would be 
updated under tx lock which looks safe.

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 5/6] tuntap: per queue 64 bit stats
From: Eric Dumazet @ 2012-06-26  6:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, akong, habanero, tahm, haixiao, jwhan, ernesto.martin,
	mashirle, davem, netdev, linux-kernel, krkumar2, shemminger,
	edumazet
In-Reply-To: <4FE95015.7000707@redhat.com>

On Tue, 2012-06-26 at 14:00 +0800, Jason Wang wrote:

> Yes, looks like it's hard to use NETIF_F_LLTX without breaking the u64 
> statistics, may worth to use tx lock and alloc_netdev_mq().

Yes, this probably needs percpu storage (if you really want to use 
include/linux/u64_stats_sync.h).

But percpu storage seems a bit overkill with a raising number of cpus
on typical machines.

For loopback device, its fine because we only have one lo device per
network namespace, and some workloads really hit hard this device.

But for tuntap, I am not sure ?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox