Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] ucc_geth: Reduce IRQ off in xmit path
From: David Miller @ 2012-09-20 21:08 UTC (permalink / raw)
  To: Joakim.Tjernlund; +Cc: netdev, romieu
In-Reply-To: <1348129021-28333-1-git-send-email-Joakim.Tjernlund@transmode.se>

From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Thu, 20 Sep 2012 10:17:01 +0200

> Currently ucc_geth_start_xmit wraps IRQ off for the
> whole body just to be safe.
> Reduce the IRQ off period to a minimum.
> 
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> ---
> 
>  v2: Move assignment of ugeth->tx_skbuff[txQ][ugeth->skb_curtx[txQ]]
>      inside IRQ off section to prevent racing against
>      ucc_geth_tx(). Spotted by Francois Romieu <romieu@fr.zoreil.com>

I agree with Francois's initial analysis, and disagree with you're
response to him, wrt. the suggest to remove all locking entirely.

Unlike what you claim, there isn't much of a gain at all from merely
make the window of lock holding smaller, especially on the scale
in which you are doing it here.

Whereas removing the lock and the atomic completely, as tg3 does,
will give very significant performance gains.

The locking cost of grabbing the spinlock, and the memory transactions
associated with it, dominate.

Furthermore, even if the gains of your change are non-trivial, you
haven't documented it.  So unless you should some noticable gains from
this, it's just code masterbation as far as I'm concerned and I'm
therefore inclined to not apply patches like this.

TG3's core interrupt locking is not that difficult to understand and
replicate in other drivers, so I dismiss your attempts to avoid that
approach on difficulty grounds as well.

^ permalink raw reply

* Re: [RFC PATCH 0/3] usbnet: support runtime PM triggered by link change
From: Oliver Neukum @ 2012-09-20 21:04 UTC (permalink / raw)
  To: David Miller
  Cc: bjorn-yOkvZcmFvRU, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, finik-l0cyMroinI0,
	rjw-KKrjLPT3xs0, stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120920.170227.258356702969458329.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Thursday 20 September 2012 17:02:27 David Miller wrote:
> 
> There seems to be some discussion about the legitimacy of doing things
> this way, and in any event the patches were an RFC.
> 
> Please resubmit as a non-RFC once all the issues have been worked
> out, if appropriate.

Just to make this clear, I'd like to state that the discussion involved
only the third, last patch in the series. The first two are fine and make
sense by themselves.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH v1 1/3] usbnet: introduce usbnet_link_change API
From: David Miller @ 2012-09-20 21:02 UTC (permalink / raw)
  To: ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, oneukum-l3A5Bk7waGM,
	finik-l0cyMroinI0, rjw-KKrjLPT3xs0,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1347978201-6219-2-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

From: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Date: Tue, 18 Sep 2012 22:23:19 +0800

> +void usbnet_link_change(struct usbnet *dev, int link, int need_reset)

Please use 'bool' for link and need_reset.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 0/3] usbnet: support runtime PM triggered by link change
From: David Miller @ 2012-09-20 21:02 UTC (permalink / raw)
  To: bjorn; +Cc: ming.lei, oneukum, gregkh, finik, rjw, stern, netdev, linux-usb
In-Reply-To: <87r4pxumdd.fsf@nemi.mork.no>

There seems to be some discussion about the legitimacy of doing things
this way, and in any event the patches were an RFC.

Please resubmit as a non-RFC once all the issues have been worked
out, if appropriate.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next V4] IB/ipoib: Add rtnl_link_ops support
From: David Miller @ 2012-09-20 20:58 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, netdev, erezsh
In-Reply-To: <1347551797-2495-2-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 13 Sep 2012 18:56:36 +0300

> Add rtnl_link_ops to IPoIB, with the first usage being child device
> create/delete through them. Childs devices are now either legacy ones,
> created/deleted through the ipoib sysfs entries, or RTNL ones.
> 
> Adding support for RTNL childs involved refactoring of ipoib_vlan_add
> which is now used by both the sysfs and the link_ops code.
> 
> Also, added ndo_uninit entry to support calling unregister_netdevice_queue
> from the rtnl dellink entry. This required removal of calls to
> ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
> since the networking core will invoke ipoib_uninit which does exactly that.
> 
> Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (net-next tree related)
From: David Miller @ 2012-09-20 20:45 UTC (permalink / raw)
  To: mika.westerberg; +Cc: sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20120920091013.GV15548@intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Thu, 20 Sep 2012 12:10:14 +0300

> On Thu, Sep 20, 2012 at 05:36:22PM +1000, Stephen Rothwell wrote:
>> Hi all,
>> 
>> After merging the final tree, today's linux-next build (powerpc
>> allyesconfig) failed like this:
>> 
>> drivers/net/ethernet/i825xx/znet.c: In function 'hardware_init':
>> drivers/net/ethernet/i825xx/znet.c:868:2: error: implicit declaration of function 'isa_virt_to_bus' [-Werror=implicit-function-declaration]
>> 
>> Caused by commit 1d3ff76759b7 ("i825xx: znet: fix compiler warnings when
>> building a 64-bit kernel").  Is there some Kconfig dependency missing (CONFIG_ISA)?
> 
> If we make it dependent on CONFIG_ISA then the driver cannot be built with
> 64-bit kernel. Then again is there someone running 64-bit kernel on Zenith
> Z-note notebook? From the pictures it looks like very ancient "laptop".
> 
> An alternative is to make it depend on X86 like this:

I think the powerpc port is at fault here.

Part of being able to advertise ISA_DMA_API is providing isa_virt_to_bus().

^ permalink raw reply

* Re: Pull request: sfc-next 2012-09-19
From: David Miller @ 2012-09-20 20:42 UTC (permalink / raw)
  To: bhutchings; +Cc: richardcochran, giometti, linux-net-drivers, netdev, ajackson
In-Reply-To: <1348081775.2636.15.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 19 Sep 2012 20:09:35 +0100

> The following changes since commit b4516a288e71c64d7e214902250baf78b7b3cdcf:
> 
>   llc: Remove stray reference to sysctl_llc_station_ack_timeout. (2012-09-17 13:13:24 -0400)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem
> 
> (commit 450783747f42dfa3883920acfad4acdd93ce69af)
> 
> 1. Extension to PPS/PTP to allow for PHC devices where pulses are
>    subject to a variable but measurable delay.
> 2. PPS/PTP/PHC support for Solarflare boards with a timestamping 
>    peripheral.
> 3. MTD support for updating the timestamping peripheral on those boards.
> 4. Fix for potential over-length requests to firmware.

Pulled, thanks Ben.

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Eric Dumazet @ 2012-09-20 20:25 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, subramanian.vijay, netdev
In-Reply-To: <505B773E.9070501@hp.com>

On Thu, 2012-09-20 at 13:06 -0700, Rick Jones wrote:

> 
> Yes, I was being too fast and loose with my wording, paying more 
> attention to the netperf tests than the rest of it.  While loopback may 
> be lossless, TCP retransmissions over loopback shouldn't be all *that* 
> surprising.

Sending perfect packets (large packets) should trigger no retransmits.

In your tests, you send one-byte packets, so obviously the receiver will
drop some of them, because sk_rcvbuf limit (or the backlog limit) is hit
very fast.

(This should be less frequent with TCP coalescing that was recently
introduced : We are able to coalesce about 1600 'one-byte packets' into
a single one.)

netperf -t TCP_STREAM over loopback should not drop packets or
retransmit them.

# netstat -s|grep TCPRcvCoalesce
    TCPRcvCoalesce: 0

# netperf -t TCP_RR -- -b 1024 -D -S 16K -o
local_transport_retrans,remote_transport_retrans MIGRATED TCP
REQUEST/RESPONSE TEST from 0.0.0.0 () port 0 AF_INET to localhost ()
port 0 AF_INET : nodelay : first burst 1024
Local Transport Retransmissions,Remote Transport Retransmissions
0,0

# netstat -s|grep TCPRcvCoalesce
    TCPRcvCoalesce: 2072191

^ permalink raw reply

* RE: bnx2x: link detected up at startup even when it should be down
From: Dmitry Kravkov @ 2012-09-20 20:08 UTC (permalink / raw)
  To: Jean-Michel Hautbois, netdev
  Cc: Barak Witkowski, Eilon Greenstein, davem@davemloft.net
In-Reply-To: <CAL8zT=hFPQ-NZw8eKQwnktRrcpsOFk3aa_ac5mSWa+TAwyTGBQ@mail.gmail.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Jean-Michel Hautbois
> Sent: Thursday, September 20, 2012 6:39 PM
> To: netdev
> Cc: Barak Witkowski; Eilon Greenstein; davem@davemloft.net
> Subject: bnx2x: link detected up at startup even when it should be down
> 
> Hi all,
> 
> I am working with a HP blade which has a bnx2x based card (Broadcom
> NetXtreme II BCM57810 10 Gigabit Ethernet).
> I am using a 3.2 linux kernel, which works very well except on
> detecting the link state at startup.
> I have my ethernet interfaces linked with a bond, and I want to
> configure it for HA (in miimon mode).
> I am using a managed switch which helps me in disabling/enabling ports.
> 
> When the port is disabled, at startup, the link is detected "UP".
> When I enable the port, it is still "UP", and when I disable it again,
> then it is detected "DOWN".
> 
> I have tried the latest 3.6-rc6 kernel, and it works well (link is
> "DOWN" at startup when port is disabled).
> Then I bisected it, and I found out that the commit which makes it
> working (yes, it is an inverse bisect, thanks to this powerful git
> tool :)) is :
> 
> a334872224a67b614dc888460377862621f3dac7 is the first bad commit
> commit a334872224a67b614dc888460377862621f3dac7
> Author: Barak Witkowski <barak@broadcom.com>
> Date:   Mon Apr 23 03:04:46 2012 +0000
> 
>     bnx2x: add afex support
> 
>     Following patch adds afex multifunction support to the driver (afex
>     multifunction is based on vntag header) and updates FW version
> used to 7.2.51.
> 
>     Support includes the following:
> 
>     1. Configure vif parameters in firmware (default vlan, vif id, default
>        priority, allowed priorities) according to values received from NIC.
>     2. Configure FW to strip/add default vlan according to afex vlan mode.
>     3. Notify link up to OS only after vif is fully initialized.
>     4. Support vif list set/get requests and configure FW accordingly.
>     5. Supply afex statistics upon request from NIC.
>     6. Special handling to L2 interface in case of FCoE vif.
> 
>     Signed-off-by: Barak Witkowski <barak@broadcom.com>
>     Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> This commit is present in the 3.5.y stable branch, but not the 3.2.y one.
> Is there a workaround which would make this feature work correctly
> even on older kernels ?
> It does not seem to be trivial, but I may miss something as this
> driver is pretty big...

 Jean,
I have passed over the patch, but was unable to catch link related change out of the 
AFEX flow. We will get closer look asap in out lab (guys are out for the weekend already)

Can you double check bisect result for me, pls?

Thanks
> 
> Cheers,
> JM
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Rick Jones @ 2012-09-20 20:06 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, subramanian.vijay, netdev
In-Reply-To: <20120920.154007.1073423645697694793.davem@davemloft.net>

On 09/20/2012 12:40 PM, David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Thu, 20 Sep 2012 10:10:43 -0700
>
>> On 09/19/2012 10:37 PM, Eric Dumazet wrote:
>>> loopback is lossless, so its always surprising we can have TCP
>>> retransmits on this medium ;)
>>
>> Is it lossless?
>>
>> raj@tardy:~/netperf2_trunk$ netstat -s | grep pru
>>      19 packets pruned from receive queue because of socket buffer overrun
>
> Those packets are not being dropped by the loopback device.
>

Yes, I was being too fast and loose with my wording, paying more 
attention to the netperf tests than the rest of it.  While loopback may 
be lossless, TCP retransmissions over loopback shouldn't be all *that* 
surprising.

rick

^ permalink raw reply

* [PATCH v3 5/7] xfrm_user: ensure user supplied esn replay window is valid
From: Mathias Krause @ 2012-09-20 20:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: Steffen Klassert, netdev, linux-kernel, Mathias Krause,
	Martin Willi, Ben Hutchings
In-Reply-To: <20120920070508.GA4221@secunet.com>

The current code fails to ensure that the netlink message actually
contains as many bytes as the header indicates. If a user creates a new
state or updates an existing one but does not supply the bytes for the
whole ESN replay window, the kernel copies random heap bytes into the
replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
netlink attribute. This leads to following issues:

1. The replay window has random bits set confusing the replay handling
   code later on.

2. A malicious user could use this flaw to leak up to ~3.5kB of heap
   memory when she has access to the XFRM netlink interface (requires
   CAP_NET_ADMIN).

Known users of the ESN replay window are strongSwan and Steffen's
iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
uses the interface with a bitmap supplied while the former does not.
strongSwan is therefore prone to run into issue 1.

To fix both issues without breaking existing userland allow using the
XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
fully specified one. For the former case we initialize the in-kernel
bitmap with zero, for the latter we copy the user supplied bitmap. For
state updates the full bitmap must be supplied.

To prevent overflows in the bitmap length calculation the maximum size
of bmp_len is limited to 128 by this patch -- resulting in a maximum
replay window of 4096 packets. This should be sufficient for all real
life scenarios (RFC 4303 recommends a default replay window size of 64).

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Martin Willi <martin@revosec.ch>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
v3:
- revert size_t change to xfrm_replay_state_esn_len() (requested by Steffen)
- switch to int types for lengths (suggested by Ben)
- implement 4096 packets limit for bmp_len to avoid overflows in
  xfrm_replay_state_esn_len()
v2:
- compare against klen in xfrm_alloc_replay_state_esn (suggested by Ben)
- make xfrm_replay_state_esn_len() return size_t

 include/linux/xfrm.h |    2 ++
 net/xfrm/xfrm_user.c |   31 +++++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index 22e61fd..28e493b 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -84,6 +84,8 @@ struct xfrm_replay_state {
 	__u32	bitmap;
 };

+#define XFRMA_REPLAY_ESN_MAX	4096
+
 struct xfrm_replay_state_esn {
 	unsigned int	bmp_len;
 	__u32		oseq;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 9f1e749..e761562 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -123,9 +123,21 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
 				struct nlattr **attrs)
 {
 	struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
+	struct xfrm_replay_state_esn *rs;

-	if ((p->flags & XFRM_STATE_ESN) && !rt)
-		return -EINVAL;
+	if (p->flags & XFRM_STATE_ESN) {
+		if (!rt)
+			return -EINVAL;
+
+		rs = nla_data(rt);
+
+		if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
+			return -EINVAL;
+
+		if (nla_len(rt) < xfrm_replay_state_esn_len(rs) &&
+		    nla_len(rt) != sizeof(*rs))
+			return -EINVAL;
+	}

 	if (!rt)
 		return 0;
@@ -370,14 +382,15 @@ static inline int xfrm_replay_verify_len(struct xfrm_replay_state_esn *replay_es
 					 struct nlattr *rp)
 {
 	struct xfrm_replay_state_esn *up;
+	int ulen;

 	if (!replay_esn || !rp)
 		return 0;

 	up = nla_data(rp);
+	ulen = xfrm_replay_state_esn_len(up);

-	if (xfrm_replay_state_esn_len(replay_esn) !=
-			xfrm_replay_state_esn_len(up))
+	if (nla_len(rp) < ulen || xfrm_replay_state_esn_len(replay_esn) != ulen)
 		return -EINVAL;

 	return 0;
@@ -388,22 +401,28 @@ static int xfrm_alloc_replay_state_esn(struct xfrm_replay_state_esn **replay_esn
 				       struct nlattr *rta)
 {
 	struct xfrm_replay_state_esn *p, *pp, *up;
+	int klen, ulen;

 	if (!rta)
 		return 0;

 	up = nla_data(rta);
+	klen = xfrm_replay_state_esn_len(up);
+	ulen = nla_len(rta) >= klen ? klen : sizeof(*up);

-	p = kmemdup(up, xfrm_replay_state_esn_len(up), GFP_KERNEL);
+	p = kzalloc(klen, GFP_KERNEL);
 	if (!p)
 		return -ENOMEM;

-	pp = kmemdup(up, xfrm_replay_state_esn_len(up), GFP_KERNEL);
+	pp = kzalloc(klen, GFP_KERNEL);
 	if (!pp) {
 		kfree(p);
 		return -ENOMEM;
 	}

+	memcpy(p, up, ulen);
+	memcpy(pp, up, ulen);
+
 	*replay_esn = p;
 	*preplay_esn = pp;

-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: David Miller @ 2012-09-20 19:41 UTC (permalink / raw)
  To: rick.jones2; +Cc: eric.dumazet, sclark46, brutus, edumazet, netdev
In-Reply-To: <505B5154.3020002@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 20 Sep 2012 10:24:36 -0700

> On 09/20/2012 04:51 AM, Eric Dumazet wrote:
>> On Thu, 2012-09-20 at 07:28 -0400, Stephen Clark wrote:
>>>
>>> Does this mean traffic on the loopback interface will not traverse
>>> netfilter?
>>>
>>
>> Yes this was already mentioned.
>>
>> Only the SYN / SYNACK messages will
>>
>> All data will bypass IP stack, qdisc (if any), loopback driver, and
>> netfilter.
> 
> Does that then lift the tent flap for TOE?  As I recall, TOE's
> bypassing of all those things is one of the reasons used to reject
> TOE.

Wrong.  This bypassing is completely in software, and completely
controlled by us.

Which is completely opposite to TOE.

Don't spread fud like this, even on a whim.

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: David Miller @ 2012-09-20 19:40 UTC (permalink / raw)
  To: rick.jones2; +Cc: eric.dumazet, subramanian.vijay, netdev
In-Reply-To: <505B4E13.5000501@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 20 Sep 2012 10:10:43 -0700

> On 09/19/2012 10:37 PM, Eric Dumazet wrote:
>> loopback is lossless, so its always surprising we can have TCP
>> retransmits on this medium ;)
> 
> Is it lossless?
> 
> raj@tardy:~/netperf2_trunk$ netstat -s | grep pru
>     19 packets pruned from receive queue because of socket buffer overrun

Those packets are not being dropped by the loopback device.

^ permalink raw reply

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: David Miller @ 2012-09-20 19:30 UTC (permalink / raw)
  To: sclark46; +Cc: brutus, eric.dumazet, edumazet, netdev
In-Reply-To: <505AFDE9.4080602@earthlink.net>

From: Stephen Clark <sclark46@earthlink.net>
Date: Thu, 20 Sep 2012 07:28:41 -0400

> Does this mean traffic on the loopback interface will not traverse
> netfilter?

Please, we've had this discussion before, let's not have it again.
Read the archives for the postings of the previous versions of
this patch.

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Yuchung Cheng @ 2012-09-20 18:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, Vijay Subramanian, David Miller, netdev
In-Reply-To: <1348163031.3440.3.camel@edumazet-glaptop>

On Thu, Sep 20, 2012 at 10:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-09-20 at 10:10 -0700, Rick Jones wrote:
>> On 09/19/2012 10:37 PM, Eric Dumazet wrote:
>> > loopback is lossless, so its always surprising we can have TCP
>> > retransmits on this medium ;)
>>
>> Is it lossless?
>>
>
> It is lossless, yes.
>
> But packets can be dropped by TCP stack for various reasons, including
> reordering and retransmits.
I'd recommend checking reordering stats. If it's lose less, set
tp->reordering = 127 for loopback.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] openvswitch: using kfree_rcu() to simplify the code
From: Paul E. McKenney @ 2012-09-20 17:47 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <CAPgLHd_71Qh90j9FCkT2cQ35wNMkZeLDwHT2QLP55_3gfzfjTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Aug 27, 2012 at 12:20:45PM +0800, Wei Yongjun wrote:
> From: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>
> 
> The callback function of call_rcu() just calls a kfree(), so we
> can use kfree_rcu() instead of call_rcu() + callback function.
> 
> spatch with a semantic match is used to found this problem.
> (http://coccinelle.lip6.fr/)
> 
> Signed-off-by: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>

Reviewed-by: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

> ---
>  net/openvswitch/flow.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index b7f38b1..c7bf2f2 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -427,19 +427,11 @@ void ovs_flow_deferred_free(struct sw_flow *flow)
>  	call_rcu(&flow->rcu, rcu_free_flow_callback);
>  }
> 
> -/* RCU callback used by ovs_flow_deferred_free_acts. */
> -static void rcu_free_acts_callback(struct rcu_head *rcu)
> -{
> -	struct sw_flow_actions *sf_acts = container_of(rcu,
> -			struct sw_flow_actions, rcu);
> -	kfree(sf_acts);
> -}
> -
>  /* Schedules 'sf_acts' to be freed after the next RCU grace period.
>   * The caller must hold rcu_read_lock for this to be sensible. */
>  void ovs_flow_deferred_free_acts(struct sw_flow_actions *sf_acts)
>  {
> -	call_rcu(&sf_acts->rcu, rcu_free_acts_callback);
> +	kfree_rcu(sf_acts, rcu);
>  }
> 
>  static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Eric Dumazet @ 2012-09-20 17:43 UTC (permalink / raw)
  To: Rick Jones; +Cc: Vijay Subramanian, David Miller, netdev
In-Reply-To: <505B4E13.5000501@hp.com>

On Thu, 2012-09-20 at 10:10 -0700, Rick Jones wrote:
> On 09/19/2012 10:37 PM, Eric Dumazet wrote:
> > loopback is lossless, so its always surprising we can have TCP
> > retransmits on this medium ;)
> 
> Is it lossless?
> 

It is lossless, yes.

But packets can be dropped by TCP stack for various reasons, including
reordering and retransmits.

^ permalink raw reply

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: Bill Fink @ 2012-09-20 16:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sclark46, Bruce Curtis, David Miller, edumazet, netdev
In-Reply-To: <1348141871.31352.66.camel@edumazet-glaptop>

On Thu, 20 Sep 2012, Eric Dumazet wrote:

> On Thu, 2012-09-20 at 07:28 -0400, Stephen Clark wrote:
> >  
> > Does this mean traffic on the loopback interface will not traverse 
> > netfilter?
> > 
> 
> Yes this was already mentioned.
> 
> Only the SYN / SYNACK messages will
> 
> All data will bypass IP stack, qdisc (if any), loopback driver, and
> netfilter.

These restrictions and any others should be documented in ip-sysctl.txt.

>From Eric's earlier e-mail:

-> no iptables, 
   no qdisc (by default there is no qdisc on lo),
   no loopback stats (ifconfig lo).
   some SNMP stats missing as well (netstat -s)

						-Bill

^ permalink raw reply

* Re: [PATCH v3] net-tcp: TCP/IP stack bypass for loopback connections
From: Rick Jones @ 2012-09-20 17:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sclark46, Bruce Curtis, David Miller, edumazet, netdev
In-Reply-To: <1348141871.31352.66.camel@edumazet-glaptop>

On 09/20/2012 04:51 AM, Eric Dumazet wrote:
> On Thu, 2012-09-20 at 07:28 -0400, Stephen Clark wrote:
>>
>> Does this mean traffic on the loopback interface will not traverse
>> netfilter?
>>
>
> Yes this was already mentioned.
>
> Only the SYN / SYNACK messages will
>
> All data will bypass IP stack, qdisc (if any), loopback driver, and
> netfilter.

Does that then lift the tent flap for TOE?  As I recall, TOE's bypassing 
of all those things is one of the reasons used to reject TOE.

rick jones

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Rick Jones @ 2012-09-20 17:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Vijay Subramanian, David Miller, netdev
In-Reply-To: <1348119475.31352.60.camel@edumazet-glaptop>

On 09/19/2012 10:37 PM, Eric Dumazet wrote:
> loopback is lossless, so its always surprising we can have TCP
> retransmits on this medium ;)

Is it lossless?

raj@tardy:~/netperf2_trunk$ netstat -s | grep pru
     19 packets pruned from receive queue because of socket buffer overrun

raj@tardy:~/netperf2_trunk$ src/netperf -t TCP_RR -- -b 256 -D -o 
burst_size,local_transport_retrans,remote_transport_retrans
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to localhost.localdomain () port 0 AF_INET : nodelay : histogram : demo 
: first burst 256
Initial Burst Requests,Local Transport Retransmissions,Remote Transport 
Retransmissions
256,151,94

raj@tardy:~/netperf2_trunk$ netstat -s | grep pru    26 packets pruned 
from receive queue because of socket buffer overrun
raj@tardy:~/netperf2_trunk$ uname -a
Linux tardy 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 17:58:38 UTC 2012 
x86_64 x86_64 x86_64 GNU/Linux

Admittedly, my test is on an older kernel, but have things changed in 
this regard since then?   I had to get a bit more contrived on a later 
kernel in a VM (vs what is running directly on my workstation):

raj@tardy-ubuntu-1204:~$ netstat -s | grep -e prune -e retrans    1 
segments retransmited
     4 packets pruned from receive queue because of socket buffer overrun
     1 fast retransmits
raj@tardy-ubuntu-1204:~$ netperf -t TCP_RR -- -b 1024 -D -S 16K -o 
local_transport_retrans,remote_transport_retrans
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to localhost () port 0 AF_INET : nodelay : demo : first burst 1024
Local Transport Retransmissions,Remote Transport Retransmissions
1,0
raj@tardy-ubuntu-1204:~$ netstat -s | grep -e prune -e retrans    2 
segments retransmited
     7 packets pruned from receive queue because of socket buffer overrun
     2 fast retransmits
raj@tardy-ubuntu-1204:~$ uname -a
Linux tardy-ubuntu-1204 3.6.0-rc3+ #7 SMP Mon Sep 10 14:46:05 PDT 2012 
x86_64 x86_64 x86_64 GNU/Linux

rick

^ permalink raw reply

* Re: Oops with latest (netfilter) nf-next tree, when unloading iptable_nat
From: Patrick McHardy @ 2012-09-20 17:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, Florian Westphal, netfilter-devel, netdev,
	yongjun_wei
In-Reply-To: <Pine.GSO.4.63.1209201212480.1775@stinky-local.trash.net>

On Thu, 20 Sep 2012, Patrick McHardy wrote:

>>> diff --git a/net/netfilter/nf_conntrack_core.c 
>>> b/net/netfilter/nf_conntrack_core.c
>>> index dcb2791..0f241be 100644
>>> --- a/net/netfilter/nf_conntrack_core.c
>>> +++ b/net/netfilter/nf_conntrack_core.c
>>> @@ -1224,6 +1224,8 @@ get_next_corpse(struct net *net, int (*iter)(struct 
>>> nf_conn *i, void *data),
>>>  	spin_lock_bh(&nf_conntrack_lock);
>>>  	for (; *bucket < net->ct.htable_size; (*bucket)++) {
>>>  		hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], 
>>> hnnode) {
>>> +			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>> +				continue;
>> 
>> I think this will make the deletion of entries via `conntrack -F'
>> slowier as we'll have to iterate over more entries (we won't delete
>> entries for the reply tuple).
>
> Slightly maybe, but I doubt it makes much of a difference.
>
>> I think I prefer Florian's patch, it's fairly small and it does not
>> change the current nf_ct_iterate behaviour or adding some
>> nf_nat_iterate cleanup.
>
> I don't think I've received it. Could you forward it to me please?

Florian forwarded the patch to me. While it fixes the problem, it
is a workaround and it certainly is inelegant to do the
list_del_rcu_init() and memset up to *four* times for a single conntrack.

The correct thing IMO is to invoke the callbacks exactly once per
conntrack, either through my nf_ct_iterate_cleanup() change or through
a new iteration function for callers that don't kill conntracks. As
soon as we start generating events for NAT section cleanup this will be
needed in any case.

Unless I'm missing something, conntrack flushing is also a really rare 
operation anyways and for large tables where this might make a small
difference will take a quite large time anyway.

^ permalink raw reply

* [PATCH net-next] gianfar: Change default HW Tx queue scheduling mode
From: Claudiu Manoil @ 2012-09-20 15:57 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Paul Gortmaker, Claudiu Manoil

This is primarily to address transmission timeout occurrences, when
multiple H/W Tx queues are being used concurrently. Because in
the priority scheduling mode the controller does not service the
Tx queues equally (but in ascending index order), Tx timeouts are
being triggered rightaway for a basic test with multiple simultaneous
connections like:
iperf -c <server_ip> -n 100M -P 8

resulting in kernel trace:
NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue <X> timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255
...
and controller reset during intense traffic, and possibly further
complications.

This patch changes the default H/W Tx scheduling setting (TXSCHED)
for multi-queue devices, from priority scheduling mode to a weighted
round robin mode with equal weights for all H/W Tx queues, and
addresses the issue above.

The TXSCHED setting may be changed at runtime, via sysfs, for devices
using multiple H/W Tx queues. For single queue devices this config
option is disabled, as the TXSCHED setting is superfluous in those cases.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
---
 drivers/net/ethernet/freescale/gianfar.c       |   11 +++-
 drivers/net/ethernet/freescale/gianfar.h       |   11 +++-
 drivers/net/ethernet/freescale/gianfar_sysfs.c |   71 ++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 4d5b58c..a1b52ec 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -394,7 +394,13 @@ static void gfar_init_mac(struct net_device *ndev)
 	if (ndev->features & NETIF_F_IP_CSUM)
 		tctrl |= TCTRL_INIT_CSUM;
 
-	tctrl |= TCTRL_TXSCHED_PRIO;
+	if (priv->prio_sched_en)
+		tctrl |= TCTRL_TXSCHED_PRIO;
+	else {
+		tctrl |= TCTRL_TXSCHED_WRRS;
+		gfar_write(&regs->tr03wt, DEFAULT_WRRS_WEIGHT);
+		gfar_write(&regs->tr47wt, DEFAULT_WRRS_WEIGHT);
+	}
 
 	gfar_write(&regs->tctrl, tctrl);
 
@@ -1160,6 +1166,9 @@ static int gfar_probe(struct platform_device *ofdev)
 	priv->rx_filer_enable = 1;
 	/* Enable most messages by default */
 	priv->msg_enable = (NETIF_MSG_IFUP << 1 ) - 1;
+	/* use pritority h/w tx queue scheduling for single queue devices */
+	if (priv->num_tx_queues == 1)
+		priv->prio_sched_en = 1;
 
 	/* Carrier starts down, phylib will bring it up */
 	netif_carrier_off(dev);
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 2136c7f..4141ef2 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -301,8 +301,16 @@ extern const char gfar_driver_version[];
 #define TCTRL_TFCPAUSE		0x00000008
 #define TCTRL_TXSCHED_MASK	0x00000006
 #define TCTRL_TXSCHED_INIT	0x00000000
+/* priority scheduling */
 #define TCTRL_TXSCHED_PRIO	0x00000002
+/* weighted round-robin scheduling (WRRS) */
 #define TCTRL_TXSCHED_WRRS	0x00000004
+/* default WRRS weight and policy setting,
+ * tailored to the tr03wt and tr47wt registers:
+ * equal weight for all Tx Qs, measured in 64byte units
+ */
+#define DEFAULT_WRRS_WEIGHT	0x18181818
+
 #define TCTRL_INIT_CSUM		(TCTRL_TUCSEN | TCTRL_IPCSEN)
 
 #define IEVENT_INIT_CLEAR	0xffffffff
@@ -1098,7 +1106,8 @@ struct gfar_private {
 		extended_hash:1,
 		bd_stash_en:1,
 		rx_filer_enable:1,
-		wol_en:1; /* Wake-on-LAN enabled */
+		wol_en:1, /* Wake-on-LAN enabled */
+		prio_sched_en:1; /* Enable priorty based Tx scheduling in Hw */
 	unsigned short padding;
 
 	/* PHY stuff */
diff --git a/drivers/net/ethernet/freescale/gianfar_sysfs.c b/drivers/net/ethernet/freescale/gianfar_sysfs.c
index cd14a4d..b942bfc 100644
--- a/drivers/net/ethernet/freescale/gianfar_sysfs.c
+++ b/drivers/net/ethernet/freescale/gianfar_sysfs.c
@@ -319,6 +319,76 @@ static ssize_t gfar_set_fifo_starve_off(struct device *dev,
 static DEVICE_ATTR(fifo_starve_off, 0644, gfar_show_fifo_starve_off,
 		   gfar_set_fifo_starve_off);
 
+static ssize_t gfar_show_tx_prio_sched(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
+
+	return sprintf(buf, "%s\n", priv->prio_sched_en ? "on" : "off");
+}
+
+static ssize_t gfar_set_tx_prio_sched(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct gfar_private *priv = netdev_priv(to_net_dev(dev));
+	bool new_setting;
+
+	/* no changes if device doesn't have multiple Tx queues */
+	if (priv->num_tx_queues <= 1)
+		return count;
+
+	/* find out the new setting */
+	if (!strncmp("on", buf, count - 1) || !strncmp("1", buf, count - 1))
+		new_setting = 1;
+	else if (!strncmp("off", buf, count - 1) ||
+		 !strncmp("0", buf, count - 1))
+		new_setting = 0;
+	else
+		return count;
+
+	if (priv->prio_sched_en == new_setting)
+		return count;
+
+	/* Only stop and start the controller if it isn't already
+	 * stopped and we're about to update TCTRL with the new Tx
+	 * schedulling policy
+	 */
+	if (ndev->flags & IFF_UP) {
+		unsigned long flags;
+
+		/* Halt TX and RX, and process the frames which
+		 * have already been received
+		 */
+		netif_tx_stop_all_queues(ndev);
+		local_irq_save(flags);
+		lock_tx_qs(priv);
+		lock_rx_qs(priv);
+
+		gfar_halt(ndev);
+
+		unlock_tx_qs(priv);
+		unlock_rx_qs(priv);
+		local_irq_restore(flags);
+
+		/* take down the rings to rebuild them */
+		stop_gfar(ndev);
+
+		priv->prio_sched_en = new_setting;
+
+		startup_gfar(ndev);
+		netif_tx_wake_all_queues(ndev);
+	} else
+		priv->prio_sched_en = new_setting;
+
+	return count;
+}
+
+static DEVICE_ATTR(tx_prio_sched, 0644, gfar_show_tx_prio_sched,
+		   gfar_set_tx_prio_sched);
+
 void gfar_init_sysfs(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
@@ -336,6 +406,7 @@ void gfar_init_sysfs(struct net_device *dev)
 	rc |= device_create_file(&dev->dev, &dev_attr_fifo_threshold);
 	rc |= device_create_file(&dev->dev, &dev_attr_fifo_starve);
 	rc |= device_create_file(&dev->dev, &dev_attr_fifo_starve_off);
+	rc |= device_create_file(&dev->dev, &dev_attr_tx_prio_sched);
 	if (rc)
 		dev_err(&dev->dev, "Error creating gianfar sysfs files.\n");
 }
-- 
1.7.6.5

^ permalink raw reply related

* bnx2x: link detected up at startup even when it should be down
From: Jean-Michel Hautbois @ 2012-09-20 15:39 UTC (permalink / raw)
  To: netdev; +Cc: barak, eilong, davem

Hi all,

I am working with a HP blade which has a bnx2x based card (Broadcom
NetXtreme II BCM57810 10 Gigabit Ethernet).
I am using a 3.2 linux kernel, which works very well except on
detecting the link state at startup.
I have my ethernet interfaces linked with a bond, and I want to
configure it for HA (in miimon mode).
I am using a managed switch which helps me in disabling/enabling ports.

When the port is disabled, at startup, the link is detected "UP".
When I enable the port, it is still "UP", and when I disable it again,
then it is detected "DOWN".

I have tried the latest 3.6-rc6 kernel, and it works well (link is
"DOWN" at startup when port is disabled).
Then I bisected it, and I found out that the commit which makes it
working (yes, it is an inverse bisect, thanks to this powerful git
tool :)) is :

a334872224a67b614dc888460377862621f3dac7 is the first bad commit
commit a334872224a67b614dc888460377862621f3dac7
Author: Barak Witkowski <barak@broadcom.com>
Date:   Mon Apr 23 03:04:46 2012 +0000

    bnx2x: add afex support

    Following patch adds afex multifunction support to the driver (afex
    multifunction is based on vntag header) and updates FW version
used to 7.2.51.

    Support includes the following:

    1. Configure vif parameters in firmware (default vlan, vif id, default
       priority, allowed priorities) according to values received from NIC.
    2. Configure FW to strip/add default vlan according to afex vlan mode.
    3. Notify link up to OS only after vif is fully initialized.
    4. Support vif list set/get requests and configure FW accordingly.
    5. Supply afex statistics upon request from NIC.
    6. Special handling to L2 interface in case of FCoE vif.

    Signed-off-by: Barak Witkowski <barak@broadcom.com>
    Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This commit is present in the 3.5.y stable branch, but not the 3.2.y one.
Is there a workaround which would make this feature work correctly
even on older kernels ?
It does not seem to be trivial, but I may miss something as this
driver is pretty big...

Cheers,
JM

^ permalink raw reply

* RE: New commands to configure IOV features
From: Rose, Gregory V @ 2012-09-20 15:39 UTC (permalink / raw)
  To: Ben Hutchings, Yinghai Lu
  Cc: Bjorn Helgaas, Yuval Mintz, davem@davemloft.net,
	netdev@vger.kernel.org, Ariel Elior, Eilon Greenstein, linux-pci
In-Reply-To: <1348104190.4836.61.camel@deadeye.wl.decadent.org.uk>

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Wednesday, September 19, 2012 6:23 PM
> To: Yinghai Lu
> Cc: Bjorn Helgaas; Rose, Gregory V; Yuval Mintz; davem@davemloft.net;
> netdev@vger.kernel.org; Ariel Elior; Eilon Greenstein; linux-pci
> Subject: Re: New commands to configure IOV features
> 
> On Wed, 2012-09-19 at 17:19 -0700, Yinghai Lu wrote:
> > On Wed, Sep 19, 2012 at 3:46 PM, Ben Hutchings
> > <bhutchings@solarflare.com> wrote:
> > > On Wed, 2012-09-19 at 15:17 -0700, Yinghai Lu wrote:
> > >> +max_vfs_store(struct device *dev, struct device_attribute *attr,
> > >> +                const char *buf, size_t count) {
> > >> +       unsigned long val;
> > >> +       struct pci_dev *pdev = to_pci_dev(dev);
> > >> +
> > >> +       if (strict_strtoul(buf, 0, &val) < 0)
> > >> +               return -EINVAL;
> > >> +
> > >> +       pdev->max_vfs = val;
> > >> +
> > >> +       return count;
> > >> +}
> > > [...]
> > >
> > > Then what would actually trigger creation of the VFs?  There's no
> > > way we can assume that some sysfs attribute will be written before
> > > the PF driver is loaded (what if it's built-in?).  I thought the
> > > idea was to add a driver callback that would be called when the
> > > sysfs attribute was written.
> >
> > could just stop the device and add it back again?
> 
> This is highly disruptive and I think it would be totally unacceptable for
> at least networking devices.

Agreed.

We need the driver callback.

- Greg


^ permalink raw reply

* [PATCH net] bnx2x: remove false warning regarding interrupt number
From: Ariel Elior @ 2012-09-20 15:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eilon Greenstein, Ariel Elior

Since version 7.4 the FW configures in the pci config space the max
number of interrupts available to the physical function, instead of
the exact number to use.
This causes a false warning in driver when comparing the number of
configured interrupts to the number about to be used.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 211753e..0875ecf 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9831,12 +9831,13 @@ static void __devinit bnx2x_get_igu_cam_info(struct bnx2x *bp)
 	}
 
 #ifdef CONFIG_PCI_MSI
-	/*
-	 * It's expected that number of CAM entries for this functions is equal
-	 * to the number evaluated based on the MSI-X table size. We want a
-	 * harsh warning if these values are different!
+	/* Due to new PF resource allocation by MFW T7.4 and above, it's
+	 * optional that number of CAM entries will not be equal to the value
+	 * advertised in PCI.
+	 * Driver should use the minimal value of both as the actual status
+	 * block count
 	 */
-	WARN_ON(bp->igu_sb_cnt != igu_sb_cnt);
+	bp->igu_sb_cnt = min_t(int, bp->igu_sb_cnt, igu_sb_cnt);
 #endif
 
 	if (igu_sb_cnt == 0)
-- 
1.7.9.GIT

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox