Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] dev: add support of flag IFF_NOPROC
From: David Miller @ 2013-10-03 19:09 UTC (permalink / raw)
  To: stephen; +Cc: nicolas.dichtel, netdev
In-Reply-To: <20131003104627.411f5cc4@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Thu, 3 Oct 2013 10:46:27 -0700

> What about speeding up proc or sysfs? Or providing a bulk create/destroy.

+1 +1 +1

This will benefit more people than the just the envisioned users for
this IFF_NOPROC thing.

I really don't want to take the IFF_NOPROC approach.

^ permalink raw reply

* Re: Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Rick Jones @ 2013-10-03 18:50 UTC (permalink / raw)
  To: Ben Greear, netdev,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <524DB6F6.6020405-my8/4N5VtI7c+919tysfdA@public.gmane.org>

On 10/03/2013 11:27 AM, Ben Greear wrote:
> I'm seeing something a bit strange and wondering if anyone had an
> opinion on why...
>
> I am testing up to 200 wifi station systems, each with a TCP connection
> running on them (download only, from VAP to stations).
>
> Without encryption (ie, open network), I see total throughput go from
> about 108Mbps down to 69Mbps as I add more stations (I add 25 at a time,
> so the 108Mbps is with 25 active, and 69Mbps is with 200 active).
>
> However, if I enable encryption, the throughput is actually higher
> (111Mbps to 71Mbps).  I'm doing encryption in software, so it adds a fair
> bit of CPU load in this test.  The numbers bounce around since this is
> wifi after all, but in general encryption tends to win reliably in this
> test.
>
> When testing with a single station (and 5 tcp streams with jacked up
> snd/rcv buffers) the open networks perform significantly better at total throughput:
> 263Mbps vs 246Mbps.
>
> Maybe the extra delay for decryption increases odds that GRO will take
> affect for the many, slower streams (and maybe that will decrease ACK
> traffic?)
>
> Any other ideas?

Fewer times two or more stations step on one another?  The recievers 
will only try to transmit when they receive data.  Modulo timing, if the 
individual downloads are a bit slower, less chance of the receivers 
looking to send ACKs back through at the same time?  Got any low-level 
stats for the health and well being of the wireless network?

rick jones

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [stable 3.0] add some CVE fixes
From: Greg KH @ 2013-10-03 18:44 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: stable, netdev
In-Reply-To: <524D36DC.5070506@suse.cz>

On Thu, Oct 03, 2013 at 11:20:28AM +0200, Jiri Slaby wrote:
> Hi,
> 
> could you consider adding fixes for the following CVEs to 3.0 (and
> possibly later)?
> CVE-2013-4163: 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be
> CVE-2013-2206: f2815633504b442ca0b0605c16bf3d88a3a0fcea

Network patches for stable releases need to be asked of the networking
maintainer, who then forwards them on to me.

> Plus the backports that are replied to this mail?

I don't see any backports, did you forget to send them?

thanks,

greg k-h

^ permalink raw reply

* Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Ben Greear @ 2013-10-03 18:27 UTC (permalink / raw)
  To: netdev, linux-wireless@vger.kernel.org

I'm seeing something a bit strange and wondering if anyone had an opinion on why...

I am testing up to 200 wifi station systems, each with a TCP connection running
on them (download only, from VAP to stations).

Without encryption (ie, open network), I see total throughput go from
about 108Mbps down to 69Mbps as I add more stations (I add 25 at a time,
so the 108Mbps is with 25 active, and 69Mbps is with 200 active).

However, if I enable encryption, the throughput is actually higher
(111Mbps to 71Mbps).  I'm doing encryption in software, so it adds a fair
bit of CPU load in this test.  The numbers bounce around since this is
wifi after all, but in general encryption tends to win reliably in this
test.

When testing with a single station (and 5 tcp streams with jacked up snd/rcv buffers)
the open networks perform significantly better at total throughput:  263Mbps vs 246Mbps.

Maybe the extra delay for decryption increases odds that GRO will take
affect for the many, slower streams (and maybe that will decrease ACK traffic?)

Any other ideas?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [net-next 2/3] udp: Add udp early demux
From: Eric Dumazet @ 2013-10-03 18:06 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev
In-Reply-To: <20131003173946.GA5684@sbohrermbp13-local.rgmadvisors.com>

On Thu, 2013-10-03 at 12:39 -0500, Shawn Bohrer wrote:
> On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> > I suggested that for unicast, you do a limited lookup to the first
> > socket found in bucket.
> > 
> > If its an exact match, you take the socket.
> > 
> > If not, you give up, and do not scan the whole chain.
> 
> So something like the following?
> 
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 02185a5..d202e5b 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1849,7 +1849,42 @@ begin:
>  	}
>  	rcu_read_unlock();
>  	return result;
> +}
>  
> +/* For unicast we should only early demux connected sockets or we can
> + * break forwarding setups.  The chains here can be long so only check
> + * if the first socket is an exact match and if not move on.
> + */
> +static struct sock *__udp4_lib_demux_lookup(struct net *net,
> +					    __be16 loc_port, __be32 loc_addr,
> +					    __be16 rmt_port, __be32 rmt_addr,
> +					    int dif)
> +{
> +	struct sock *sk, *result;
> +	struct hlist_nulls_node *node;
> +	unsigned short hnum = ntohs(loc_port);
> +	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
> +	struct udp_hslot *hslot = &udp_table.hash[slot];
> +	const int exact_match = 18;
> +	int score;
> +
> +	rcu_read_lock();
> +	result = NULL;
> +	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
> +		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
> +				      loc_addr, loc_port, dif);
> +		if (score == exact_match)
> +			result = sk;
> +		/* Only check first socket in chain */
> +		break;
> +	}
> +
> +	if (result) {
> +		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
> +			result = NULL;
> +	}
> +	rcu_read_unlock();
> +	return result;
>  }
>  

Just do the tuple comparison instead of compute_score(),
since you know we want full L4 match.

The standard way is to use the INET_MATCH() macro

^ permalink raw reply

* Re: [PATCH net-next] dev: add support of flag IFF_NOPROC
From: Stephen Hemminger @ 2013-10-03 17:46 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem
In-Reply-To: <1380806905-4461-1-git-send-email-nicolas.dichtel@6wind.com>

On Thu,  3 Oct 2013 15:28:25 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> This flag allows to create netdevices without creating directories in
> /proc, ie no /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and no
> /proc/net/dev_snmp6/<dev>.
> 
> When a system creates a lot of virtual netdevices, this allows to speed up the
> creation time. For systems which continuously create and destroy virtual
> netdevices, proc entries for these netdevices may not be used, hence adding this
> flag is interesting.
> 
> Note that the flag should be specified at the creation time (before calling
> register_netdevice()) and cannot be removed during the life of the netdevice.
> 
> Here are some numbers:
> 
> dummy20000.batch contains 20 000 times 'link add type dummy' and
> dummy20000-noproc.batch 20 000 times 'link add noproc type dummy'.
> 
> time ip -b dummy20000.batch
> real    0m56.367s
> user    0m0.200s
> sys     0m53.070s
> 
> time ip -b dummy20000-noproc.batch
> real    0m42.417s
> user    0m0.310s
> sys     0m38.470s
> 
> Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Seems like a special case. The problem is that you just created devices
that are unmanageable and might well break other tools in the system.
What about speeding up proc or sysfs? Or providing a bulk create/destroy.
Also if you used a custom program it could have seperate netlink send
and receive threads to pipeline the creation.

^ permalink raw reply

* Re: [PATCH RFC 59/77] qla2xxx: Update MSI/MSI-X interrupts enablement code
From: Saurav Kashyap @ 2013-10-03 17:42 UTC (permalink / raw)
  To: Alexander Gordeev, linux-kernel
  Cc: Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips@linux-mips.org,
	linuxppc-dev@lists.ozlabs.org, linux390@de.ibm.com,
	linux-s390@vger.kernel.org, x86@kernel.org,
	linux-ide@vger.kernel.org
In-Reply-To: <54f6b89372f51cd27a6adf6ecc91b8bf6bb5ba74.1380703263.git.agordeev@redhat.com>

Acked-by: Saurav Kashyap <saurav.kashyap@qlogic.com>


>As result of recent re-design of the MSI/MSI-X interrupts enabling
>pattern this driver has to be updated to use the new technique to
>obtain a optimal number of MSI/MSI-X interrupts required.
>
>Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
>---
> drivers/scsi/qla2xxx/qla_isr.c |   18 +++++++++++-------
> 1 files changed, 11 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/scsi/qla2xxx/qla_isr.c
>b/drivers/scsi/qla2xxx/qla_isr.c
>index df1b30b..6c11ab9 100644
>--- a/drivers/scsi/qla2xxx/qla_isr.c
>+++ b/drivers/scsi/qla2xxx/qla_isr.c
>@@ -2836,16 +2836,20 @@ qla24xx_enable_msix(struct qla_hw_data *ha,
>struct rsp_que *rsp)
> 	for (i = 0; i < ha->msix_count; i++)
> 		entries[i].entry = i;
> 
>-	ret = pci_enable_msix(ha->pdev, entries, ha->msix_count);
>-	if (ret) {
>+	ret = pci_msix_table_size(ha->pdev);
>+	if (ret < 0) {
>+		goto msix_failed;
>+	} else {
> 		if (ret < MIN_MSIX_COUNT)
> 			goto msix_failed;
> 
>-		ql_log(ql_log_warn, vha, 0x00c6,
>-		    "MSI-X: Failed to enable support "
>-		    "-- %d/%d\n Retry with %d vectors.\n",
>-		    ha->msix_count, ret, ret);
>-		ha->msix_count = ret;
>+		if (ret < ha->msix_count) {
>+			ql_log(ql_log_warn, vha, 0x00c6,
>+			    "MSI-X: Failed to enable support "
>+			    "-- %d/%d\n Retry with %d vectors.\n",
>+			    ha->msix_count, ret, ret);
>+			ha->msix_count = ret;
>+		}
> 		ret = pci_enable_msix(ha->pdev, entries, ha->msix_count);
> 		if (ret) {
> msix_failed:
>-- 
>1.7.7.6
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Fw: [Bug 62491] New: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
From: Stephen Hemminger @ 2013-10-03 17:40 UTC (permalink / raw)
  To: Jay Cliburn, Chris Snook, Johannes Berg; +Cc: netdev



Begin forwarded message:

Date: Thu, 3 Oct 2013 10:21:34 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 62491] New: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff


https://bugzilla.kernel.org/show_bug.cgi?id=62491

            Bug ID: 62491
           Summary: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
           Product: Networking
           Version: 2.5
    Kernel Version: 3.12-rc3
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: vaniaz@msn.com
        Regression: No

Hardware: lenovo g780 using alx network driver; software - opensuse tambleweed
with kernel 3.12-rc3.
After entering sleep mode and waking up console at alt + f is flooding with
messages like:
[672.000000] alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [net-next 2/3] udp: Add udp early demux
From: Shawn Bohrer @ 2013-10-03 17:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev
In-Reply-To: <1380749932.19002.127.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> I suggested that for unicast, you do a limited lookup to the first
> socket found in bucket.
> 
> If its an exact match, you take the socket.
> 
> If not, you give up, and do not scan the whole chain.

So something like the following?


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 02185a5..d202e5b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1849,7 +1849,42 @@ begin:
 	}
 	rcu_read_unlock();
 	return result;
+}
 
+/* For unicast we should only early demux connected sockets or we can
+ * break forwarding setups.  The chains here can be long so only check
+ * if the first socket is an exact match and if not move on.
+ */
+static struct sock *__udp4_lib_demux_lookup(struct net *net,
+					    __be16 loc_port, __be32 loc_addr,
+					    __be16 rmt_port, __be32 rmt_addr,
+					    int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
+	struct udp_hslot *hslot = &udp_table.hash[slot];
+	const int exact_match = 18;
+	int score;
+
+	rcu_read_lock();
+	result = NULL;
+	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
+				      loc_addr, loc_port, dif);
+		if (score == exact_match)
+			result = sk;
+		/* Only check first socket in chain */
+		break;
+	}
+
+	if (result) {
+		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+	}
+	rcu_read_unlock();
+	return result;
 }
 
 void udp_v4_early_demux(struct sk_buff *skb)
@@ -1870,8 +1905,8 @@ void udp_v4_early_demux(struct sk_buff *skb)
 		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
 						   uh->source, iph->saddr, dif);
 	else if (skb->pkt_type == PACKET_HOST)
-		sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
-				       iph->daddr, uh->dest, dif, &udp_table);
+		sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
+					     uh->source, iph->saddr, dif);
 	else
 		return;
 

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related

* Re: [PATCH net 1/2] sit: allow to use rtnl ops on fb tunnel
From: David Miller @ 2013-10-03 17:37 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, steffen.klassert, pshelar
In-Reply-To: <524D2580.9040702@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 03 Oct 2013 10:06:24 +0200

> In fact, I just notice that 3.9 branch is EoL (bug is only in 3.8 and
> 3.9).
> Should I still send a patch ? If yes, based on which tree/branch?

No stable backports are needed then.

^ permalink raw reply

* I NEED YOUR HELP.
From: FROM MRS GRACE MANDA @ 2013-10-03 16:36 UTC (permalink / raw)

In-Reply-To: <1380809157.62922.YahooMailNeo@web5706.biz.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 56 bytes --]



I PRAY THAT THIS MAIL GETS TO YOU IN BETTER HEALTH. 

[-- Attachment #2: From Grace Manda.pdf --]
[-- Type: application/pdf, Size: 41077 bytes --]

^ permalink raw reply

* Re: [PATCH RFC 51/77] mthca: Update MSI/MSI-X interrupts enablement code
From: Jack Morgenstein @ 2013-10-03 16:11 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <9d424912ef78993dc75e2af5006cd12913e9e7e7.1380703263.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:49:07 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

> Subject: [PATCH RFC 51/77] mthca: Update MSI/MSI-X interrupts
> enablement code Date: Wed,  2 Oct 2013 12:49:07 +0200
> Sender: linux-rdma-owner@vger.kernel.org
> X-Mailer: git-send-email 1.7.7.6
> 
> As result of recent re-design of the MSI/MSI-X interrupts enabling
> pattern this driver has to be updated to use the new technique to
> obtain a optimal number of MSI/MSI-X interrupts required.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---

ACK.

-Jack

^ permalink raw reply

* tx checksum offload in rtl8168evl disabled in driver
From: jason.morgan @ 2013-10-03 14:27 UTC (permalink / raw)
  To: netdev

Hi,

I'm try to get close to saturating a 1G ethernet.

I'm at 517Mbps and I've found that there seems to be a cpu bottleneck.

I'm using 2k to 4k frames with a rtl8168evl.

I notice from ethtool that tx-checksum is turned off and refuse to turn 
on.

I've found this message
http://www.spinics.net/lists/netdev/msg216530.html

Which indicates the cause being the driver.

I've looked at the driver code rtl8169.c in kernel 3.8 and the line 

        [RTL_GIGA_MAC_VER_34] =
                _R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3,
                                                        JUMBO_9K, false),

indicates the reason for this.

However the message thread, above indicates that this is not a problem and 

can be changed to make tx-checksum offload possible.

However we are using a newer chip to the on in the message thread.  I've 
tried to find other, more recent citations without success.

So, why is it still turned off?

What will be the effect of turning it on (changing false to true, in the 
driver line) for our chip?

Thanks in advance,
Jason

^ permalink raw reply

* [PATCHv2] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-10-03 13:49 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Oussama Ghorbel

The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
---
 net/ipv6/ip6_tunnel.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 46ba243..4b51b03 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1429,9 +1429,17 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 static int
 ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
-	if (new_mtu < IPV6_MIN_MTU) {
-		return -EINVAL;
+	struct ip6_tnl *tnl = netdev_priv(dev);
+
+	if (tnl->parms.proto == IPPROTO_IPIP) {
+		if (new_mtu < 68)
+			return -EINVAL;
+	} else {
+		if (new_mtu < IPV6_MIN_MTU)
+			return -EINVAL;
 	}
+	if (new_mtu > 0xFFF8 - dev->hard_header_len)
+		return -EINVAL;
 	dev->mtu = new_mtu;
 	return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH iproute2 net-next-3.11] ip: add support of link flag IFF_NOPROC
From: Nicolas Dichtel @ 2013-10-03 13:30 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, davem, Nicolas Dichtel
In-Reply-To: <1380806905-4461-1-git-send-email-nicolas.dichtel@6wind.com>

When this flag is specified, /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and
/proc/net/dev_snmp6/<dev> directories are not created.

This flag cannot be removed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/if.h    | 2 ++
 ip/ipaddress.c        | 1 +
 ip/iplink.c           | 3 +++
 man/man8/ip-link.8.in | 8 ++++++++
 4 files changed, 14 insertions(+)

diff --git a/include/linux/if.h b/include/linux/if.h
index 7f261c08e816..5b8a5ebff599 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -53,6 +53,8 @@
 
 #define IFF_ECHO	0x40000		/* echo sent packets		*/
 
+#define IFF_NOPROC	0x80000		/* no proc/sysctl directories	*/
+
 #define IFF_VOLATILE	(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_ECHO|\
 		IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 1c3e4da0d0da..b2e35028c844 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -116,6 +116,7 @@ static void print_link_flags(FILE *fp, unsigned flags, unsigned mdown)
 	_PF(LOWER_UP);
 	_PF(DORMANT);
 	_PF(ECHO);
+	_PF(NOPROC);
 #undef _PF
 	if (flags)
 		fprintf(fp, "%x", flags);
diff --git a/ip/iplink.c b/ip/iplink.c
index ada9d4255ba2..253ed1cc3f6f 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -50,6 +50,7 @@ void iplink_usage(void)
 		fprintf(stderr, "                   [ mtu MTU ]\n");
 		fprintf(stderr, "                   [ numtxqueues QUEUE_COUNT ]\n");
 		fprintf(stderr, "                   [ numrxqueues QUEUE_COUNT ]\n");
+		fprintf(stderr, "                   [ noproc ]\n");
 		fprintf(stderr, "                   type TYPE [ ARGS ]\n");
 		fprintf(stderr, "       ip link delete DEV type TYPE [ ARGS ]\n");
 		fprintf(stderr, "\n");
@@ -480,6 +481,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 				invarg("Invalid \"numrxqueues\" value\n", *argv);
 			addattr_l(&req->n, sizeof(*req), IFLA_NUM_RX_QUEUES,
 				  &numrxqueues, 4);
+		} else if (matches(*argv, "noproc") == 0) {
+			req->i.ifi_flags |= IFF_NOPROC;
 		} else {
 			if (strcmp(*argv, "dev") == 0) {
 				NEXT_ARG();
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 76f92ddbd82c..b16d1a1f8a41 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -45,6 +45,8 @@ ip-link \- network device configuration
 .RB "[ " numrxqueues
 .IR QUEUE_COUNT " ]"
 .br
+.RB "[ " noproc " ]"
+.br
 .BR type " TYPE"
 .RI "[ " ARGS " ]"
 
@@ -197,6 +199,12 @@ specifies the number of transmit queues for new device.
 specifies the number of receive queues for new device.
 
 .TP
+.BI noproc
+specifies to no create iface related directories under /proc
+(/proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and
+/proc/net/dev_snmp6/<dev>)
+
+.TP
 VXLAN Type Support
 For a link of type 
 .I VXLAN
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next] dev: add support of flag IFF_NOPROC
From: Nicolas Dichtel @ 2013-10-03 13:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, Nicolas Dichtel

This flag allows to create netdevices without creating directories in
/proc, ie no /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and no
/proc/net/dev_snmp6/<dev>.

When a system creates a lot of virtual netdevices, this allows to speed up the
creation time. For systems which continuously create and destroy virtual
netdevices, proc entries for these netdevices may not be used, hence adding this
flag is interesting.

Note that the flag should be specified at the creation time (before calling
register_netdevice()) and cannot be removed during the life of the netdevice.

Here are some numbers:

dummy20000.batch contains 20 000 times 'link add type dummy' and
dummy20000-noproc.batch 20 000 times 'link add noproc type dummy'.

time ip -b dummy20000.batch
real    0m56.367s
user    0m0.200s
sys     0m53.070s

time ip -b dummy20000-noproc.batch
real    0m42.417s
user    0m0.310s
sys     0m38.470s

Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/if.h | 2 ++
 net/core/dev.c          | 2 +-
 net/core/rtnetlink.c    | 1 +
 net/ipv4/devinet.c      | 3 +++
 net/ipv6/addrconf.c     | 3 +++
 net/ipv6/proc.c         | 5 +++++
 6 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 1ec407b01e46..bb9fe5eb38bf 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -53,6 +53,8 @@
 
 #define IFF_ECHO	0x40000		/* echo sent packets		*/
 
+#define IFF_NOPROC	0x80000		/* no proc/sysctl directories	*/
+
 #define IFF_VOLATILE	(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_ECHO|\
 		IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index c25db20a4246..13f6dd360c74 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5199,7 +5199,7 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
 			       IFF_DYNAMIC | IFF_MULTICAST | IFF_PORTSEL |
 			       IFF_AUTOMEDIA)) |
 		     (dev->flags & (IFF_UP | IFF_VOLATILE | IFF_PROMISC |
-				    IFF_ALLMULTI));
+				    IFF_ALLMULTI | IFF_NOPROC));
 
 	/*
 	 *	Load in the correct multicast list now the flags have changed.
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4aedf03da052..5bad28e66fa2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1860,6 +1860,7 @@ replay:
 		}
 
 		dev->ifindex = ifm->ifi_index;
+		dev->flags |= ifm->ifi_flags & IFF_NOPROC;
 
 		if (ops->newlink)
 			err = ops->newlink(net, dev, tb, data);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcbd04ae..13b4089d8996 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2160,6 +2160,9 @@ static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf)
 
 static void devinet_sysctl_register(struct in_device *idev)
 {
+	if (idev->dev->flags & IFF_NOPROC)
+		return;
+
 	neigh_sysctl_register(idev->dev, idev->arp_parms, "ipv4", NULL);
 	__devinet_sysctl_register(dev_net(idev->dev), idev->dev->name,
 					&idev->cnf);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index cd3fb301da38..e06d15ea2dba 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5032,6 +5032,9 @@ static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
 
 static void addrconf_sysctl_register(struct inet6_dev *idev)
 {
+	if (idev->dev->flags & IFF_NOPROC)
+		return;
+
 	neigh_sysctl_register(idev->dev, idev->nd_parms, "ipv6",
 			      &ndisc_ifinfo_sysctl_change);
 	__addrconf_sysctl_register(dev_net(idev->dev), idev->dev->name,
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 091d066a57b3..f89911116aa7 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -274,6 +274,9 @@ int snmp6_register_dev(struct inet6_dev *idev)
 	if (!idev || !idev->dev)
 		return -EINVAL;
 
+	if (idev->dev->flags & IFF_NOPROC)
+		return 0;
+
 	net = dev_net(idev->dev);
 	if (!net->mib.proc_net_devsnmp6)
 		return -ENOENT;
@@ -291,6 +294,8 @@ int snmp6_register_dev(struct inet6_dev *idev)
 int snmp6_unregister_dev(struct inet6_dev *idev)
 {
 	struct net *net = dev_net(idev->dev);
+	if (idev->dev->flags & IFF_NOPROC)
+		return 0;
 	if (!net->mib.proc_net_devsnmp6)
 		return -ENOENT;
 	if (!idev->stats.proc_dir_entry)
-- 
1.8.2.1

^ permalink raw reply related

* Re: [PATCH net-next] tcp: rcvbuf autotuning improvements
From: Eric Dumazet @ 2013-10-03 13:13 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Francesco Fusco
In-Reply-To: <1380787003-20488-1-git-send-email-dborkman@redhat.com>

On Thu, 2013-10-03 at 09:56 +0200, Daniel Borkmann wrote:

> We also renamed tcp_fixup_rcvbuf() to tcp_rcvbuf_expand() to be
> consistent with tcp_sndbuf_expand().

BTW we renamed the function only because it was used both for initial
sizing, and from tcp_new_space()

As is, tcp_fixup_rcvbuf() is only called at connection setup.

^ permalink raw reply

* Re: [PATCH net-next] tcp: rcvbuf autotuning improvements
From: Eric Dumazet @ 2013-10-03 13:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, Francesco Fusco, Michael Dalton, ycheng, ncardwell
In-Reply-To: <1380787003-20488-1-git-send-email-dborkman@redhat.com>

On Thu, 2013-10-03 at 09:56 +0200, Daniel Borkmann wrote:
> This is a complementary patch for commit 6ae705323 ("tcp: sndbuf
> autotuning improvements") that fixes a performance regression on
> receiver side in setups with low to mid latency, high throughput,
> and senders with TSO/GSO off (receivers w/ default settings).
> 
> The following measurements in Mbit/s were done for 60sec w/ netperf
> on virtio w/ TSO/GSO off:
> 
> (ms)    1)              2)              3)
>   0     2762.11         1150.32         2906.17
>  10     1083.61          538.89         1091.03
>  25      471.81          313.18          474.60
>  50      242.33          187.84          242.36
>  75      162.14          134.45          161.95
> 100      121.55          101.96          121.49
> 150       80.64           57.75           80.48
> 200       58.97           54.11           59.90
> 250       47.10           46.92           47.31
> 
> Same setup w/ TSO/GSO on:
> 
> (ms)    1)              2)              3)
>   0     12225.91        12366.89        16514.37
>  10      1526.64         1525.79         2176.63
>  25       655.13          647.79          871.52
>  50       338.51          377.88          439.46
>  75       246.49          278.46          295.62
> 100       210.93          207.56          217.34
> 150       127.88          129.56          141.33
> 200        94.95           94.50          107.29
> 250        67.39           73.88           88.35
> 
> Similarly as in 6ae705323, we fixed up power-of-two rounding and
> took cached mss into account, thus bringing per_mss calculations
> closer to each other, the rest stays as is.
> 
> We also renamed tcp_fixup_rcvbuf() to tcp_rcvbuf_expand() to be
> consistent with tcp_sndbuf_expand().
> 
> While we do think that 6ae705323b71 is the right way to go, also
> this follow-up seems necessary to restore performance for
> receivers.

Hmm, I think you based this patch on some virtio requirements.

I would rather fix virtio, because virtio has poor truesize/payload
ratio.

Michael Dalton is working on this right now.

Really I don't understand how 'fixing' initial rcvbuf could explain such
difference in a 60 second transfert.

Normally, if autotuning was working, the first sk_rcvbuf value would
only matter in the very beginning of a flow (maybe one, two or even
three RTT)

It looks like you only need to set sk_rcvbuf to tcp_rmem[2],
so you probably have to fix the autotuning, or virtio to give normal
skbs, not fat ones ;)


Thanks

^ permalink raw reply

* Re: [PATCH] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-10-03 12:37 UTC (permalink / raw)
  To: Oussama Ghorbel, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel,
	Oussama Ghorbel
In-Reply-To: <CABfLueHOR1HNsRC_-+Phc=9LMPTiOVuWoEjguG64L=9hiZLeVg@mail.gmail.com>

I will send a new patch, the diff will be:

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 46ba243..4b51b03 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1429,9 +1429,17 @@ ip6_tnl_ioctl(struct net_device *dev, struct
ifreq *ifr, int cmd)
 static int
 ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
-       if (new_mtu < IPV6_MIN_MTU) {
-               return -EINVAL;
+       struct ip6_tnl *tnl = netdev_priv(dev);
+
+       if (tnl->parms.proto == IPPROTO_IPIP) {
+               if (new_mtu < 68)
+                       return -EINVAL;
+       } else {
+               if (new_mtu < IPV6_MIN_MTU)
+                       return -EINVAL;
        }
+       if (new_mtu > 0xFFF8 - dev->hard_header_len)
+               return -EINVAL;
        dev->mtu = new_mtu;
        return 0;
 }


On Sun, Sep 29, 2013 at 5:33 PM, Oussama Ghorbel <ou.ghorbel@gmail.com> wrote:
> On Sun, Sep 29, 2013 at 4:45 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Sun, Sep 29, 2013 at 10:40:11AM +0100, Oussama Ghorbel wrote:
>>> On Fri, Sep 27, 2013 at 6:03 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>> > Ok, let's go with one function per protocol type. Seems easier.
>>> >
>>> > It seems to get more hairy, because it depends on the tunnel driver if the
>>> > prepended ip header is accounted in hard_header_len. :/
>>> >
>>> > I don't know if it works out cleanly. Otherwise I would be ok if the checks
>>> > just get repeated in ip6_tunnel and leave the rest as-is.
>>> >
>>> Yes, It will be the clean way to do it.
>>
>> Fine. :)
>>
>>> >
>>> > Linux currently cannot create "jumbograms" (only the receiving side
>>> > is supported).
>>> >
>>> I understand, but what are the benefit from this limit or the harm
>>> from not specifying it?
>>> Please check this comment from eth.c
>>>
>>> /**
>>>  * eth_change_mtu - set new MTU size
>>>  * @dev: network device
>>>  * @new_mtu: new Maximum Transfer Unit
>>>  *
>>>  * Allow changing MTU size. Needs to be overridden for devices
>>>  * supporting jumbo frames.
>>>  */
>>> int eth_change_mtu(struct net_device *dev, int new_mtu)
>>
>> Hmm, I cannot judge without the full patch. Will it be applicable
>> to all net_devices or just ethernet ones? The name could be a bit
>> misleading. Remindes me a lot of dev_set_mtu based on the signature, btw.
>
> Normally to all net_devices, otherwise it could get complicated to
> check for every dev separately ...
> But, never mind, the comment below solve the issue
>
>>
>>> So wouldn't be a good idea to let our function open for jumbo frames...?
>>
>> Hm, we can document the fact that the function would needed to be updated in
>> that case. But we should not allow to set a mtu which would require jumbograms
>> currently.
>
> OK, sounds a good. I will check the mtu against the limit
> IPV6_MAXPLEN, and document the jumbo restriction ...
>
>>
>> Greetings,
>>
>>   Hannes
>>
>
> Regards,
> Oussama

^ permalink raw reply related

* Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-03 11:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, linux-kernel, Heiko Carstens
In-Reply-To: <1380776250.19002.147.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 2, 2013 at 9:57 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote:
>> On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote:
>>
>> > I think ifdef config_x86 is a bit ugly inside struct sk_filter, but
>> > don't mind whichever way.
>>
>> Its not fair to make sk_filter bigger, because it means that simple (non
>> JIT) filter might need an extra cache line.
>>
>> You could presumably use the following layout instead :
>>
>> struct sk_filter
>> {
>>         atomic_t                refcnt;
>>         struct rcu_head         rcu;
>>       struct work_struct      work;
>>
>>         unsigned int            len ____cacheline_aligned;    /* Number of filter blocks */
>>         unsigned int            (*bpf_func)(const struct sk_buff *skb,
>>                                             const struct sock_filter *filter);
>>         struct sock_filter      insns[0];
>> };
>
> And since @len is not used by sk_run_filter() use :
>
> struct sk_filter {
>         atomic_t                refcnt;
>         int                     len; /* number of filter blocks */
>         struct rcu_head         rcu;
>         struct work_struct      work;
>
>         unsigned int            (*bpf_func)(const struct sk_buff *skb,
>                                             const struct sock_filter *filter) ____cacheline_aligned;
>         struct sock_filter      insns[0];
> };

yes. make sense to avoid first insn cache miss inside sk_run_filter()
at the expense
of 8-byte gap between work and bpf_func (on x86_64 w/o lockdep)

Probably even better to overlap work and insns fields.
Pro: sk_filter size the same, no impact on non-jit case
Con: would be harder to understand the code

another problem is that kfree(sk_filter) inside
sk_filter_release_rcu() needs to move inside bpf_jit_free().
so self nack. Let me fix these issues and respin

Thanks
Alexei

^ permalink raw reply

* [PATCH net v2] be2net: Warn users of possible broken functionality on BE2 cards with very old FW versions with latest driver
From: Somnath Kotur @ 2013-10-03 10:04 UTC (permalink / raw)
  To: netdev; +Cc: davem, Somnath Kotur

On very old FW versions < 4.0, the mailbox command to set interrupts
on the card succeeds even though it is not supported and should have
failed, leading to a scenario where interrupts do not work.
Hence warn users to upgrade to a suitable FW version to avoid seeing
broken functionality.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
v2: Replaced all occurences of 'F/W' with 'FW' or 'Firmware' as suggested by
Sathya

 drivers/net/ethernet/emulex/benet/be_main.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 2c38cc4..9563ced 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3247,6 +3247,11 @@ static int be_setup(struct be_adapter *adapter)
 
 	be_cmd_get_fw_ver(adapter, adapter->fw_ver, adapter->fw_on_flash);
 
+	if (BE2_chip(adapter) && memcmp(adapter->fw_ver, "4.", 2) < 0) {
+		dev_err(dev, "Firmware version is too old.IRQs may not work\n");
+		dev_err(dev, "Pls upgrade firmware to version >= 4.0\n");
+	}
+
 	if (adapter->vlans_added)
 		be_vid_config(adapter);
 
-- 
1.6.0.2

^ permalink raw reply related

* Re: [PATCH RFC 46/77] mlx4: Update MSI/MSI-X interrupts enablement code
From: Jack Morgenstein @ 2013-10-03  8:39 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <b0a9f6f455aa03b7769e6d9cc2e7fdbc06732b2f.1380703263.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:49:02 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

> As result of recent re-design of the MSI/MSI-X interrupts enabling
> pattern this driver has to be updated to use the new technique to
> obtain a optimal number of MSI/MSI-X interrupts required.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

New review -- ACK (i.e., patch is OK), subject to acceptance of patches
05 and 07 of this patch set.

I sent my previous review (NACK) when I was not yet aware that
changes proposed were due to the two earlier patches (mentioned above)
in the current patch set.

The change log here should actually read something like the following:

As a result of changes to the MSI/MSI_X enabling procedures, this driver
must be modified in order to preserve its current msi/msi_x enablement
logic.

-Jack

^ permalink raw reply

* Re: [PATCH RFC 46/77] mlx4: Update MSI/MSI-X interrupts enablement code
From: Jack Morgenstein @ 2013-10-03  8:27 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <b0a9f6f455aa03b7769e6d9cc2e7fdbc06732b2f.1380703263.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:49:02 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

UPDATING THIS REPLY.
Your change log confused me. The change below is not from a "recent
re-design", it is required due to an earlier patch in this patch set.
>From the log, I assumed that the change you are talking about is already
upstream.

I will re-review.

-Jack

NACK.  This change does not do anything logically as far as I can tell.
pci_enable_msix in the current upstream kernel itself calls
pci_msix_table_size.  The current code yields the same resultswill
as the code suggested below. (i.e., the suggested code has no effect on
optimality).

BTW, pci_msix_table_size never returns a value < 0 (if msix is not
enabled, it returns 0 for the table size), so the (err < 0) check here
is not correct. (I also do not like using "err" here anyway for the
value returned by pci_msix_table_size().  There is no error here, and
it is simply confusing.

-Jack

> As result of recent re-design of the MSI/MSI-X interrupts enabling
> pattern this driver has to be updated to use the new technique to
> obtain a optimal number of MSI/MSI-X interrupts required.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/main.c |   17 ++++++++---------
>  1 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
> b/drivers/net/ethernet/mellanox/mlx4/main.c index 60c9f4f..377a5ea
> 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -1852,8 +1852,16 @@ static void mlx4_enable_msi_x(struct mlx4_dev
> *dev) int i;
>  
>  	if (msi_x) {
> +		err = pci_msix_table_size(dev->pdev);
> +		if (err < 0)
> +			goto no_msi;
> +
> +		/* Try if at least 2 vectors are available */
>  		nreq = min_t(int, dev->caps.num_eqs -
> dev->caps.reserved_eqs, nreq);
> +		nreq = min_t(int, nreq, err);
> +		if (nreq < 2)
> +			goto no_msi;
>  
>  		entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
>  		if (!entries)
> @@ -1862,17 +1870,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev
> *dev) for (i = 0; i < nreq; ++i)
>  			entries[i].entry = i;
>  
> -	retry:
>  		err = pci_enable_msix(dev->pdev, entries, nreq);
>  		if (err) {
> -			/* Try again if at least 2 vectors are
> available */
> -			if (err > 1) {
> -				mlx4_info(dev, "Requested %d
> vectors, "
> -					  "but only %d MSI-X vectors
> available, "
> -					  "trying again\n", nreq,
> err);
> -				nreq = err;
> -				goto retry;
> -			}
>  			kfree(entries);
>  			goto no_msi;
>  		}


^ permalink raw reply

* [ethtool] ethtool: ixgbe DCB registers dump for 82599 and x540
From: Jeff Kirsher @ 2013-10-03  8:11 UTC (permalink / raw)
  To: bhutchings
  Cc: Leonardo Potenza, netdev, gospo, sassmann, Maryam Tahhan,
	Jeff Kirsher

From: Leonardo Potenza <leonardo.potenza@intel.com>

Added support for DCB registers dump using ethtool -d option both for
82599 and x540 ethernet controllers

Signed-off-by: Leonardo Potenza <leonardo.potenza@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Jack Morgan <jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 ixgbe.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 125 insertions(+), 29 deletions(-)

diff --git a/ixgbe.c b/ixgbe.c
index 854e463..6a663e4 100644
--- a/ixgbe.c
+++ b/ixgbe.c
@@ -26,6 +26,8 @@
 #define IXGBE_HLREG0_LPBK          0x00008000
 #define IXGBE_RMCS_TFCE_802_3X     0x00000008
 #define IXGBE_RMCS_TFCE_PRIORITY   0x00000010
+#define IXGBE_FCCFG_TFCE_802_3X    0x00000008
+#define IXGBE_FCCFG_TFCE_PRIORITY  0x00000010
 #define IXGBE_MFLCN_PMCF           0x00000001 /* Pass MAC Control Frames */
 #define IXGBE_MFLCN_DPF            0x00000002 /* Discard Pause Frame */
 #define IXGBE_MFLCN_RPFCE          0x00000004 /* Receive Priority FC Enable */
@@ -127,6 +129,7 @@ int
 ixgbe_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 {
 	u32 *regs_buff = (u32 *)regs->data;
+	u32 regs_buff_len = regs->len / sizeof(*regs_buff);
 	u32 reg;
 	u16 hw_device_id = (u16) regs->version;
 	u8 version = (u8)(regs->version >> 24);
@@ -208,13 +211,23 @@ ixgbe_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 	(reg & IXGBE_SRRCTL_BSIZEPKT_MASK) <= 0x10 ? (reg & IXGBE_SRRCTL_BSIZEPKT_MASK) : 0x10);
 
 	reg = regs_buff[829];
-	fprintf(stdout,
-	"0x03D00: RMCS (Receive Music Control register)        0x%08X\n"
-	"       Transmit Flow Control:                         %s\n"
-	"       Priority Flow Control:                         %s\n",
-	reg,
-	reg & IXGBE_RMCS_TFCE_802_3X     ? "enabled"  : "disabled",
-	reg & IXGBE_RMCS_TFCE_PRIORITY   ? "enabled"  : "disabled");
+	if (mac_type == ixgbe_mac_82598EB) {
+		fprintf(stdout,
+		"0x03D00: RMCS (Receive Music Control register)        0x%08X\n"
+		"       Transmit Flow Control:                         %s\n"
+		"       Priority Flow Control:                         %s\n",
+		reg,
+		reg & IXGBE_RMCS_TFCE_802_3X     ? "enabled"  : "disabled",
+		reg & IXGBE_RMCS_TFCE_PRIORITY   ? "enabled"  : "disabled");
+	} else if (mac_type >= ixgbe_mac_82599EB) {
+		fprintf(stdout,
+		"0x03D00: FCCFG (Flow Control Configuration)           0x%08X\n"
+		"       Transmit Flow Control:                         %s\n"
+		"       Priority Flow Control:                         %s\n",
+		reg,
+		reg & IXGBE_FCCFG_TFCE_802_3X     ? "enabled"  : "disabled",
+		reg & IXGBE_FCCFG_TFCE_PRIORITY   ? "enabled"  : "disabled");
+	}
 
 	reg = regs_buff[1047];
 	fprintf(stdout,
@@ -428,7 +441,7 @@ ixgbe_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 		"0x02F00: RDRXCTL     (Receive DMA Control)            0x%08X\n",
 		regs_buff[469]);
 
-	for (i = 0; i < 8; i++ )
+	for (i = 0; i < 8; i++)
 		fprintf(stdout,
 		"0x%05X: RXPBSIZE%d   (Receive Packet Buffer Size %d)   0x%08X\n",
 		0x3C00 + (4 * i), i, i, regs_buff[470 + i]);
@@ -592,50 +605,133 @@ ixgbe_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 		"0x09000: FHFT        (Flexible Host Filter Table)     0x%08X\n",
 		regs_buff[828]);
 
-	/* DCE */
-	fprintf(stdout,
+	/* DCB */
+	if (mac_type == ixgbe_mac_82598EB) {
+		fprintf(stdout,
 		"0x07F40: DPMCS       (Desc. Plan Music Ctrl Status)   0x%08X\n",
 		regs_buff[830]);
 
-	fprintf(stdout,
+		fprintf(stdout,
 		"0x0CD00: PDPMCS      (Pkt Data Plan Music ctrl Stat)  0x%08X\n",
 		regs_buff[831]);
 
-	if (mac_type == ixgbe_mac_82598EB) {
 		fprintf(stdout,
-			"0x050A0: RUPPBMR     (Rx User Prior to Pkt Buff Map)  0x%08X\n",
-			regs_buff[832]);
+		"0x050A0: RUPPBMR     (Rx User Prior to Pkt Buff Map)  0x%08X\n",
+		regs_buff[832]);
 
 		for (i = 0; i < 8; i++)
 			fprintf(stdout,
-				"0x%05X: RT2CR%d      (Receive T2 Configure %d)         0x%08X\n",
-				0x03C20 + (4 * i), i, i, regs_buff[833 + i]);
+			"0x%05X: RT2CR%d      (Receive T2 Configure %d)         0x%08X\n",
+			0x03C20 + (4 * i), i, i, regs_buff[833 + i]);
 
 		for (i = 0; i < 8; i++)
 			fprintf(stdout,
-				"0x%05X: RT2SR%d      (Recieve T2 Status %d)            0x%08X\n",
-				0x03C40 + (4 * i), i, i, regs_buff[841 + i]);
+			"0x%05X: RT2SR%d      (Receive T2 Status %d)            0x%08X\n",
+			0x03C40 + (4 * i), i, i, regs_buff[841 + i]);
 
 		for (i = 0; i < 8; i++)
 			fprintf(stdout,
-				"0x%05X: TDTQ2TCCR%d  (Tx Desc TQ2 TC Config %d)        0x%08X\n",
-				0x0602C + (0x40 * i), i, i, regs_buff[849 + i]);
+			"0x%05X: TDTQ2TCCR%d  (Tx Desc TQ2 TC Config %d)        0x%08X\n",
+			0x0602C + (0x40 * i), i, i, regs_buff[849 + i]);
 
 		for (i = 0; i < 8; i++)
 			fprintf(stdout,
-				"0x%05X: TDTQ2TCSR%d  (Tx Desc TQ2 TC Status %d)        0x%08X\n",
-				0x0622C + (0x40 * i), i, i, regs_buff[857 + i]);
-	}
+			"0x%05X: TDTQ2TCSR%d  (Tx Desc TQ2 TC Status %d)        0x%08X\n",
+			0x0622C + (0x40 * i), i, i, regs_buff[857 + i]);
 
-	for (i = 0; i < 8; i++)
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: TDPT2TCCR%d  (Tx Data Plane T2 TC Config %d)   0x%08X\n",
+			0x0CD20 + (4 * i), i, i, regs_buff[865 + i]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: TDPT2TCSR%d  (Tx Data Plane T2 TC Status %d)   0x%08X\n",
+			0x0CD40 + (4 * i), i, i, regs_buff[873 + i]);
+	} else if (mac_type >= ixgbe_mac_82599EB) {
 		fprintf(stdout,
-		"0x%05X: TDPT2TCCR%d  (Tx Data Plane T2 TC Config %d)   0x%08X\n",
-		0x0CD20 + (4 * i), i, i, regs_buff[865 + i]);
+			"0x04900: RTTDCS      (Tx Descr Plane Ctrl&Status)     0x%08X\n",
+			regs_buff[830]);
+
+		fprintf(stdout,
+			"0x0CD00: RTTPCS      (Tx Pkt Plane Ctrl&Status)       0x%08X\n",
+			regs_buff[831]);
 
-	for (i = 0; i < 8; i++)
 		fprintf(stdout,
-		"0x%05X: TDPT2TCSR%d  (Tx Data Plane T2 TC Status %d)   0x%08X\n",
-		0x0CD40 + (4 * i), i, i, regs_buff[873 + i]);
+			"0x02430: RTRPCS      (Rx Packet Plane Ctrl&Status)    0x%08X\n",
+			regs_buff[832]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTRPT4C%d    (Rx Packet Plane T4 Config %d)    0x%08X\n",
+			0x02140 + (4 * i), i, i, regs_buff[833 + i]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTRPT4S%d    (Rx Packet Plane T4 Status %d)    0x%08X\n",
+			0x02160 + (4 * i), i, i, regs_buff[841 + i]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTTDT2C%d    (Tx Descr Plane T2 Config %d)     0x%08X\n",
+			0x04910 + (4 * i), i, i, regs_buff[849 + i]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTTDT2S%d    (Tx Descr Plane T2 Status %d)     0x%08X\n",
+			0x04930 + (4 * i), i, i, regs_buff[857 + i]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTTPT2C%d    (Tx Packet Plane T2 Config %d)    0x%08X\n",
+			0x0CD20 + (4 * i), i, i, regs_buff[865]);
+
+		for (i = 0; i < 8; i++)
+			fprintf(stdout,
+			"0x%05X: RTTPT2S%d    (Tx Packet Plane T2 Status %d)    0x%08X\n",
+			0x0CD40 + (4 * i), i, i, regs_buff[873 + i]);
+
+		if (regs_buff_len > 1129) {
+			fprintf(stdout,
+			"0x03020: RTRUP2TC    (Rx User Prio to Traffic Classes)0x%08X\n",
+			regs_buff[1129]);
+
+			fprintf(stdout,
+			"0x0C800: RTTUP2TC    (Tx User Prio to Traffic Classes)0x%08X\n",
+			regs_buff[1130]);
+
+			for (i = 0; i < 4; i++)
+				fprintf(stdout,
+				"0x%05X: TXLLQ%d      (Strict Low Lat Tx Queues %d)     0x%08X\n",
+				0x082E0 + (4 * i), i, i, regs_buff[1131 + i]);
+
+			if (mac_type == ixgbe_mac_82599EB) {
+				fprintf(stdout,
+				"0x04980: RTTBCNRM    (DCB TX Rate Sched MMW)          0x%08X\n",
+				regs_buff[1135]);
+
+				fprintf(stdout,
+				"0x0498C: RTTBCNRD    (DCB TX Rate-Scheduler Drift)    0x%08X\n",
+				regs_buff[1136]);
+			} else if (mac_type == ixgbe_mac_X540) {
+				fprintf(stdout,
+				"0x04980: RTTQCNRM    (DCB TX QCN Rate Sched MMW)      0x%08X\n",
+				regs_buff[1135]);
+
+				fprintf(stdout,
+				"0x0498C: RTTQCNRR    (DCB TX QCN Rate Reset)          0x%08X\n",
+				regs_buff[1136]);
+
+				fprintf(stdout,
+				"0x08B00: RTTQCNCR    (DCB TX QCN Control)             0x%08X\n",
+				regs_buff[1137]);
+
+				fprintf(stdout,
+				"0x04A90: RTTQCNTG    (DCB TX QCN Tagging)             0x%08X\n",
+				regs_buff[1138]);
+			}
+		}
+	}
 
 	/* Statistics */
 	fprintf(stdout,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net 1/2] sit: allow to use rtnl ops on fb tunnel
From: Nicolas Dichtel @ 2013-10-03  8:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, steffen.klassert, pshelar
In-Reply-To: <20131002.170822.1627691739724018976.davem@davemloft.net>

Le 02/10/2013 23:08, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Wed, 02 Oct 2013 09:36:02 +0200
>
>> Le 02/10/2013 09:15, Nicolas Dichtel a écrit :
>>> Le 01/10/2013 18:59, David Miller a écrit :
>>>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>> Date: Tue,  1 Oct 2013 18:04:59 +0200
>>>>
>>>>> rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel
>>>>> param via
>>>>> rtnl"), but I forget to assign rtnl ops to fb tunnels.
>>>>>
>>>>> Now that it is done, we must remove the explicit call to
>>>>> unregister_netdevice_queue(), because the fallback tunnel is added to
>>>>> the queue
>>>>> in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices
>>>>> (this
>>>>> is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")).
>>>>>
>>>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>>
>>>> Applied and queued up for -stable.
>> Another things about ipip: between 0974658da47c ("ipip: advertise
>> tunnel param
>> via rtnl", v3.8) and fd58156e456d ("IPIP: Use ip-tunneling code.",
>> v3.10) the
>> fb device of ipip module has the same problem.
>> Should I send a patch?
>
> Yes please do, thanks for noticing this.
>
In fact, I just notice that 3.9 branch is EoL (bug is only in 3.8 and 3.9).
Should I still send a patch ? If yes, based on which tree/branch?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox