Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] tcp: shrink tcp6_timewait_sock by one cache line
From: David Miller @ 2013-10-03 19:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1380711604.19002.78.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Oct 2013 04:00:04 -0700

> +	tmo = tw->tw_ttd - (u32)jiffies;
 ...
> +		tw->tw_ttd = (u32)(jiffies + timeo);
 ...
> +		tw->tw_ttd = (u32)(jiffies + (slot << INET_TWDR_RECYCLE_TICK));
 ...
> +	s32 delta = tw->tw_ttd - (u32)jiffies;
 ...
> +	s32 delta = tw->tw_ttd - (u32)jiffies;

Eric just use tcp_time_stamp in all of these locations, then you can
lose the casts and still achieve your stated objective.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/1] hso: fix problem with wrong status code sent by OPTION GTM601 during RING indication
From: David Miller @ 2013-10-03 19:29 UTC (permalink / raw)
  To: hns; +Cc: j.dumon, marek.belisko, linux-usb, netdev, linux-kernel
In-Reply-To: <B1808224-5850-41E3-9A8F-0F350F84FF89@goldelico.com>

From: "Dr. H. Nikolaus Schaller" <hns@goldelico.com>
Date: Wed, 2 Oct 2013 09:00:18 +0200

> From f5c7e15b61f2ce4fe3105ff914f6bfaf5d74af0d Mon Sep 17 00:00:00 2001
> From: "H. Nikolaus Schaller" <hns@goldelico.com>
> Date: Thu, 15 Nov 2012 14:40:57 +0100
> Subject: [PATCH 1/1] hso: fix problem with wrong status code sent by OPTION
>  GTM601 during RING indication
> 
>  It has been observed that the GTM601 with 1.7 firmware sometimes sends a value
>  wIndex that has bit 0x04 set instead of being reset as the code expects. So we
>  mask it for the error check.
>  
>  See http://lists.goldelico.com/pipermail/gta04-owner/2012-February/001643.html
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.de>

I think we should look more deeply into what this bit might mean
and why the firmware might be setting it before we even consider
applying a patch like this one.

^ permalink raw reply

* [PATCH] atl1e: enable support for NETIF_F_RXALL and NETIF_F_RXCRC features
From: Andrea Merello @ 2013-10-03 19:18 UTC (permalink / raw)
  To: jie.yang, xiong.huang; +Cc: davem, netdev, linux-kernel, Andrea Merello

This patch allows (optionally, via ethtool) the atl1e NIC to:
- Receive bad frames (runt, bad-fcs, etc..)
- Receive full frames without stripping the FCS.

This has been tested on my board by injecting runt and bad-fcs
frames with a FPGA-based device.

The particular scenario of receiving very short frames (<4 bytes)
without passing FCS to the upper layer has been also tested:
This could be potentially dangerous because the driver performs a
4 byte subtraction on the frame length, but I finally have NOT
added anything to avoid this because it seems the NIC always
discards frames so much short..
If someone still have some reason to worry about this, please
tell me.. I will add an explicit SW check..

Signed-off-by: Andrea Merello <andrea.merello@gmail.com>
---
 drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 46 ++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
index 1966444..7a73f3a 100644
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -313,6 +313,34 @@ static void atl1e_set_multi(struct net_device *netdev)
 	}
 }
 
+static void __atl1e_rx_mode(netdev_features_t features, u32 *mac_ctrl_data)
+{
+
+	if (features & NETIF_F_RXALL) {
+		/* enable RX of ALL frames */
+		*mac_ctrl_data |= MAC_CTRL_DBG;
+	} else {
+		/* disable RX of ALL frames */
+		*mac_ctrl_data &= ~MAC_CTRL_DBG;
+	}
+}
+
+static void atl1e_rx_mode(struct net_device *netdev,
+	netdev_features_t features)
+{
+	struct atl1e_adapter *adapter = netdev_priv(netdev);
+	u32 mac_ctrl_data = 0;
+
+	netdev_dbg(adapter->netdev, "%s\n", __func__);
+
+	atl1e_irq_disable(adapter);
+	mac_ctrl_data = AT_READ_REG(&adapter->hw, REG_MAC_CTRL);
+	__atl1e_rx_mode(features, &mac_ctrl_data);
+	AT_WRITE_REG(&adapter->hw, REG_MAC_CTRL, mac_ctrl_data);
+	atl1e_irq_enable(adapter);
+}
+
+
 static void __atl1e_vlan_mode(netdev_features_t features, u32 *mac_ctrl_data)
 {
 	if (features & NETIF_F_HW_VLAN_CTAG_RX) {
@@ -394,6 +422,10 @@ static int atl1e_set_features(struct net_device *netdev,
 	if (changed & NETIF_F_HW_VLAN_CTAG_RX)
 		atl1e_vlan_mode(netdev, features);
 
+	if (changed & NETIF_F_RXALL)
+		atl1e_rx_mode(netdev, features);
+
+
 	return 0;
 }
 
@@ -1057,7 +1089,8 @@ static void atl1e_setup_mac_ctrl(struct atl1e_adapter *adapter)
 		value |= MAC_CTRL_PROMIS_EN;
 	if (netdev->flags & IFF_ALLMULTI)
 		value |= MAC_CTRL_MC_ALL_EN;
-
+	if (netdev->features & NETIF_F_RXALL)
+		value |= MAC_CTRL_DBG;
 	AT_WRITE_REG(hw, REG_MAC_CTRL, value);
 }
 
@@ -1405,7 +1438,8 @@ static void atl1e_clean_rx_irq(struct atl1e_adapter *adapter, u8 que,
 			rx_page_desc[que].rx_nxseq++;
 
 			/* error packet */
-			if (prrs->pkt_flag & RRS_IS_ERR_FRAME) {
+			if ((prrs->pkt_flag & RRS_IS_ERR_FRAME) &&
+			    !(netdev->features & NETIF_F_RXALL)) {
 				if (prrs->err_flag & (RRS_ERR_BAD_CRC |
 					RRS_ERR_DRIBBLE | RRS_ERR_CODE |
 					RRS_ERR_TRUNC)) {
@@ -1418,7 +1452,10 @@ static void atl1e_clean_rx_irq(struct atl1e_adapter *adapter, u8 que,
 			}
 
 			packet_size = ((prrs->word1 >> RRS_PKT_SIZE_SHIFT) &
-					RRS_PKT_SIZE_MASK) - 4; /* CRC */
+					RRS_PKT_SIZE_MASK);
+			if (likely(!(netdev->features & NETIF_F_RXFCS)))
+				packet_size -= 4; /* CRC */
+
 			skb = netdev_alloc_skb_ip_align(netdev, packet_size);
 			if (skb == NULL)
 				goto skip_pkt;
@@ -2245,7 +2282,8 @@ static int atl1e_init_netdev(struct net_device *netdev, struct pci_dev *pdev)
 			      NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->features = netdev->hw_features | NETIF_F_LLTX |
 			   NETIF_F_HW_VLAN_CTAG_TX;
-
+	/* not enabled by default */
+	netdev->hw_features |= NETIF_F_RXALL | NETIF_F_RXFCS;
 	return 0;
 }
 
-- 
1.8.1.2

^ permalink raw reply related

* Re: Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Ben Greear @ 2013-10-03 19:17 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, linux-wireless@vger.kernel.org
In-Reply-To: <524DBC93.1070400@hp.com>

On 10/03/2013 11:50 AM, Rick Jones wrote:
> On 10/03/2013 11:27 AM, Ben Greear wrote:
>> I'm seeing something a bit strange and wondering if anyone had an
>> opinion on why...
>>
>> I am testing up to 200 wifi station systems, each with a TCP connection
>> running on them (download only, from VAP to stations).
>>
>> Without encryption (ie, open network), I see total throughput go from
>> about 108Mbps down to 69Mbps as I add more stations (I add 25 at a time,
>> so the 108Mbps is with 25 active, and 69Mbps is with 200 active).
>>
>> However, if I enable encryption, the throughput is actually higher
>> (111Mbps to 71Mbps).  I'm doing encryption in software, so it adds a fair
>> bit of CPU load in this test.  The numbers bounce around since this is
>> wifi after all, but in general encryption tends to win reliably in this
>> test.
>>
>> When testing with a single station (and 5 tcp streams with jacked up
>> snd/rcv buffers) the open networks perform significantly better at total throughput:
>> 263Mbps vs 246Mbps.
>>
>> Maybe the extra delay for decryption increases odds that GRO will take
>> affect for the many, slower streams (and maybe that will decrease ACK
>> traffic?)
>>
>> Any other ideas?
>
> Fewer times two or more stations step on one another?  The recievers will only try to transmit when they receive data.  Modulo timing, if the individual
> downloads are a bit slower, less chance of the receivers looking to send ACKs back through at the same time?  Got any low-level stats for the health and well
> being of the wireless network?

The tcp connection stats are taken after running for 60 seconds, and I take 3-sec running averages
as well as 60 second averages.  So, I think that it would have to be total decrease in ACKs,
not just timing, to make a difference.  The 3 and 60 second stats show consistently higher throughput
with encryption when using 25+ stations/connections.

Also, it works out that the sending sockets all sort of send randomly as they
are able, so I don't think there would be any particular ACK flood seen..

I have great quantities of low level stats, but I have not dug into them in detail
just yet.  In general, my RF environment in this test is fairly controlled, as
I am cabling the systems using good semi-rigid SMA cables and an RF attenuator.
There will be some external interference of course, as they are not in an
isolation chamber.

As for the difference in 1 stations vs 25+, then it is very likely related to
low level things like MPDU working better with a single station, and probably
better ACK avoidance (I recall about 20kpps download, 4kpps upload in a previous
test with a single station, which indicates to me we must not be acking every
packet-on-the-air..somehow).

(For grins, I played with the delayed-ack-segs from an out-of-tree patch and
can get TCP throughput up to 300Mbps by setting delayed ack segs to 64 in
single station/5 stream, open network test).

Thanks,
Ben

>
> rick jones

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [stable 3.0] add some CVE fixes
From: Jiri Slaby @ 2013-10-03 19:11 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, netdev
In-Reply-To: <20131003184411.GG8901@kroah.com>

On 10/03/2013 08:44 PM, Greg KH wrote:
> On Thu, Oct 03, 2013 at 11:20:28AM +0200, Jiri Slaby wrote:
>> Plus the backports that are replied to this mail?
> 
> I don't see any backports, did you forget to send them?

I don't think so, they were sent and this is a log of one of them:
OK. Log says:
Sendmail: /usr/sbin/sendmail -f jslaby@suse.cz -i stable@vger.kernel.org
jslaby@suse.cz
From: Jiri Slaby <jslaby@suse.cz>
To: <stable@vger.kernel.org>
Cc: jslaby@suse.cz
Subject: [PATCH 4/4] Tools: hv: verify origin of netlink connector message
Date: Thu,  3 Oct 2013 11:23:50 +0200
Message-Id: <1380792230-27255-4-git-send-email-jslaby@suse.cz>
X-Mailer: git-send-email 1.8.4
In-Reply-To: <1380792230-27255-1-git-send-email-jslaby@suse.cz>
References: <524D36DC.5070506@suse.cz>
 <1380792230-27255-1-git-send-email-jslaby@suse.cz>

Result: OK

Could you check your spam folder? Or I can bounce them directly to you?

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH net-next] dev: add support of flag IFF_NOPROC
From: David Miller @ 2013-10-03 19:09 UTC (permalink / raw)
  To: stephen; +Cc: nicolas.dichtel, netdev
In-Reply-To: <20131003104627.411f5cc4@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Thu, 3 Oct 2013 10:46:27 -0700

> What about speeding up proc or sysfs? Or providing a bulk create/destroy.

+1 +1 +1

This will benefit more people than the just the envisioned users for
this IFF_NOPROC thing.

I really don't want to take the IFF_NOPROC approach.

^ permalink raw reply

* Re: Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Rick Jones @ 2013-10-03 18:50 UTC (permalink / raw)
  To: Ben Greear, netdev,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <524DB6F6.6020405-my8/4N5VtI7c+919tysfdA@public.gmane.org>

On 10/03/2013 11:27 AM, Ben Greear wrote:
> I'm seeing something a bit strange and wondering if anyone had an
> opinion on why...
>
> I am testing up to 200 wifi station systems, each with a TCP connection
> running on them (download only, from VAP to stations).
>
> Without encryption (ie, open network), I see total throughput go from
> about 108Mbps down to 69Mbps as I add more stations (I add 25 at a time,
> so the 108Mbps is with 25 active, and 69Mbps is with 200 active).
>
> However, if I enable encryption, the throughput is actually higher
> (111Mbps to 71Mbps).  I'm doing encryption in software, so it adds a fair
> bit of CPU load in this test.  The numbers bounce around since this is
> wifi after all, but in general encryption tends to win reliably in this
> test.
>
> When testing with a single station (and 5 tcp streams with jacked up
> snd/rcv buffers) the open networks perform significantly better at total throughput:
> 263Mbps vs 246Mbps.
>
> Maybe the extra delay for decryption increases odds that GRO will take
> affect for the many, slower streams (and maybe that will decrease ACK
> traffic?)
>
> Any other ideas?

Fewer times two or more stations step on one another?  The recievers 
will only try to transmit when they receive data.  Modulo timing, if the 
individual downloads are a bit slower, less chance of the receivers 
looking to send ACKs back through at the same time?  Got any low-level 
stats for the health and well being of the wireless network?

rick jones

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [stable 3.0] add some CVE fixes
From: Greg KH @ 2013-10-03 18:44 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: stable, netdev
In-Reply-To: <524D36DC.5070506@suse.cz>

On Thu, Oct 03, 2013 at 11:20:28AM +0200, Jiri Slaby wrote:
> Hi,
> 
> could you consider adding fixes for the following CVEs to 3.0 (and
> possibly later)?
> CVE-2013-4163: 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be
> CVE-2013-2206: f2815633504b442ca0b0605c16bf3d88a3a0fcea

Network patches for stable releases need to be asked of the networking
maintainer, who then forwards them on to me.

> Plus the backports that are replied to this mail?

I don't see any backports, did you forget to send them?

thanks,

greg k-h

^ permalink raw reply

* Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Ben Greear @ 2013-10-03 18:27 UTC (permalink / raw)
  To: netdev, linux-wireless@vger.kernel.org

I'm seeing something a bit strange and wondering if anyone had an opinion on why...

I am testing up to 200 wifi station systems, each with a TCP connection running
on them (download only, from VAP to stations).

Without encryption (ie, open network), I see total throughput go from
about 108Mbps down to 69Mbps as I add more stations (I add 25 at a time,
so the 108Mbps is with 25 active, and 69Mbps is with 200 active).

However, if I enable encryption, the throughput is actually higher
(111Mbps to 71Mbps).  I'm doing encryption in software, so it adds a fair
bit of CPU load in this test.  The numbers bounce around since this is
wifi after all, but in general encryption tends to win reliably in this
test.

When testing with a single station (and 5 tcp streams with jacked up snd/rcv buffers)
the open networks perform significantly better at total throughput:  263Mbps vs 246Mbps.

Maybe the extra delay for decryption increases odds that GRO will take
affect for the many, slower streams (and maybe that will decrease ACK traffic?)

Any other ideas?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [net-next 2/3] udp: Add udp early demux
From: Eric Dumazet @ 2013-10-03 18:06 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: David Miller, tomk, netdev
In-Reply-To: <20131003173946.GA5684@sbohrermbp13-local.rgmadvisors.com>

On Thu, 2013-10-03 at 12:39 -0500, Shawn Bohrer wrote:
> On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> > I suggested that for unicast, you do a limited lookup to the first
> > socket found in bucket.
> > 
> > If its an exact match, you take the socket.
> > 
> > If not, you give up, and do not scan the whole chain.
> 
> So something like the following?
> 
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 02185a5..d202e5b 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1849,7 +1849,42 @@ begin:
>  	}
>  	rcu_read_unlock();
>  	return result;
> +}
>  
> +/* For unicast we should only early demux connected sockets or we can
> + * break forwarding setups.  The chains here can be long so only check
> + * if the first socket is an exact match and if not move on.
> + */
> +static struct sock *__udp4_lib_demux_lookup(struct net *net,
> +					    __be16 loc_port, __be32 loc_addr,
> +					    __be16 rmt_port, __be32 rmt_addr,
> +					    int dif)
> +{
> +	struct sock *sk, *result;
> +	struct hlist_nulls_node *node;
> +	unsigned short hnum = ntohs(loc_port);
> +	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
> +	struct udp_hslot *hslot = &udp_table.hash[slot];
> +	const int exact_match = 18;
> +	int score;
> +
> +	rcu_read_lock();
> +	result = NULL;
> +	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
> +		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
> +				      loc_addr, loc_port, dif);
> +		if (score == exact_match)
> +			result = sk;
> +		/* Only check first socket in chain */
> +		break;
> +	}
> +
> +	if (result) {
> +		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
> +			result = NULL;
> +	}
> +	rcu_read_unlock();
> +	return result;
>  }
>  

Just do the tuple comparison instead of compute_score(),
since you know we want full L4 match.

The standard way is to use the INET_MATCH() macro

^ permalink raw reply

* Re: [PATCH net-next] dev: add support of flag IFF_NOPROC
From: Stephen Hemminger @ 2013-10-03 17:46 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem
In-Reply-To: <1380806905-4461-1-git-send-email-nicolas.dichtel@6wind.com>

On Thu,  3 Oct 2013 15:28:25 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> This flag allows to create netdevices without creating directories in
> /proc, ie no /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and no
> /proc/net/dev_snmp6/<dev>.
> 
> When a system creates a lot of virtual netdevices, this allows to speed up the
> creation time. For systems which continuously create and destroy virtual
> netdevices, proc entries for these netdevices may not be used, hence adding this
> flag is interesting.
> 
> Note that the flag should be specified at the creation time (before calling
> register_netdevice()) and cannot be removed during the life of the netdevice.
> 
> Here are some numbers:
> 
> dummy20000.batch contains 20 000 times 'link add type dummy' and
> dummy20000-noproc.batch 20 000 times 'link add noproc type dummy'.
> 
> time ip -b dummy20000.batch
> real    0m56.367s
> user    0m0.200s
> sys     0m53.070s
> 
> time ip -b dummy20000-noproc.batch
> real    0m42.417s
> user    0m0.310s
> sys     0m38.470s
> 
> Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Seems like a special case. The problem is that you just created devices
that are unmanageable and might well break other tools in the system.
What about speeding up proc or sysfs? Or providing a bulk create/destroy.
Also if you used a custom program it could have seperate netlink send
and receive threads to pipeline the creation.

^ permalink raw reply

* Re: [PATCH RFC 59/77] qla2xxx: Update MSI/MSI-X interrupts enablement code
From: Saurav Kashyap @ 2013-10-03 17:42 UTC (permalink / raw)
  To: Alexander Gordeev, linux-kernel
  Cc: Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips@linux-mips.org,
	linuxppc-dev@lists.ozlabs.org, linux390@de.ibm.com,
	linux-s390@vger.kernel.org, x86@kernel.org,
	linux-ide@vger.kernel.org
In-Reply-To: <54f6b89372f51cd27a6adf6ecc91b8bf6bb5ba74.1380703263.git.agordeev@redhat.com>

Acked-by: Saurav Kashyap <saurav.kashyap@qlogic.com>


>As result of recent re-design of the MSI/MSI-X interrupts enabling
>pattern this driver has to be updated to use the new technique to
>obtain a optimal number of MSI/MSI-X interrupts required.
>
>Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
>---
> drivers/scsi/qla2xxx/qla_isr.c |   18 +++++++++++-------
> 1 files changed, 11 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/scsi/qla2xxx/qla_isr.c
>b/drivers/scsi/qla2xxx/qla_isr.c
>index df1b30b..6c11ab9 100644
>--- a/drivers/scsi/qla2xxx/qla_isr.c
>+++ b/drivers/scsi/qla2xxx/qla_isr.c
>@@ -2836,16 +2836,20 @@ qla24xx_enable_msix(struct qla_hw_data *ha,
>struct rsp_que *rsp)
> 	for (i = 0; i < ha->msix_count; i++)
> 		entries[i].entry = i;
> 
>-	ret = pci_enable_msix(ha->pdev, entries, ha->msix_count);
>-	if (ret) {
>+	ret = pci_msix_table_size(ha->pdev);
>+	if (ret < 0) {
>+		goto msix_failed;
>+	} else {
> 		if (ret < MIN_MSIX_COUNT)
> 			goto msix_failed;
> 
>-		ql_log(ql_log_warn, vha, 0x00c6,
>-		    "MSI-X: Failed to enable support "
>-		    "-- %d/%d\n Retry with %d vectors.\n",
>-		    ha->msix_count, ret, ret);
>-		ha->msix_count = ret;
>+		if (ret < ha->msix_count) {
>+			ql_log(ql_log_warn, vha, 0x00c6,
>+			    "MSI-X: Failed to enable support "
>+			    "-- %d/%d\n Retry with %d vectors.\n",
>+			    ha->msix_count, ret, ret);
>+			ha->msix_count = ret;
>+		}
> 		ret = pci_enable_msix(ha->pdev, entries, ha->msix_count);
> 		if (ret) {
> msix_failed:
>-- 
>1.7.7.6
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Fw: [Bug 62491] New: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
From: Stephen Hemminger @ 2013-10-03 17:40 UTC (permalink / raw)
  To: Jay Cliburn, Chris Snook, Johannes Berg; +Cc: netdev



Begin forwarded message:

Date: Thu, 3 Oct 2013 10:21:34 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 62491] New: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff


https://bugzilla.kernel.org/show_bug.cgi?id=62491

            Bug ID: 62491
           Summary: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
           Product: Networking
           Version: 2.5
    Kernel Version: 3.12-rc3
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: vaniaz@msn.com
        Regression: No

Hardware: lenovo g780 using alx network driver; software - opensuse tambleweed
with kernel 3.12-rc3.
After entering sleep mode and waking up console at alt + f is flooding with
messages like:
[672.000000] alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [net-next 2/3] udp: Add udp early demux
From: Shawn Bohrer @ 2013-10-03 17:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, tomk, netdev
In-Reply-To: <1380749932.19002.127.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 02, 2013 at 02:38:52PM -0700, Eric Dumazet wrote:
> I suggested that for unicast, you do a limited lookup to the first
> socket found in bucket.
> 
> If its an exact match, you take the socket.
> 
> If not, you give up, and do not scan the whole chain.

So something like the following?


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 02185a5..d202e5b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1849,7 +1849,42 @@ begin:
 	}
 	rcu_read_unlock();
 	return result;
+}
 
+/* For unicast we should only early demux connected sockets or we can
+ * break forwarding setups.  The chains here can be long so only check
+ * if the first socket is an exact match and if not move on.
+ */
+static struct sock *__udp4_lib_demux_lookup(struct net *net,
+					    __be16 loc_port, __be32 loc_addr,
+					    __be16 rmt_port, __be32 rmt_addr,
+					    int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int slot = udp_hashfn(net, hnum, udp_table.mask);
+	struct udp_hslot *hslot = &udp_table.hash[slot];
+	const int exact_match = 18;
+	int score;
+
+	rcu_read_lock();
+	result = NULL;
+	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+		score = compute_score(sk, net, rmt_addr, hnum, rmt_port,
+				      loc_addr, loc_port, dif);
+		if (score == exact_match)
+			result = sk;
+		/* Only check first socket in chain */
+		break;
+	}
+
+	if (result) {
+		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+	}
+	rcu_read_unlock();
+	return result;
 }
 
 void udp_v4_early_demux(struct sk_buff *skb)
@@ -1870,8 +1905,8 @@ void udp_v4_early_demux(struct sk_buff *skb)
 		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
 						   uh->source, iph->saddr, dif);
 	else if (skb->pkt_type == PACKET_HOST)
-		sk = __udp4_lib_lookup(net, iph->saddr, uh->source,
-				       iph->daddr, uh->dest, dif, &udp_table);
+		sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
+					     uh->source, iph->saddr, dif);
 	else
 		return;
 

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related

* Re: [PATCH net 1/2] sit: allow to use rtnl ops on fb tunnel
From: David Miller @ 2013-10-03 17:37 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, steffen.klassert, pshelar
In-Reply-To: <524D2580.9040702@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 03 Oct 2013 10:06:24 +0200

> In fact, I just notice that 3.9 branch is EoL (bug is only in 3.8 and
> 3.9).
> Should I still send a patch ? If yes, based on which tree/branch?

No stable backports are needed then.

^ permalink raw reply

* I NEED YOUR HELP.
From: FROM MRS GRACE MANDA @ 2013-10-03 16:36 UTC (permalink / raw)

In-Reply-To: <1380809157.62922.YahooMailNeo@web5706.biz.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 56 bytes --]



I PRAY THAT THIS MAIL GETS TO YOU IN BETTER HEALTH. 

[-- Attachment #2: From Grace Manda.pdf --]
[-- Type: application/pdf, Size: 41077 bytes --]

^ permalink raw reply

* Re: [PATCH RFC 51/77] mthca: Update MSI/MSI-X interrupts enablement code
From: Jack Morgenstein @ 2013-10-03 16:11 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers,
	"VMware, Inc." <pv-dr
In-Reply-To: <9d424912ef78993dc75e2af5006cd12913e9e7e7.1380703263.git.agordeev@redhat.com>

On Wed,  2 Oct 2013 12:49:07 +0200
Alexander Gordeev <agordeev@redhat.com> wrote:

> Subject: [PATCH RFC 51/77] mthca: Update MSI/MSI-X interrupts
> enablement code Date: Wed,  2 Oct 2013 12:49:07 +0200
> Sender: linux-rdma-owner@vger.kernel.org
> X-Mailer: git-send-email 1.7.7.6
> 
> As result of recent re-design of the MSI/MSI-X interrupts enabling
> pattern this driver has to be updated to use the new technique to
> obtain a optimal number of MSI/MSI-X interrupts required.
> 
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---

ACK.

-Jack

^ permalink raw reply

* tx checksum offload in rtl8168evl disabled in driver
From: jason.morgan @ 2013-10-03 14:27 UTC (permalink / raw)
  To: netdev

Hi,

I'm try to get close to saturating a 1G ethernet.

I'm at 517Mbps and I've found that there seems to be a cpu bottleneck.

I'm using 2k to 4k frames with a rtl8168evl.

I notice from ethtool that tx-checksum is turned off and refuse to turn 
on.

I've found this message
http://www.spinics.net/lists/netdev/msg216530.html

Which indicates the cause being the driver.

I've looked at the driver code rtl8169.c in kernel 3.8 and the line 

        [RTL_GIGA_MAC_VER_34] =
                _R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3,
                                                        JUMBO_9K, false),

indicates the reason for this.

However the message thread, above indicates that this is not a problem and 

can be changed to make tx-checksum offload possible.

However we are using a newer chip to the on in the message thread.  I've 
tried to find other, more recent citations without success.

So, why is it still turned off?

What will be the effect of turning it on (changing false to true, in the 
driver line) for our chip?

Thanks in advance,
Jason

^ permalink raw reply

* [PATCHv2] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-10-03 13:49 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Oussama Ghorbel

The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
---
 net/ipv6/ip6_tunnel.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 46ba243..4b51b03 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1429,9 +1429,17 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 static int
 ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
-	if (new_mtu < IPV6_MIN_MTU) {
-		return -EINVAL;
+	struct ip6_tnl *tnl = netdev_priv(dev);
+
+	if (tnl->parms.proto == IPPROTO_IPIP) {
+		if (new_mtu < 68)
+			return -EINVAL;
+	} else {
+		if (new_mtu < IPV6_MIN_MTU)
+			return -EINVAL;
 	}
+	if (new_mtu > 0xFFF8 - dev->hard_header_len)
+		return -EINVAL;
 	dev->mtu = new_mtu;
 	return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH iproute2 net-next-3.11] ip: add support of link flag IFF_NOPROC
From: Nicolas Dichtel @ 2013-10-03 13:30 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, davem, Nicolas Dichtel
In-Reply-To: <1380806905-4461-1-git-send-email-nicolas.dichtel@6wind.com>

When this flag is specified, /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and
/proc/net/dev_snmp6/<dev> directories are not created.

This flag cannot be removed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/if.h    | 2 ++
 ip/ipaddress.c        | 1 +
 ip/iplink.c           | 3 +++
 man/man8/ip-link.8.in | 8 ++++++++
 4 files changed, 14 insertions(+)

diff --git a/include/linux/if.h b/include/linux/if.h
index 7f261c08e816..5b8a5ebff599 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -53,6 +53,8 @@
 
 #define IFF_ECHO	0x40000		/* echo sent packets		*/
 
+#define IFF_NOPROC	0x80000		/* no proc/sysctl directories	*/
+
 #define IFF_VOLATILE	(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_ECHO|\
 		IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 1c3e4da0d0da..b2e35028c844 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -116,6 +116,7 @@ static void print_link_flags(FILE *fp, unsigned flags, unsigned mdown)
 	_PF(LOWER_UP);
 	_PF(DORMANT);
 	_PF(ECHO);
+	_PF(NOPROC);
 #undef _PF
 	if (flags)
 		fprintf(fp, "%x", flags);
diff --git a/ip/iplink.c b/ip/iplink.c
index ada9d4255ba2..253ed1cc3f6f 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -50,6 +50,7 @@ void iplink_usage(void)
 		fprintf(stderr, "                   [ mtu MTU ]\n");
 		fprintf(stderr, "                   [ numtxqueues QUEUE_COUNT ]\n");
 		fprintf(stderr, "                   [ numrxqueues QUEUE_COUNT ]\n");
+		fprintf(stderr, "                   [ noproc ]\n");
 		fprintf(stderr, "                   type TYPE [ ARGS ]\n");
 		fprintf(stderr, "       ip link delete DEV type TYPE [ ARGS ]\n");
 		fprintf(stderr, "\n");
@@ -480,6 +481,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 				invarg("Invalid \"numrxqueues\" value\n", *argv);
 			addattr_l(&req->n, sizeof(*req), IFLA_NUM_RX_QUEUES,
 				  &numrxqueues, 4);
+		} else if (matches(*argv, "noproc") == 0) {
+			req->i.ifi_flags |= IFF_NOPROC;
 		} else {
 			if (strcmp(*argv, "dev") == 0) {
 				NEXT_ARG();
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 76f92ddbd82c..b16d1a1f8a41 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -45,6 +45,8 @@ ip-link \- network device configuration
 .RB "[ " numrxqueues
 .IR QUEUE_COUNT " ]"
 .br
+.RB "[ " noproc " ]"
+.br
 .BR type " TYPE"
 .RI "[ " ARGS " ]"
 
@@ -197,6 +199,12 @@ specifies the number of transmit queues for new device.
 specifies the number of receive queues for new device.
 
 .TP
+.BI noproc
+specifies to no create iface related directories under /proc
+(/proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and
+/proc/net/dev_snmp6/<dev>)
+
+.TP
 VXLAN Type Support
 For a link of type 
 .I VXLAN
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next] dev: add support of flag IFF_NOPROC
From: Nicolas Dichtel @ 2013-10-03 13:28 UTC (permalink / raw)
  To: netdev; +Cc: davem, Nicolas Dichtel

This flag allows to create netdevices without creating directories in
/proc, ie no /proc/sys/net/ipv[4|6]/[conf|neigh]/<dev> and no
/proc/net/dev_snmp6/<dev>.

When a system creates a lot of virtual netdevices, this allows to speed up the
creation time. For systems which continuously create and destroy virtual
netdevices, proc entries for these netdevices may not be used, hence adding this
flag is interesting.

Note that the flag should be specified at the creation time (before calling
register_netdevice()) and cannot be removed during the life of the netdevice.

Here are some numbers:

dummy20000.batch contains 20 000 times 'link add type dummy' and
dummy20000-noproc.batch 20 000 times 'link add noproc type dummy'.

time ip -b dummy20000.batch
real    0m56.367s
user    0m0.200s
sys     0m53.070s

time ip -b dummy20000-noproc.batch
real    0m42.417s
user    0m0.310s
sys     0m38.470s

Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/if.h | 2 ++
 net/core/dev.c          | 2 +-
 net/core/rtnetlink.c    | 1 +
 net/ipv4/devinet.c      | 3 +++
 net/ipv6/addrconf.c     | 3 +++
 net/ipv6/proc.c         | 5 +++++
 6 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 1ec407b01e46..bb9fe5eb38bf 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -53,6 +53,8 @@
 
 #define IFF_ECHO	0x40000		/* echo sent packets		*/
 
+#define IFF_NOPROC	0x80000		/* no proc/sysctl directories	*/
+
 #define IFF_VOLATILE	(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_ECHO|\
 		IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
 
diff --git a/net/core/dev.c b/net/core/dev.c
index c25db20a4246..13f6dd360c74 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5199,7 +5199,7 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
 			       IFF_DYNAMIC | IFF_MULTICAST | IFF_PORTSEL |
 			       IFF_AUTOMEDIA)) |
 		     (dev->flags & (IFF_UP | IFF_VOLATILE | IFF_PROMISC |
-				    IFF_ALLMULTI));
+				    IFF_ALLMULTI | IFF_NOPROC));
 
 	/*
 	 *	Load in the correct multicast list now the flags have changed.
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4aedf03da052..5bad28e66fa2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1860,6 +1860,7 @@ replay:
 		}
 
 		dev->ifindex = ifm->ifi_index;
+		dev->flags |= ifm->ifi_flags & IFF_NOPROC;
 
 		if (ops->newlink)
 			err = ops->newlink(net, dev, tb, data);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcbd04ae..13b4089d8996 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2160,6 +2160,9 @@ static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf)
 
 static void devinet_sysctl_register(struct in_device *idev)
 {
+	if (idev->dev->flags & IFF_NOPROC)
+		return;
+
 	neigh_sysctl_register(idev->dev, idev->arp_parms, "ipv4", NULL);
 	__devinet_sysctl_register(dev_net(idev->dev), idev->dev->name,
 					&idev->cnf);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index cd3fb301da38..e06d15ea2dba 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5032,6 +5032,9 @@ static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
 
 static void addrconf_sysctl_register(struct inet6_dev *idev)
 {
+	if (idev->dev->flags & IFF_NOPROC)
+		return;
+
 	neigh_sysctl_register(idev->dev, idev->nd_parms, "ipv6",
 			      &ndisc_ifinfo_sysctl_change);
 	__addrconf_sysctl_register(dev_net(idev->dev), idev->dev->name,
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 091d066a57b3..f89911116aa7 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -274,6 +274,9 @@ int snmp6_register_dev(struct inet6_dev *idev)
 	if (!idev || !idev->dev)
 		return -EINVAL;
 
+	if (idev->dev->flags & IFF_NOPROC)
+		return 0;
+
 	net = dev_net(idev->dev);
 	if (!net->mib.proc_net_devsnmp6)
 		return -ENOENT;
@@ -291,6 +294,8 @@ int snmp6_register_dev(struct inet6_dev *idev)
 int snmp6_unregister_dev(struct inet6_dev *idev)
 {
 	struct net *net = dev_net(idev->dev);
+	if (idev->dev->flags & IFF_NOPROC)
+		return 0;
 	if (!net->mib.proc_net_devsnmp6)
 		return -ENOENT;
 	if (!idev->stats.proc_dir_entry)
-- 
1.8.2.1

^ permalink raw reply related

* Re: [PATCH net-next] tcp: rcvbuf autotuning improvements
From: Eric Dumazet @ 2013-10-03 13:13 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Francesco Fusco
In-Reply-To: <1380787003-20488-1-git-send-email-dborkman@redhat.com>

On Thu, 2013-10-03 at 09:56 +0200, Daniel Borkmann wrote:

> We also renamed tcp_fixup_rcvbuf() to tcp_rcvbuf_expand() to be
> consistent with tcp_sndbuf_expand().

BTW we renamed the function only because it was used both for initial
sizing, and from tcp_new_space()

As is, tcp_fixup_rcvbuf() is only called at connection setup.

^ permalink raw reply

* Re: [PATCH net-next] tcp: rcvbuf autotuning improvements
From: Eric Dumazet @ 2013-10-03 13:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, netdev, Francesco Fusco, Michael Dalton, ycheng, ncardwell
In-Reply-To: <1380787003-20488-1-git-send-email-dborkman@redhat.com>

On Thu, 2013-10-03 at 09:56 +0200, Daniel Borkmann wrote:
> This is a complementary patch for commit 6ae705323 ("tcp: sndbuf
> autotuning improvements") that fixes a performance regression on
> receiver side in setups with low to mid latency, high throughput,
> and senders with TSO/GSO off (receivers w/ default settings).
> 
> The following measurements in Mbit/s were done for 60sec w/ netperf
> on virtio w/ TSO/GSO off:
> 
> (ms)    1)              2)              3)
>   0     2762.11         1150.32         2906.17
>  10     1083.61          538.89         1091.03
>  25      471.81          313.18          474.60
>  50      242.33          187.84          242.36
>  75      162.14          134.45          161.95
> 100      121.55          101.96          121.49
> 150       80.64           57.75           80.48
> 200       58.97           54.11           59.90
> 250       47.10           46.92           47.31
> 
> Same setup w/ TSO/GSO on:
> 
> (ms)    1)              2)              3)
>   0     12225.91        12366.89        16514.37
>  10      1526.64         1525.79         2176.63
>  25       655.13          647.79          871.52
>  50       338.51          377.88          439.46
>  75       246.49          278.46          295.62
> 100       210.93          207.56          217.34
> 150       127.88          129.56          141.33
> 200        94.95           94.50          107.29
> 250        67.39           73.88           88.35
> 
> Similarly as in 6ae705323, we fixed up power-of-two rounding and
> took cached mss into account, thus bringing per_mss calculations
> closer to each other, the rest stays as is.
> 
> We also renamed tcp_fixup_rcvbuf() to tcp_rcvbuf_expand() to be
> consistent with tcp_sndbuf_expand().
> 
> While we do think that 6ae705323b71 is the right way to go, also
> this follow-up seems necessary to restore performance for
> receivers.

Hmm, I think you based this patch on some virtio requirements.

I would rather fix virtio, because virtio has poor truesize/payload
ratio.

Michael Dalton is working on this right now.

Really I don't understand how 'fixing' initial rcvbuf could explain such
difference in a 60 second transfert.

Normally, if autotuning was working, the first sk_rcvbuf value would
only matter in the very beginning of a flow (maybe one, two or even
three RTT)

It looks like you only need to set sk_rcvbuf to tcp_rmem[2],
so you probably have to fix the autotuning, or virtio to give normal
skbs, not fat ones ;)


Thanks

^ permalink raw reply

* Re: [PATCH] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-10-03 12:37 UTC (permalink / raw)
  To: Oussama Ghorbel, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel,
	Oussama Ghorbel
In-Reply-To: <CABfLueHOR1HNsRC_-+Phc=9LMPTiOVuWoEjguG64L=9hiZLeVg@mail.gmail.com>

I will send a new patch, the diff will be:

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 46ba243..4b51b03 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1429,9 +1429,17 @@ ip6_tnl_ioctl(struct net_device *dev, struct
ifreq *ifr, int cmd)
 static int
 ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
-       if (new_mtu < IPV6_MIN_MTU) {
-               return -EINVAL;
+       struct ip6_tnl *tnl = netdev_priv(dev);
+
+       if (tnl->parms.proto == IPPROTO_IPIP) {
+               if (new_mtu < 68)
+                       return -EINVAL;
+       } else {
+               if (new_mtu < IPV6_MIN_MTU)
+                       return -EINVAL;
        }
+       if (new_mtu > 0xFFF8 - dev->hard_header_len)
+               return -EINVAL;
        dev->mtu = new_mtu;
        return 0;
 }


On Sun, Sep 29, 2013 at 5:33 PM, Oussama Ghorbel <ou.ghorbel@gmail.com> wrote:
> On Sun, Sep 29, 2013 at 4:45 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Sun, Sep 29, 2013 at 10:40:11AM +0100, Oussama Ghorbel wrote:
>>> On Fri, Sep 27, 2013 at 6:03 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>> > Ok, let's go with one function per protocol type. Seems easier.
>>> >
>>> > It seems to get more hairy, because it depends on the tunnel driver if the
>>> > prepended ip header is accounted in hard_header_len. :/
>>> >
>>> > I don't know if it works out cleanly. Otherwise I would be ok if the checks
>>> > just get repeated in ip6_tunnel and leave the rest as-is.
>>> >
>>> Yes, It will be the clean way to do it.
>>
>> Fine. :)
>>
>>> >
>>> > Linux currently cannot create "jumbograms" (only the receiving side
>>> > is supported).
>>> >
>>> I understand, but what are the benefit from this limit or the harm
>>> from not specifying it?
>>> Please check this comment from eth.c
>>>
>>> /**
>>>  * eth_change_mtu - set new MTU size
>>>  * @dev: network device
>>>  * @new_mtu: new Maximum Transfer Unit
>>>  *
>>>  * Allow changing MTU size. Needs to be overridden for devices
>>>  * supporting jumbo frames.
>>>  */
>>> int eth_change_mtu(struct net_device *dev, int new_mtu)
>>
>> Hmm, I cannot judge without the full patch. Will it be applicable
>> to all net_devices or just ethernet ones? The name could be a bit
>> misleading. Remindes me a lot of dev_set_mtu based on the signature, btw.
>
> Normally to all net_devices, otherwise it could get complicated to
> check for every dev separately ...
> But, never mind, the comment below solve the issue
>
>>
>>> So wouldn't be a good idea to let our function open for jumbo frames...?
>>
>> Hm, we can document the fact that the function would needed to be updated in
>> that case. But we should not allow to set a mtu which would require jumbograms
>> currently.
>
> OK, sounds a good. I will check the mtu against the limit
> IPV6_MAXPLEN, and document the jumbo restriction ...
>
>>
>> Greetings,
>>
>>   Hannes
>>
>
> Regards,
> Oussama

^ permalink raw reply related

* Re: [PATCH net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-03 11:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, linux-kernel, Heiko Carstens
In-Reply-To: <1380776250.19002.147.camel@edumazet-glaptop.roam.corp.google.com>

On Wed, Oct 2, 2013 at 9:57 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-10-02 at 21:53 -0700, Eric Dumazet wrote:
>> On Wed, 2013-10-02 at 21:44 -0700, Alexei Starovoitov wrote:
>>
>> > I think ifdef config_x86 is a bit ugly inside struct sk_filter, but
>> > don't mind whichever way.
>>
>> Its not fair to make sk_filter bigger, because it means that simple (non
>> JIT) filter might need an extra cache line.
>>
>> You could presumably use the following layout instead :
>>
>> struct sk_filter
>> {
>>         atomic_t                refcnt;
>>         struct rcu_head         rcu;
>>       struct work_struct      work;
>>
>>         unsigned int            len ____cacheline_aligned;    /* Number of filter blocks */
>>         unsigned int            (*bpf_func)(const struct sk_buff *skb,
>>                                             const struct sock_filter *filter);
>>         struct sock_filter      insns[0];
>> };
>
> And since @len is not used by sk_run_filter() use :
>
> struct sk_filter {
>         atomic_t                refcnt;
>         int                     len; /* number of filter blocks */
>         struct rcu_head         rcu;
>         struct work_struct      work;
>
>         unsigned int            (*bpf_func)(const struct sk_buff *skb,
>                                             const struct sock_filter *filter) ____cacheline_aligned;
>         struct sock_filter      insns[0];
> };

yes. make sense to avoid first insn cache miss inside sk_run_filter()
at the expense
of 8-byte gap between work and bpf_func (on x86_64 w/o lockdep)

Probably even better to overlap work and insns fields.
Pro: sk_filter size the same, no impact on non-jit case
Con: would be harder to understand the code

another problem is that kfree(sk_filter) inside
sk_filter_release_rcu() needs to move inside bpf_jit_free().
so self nack. Let me fix these issues and respin

Thanks
Alexei

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox