Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 5/7] drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c: adjust suspicious bit operation
From: Franky Lin @ 2012-06-06 23:43 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Brett Rudley, kernel-janitors-u79uwXL29TY76Z2rM5mHXA,
	Roland Vossen, Arend van Spriel, Kan Yan, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, joe-6d6DIl74uiNBDgjK7y7TUQ,
	Julia Lawall
In-Reply-To: <1339018901-28439-6-git-send-email-Julia.Lawall-L2FTfq7BK8M@public.gmane.org>

On 06/06/2012 02:41 PM, Julia Lawall wrote:
> From: Julia Lawall<Julia.Lawall-L2FTfq7BK8M@public.gmane.org>
>
> IRQF_TRIGGER_HIGH is 0x00000004, so it seems that&  was intended rather than |.
>
> This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
>
> Signed-off-by: Julia Lawall<julia-dAYI7NvHqcQ@public.gmane.org>

Thanks, Julia. But this has already been fixed by Joe Perches [1] and 
the patch has arrived at Linux wireless tree.

Franky

[1] https://lkml.org/lkml/2012/5/30/482

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH IPROUTE2] ss: Add support for sk_meminfo_backlog
From: Stephen Hemminger @ 2012-06-06 23:23 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: netdev, Eric Dumazet
In-Reply-To: <1339024292-4361-1-git-send-email-subramanian.vijay@gmail.com>

On Wed,  6 Jun 2012 16:11:32 -0700
Vijay Subramanian <subramanian.vijay@gmail.com> wrote:

> This adds the ability to print the backlog length of sockets that is provided by
> recent Linux kernels since commit (d594e987c6 sock_diag: add
> SK_MEMINFO_BACKLOG).
> 
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>

Since the SK_MEMINFO_BACKLOG is not in 3.5 (it is in net-next);
please resubmit this when it is accepted in Linus's tree.

^ permalink raw reply

* [PATCH IPROUTE2] ss: Add support for sk_meminfo_backlog
From: Vijay Subramanian @ 2012-06-06 23:11 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Eric Dumazet, Vijay Subramanian

This adds the ability to print the backlog length of sockets that is provided by
recent Linux kernels since commit (d594e987c6 sock_diag: add
SK_MEMINFO_BACKLOG).

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
 include/linux/sock_diag.h |    1 +
 misc/ss.c                 |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index 39e4b1c..ac9db19 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -18,6 +18,7 @@ enum {
 	SK_MEMINFO_FWD_ALLOC,
 	SK_MEMINFO_WMEM_QUEUED,
 	SK_MEMINFO_OPTMEM,
+	SK_MEMINFO_BACKLOG,
 
 	SK_MEMINFO_VARS,
 };
diff --git a/misc/ss.c b/misc/ss.c
index cf529ef..ea14e2b 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1338,14 +1338,15 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r)
 
 	if (tb[INET_DIAG_SKMEMINFO]) {
 		const __u32 *skmeminfo =  RTA_DATA(tb[INET_DIAG_SKMEMINFO]);
-		printf(" skmem:(r%u,rb%u,t%u,tb%u,f%u,w%u,o%u)",
+		printf(" skmem:(r%u,rb%u,t%u,tb%u,f%u,w%u,o%u,bl%u)",
 			skmeminfo[SK_MEMINFO_RMEM_ALLOC],
 			skmeminfo[SK_MEMINFO_RCVBUF],
 			skmeminfo[SK_MEMINFO_WMEM_ALLOC],
 			skmeminfo[SK_MEMINFO_SNDBUF],
 			skmeminfo[SK_MEMINFO_FWD_ALLOC],
 			skmeminfo[SK_MEMINFO_WMEM_QUEUED],
-			skmeminfo[SK_MEMINFO_OPTMEM]);
+			skmeminfo[SK_MEMINFO_OPTMEM],
+			skmeminfo[SK_MEMINFO_BACKLOG]);
 	}else if (tb[INET_DIAG_MEMINFO]) {
 		const struct inet_diag_meminfo *minfo
 			= RTA_DATA(tb[INET_DIAG_MEMINFO]);
-- 
1.7.0.4

^ permalink raw reply related

* Re: pull-request: can-next 2012-06-04
From: David Miller @ 2012-06-06 22:42 UTC (permalink / raw)
  To: mkl; +Cc: socketcan, netdev, linux-can
In-Reply-To: <4FCFD236.8010002@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 06 Jun 2012 23:57:10 +0200

> I've some patches for net-next which depend on net. When do you merge
> net to net-next?

Periodically, and when specific needs arise such as your's.

I've just merged net into net-next

^ permalink raw reply

* Re: [PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
From: Yuchung Cheng @ 2012-06-06 22:41 UTC (permalink / raw)
  To: Damian Lukowski; +Cc: Jerry Chu, Netdev, David Miller, Ilpo Järvinen
In-Reply-To: <1339014958.4294.2.camel@nexus>

On Wed, Jun 6, 2012 at 1:35 PM, Damian Lukowski
<damian@tvk.rwth-aachen.de> wrote:
> Hi,
>
> backoffs are reverted when an RTO retransmission triggers an ICMP
> destination unreachable along the path towards the target.
> Consider A --- R --- B, where A and B are TCP endpoints, and R is some
> router in between. When the link between R and B breaks for a longer
> time, A will perform RTO retransmissions if there are outstanding ACKs.
> Those packets which arrive at R will hopefully trigger an ICMP response
> back towards A, as R has no more route towards B. The ICMP packet is an
> indication for A, that the retransmission has not been lost because of
> congestion but because of a link outage, and the backoff will be
> reverted for the corresponding TCP session. In the best case, every RTO
> retransmission triggers an ICMP response, so every backoff is reverted,
> and the time between retransmissions remains at the original value.
> If icsk_retransmits is decremented at this point within the original
> code's logic, the connection might never time out. And we cannot take
> tcp_retriesX literally here, as the above scenario would time out after
> tcp_retries2 x base_rto, where the base_rto might be as small as 0.2
> seconds.
so wouldn't upping the threshold counter a (much) simpler solution?

IMO, this is a valid concern and listed in the TODOs of the expired LCD draft.
But it creates a new rule that's not compatible with others in the stack: an RTO
backoff can be reverted but TCP should still timeout based on exp.
backoff rule. But
we can also do linear RTOs and use counter-based checks. End result:
unpredictable
and unclear timeout behavior.

I second Jerry's idea to revert back to simple counter checks.

>
> I am not sure, how an additional counter variable should help. You still
> cannot take tcp_retriesX literally. Besides, I think that changing the
> socket structure is too heavy machinery, isn't it?

^ permalink raw reply

* Re: Reoccuring kern.log events after running xl2tp with ethernet adapter Realtek 8111E
From: Francois Romieu @ 2012-06-06 22:25 UTC (permalink / raw)
  To: Dustin Schumm; +Cc: netdev
In-Reply-To: <CANcVnzjKCn00kF8NG+ZLLrzGCs_jx7WYPKyM7uwrxJ=kVLiQ4Q@mail.gmail.com>

Dustin Schumm <shodid@gmail.com> :
[...]
> Do you see any indication if this is a problem with the mainline
> kernel, distro kernel, driver, or otherwise? It's not definitive, been
> I've been good using kernels 2.6.x and getting errors on everything
> 3.x I have tried. I am looking for some direction to pursue. Thanks.

See http://marc.info/?l=linux-kernel&m=133897670115862&w=2

-- 
Ueimor

^ permalink raw reply

* Re: pull-request: can-next 2012-06-04
From: Marc Kleine-Budde @ 2012-06-06 21:57 UTC (permalink / raw)
  To: David Miller; +Cc: socketcan, netdev, linux-can
In-Reply-To: <20120606.111538.2166268452793199990.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On 06/06/2012 08:15 PM, David Miller wrote:
> From: Oliver Hartkopp <socketcan@hartkopp.net>
> Date: Wed, 06 Jun 2012 20:02:35 +0200
> 
>> i think there was a confusion about
>>
>> 	linux-can and linux-can-next
>>
>> pull requests that overlapped in the mail traffic that day.
>>
>> The pull request of linux-can-next is still pending.
>>
>> git://gitorious.org/linux-can/linux-can-next.git master
> 
> Strange, I thought I had pulled it, I've done so now, thanks.

Thanks.

I've some patches for net-next which depend on net. When do you merge
net to net-next?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH 7/7] drivers/net/can/cc770/cc770_platform.c: adjust suspicious bit operation
From: Marc Kleine-Budde @ 2012-06-06 21:52 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Wolfgang Grandegger, kernel-janitors, linux-can, netdev,
	linux-kernel, joe, Julia Lawall
In-Reply-To: <1339018901-28439-8-git-send-email-Julia.Lawall@lip6.fr>

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

On 06/06/2012 11:41 PM, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall@lip6.fr>
> 
> CPUIF_DSC is 0x40, so it would seem that & is wanted rather than |.
> 
> This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>

Good catch, Joe Perches has already found and fixed that problem. It has
been fixed in:

    dc605db can: cc770: Fix likely misuse of | for &

The commit is currently in David's net/master, and scheduled for the
v3.5 release.

Regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* [PATCH 7/7] drivers/net/can/cc770/cc770_platform.c: adjust suspicious bit operation
From: Julia Lawall @ 2012-06-06 21:41 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: kernel-janitors, Marc Kleine-Budde, linux-can, netdev,
	linux-kernel, joe, Julia Lawall
In-Reply-To: <1339018901-28439-1-git-send-email-Julia.Lawall@lip6.fr>

From: Julia Lawall <Julia.Lawall@lip6.fr>

CPUIF_DSC is 0x40, so it would seem that & is wanted rather than |.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/can/cc770/cc770_platform.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/cc770/cc770_platform.c b/drivers/net/can/cc770/cc770_platform.c
index 53115ee..688371c 100644
--- a/drivers/net/can/cc770/cc770_platform.c
+++ b/drivers/net/can/cc770/cc770_platform.c
@@ -154,7 +154,7 @@ static int __devinit cc770_get_platform_data(struct platform_device *pdev,
 	struct cc770_platform_data *pdata = pdev->dev.platform_data;
 
 	priv->can.clock.freq = pdata->osc_freq;
-	if (priv->cpu_interface | CPUIF_DSC)
+	if (priv->cpu_interface & CPUIF_DSC)
 		priv->can.clock.freq /= 2;
 	priv->clkout = pdata->cor;
 	priv->bus_config = pdata->bcr;

^ permalink raw reply related

* [PATCH 5/7] drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c: adjust suspicious bit operation
From: Julia Lawall @ 2012-06-06 21:41 UTC (permalink / raw)
  To: Brett Rudley
  Cc: kernel-janitors, Roland Vossen, Arend van Spriel,
	Franky (Zhenhui) Lin, Kan Yan, John W. Linville, linux-wireless,
	netdev, linux-kernel, joe, Julia Lawall
In-Reply-To: <1339018901-28439-1-git-send-email-Julia.Lawall@lip6.fr>

From: Julia Lawall <Julia.Lawall@lip6.fr>

IRQF_TRIGGER_HIGH is 0x00000004, so it seems that & was intended rather than |.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
index e2480d1..bed208f 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c
@@ -91,7 +91,7 @@ int brcmf_sdio_intr_register(struct brcmf_sdio_dev *sdiodev)
 
 	/* redirect, configure ane enable io for interrupt signal */
 	data = SDIO_SEPINT_MASK | SDIO_SEPINT_OE;
-	if (sdiodev->irq_flags | IRQF_TRIGGER_HIGH)
+	if (sdiodev->irq_flags & IRQF_TRIGGER_HIGH)
 		data |= SDIO_SEPINT_ACT_HI;
 	brcmf_sdio_regwb(sdiodev, SDIO_CCCR_BRCM_SEPINT, data, &ret);
 

^ permalink raw reply related

* [PATCH] ieee802154: verify packet size before trying to allocate it
From: Sasha Levin @ 2012-06-06 21:32 UTC (permalink / raw)
  To: dbaryshkov-Re5JQEeQqe8AvxtiuMwx3w, slapin-9cOl001CZnBAfugRpC6u6w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Sasha Levin

Currently when sending data over datagram, the send function will attempt to
allocate any size passed on from the userspace.

We should make sure that this size is checked and limited. The maximum size
of an IP packet seemed like the safest limit here.

Signed-off-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 net/ieee802154/dgram.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 6fbb2ad..cf5070b 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -232,6 +232,10 @@ static int dgram_sendmsg(struct kiocb *iocb, struct sock *sk,
 
 	hlen = LL_RESERVED_SPACE(dev);
 	tlen = dev->needed_tailroom;
+	if (hlen + tlen + size > USHRT_MAX) {
+		err = -EMSGSIZE;
+		goto out;
+	}
 	skb = sock_alloc_send_skb(sk, hlen + tlen + size,
 			msg->msg_flags & MSG_DONTWAIT,
 			&err);
-- 
1.7.8.6


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply related

* Re: [v3 net-next PATCH 1/3] Added kernel support in EEE Ethtool commands
From: Ben Hutchings @ 2012-06-06 21:30 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, eilong, peppe.cavallaro
In-Reply-To: <1339009643-22951-2-git-send-email-yuvalmin@broadcom.com>

On Wed, 2012-06-06 at 22:07 +0300, Yuval Mintz wrote:
> This patch extends the kernel's ethtool interface by adding support
> for 2 new EEE commands - get_eee and set_eee.
> 
> Thanks goes to Giuseppe Cavallaro for his original patch adding this support.
> 
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>

but with one tiny change required:

[...]
> +static int ethtool_set_eee(struct net_device *dev, char __user *useraddr)
> +{
> +	struct ethtool_eee edata;
> +
> +	if (!dev->ethtool_ops->get_eee)

This must check set_eee, not get_eee.

Ben.

> +		return -EOPNOTSUPP;
> +
> +	if (copy_from_user(&edata, useraddr, sizeof(edata)))
> +		return -EFAULT;
> +
> +	return dev->ethtool_ops->set_eee(dev, &edata);
> +}
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] net: sierra_net: device IDs for Aircard 320U++
From: Greg KH @ 2012-06-06 21:15 UTC (permalink / raw)
  To: David Miller
  Cc: bjorn-yOkvZcmFvRU, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, dcbw-H+wXaHxf7aLQT0dZR+AlfA,
	linux-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8,
	autif.mlist-Re5JQEeQqe8AvxtiuMwx3w,
	tomas.cassidy-Re5JQEeQqe8AvxtiuMwx3w,
	stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120606.104018.1721491470277229085.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Wed, Jun 06, 2012 at 10:40:18AM -0700, David Miller wrote:
> From: Greg KH <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> Date: Wed, 6 Jun 2012 21:16:33 +0900
> 
> > Ok, thanks for clearing that up, I'll take the serial patch, and I'm
> > sure that David will take this one.
> > 
> > 	Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> 
> I'll apply this, thanks.
> 
> Greg, could you not put that TAB at the beginning of your
> signoffs?  It causes patchwork not to pick them up so I have
> to add it manually.

Sorry, I didn't know that, I'll not do that in the future.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] NFC: Fix possible NULL ptr deref when getting the name of a socket
From: Sasha Levin @ 2012-06-06 21:02 UTC (permalink / raw)
  To: lauro.venancio, aloisio.almeida, sameo, davem, linville
  Cc: linux-wireless, netdev, linux-kernel, Sasha Levin

llcp_sock_getname() might get called before the LLCP socket was created.
This condition isn't checked, and llcp_sock_getname will simply deref a
NULL ptr in that case.

This exists starting with d646960 ("NFC: Initial LLCP support").

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 net/nfc/llcp/sock.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/nfc/llcp/sock.c b/net/nfc/llcp/sock.c
index 3f339b1..17a707d 100644
--- a/net/nfc/llcp/sock.c
+++ b/net/nfc/llcp/sock.c
@@ -292,6 +292,9 @@ static int llcp_sock_getname(struct socket *sock, struct sockaddr *addr,
 
 	pr_debug("%p\n", sk);
 
+	if (llcp_sock == NULL)
+		return -EBADFD;
+
 	addr->sa_family = AF_NFC;
 	*len = sizeof(struct sockaddr_nfc_llcp);
 
-- 
1.7.8.6

^ permalink raw reply related

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 20:43 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric Dumazet, netdev, linux-kernel, virtualization,
	Stephen Hemminger
In-Reply-To: <1339014904.2836.56.camel@bwh-desktop.uk.solarflarecom.com>

On Wed, Jun 06, 2012 at 09:35:04PM +0100, Ben Hutchings wrote:
> On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > Absolutely, I am talking about virtio here.  I'm not kicking
> > > > u64_stats_sync idea I am just saying that simple locking
> > > > would work for virtio and might be better as it
> > > > gives us a way to get counters atomically.
> > > 
> > > Which lock do you own in the RX path ?
> > 
> > We can just disable napi, everything is updated from napi callback.
> 
> Seriously, though: don't do that; this is going to hurt performance for
> minimal benefit.
> 
> Ben.

Yea, it doesn't work anyway. Maybe take a xmit lock for tx and keep
using the per-cpu counters for rx. Or does this sound too disruptive
too?

> > > You'll have to add a lock in fast path. This sounds really a bad choice
> > > to me.
> > 
> > .ndo_get_stats64 is not data path though, is it?
> > 
> 
> -- 
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 20:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1339014259.26966.70.camel@edumazet-glaptop>

On Wed, 2012-06-06 at 22:24 +0200, Eric Dumazet wrote:

> (ndo_get_stats64() is not allowed to sleep, and I cant see how you are
> going to disable napi without sleeping)
> 
> 

In case you wonder, take a look at bond_get_stats() in
drivers/net/bonding/bond_main.c

^ permalink raw reply

* Re: Change in alloc_skb() behavior in 3.2+ kernels?
From: Grant Edwards @ 2012-06-06 20:35 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1339014685.26966.73.camel@edumazet-glaptop>

On 2012-06-06, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2012-06-06 at 20:24 +0000, Grant Edwards wrote:
>
>> Is skb_tailroom() guaranteed to be >= the requested size?
>> 
>
> Of course, but only right after alloc_skb().

That's good enough.
     
-- 
Grant Edwards               grant.b.edwards        Yow! Okay ... I'm going
                                  at               home to write the "I HATE
                              gmail.com            RUBIK's CUBE HANDBOOK FOR
                                                   DEAD CAT LOVERS" ...

^ permalink raw reply

* Re: [PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
From: Damian Lukowski @ 2012-06-06 20:35 UTC (permalink / raw)
  To: Jerry Chu; +Cc: Netdev, David Miller, Ilpo Järvinen
In-Reply-To: <CAPshTCjEkyLg+BdYvA3vW4C92rAyfuC_mVEDgrbRz4NDyGh9Ug@mail.gmail.com>

Am Dienstag, den 05.06.2012, 14:22 -0700 schrieb Jerry Chu:
> On Tue, Jun 5, 2012 at 11:39 AM, Damian Lukowski
> <damian@tvk.rwth-aachen.de> wrote:
> > Am Dienstag, den 05.06.2012, 10:42 -0700 schrieb Jerry Chu:
> >> On Mon, Jun 4, 2012 at 4:50 PM, Jerry Chu <hkchu@google.com> wrote:
> >> > Hi Damian,
> >> >
> >> > On Mon, Jun 4, 2012 at 10:50 AM, Damian Lukowski
> >> > <damian@tvk.rwth-aachen.de> wrote:
> >> >> Hi Jerry,
> >> >>
> >> >> please verify, I understood you correctly.
> >> >>
> >> >> You have set TCP_RTO_MIN to a lower value, e.g. 0.002 seconds to improve
> >> >> your internal low-latency traffic. Because of the improvement, R1
> >> >> timeouts are triggered too fast for external high-RTT traffic. Is that
> >> >> correct?
> >> >
> >> > Correct.
> >> >
> >> >> If so, may I suggest to set tcp_retries1 to a higher value? For
> >> >> TCP_RTO_MIN == 0.002 and tcp_retries1 ==  10, R1 will be calculated to
> >> >> approximately 4 seconds.
> >> >
> >> > I think hacking tcp_retries1 is the wrong solution. E.g., 10 retries may be too
> >> > generous for those short RTT flows.
> >> >
> >> > I think the fundamental problem is - the ideal fix for your original RTO revert
> >> > problem should've used the per-flow RTO to compute R1 & R2. But that
> >> > computation may be too expensive so you used TCP_RTO_MIN as an
> >> > approximation - not a good idea IMHO!
> >>
> >> Just realized the correct fix of using the original, non-backoff per flow RTO is
> >> not any more expensive than the current code through ilog2(). What's needed
> >> is a new field "base_rto" to record the original RTO before backoff. I'm leaning
> >> toward this more accurate fix now without any fudge because fudging almost
> >> always causes bugs.
> >
> >
> > The current version of retransmits_timed_out() uses such a field
> > already. I suppose, we can do a combination like the following?
> >
> > -       unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
> > +       unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : __tcp_set_rto(tcp_sk(sk));
> 
> Yes that could work and we probably don't need a new field for the original RTO.
> 
> But I started wondering what the problem you tried to solve initially. The old
> counter (icsk_retransmits) based code was really easy to understand, debug, and
> matched well with the API (sysctl_tcp_retries1, sysctl_tcp_retries2,
> TCP_SYNCNT,...), which are all counter based. Moreover, my simple brain has
> a strong prejudice against complex code unless the complexity is justified.
> 
> Could you point out where backoff revert might happen? (tcp_v4_err() when
> handing ICMP errors?) And for those cases is it possible to either not increment
> icsk_retransmits (as long as it won't get us into infinite
> retransmissions), or invent
> a separate field for the sole purpose of timeout check? Won't that be
> much simpler than your current fix?

Hi,

backoffs are reverted when an RTO retransmission triggers an ICMP
destination unreachable along the path towards the target.
Consider A --- R --- B, where A and B are TCP endpoints, and R is some
router in between. When the link between R and B breaks for a longer
time, A will perform RTO retransmissions if there are outstanding ACKs.
Those packets which arrive at R will hopefully trigger an ICMP response
back towards A, as R has no more route towards B. The ICMP packet is an
indication for A, that the retransmission has not been lost because of
congestion but because of a link outage, and the backoff will be
reverted for the corresponding TCP session. In the best case, every RTO
retransmission triggers an ICMP response, so every backoff is reverted,
and the time between retransmissions remains at the original value.
If icsk_retransmits is decremented at this point within the original
code's logic, the connection might never time out. And we cannot take
tcp_retriesX literally here, as the above scenario would time out after
tcp_retries2 x base_rto, where the base_rto might be as small as 0.2
seconds.

I am not sure, how an additional counter variable should help. You still
cannot take tcp_retriesX literally. Besides, I think that changing the
socket structure is too heavy machinery, isn't it?

Regards
 Damian

> 
> Best,
> 
> Jerry
> 
> > +       rto_base = rto_base ? : TCP_RTO_MIN;
> >
> > -       if (!inet_csk(sk)->icsk_retransmits)
> > +       if (inet_csk(sk)->icsk_retransmits < boundary)
> >
> >
> > Regards
> >  Damian
> >
> >>
> >> Any comment is welcome. I'm not sure in the existing code if it makes sense
> >> to apply the exponential backoff based computation to thin stream but it's a
> >> separate question so I won't touch it.
> >>
> >> Jerry
> >>
> >> >
> >> > The easiest solution I can see so far is to replace the check
> >> >
> >> > if (!inet_csk(sk)->icsk_retransmits)
> >> >                return false;
> >> >
> >> > at the beginning of retransmits_timed_out() with
> >> >
> >> > if (inet_csk(sk)->icsk_retransmits < boundary)
> >> >                return false;
> >> >
> >> > Best,
> >> >
> >> > Jerry
> >> >
> >> >>
> >> >> Is that ok?
> >> >>
> >> >> Best regards
> >> >>  Damian
> >> >>
> >> >> Am Freitag, den 01.06.2012, 15:58 -0700 schrieb Jerry Chu:
> >> >>> > From: Damian Lukowski <damian@tvk.rwth-aachen.de>
> >> >>> > Date: Wed, Aug 26, 2009 at 3:16 AM
> >> >>> > Subject: [PATCH 3/3] Revert Backoff [v3]: Calculate TCP's connection close
> >> >>> > threshold as a time value.
> >> >>> > To: Netdev <netdev@vger.kernel.org>
> >> >>> >
> >> >>> >
> >> >>> > RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
> >> >>> > which may represent a number of allowed retransmissions or a timeout value.
> >> >>> > Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
> >> >>> > in number of allowed retransmissions.
> >> >>> >
> >> >>> > For any desired threshold R2 (by means of time) one can specify tcp_retries2
> >> >>> > (by means of number of retransmissions) such that TCP will not time out
> >> >>> > earlier than R2. This is the case, because the RTO schedule follows a fixed
> >> >>> > pattern, namely exponential backoff.
> >> >>> >
> >> >>> > However, the RTO behaviour is not predictable any more if RTO backoffs can
> >> >>> > be
> >> >>> > reverted, as it is the case in the draft
> >> >>> > "Make TCP more Robust to Long Connectivity Disruptions"
> >> >>> > (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).
> >> >>> >
> >> >>> > In the worst case TCP would time out a connection after 3.2 seconds, if the
> >> >>> > initial RTO equaled MIN_RTO and each backoff has been reverted.
> >> >>> >
> >> >>> > This patch introduces a function retransmits_timed_out(N),
> >> >>> > which calculates the timeout of a TCP connection, assuming an initial
> >> >>> > RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.
> >> >>> >
> >> >>> > Whenever timeout decisions are made by comparing the retransmission counter
> >> >>> > to some value N, this function can be used, instead.
> >> >>> >
> >> >>> > The meaning of tcp_retries2 will be changed, as many more RTO
> >> >>> > retransmissions
> >> >>> > can occur than the value indicates. However, it yields a timeout which is
> >> >>> > similar to the one of an unpatched, exponentially backing off TCP in the
> >> >>> > same
> >> >>> > scenario. As no application could rely on an RTO greater than MIN_RTO, there
> >> >>> > should be no risk of a regression.
> >> >>>
> >> >>> This looks like a typical "fix one problem, introducing a few more" patch :(.
> >> >>> What do you mean by "no application could rely on an RTO greater than
> >> >>> MIN_RTO..."
> >> >>> above? How can you make the assumption that RTO is not too far off
> >> >>> from TCP_RTO_MIN?
> >> >>>
> >> >>> While you tried to address a problem where the retransmission count
> >> >>> was high but the actual
> >> >>> timeout duration was too short, have you considered the other case
> >> >>> around, i.e., the timeout
> >> >>> duration is long but the retransmission count is too short? This is
> >> >>> exactly what's happening
> >> >>> to us with your patch. We've much reduced TCP_RTO_MIN for our internal
> >> >>> traffic, but not
> >> >>> noticing your change has severely shortened the R1 & R2 recommended by
> >> >>> RFC1122 for our
> >> >>> long haul traffic until now. In many cases R1 threshold was met upon
> >> >>> the first retrans timeout.
> >> >>>
> >> >>> I think retransmits_timed_out() should check against both time
> >> >>> duration and retrans count
> >> >>> (icsk_retransmits).
> >> >>>
> >> >>> Thought?
> >> >>>
> >> >>> Jerry
> >> >>>
> >> >>> >
> >> >>> > Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
> >> >>> > ---
> >> >>> >  include/net/tcp.h    |   18 ++++++++++++++++++
> >> >>> >  net/ipv4/tcp_timer.c |   11 +++++++----
> >> >>> >  2 files changed, 25 insertions(+), 4 deletions(-)
> >> >>> >
> >> >>> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> >> >>> > index c35b329..17d1a88 100644
> >> >>> > --- a/include/net/tcp.h
> >> >>> > +++ b/include/net/tcp.h
> >> >>> > @@ -1247,6 +1247,24 @@ static inline struct sk_buff
> >> >>> > *tcp_write_queue_prev(struct sock *sk, struct sk_bu
> >> >>> >  #define tcp_for_write_queue_from_safe(skb, tmp, sk)                    \
> >> >>> >        skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
> >> >>> >
> >> >>> > +static inline bool retransmits_timed_out(const struct sock *sk,
> >> >>> > +                                        unsigned int boundary)
> >> >>> > +{
> >> >>> > +       int limit, K;
> >> >>> > +       if (!inet_csk(sk)->icsk_retransmits)
> >> >>> > +               return false;
> >> >>> > +
> >> >>> > +       K = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
> >> >>> > +
> >> >>> > +       if (boundary <= K)
> >> >>> > +               limit = ((2 << boundary) - 1) * TCP_RTO_MIN;
> >> >>> > +       else
> >> >>> > +               limit = ((2 << K) - 1) * TCP_RTO_MIN +
> >> >>> > +                       (boundary - K) * TCP_RTO_MAX;
> >> >>> > +
> >> >>> > +       return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= limit;
> >> >>> > +}
> >> >>> > +
> >> >>> >  static inline struct sk_buff *tcp_send_head(struct sock *sk)
> >> >>> >  {
> >> >>> >        return sk->sk_send_head;
> >> >>> > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> >> >>> > index a3ba494..2972d7b 100644
> >> >>> > --- a/net/ipv4/tcp_timer.c
> >> >>> > +++ b/net/ipv4/tcp_timer.c
> >> >>> > @@ -137,13 +137,14 @@ static int tcp_write_timeout(struct sock *sk)
> >> >>> >  {
> >> >>> >        struct inet_connection_sock *icsk = inet_csk(sk);
> >> >>> >        int retry_until;
> >> >>> > +       bool do_reset;
> >> >>> >
> >> >>> >        if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
> >> >>> >                if (icsk->icsk_retransmits)
> >> >>> >                        dst_negative_advice(&sk->sk_dst_cache);
> >> >>> >                retry_until = icsk->icsk_syn_retries ? :
> >> >>> > sysctl_tcp_syn_retries;
> >> >>> >        } else {
> >> >>> > -               if (icsk->icsk_retransmits >= sysctl_tcp_retries1) {
> >> >>> > +               if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
> >> >>> >                        /* Black hole detection */
> >> >>> >                        tcp_mtu_probing(icsk, sk);
> >> >>> >
> >> >>> > @@ -155,13 +156,15 @@ static int tcp_write_timeout(struct sock *sk)
> >> >>> >                        const int alive = (icsk->icsk_rto < TCP_RTO_MAX);
> >> >>> >
> >> >>> >                        retry_until = tcp_orphan_retries(sk, alive);
> >> >>> > +                       do_reset = alive ||
> >> >>> > +                                  !retransmits_timed_out(sk, retry_until);
> >> >>> >
> >> >>> > -                       if (tcp_out_of_resources(sk, alive ||
> >> >>> > icsk->icsk_retransmits < retry_until))
> >> >>> > +                       if (tcp_out_of_resources(sk, do_reset))
> >> >>> >                                return 1;
> >> >>> >                }
> >> >>> >        }
> >> >>> >
> >> >>> > -       if (icsk->icsk_retransmits >= retry_until) {
> >> >>> > +       if (retransmits_timed_out(sk, retry_until)) {
> >> >>> >                /* Has it gone just too far? */
> >> >>> >                tcp_write_err(sk);
> >> >>> >                return 1;
> >> >>> > @@ -385,7 +388,7 @@ void tcp_retransmit_timer(struct sock *sk)
> >> >>> >  out_reset_timer:
> >> >>> >        icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
> >> >>> >        inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto,
> >> >>> > TCP_RTO_MAX);
> >> >>> > -       if (icsk->icsk_retransmits > sysctl_tcp_retries1)
> >> >>> > +       if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
> >> >>> >                __sk_dst_reset(sk);
> >> >>> >
> >> >>> >  out:;
> >> >>> > --
> >> >>> > 1.6.3.3
> >> >>> >
> >> >>> > --
> >> >>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >>> > the body of a message to majordomo@vger.kernel.org
> >> >>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>> >
> >> >>
> >> >>
> >
> >

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Ben Hutchings @ 2012-06-06 20:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, netdev, linux-kernel, virtualization,
	Stephen Hemminger
In-Reply-To: <20120606201620.GA23358@redhat.com>

On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> > 
> > > Absolutely, I am talking about virtio here.  I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> > 
> > Which lock do you own in the RX path ?
> 
> We can just disable napi, everything is updated from napi callback.

Seriously, though: don't do that; this is going to hurt performance for
minimal benefit.

Ben.

> > You'll have to add a lock in fast path. This sounds really a bad choice
> > to me.
> 
> .ndo_get_stats64 is not data path though, is it?
> 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Change in alloc_skb() behavior in 3.2+ kernels?
From: Eric Dumazet @ 2012-06-06 20:31 UTC (permalink / raw)
  To: Grant Edwards; +Cc: netdev
In-Reply-To: <jqoe90$fs3$2@dough.gmane.org>

On Wed, 2012-06-06 at 20:24 +0000, Grant Edwards wrote:

> Is skb_tailroom() guaranteed to be >= the requested size?
> 

Of course, but only right after alloc_skb().

^ permalink raw reply

* Re: Change in alloc_skb() behavior in 3.2+ kernels?
From: Grant Edwards @ 2012-06-06 20:26 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1339011742.26966.44.camel@edumazet-glaptop>

On 2012-06-06, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2012-06-06 at 11:51 -0700, David Miller wrote:
>> From: Grant Edwards <grant.b.edwards@gmail.com>
>> Date: Wed, 6 Jun 2012 18:32:57 +0000 (UTC)
>> 
>> > The kernel module that's started failing fills the allocated sk_buff
>> > until tailroom() indicates it is full and then sends it.  The problem
>> > is that sending a packet with a length of 1850 won't work (it's a
>> > MAC-layer Ethernet packet).
>> 
>> The amount of tailroom an SKB has is implementation dependent.
>> 
>> It's incredibly poor form to rely upon it to determine whether a
>> fully sized frame has been constructed or not.
>> 
>> Please fix the code that does this.
>
> By the way, we had a similar problem, and the fix was :
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=a21d45726acacc963d8baddf74607d9b74e2b723
>
> Grant, depending on the context, you might use skb->avail_size and
> skb_availroom() as well.
>
> Beware skb->avail_size is unioned with skb->{mark|dropcount}

Thanks for the pointer.

-- 
Grant Edwards               grant.b.edwards        Yow! ANN JILLIAN'S HAIR
                                  at               makes LONI ANDERSON'S
                              gmail.com            HAIR look like RICARDO
                                                   MONTALBAN'S HAIR!

^ permalink raw reply

* Re: pull request: wireless 2012-06-06
From: David Miller @ 2012-06-06 20:29 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20120606183613.GD2338@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 6 Jun 2012 14:36:13 -0400

> Here is a batch of wireless/bluetooth fixes intended for 3.5...
 ...
> Please let me know if there are problems!

Pulled, thanks for the detailed rundown, it helps.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 20:25 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Michael S. Tsirkin, netdev, linux-kernel, virtualization,
	Stephen Hemminger
In-Reply-To: <1339013979.2836.52.camel@bwh-desktop.uk.solarflarecom.com>

On Wed, 2012-06-06 at 21:19 +0100, Ben Hutchings wrote:
> On Wed, 2012-06-06 at 22:08 +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> > 
> > > Absolutely, I am talking about virtio here.  I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> > 
> > Which lock do you own in the RX path ?
> > 
> > You'll have to add a lock in fast path. This sounds really a bad choice
> > to me.
> 
> You have the NAPI 'lock', so when gathering stats you can synchronise
> using napi_disable() ;-)

Nice, this adds one new bug in network stack.

Really guys, can we stop this thread, please ?

^ permalink raw reply

* Re: Change in alloc_skb() behavior in 3.2+ kernels?
From: Grant Edwards @ 2012-06-06 20:24 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20120606.131703.870120997635192180.davem@davemloft.net>

On 2012-06-06, David Miller <davem@davemloft.net> wrote:
> From: Grant Edwards <grant.b.edwards@gmail.com>
> Date: Wed, 6 Jun 2012 19:01:40 +0000 (UTC)
>
>> That's what I'll do as soon as I can find a definition of what the API
>> for alloc_skb() actually _is_.  It has clearly changed in the past few
>> years.
>
> And it will continue to change.  There are no stable APIs inside of
> the kernel, none.

Oh, I know (as does anybody who has maintained a driver for more than
a few months).  Right now, I'm just trying to find out what the
current API for alloc_skb() is.

Is skb_tailroom() guaranteed to be >= the requested size?

-- 
Grant Edwards               grant.b.edwards        Yow! I'm wet!  I'm wild!
                                  at               
                              gmail.com            

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 20:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606201620.GA23358@redhat.com>

On Wed, 2012-06-06 at 23:16 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 10:08:09PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 22:58 +0300, Michael S. Tsirkin wrote:
> > 
> > > Absolutely, I am talking about virtio here.  I'm not kicking
> > > u64_stats_sync idea I am just saying that simple locking
> > > would work for virtio and might be better as it
> > > gives us a way to get counters atomically.
> > 
> > Which lock do you own in the RX path ?
> 
> We can just disable napi, everything is updated from napi callback.

This is very disruptive, and illegal from ndo_get_stats64()

(ndo_get_stats64() is not allowed to sleep, and I cant see how you are
going to disable napi without sleeping)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox