Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net-next 3/5] be2net: fix erx->rx_drops_no_frags wrap around
From: Eric Dumazet @ 2011-08-22 12:44 UTC (permalink / raw)
  To: Sathya.Perla; +Cc: netdev
In-Reply-To: <3367B80B08154D42A3B2BC708B5D41F642D8EDB85A@EXMAIL.ad.emulex.com>

Le lundi 22 août 2011 à 05:15 -0700, Sathya.Perla@Emulex.Com a écrit :
> >-----Original Message-----
> >From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >Sent: Monday, August 22, 2011 5:18 PM
> >
> >Le lundi 22 août 2011 à 13:43 +0200, Eric Dumazet a écrit :
> >
> >> This adds a race for lockless SNMP readers (in be_get_stats64())
> >>
> >> They can see the 32bit value going backward.
> >>
> >> You have to be very careful to read the *acc once, and write it once.
> >
> >The "write once" is the only requirement.
> >
> >Something like :
> >
> >u32 newval = hi(*acc) + val;
> >
> >if (wrapped)
> >	newval += 65536;
> >ACCESS_ONCE(*acc) = newval;
> >
> 
> Eric,
> 
> I'm wondering why you'd need ACCESS_ONCE() here? Wouldn't using the temp variable (newval) as you suggested,
> ensure that the reader doesn't see a half-baked value?
> 

If you write :

u32 newval = hi(*acc) + val;
if (wrapped)
	newval += 65536;
*acc = newval;


C compiler (gcc) is free to eliminate the temp variable and to generate
same code than :

*acc = hi(*acc) + val;
if (wrapped)
	*acc += 65536;

ACCESS_ONCE() here is the clean way (and self explanatory/documented) to
keep compiler to write to *acc twice.




^ permalink raw reply

* Re: net_device leak after "bridge: allow creating bridge devices with netlink"?
From: Eric Dumazet @ 2011-08-22 12:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: shemminger, davem, netdev
In-Reply-To: <4E525ADA020000780005278A@nat28.tlf.novell.com>

Le lundi 22 août 2011 à 12:34 +0100, Jan Beulich a écrit :
> Stephen,
> 
> directly returning the result of register_netdev() from br_add_bridge()
> would - to me - appear to be leaking "dev" in case of failure. Or am I
> overlooking some implicit mechanism by which this would get freed?
> 
> Thanks, Jan

You're 100 % right, there is a leak here.

(But "ip" or "brctl" commands first check link already exist before
trying create it, so you need to tweak them to actually hit this bug)

We have to add a free_netdev() call to free the net_device.



diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 2cdf007..e738154 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -231,6 +231,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
 int br_add_bridge(struct net *net, const char *name)
 {
 	struct net_device *dev;
+	int res;
 
 	dev = alloc_netdev(sizeof(struct net_bridge), name,
 			   br_dev_setup);
@@ -240,7 +241,10 @@ int br_add_bridge(struct net *net, const char *name)
 
 	dev_net_set(dev, net);
 
-	return register_netdev(dev);
+	res = register_netdev(dev);
+	if (res)
+		free_netdev(dev);
+	return res;
 }
 
 int br_del_bridge(struct net *net, const char *name)



^ permalink raw reply related

* (unknown)
From: sohanes @ 2011-08-22 13:18 UTC (permalink / raw)




^ permalink raw reply

* Re: [PATCH] fix pushed_seq in keepalive / zero window probe timer
From: Li Yu @ 2011-08-22 13:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, davem, ilpo.jarvinen
In-Reply-To: <1314012322.2307.16.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

2011/8/22 Eric Dumazet <eric.dumazet@gmail.com>:
> Le lundi 22 août 2011 à 14:57 +0800, Li Yu a écrit :
>> In tcp_write_wakeup(), we may split the probe segment since send window or mss is larger than it.
>> so I think that we should update tp->pushed_seq after tcp_fragment(), is it right? thanks.
>>
>
> I am not sure what you want to fix.
>
> Fact is the skb has the TCPHDR_PSH, even if tcp_transmit_skb() (or
> tcp_fragment()) is/are not able to clone/split it.
>
>

Indeed, you are right. The goal of this patch is to fix tp->pushed_seq
may point to unsent byte seq number.

>> Signed-off-by: Li Yu <raise.sail@gmail.com>
>> CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>> CC: David S. Miller <davem@davemloft.net>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 882e0b0..659a71f 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -2784,9 +2784,6 @@ int tcp_write_wakeup(struct sock *sk)
>>               unsigned int mss = tcp_current_mss(sk);
>>               unsigned int seg_size = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
>>
>> -             if (before(tp->pushed_seq, TCP_SKB_CB(skb)->end_seq))
>> -                     tp->pushed_seq = TCP_SKB_CB(skb)->end_seq;
>> -
>>               /* We are probing the opening of a window
>>                * but the window size is != 0
>>                * must have been a result SWS avoidance ( sender )
>> @@ -2803,8 +2800,11 @@ int tcp_write_wakeup(struct sock *sk)
>>               TCP_SKB_CB(skb)->flags |= TCPHDR_PSH;
>>               TCP_SKB_CB(skb)->when = tcp_time_stamp;
>>               err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
>> -             if (!err)
>> +             if (!err) {
>>                       tcp_event_new_data_sent(sk, skb);
>> +                     if (before(tp->pushed_seq, TCP_SKB_CB(skb)->end_seq))
>> +                             tp->pushed_seq = TCP_SKB_CB(skb)->end_seq;
>> +             }
>>               return err;
>>       } else {
>>               if (between(tp->snd_up, tp->snd_una + 1, tp->snd_una + 0xFFFF))
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

^ permalink raw reply

* [PATCH] stmmac: remove the STBus bridge setting from the GMAC code
From: Giuseppe CAVALLARO @ 2011-08-22 13:58 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro

This patch removes a piece of code (actually commented)
only useful for some ST platforms in the past.

This kind of setting now can be done by using the platform
callbacks provided in linux/stmmac.h (see the stmmac.txt for
further details).

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/stmmac/dwmac1000_core.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/stmmac/dwmac1000_core.c b/drivers/net/stmmac/dwmac1000_core.c
index 0f63b3c..eea184a 100644
--- a/drivers/net/stmmac/dwmac1000_core.c
+++ b/drivers/net/stmmac/dwmac1000_core.c
@@ -37,9 +37,6 @@ static void dwmac1000_core_init(void __iomem *ioaddr)
 	value |= GMAC_CORE_INIT;
 	writel(value, ioaddr + GMAC_CONTROL);
 
-	/* STBus Bridge Configuration */
-	/*writel(0xc5608, ioaddr + 0x00007000);*/
-
 	/* Freeze MMC counters */
 	writel(0x8, ioaddr + GMAC_MMC_CTRL);
 	/* Mask GMAC interrupts */
-- 
1.7.4.4


^ permalink raw reply related

* Re: [RFC] bridge: add netfilter hook for forwarding 802.1D group addresses
From: David Lamparter @ 2011-08-22 13:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Lamparter, Nick Carter, Ed Swierk, netdev, bridge,
	netfilter-devel, David Miller
In-Reply-To: <20110819135810.1a529ab2@nehalam.ftrdhcpuser.net>

On Fri, Aug 19, 2011 at 01:58:10PM -0700, Stephen Hemminger wrote:
> The IEEE standard expects that link local multicast packets will not
> be forwarded by a bridge. But there are cases like 802.1X which may
> require that packets be forwarded. For maximum flexibilty implement
> this via netfilter.
> 
> The netfilter chain is slightly different from other chains in that
> if packet is ACCEPTED by the chain, it means it should be forwarded.
> And if the packet verdict result is DROP, the packet is processed
> as a local packet.

Exactly this functionality already exists by way of the BROUTING chain
in the broute table. Currently, link-local packets are hardcodedly
treated as local before they even reach that chain. Nick's patch, in
conjunction with BROUTING, provides exactly what you're trying to do.

Now, without bridge netfilter, your patch becomes rather useless while
Nick's patch still allows per-group (and therefore per-protocol)
control.

Further, Nick's patch is considerably less intrusive.

I would therefore ask for Nick's patch to be merged.


-David


P.S.: this whole issue is starting to get rather annoying

^ permalink raw reply

* [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Julia Lawall @ 2011-08-22 14:00 UTC (permalink / raw)
  To: Bing Zhao
  Cc: kernel-janitors-u79uwXL29TY76Z2rM5mHXA, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

From: Julia Lawall <julia-dAYI7NvHqcQ@public.gmane.org>

Test the just-initialized value rather than some other one.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier x,y,f!={PTR_ERR,ERR_PTR,ERR_CAST};
statement S;
@@

x = f(...);
(
if (\(x == NULL\|IS_ERR(x)\)) S
|
*if (\(y == NULL\|IS_ERR(y)\))
 { ... when != x
   return ...; }
)
// </smpl>

Signed-off-by: Julia Lawall <julia-dAYI7NvHqcQ@public.gmane.org>

---
 drivers/net/wireless/mwifiex/scan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mwifiex/scan.c b/drivers/net/wireless/mwifiex/scan.c
index b28241c..d3111c9 100644
--- a/drivers/net/wireless/mwifiex/scan.c
+++ b/drivers/net/wireless/mwifiex/scan.c
@@ -1480,7 +1480,7 @@ mwifiex_update_curr_bss_params(struct mwifiex_private *priv,
 		return -ENOMEM;
 	}
 	beacon_ie = kzalloc(ie_len, GFP_KERNEL);
-	if (!bss_desc) {
+	if (!beacon_ie) {
 		dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
 		return -ENOMEM;
 	}

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Pierre Louis Aublin @ 2011-08-22 14:11 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Bing Zhao, kernel-janitors, John W. Linville, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <1314021636-11528-4-git-send-email-julia@diku.dk>

Hello all

On 08/22/2011 04:00 PM, Julia Lawall wrote:
> -	if (!bss_desc) {
> +	if (!beacon_ie) {
>   		dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
Shouldn't we also modify the error message, from "failed to alloc 
bss_desc" to "failed to alloc beacon_ie" ?

Sincerely
Pierre Louis Aublin

^ permalink raw reply

* Re: [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Julia Lawall @ 2011-08-22 14:12 UTC (permalink / raw)
  To: Pierre Louis Aublin
  Cc: Bing Zhao, kernel-janitors, John W. Linville, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <4E52638C.8080407@inria.fr>

On Mon, 22 Aug 2011, Pierre Louis Aublin wrote:

> Hello all
> 
> On 08/22/2011 04:00 PM, Julia Lawall wrote:
> > -	if (!bss_desc) {
> > +	if (!beacon_ie) {
> >     dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
> Shouldn't we also modify the error message, from "failed to alloc bss_desc" to
> "failed to alloc beacon_ie" ?

Sure :)

julia

^ permalink raw reply

* [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Julia Lawall @ 2011-08-22 14:16 UTC (permalink / raw)
  To: Pierre Louis Aublin
  Cc: Bing Zhao, kernel-janitors-u79uwXL29TY76Z2rM5mHXA,
	John W. Linville, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4E52638C.8080407-MZpvjPyXg2s@public.gmane.org>

From: Julia Lawall <julia-dAYI7NvHqcQ@public.gmane.org>

Test the just-initialized value rather than some other one.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier x,y,f!={PTR_ERR,ERR_PTR,ERR_CAST};
statement S;
@@

x = f(...);
(
if (\(x == NULL\|IS_ERR(x)\)) S
|
*if (\(y == NULL\|IS_ERR(y)\))
 { ... when != x
   return ...; }
)
// </smpl>

Signed-off-by: Julia Lawall <julia-dAYI7NvHqcQ@public.gmane.org>

---
 drivers/net/wireless/mwifiex/scan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/scan.c b/drivers/net/wireless/mwifiex/scan.c
index b28241c..37ca2f9 100644
--- a/drivers/net/wireless/mwifiex/scan.c
+++ b/drivers/net/wireless/mwifiex/scan.c
@@ -1480,8 +1480,8 @@ mwifiex_update_curr_bss_params(struct mwifiex_private *priv,
 		return -ENOMEM;
 	}
 	beacon_ie = kzalloc(ie_len, GFP_KERNEL);
-	if (!bss_desc) {
-		dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
+	if (!beacon_ie) {
+		dev_err(priv->adapter->dev, " failed to alloc beacon_ie\n");
 		return -ENOMEM;
 	}
 	memcpy(beacon_ie, ie_buf, ie_len);
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [omega-g1:11072] Re: [PATCH] net: configurable sysctl parameter  "net.core.tcp_lowat" for sk_stream_min_wspace()
From: Hagen Paul Pfeifer @ 2011-08-22 14:21 UTC (permalink / raw)
  To: jun.kondo
  Cc: David Miller, linux-kernel, omega-g1, notsuki, motokazu.kozaki,
	htaira, netdev, tomohiko.takahashi, kotaro.sakai, ken.sugawara
In-Reply-To: <4E51A3F0.5010500@ctc-g.co.jp>


On Mon, 22 Aug 2011 09:33:52 +0900, "Jun.Kondo" wrote:

> By using this patch, we want to prevent "timeout occured over the

network

> that is low throughput but available".

> 

> But in the current implementation, both blocking and non-blocking,

> user processes can't recognize the reason in detail

> when failed to write to socket buffer, we think.



For your application it should not matter WHY the data can be written to

the peer. It can be happened that the peer close the window, some

scheduling bottleneck or whatever else. A blocking socket means for you

that some data is in the pipe, waiting for transmit. This is the knowledge

that you require, and you should deal with it. A blocking socket does not

mean FAILED, a failure is returned via ECONNRESET or otherwise. So

everything is fine when your socket blocks. Probably you should adjust your

Apache timeouts or other parts of the program logic.



> As stated above, we think it is difficult for user processes to handle

> timeout of writing socket buffer,

> when wmem is configured large value.(to ensure high throughput over the

> high ralency network, like 3G).



No, you should adjust your code and account that the socket has data in

the pipe. That's all.



Changing tcp_lowat

^ permalink raw reply

* Re: [OT]Any open source community in wireless network?
From: hz hanks @ 2011-08-22 14:25 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Sven-Haegar Koch, netdev
In-Reply-To: <20110821114041.c9bae50d.rdunlap@xenotime.net>

Thank you for your advices!

But these communities still concentrate on the network applications in
linux. Is there any community mainly about the network itself, such as
how mac layer organized and its protocols?

Thanks very much!

2011/8/22 Randy Dunlap <rdunlap@xenotime.net>:
> On Sun, 21 Aug 2011 19:20:32 +0200 (CEST) Sven-Haegar Koch wrote:
>
>> On Sun, 21 Aug 2011, hz hanks wrote:
>>
>> > I am looking for some open source communities in wireless network.
>> > This mailing list mainly concentrated on computer networks such as
>> > TCP, but my  interests are mainly on embedded wireless system and
>> > mobile phone system. So is there any kind person could provide me with
>> > some open source communities which mainly concentrate on those topics?
>> > Thanks a lot.
>>
>> Like:
>> linux-wireless@vger.kernel.org  (linux wireless driver development)
>> http://www.openwrt.org  (embedded linux distribution, for accesspoints
>> and more)
>> http://www.android.com  (smartphone foobar)
>
> http://wireless.kernel.org/  (Linux Wireless wiki)
>
> ---
> ~Randy
>

^ permalink raw reply

* Re: [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Larry Finger @ 2011-08-22 15:19 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Bing Zhao, kernel-janitors, John W. Linville, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <1314021636-11528-4-git-send-email-julia@diku.dk>

On 08/22/2011 09:00 AM, Julia Lawall wrote:
> From: Julia Lawall<julia@diku.dk>
>
> Test the just-initialized value rather than some other one.
>
> The semantic match that finds this problem is as follows:
> (http://coccinelle.lip6.fr/)
>
> //<smpl>
> @r@
> identifier x,y,f!={PTR_ERR,ERR_PTR,ERR_CAST};
> statement S;
> @@
>
> x = f(...);
> (
> if (\(x == NULL\|IS_ERR(x)\)) S
> |
> *if (\(y == NULL\|IS_ERR(y)\))
>   { ... when != x
>     return ...; }
> )
> //</smpl>
>
> Signed-off-by: Julia Lawall<julia@diku.dk>
>
> ---
>   drivers/net/wireless/mwifiex/scan.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/mwifiex/scan.c b/drivers/net/wireless/mwifiex/scan.c
> index b28241c..d3111c9 100644
> --- a/drivers/net/wireless/mwifiex/scan.c
> +++ b/drivers/net/wireless/mwifiex/scan.c
> @@ -1480,7 +1480,7 @@ mwifiex_update_curr_bss_params(struct mwifiex_private *priv,
>   		return -ENOMEM;
>   	}
>   	beacon_ie = kzalloc(ie_len, GFP_KERNEL);
> -	if (!bss_desc) {
> +	if (!beacon_ie) {
>   		dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
>   		return -ENOMEM;
>   	}

The error message should also get the bss_desc => beacon_ie chang.

Larry

^ permalink raw reply

* Re: [PATCH 4/4] drivers/net/wireless/mwifiex/scan.c: test the just-initialized value
From: Julia Lawall @ 2011-08-22 15:28 UTC (permalink / raw)
  To: Larry Finger
  Cc: Bing Zhao, kernel-janitors, John W. Linville, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <4E527394.4080900@lwfinger.net>

On Mon, 22 Aug 2011, Larry Finger wrote:

> On 08/22/2011 09:00 AM, Julia Lawall wrote:
> > From: Julia Lawall<julia@diku.dk>
> >
> > Test the just-initialized value rather than some other one.
> >
> > The semantic match that finds this problem is as follows:
> > (http://coccinelle.lip6.fr/)
> >
> > //<smpl>
> > @r@
> > identifier x,y,f!={PTR_ERR,ERR_PTR,ERR_CAST};
> > statement S;
> > @@
> >
> > x = f(...);
> > (
> > if (\(x == NULL\|IS_ERR(x)\)) S
> > |
> > *if (\(y == NULL\|IS_ERR(y)\))
> >   { ... when != x
> >     return ...; }
> > )
> > //</smpl>
> >
> > Signed-off-by: Julia Lawall<julia@diku.dk>
> >
> > ---
> >   drivers/net/wireless/mwifiex/scan.c |    2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wireless/mwifiex/scan.c
> > b/drivers/net/wireless/mwifiex/scan.c
> > index b28241c..d3111c9 100644
> > --- a/drivers/net/wireless/mwifiex/scan.c
> > +++ b/drivers/net/wireless/mwifiex/scan.c
> > @@ -1480,7 +1480,7 @@ mwifiex_update_curr_bss_params(struct mwifiex_private
> > *priv,
> >    	return -ENOMEM;
> >    }
> >    beacon_ie = kzalloc(ie_len, GFP_KERNEL);
> > -	if (!bss_desc) {
> > +	if (!beacon_ie) {
> >     dev_err(priv->adapter->dev, " failed to alloc bss_desc\n");
> >     return -ENOMEM;
> >    }
> 
> The error message should also get the bss_desc => beacon_ie chang.

Thanks.  Pierre Louis Aublin made the same observation and I resumitted 
the patch.

julia

^ permalink raw reply

* Miscalculated TCP ACK ?
From: Gabriel Beddingfield @ 2011-08-22 15:34 UTC (permalink / raw)
  To: netdev

I'm having trouble with a particular server via HTTPS.  It appears
that my local linux machines are sending incorrect ACK.  However, I
don't have enough expertise to know for sure.

Using wireshark, the server sends:

    Transmission Control Protocol, Src Port: https (443), Dst Port:
36015 (36015), Seq: 27658, Ack: 827, Len: 18

Local machine replies:

    Transmission Control Protocol, Src Port: 36015 (36015), Dst Port:
https (443), Seq: 827, Ack: 27677, Len: 0

It appears to me that the ACK is off-by-one (should have been 27676).
The server replies again with retransmissions.  This is with several
kernel versions on local machines, ranging from 2.6.31 to 2.6.38.

Can anyone shed insight into what's happening?  This looks like a bug to me.

Thanks,
Gabriel

^ permalink raw reply

* [PATCH 0/2]  sctp: patches for HEARTBEAT bug fix and improvement
From: Michio Honda @ 2011-08-22 15:44 UTC (permalink / raw)
  To: netdev; +Cc: Wei Yongjun

>From e7cbeb4d796fec8351ede4affbd4269346639db9 Mon Sep 17 00:00:00 2001
From: Michio Honda <micchie@sfc.wide.ad.jp>
Date: Tue, 23 Aug 2011 00:22:47 +0900
Subject: [PATCH 0/2] sctp: patches for HEARTBEAT bug fix and improvement

Series of 2 patches for bug fix and improvement around HEARTBEAT after the 
ASCONF event

Michio Honda (2):
  sctp: HEARTBEAT negotiation after ASCONF
  sctp: Bundle HEAERTBEAT into ASCONF_ACK

 include/net/sctp/structs.h |    1 +
 net/sctp/associola.c       |    1 +
 net/sctp/outqueue.c        |    4 ++++
 net/sctp/sm_make_chunk.c   |    1 +
 net/sctp/sm_statefuns.c    |    5 +++++
 5 files changed, 12 insertions(+), 0 deletions(-)

-- 
1.7.3.2



^ permalink raw reply

* [PATCH 1/2] sctp: HEARTBEAT negotiation after ASCONF
From: Michio Honda @ 2011-08-22 15:44 UTC (permalink / raw)
  To: netdev; +Cc: Wei Yongjun

>From c2f008caef8deb7dbb4ce629ec6b3c096bceddd9 Mon Sep 17 00:00:00 2001
From: Michio Honda <micchie@sfc.wide.ad.jp>
Date: Thu, 16 Jun 2011 10:54:23 +0900
Subject: [PATCH 1/2] sctp: HEARTBEAT negotiation after ASCONF

This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
---
 net/sctp/outqueue.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index a6d27bf..bcecdba 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -917,6 +917,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 		 * current cwnd).
 		 */
 		if (!list_empty(&q->retransmit)) {
+			if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED) 
+				goto sctp_flush_out;
 			if (transport == asoc->peer.retran_path)
 				goto retran;
 
@@ -989,6 +991,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			    ((new_transport->state == SCTP_INACTIVE) ||
 			     (new_transport->state == SCTP_UNCONFIRMED)))
 				new_transport = asoc->peer.active_path;
+			if (new_transport->state == SCTP_UNCONFIRMED) 
+				continue;
 
 			/* Change packets if necessary.  */
 			if (new_transport != transport) {
-- 
1.7.3.2



^ permalink raw reply related

* [PATCH 2/2] sctp: Bundle HEAERTBEAT into ASCONF_ACK
From: Michio Honda @ 2011-08-22 15:44 UTC (permalink / raw)
  To: netdev; +Cc: Wei Yongjun

>From e7cbeb4d796fec8351ede4affbd4269346639db9 Mon Sep 17 00:00:00 2001
From: Michio Honda <micchie@sfc.wide.ad.jp>
Date: Thu, 16 Jun 2011 17:14:34 +0900
Subject: [PATCH 2/2] sctp: Bundle HEAERTBEAT into ASCONF_ACK

With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK
for ADD IP ADDRESS, confirming the new destination as quickly as
possible.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
---
 include/net/sctp/structs.h |    1 +
 net/sctp/associola.c       |    1 +
 net/sctp/sm_make_chunk.c   |    1 +
 net/sctp/sm_statefuns.c    |    5 +++++
 4 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index f7d9c3f..e90e7a9 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1915,6 +1915,7 @@ struct sctp_association {
 	__u32 addip_serial;
 	union sctp_addr *asconf_addr_del_pending;
 	int src_out_of_asoc_ok;
+	struct sctp_transport *new_transport;
 
 	/* SCTP AUTH: list of the endpoint shared keys.  These
 	 * keys are provided out of band by the user applicaton
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index dc16b90..152b5b3 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -282,6 +282,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 		asoc->peer.asconf_capable = 1;
 	asoc->asconf_addr_del_pending = NULL;
 	asoc->src_out_of_asoc_ok = 0;
+	asoc->new_transport = NULL;
 
 	/* Create an input queue.  */
 	sctp_inq_init(&asoc->base.inqueue);
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 81db4e3..0121e0a 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3015,6 +3015,7 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
 		/* Start the heartbeat timer. */
 		if (!mod_timer(&peer->hb_timer, sctp_transport_timeout(peer)))
 			sctp_transport_hold(peer);
+		asoc->new_transport = peer;
 		break;
 	case SCTP_PARAM_DEL_IP:
 		/* ADDIP 4.3 D7) If a request is received to delete the
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 49b847b..73d14fc 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3612,6 +3612,11 @@ sctp_disposition_t sctp_sf_do_asconf(const struct sctp_endpoint *ep,
 	 */
 	asconf_ack->dest = chunk->source;
 	sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(asconf_ack));
+	if (asoc->new_transport) {
+	        sctp_sf_heartbeat(ep, asoc, type, asoc->new_transport,
+                    commands);
+		((struct sctp_association *)asoc)->new_transport = NULL;
+	}
 
 	return SCTP_DISPOSITION_CONSUME;
 }
-- 
1.7.3.2



^ permalink raw reply related

* Re: net_device leak after "bridge: allow creating bridge devices with netlink"?
From: Stephen Hemminger @ 2011-08-22 15:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jan Beulich, davem, netdev
In-Reply-To: <1314017401.2307.37.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Mon, 22 Aug 2011 14:50:01 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 22 août 2011 à 12:34 +0100, Jan Beulich a écrit :
> > Stephen,
> > 
> > directly returning the result of register_netdev() from br_add_bridge()
> > would - to me - appear to be leaking "dev" in case of failure. Or am I
> > overlooking some implicit mechanism by which this would get freed?
> > 
> > Thanks, Jan
> 
> You're 100 % right, there is a leak here.
> 
> (But "ip" or "brctl" commands first check link already exist before
> trying create it, so you need to tweak them to actually hit this bug)
> 
> We have to add a free_netdev() call to free the net_device.
> 
> 
> 
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 2cdf007..e738154 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -231,6 +231,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
>  int br_add_bridge(struct net *net, const char *name)
>  {
>  	struct net_device *dev;
> +	int res;
>  
>  	dev = alloc_netdev(sizeof(struct net_bridge), name,
>  			   br_dev_setup);
> @@ -240,7 +241,10 @@ int br_add_bridge(struct net *net, const char *name)
>  
>  	dev_net_set(dev, net);
>  
> -	return register_netdev(dev);
> +	res = register_netdev(dev);
> +	if (res)
> +		free_netdev(dev);
> +	return res;
>  }
>  
>  int br_del_bridge(struct net *net, const char *name)

Missing Signed-off-by:?

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: Bridge stays down until a port is added
From: Stephen Hemminger @ 2011-08-22 15:57 UTC (permalink / raw)
  To: Marc Haber; +Cc: netdev, Sven-Haegar Koch
In-Reply-To: <20110821121305.GA22232@torres.zugschlus.de>

On Sun, 21 Aug 2011 14:13:05 +0200
Marc Haber <mh+netdev@zugschlus.de> wrote:

> On Sat, Aug 20, 2011 at 09:30:59AM -0700, Stephen Hemminger wrote:
> > The problem is that IPv6 Duplicate Address Detection needs to
> > work. This is not a simple problem.  If the bridge asserted
> > carrier with no ports then:
> > 
> > 1. IPv6 address assigned and IPv6 decides it is okay.
> > 2. Port added later
> > 3. Another system has the same address.
> > *broke*
> 
> Same situation when a system-to-system-link is added after bringing up
> an interface. I agree that the issue is not an issue if a real switch
> is being used.
> 
> > If you want to avoid DAD, then you can configure disable DAD
> > by setting /proc/sys/net/ipv6/conf/br0/accept_dad to 0
> 
> I'd like to avoid that.
> 
> 2001:db8::1
> 
> Would it acceptable (and clean!) to have:
> 
> eth0: 2001:db8:1::100/64 default gw to the internet is 2001:db8:1::1
> lo: 127.0.0.1/8, ::1/128, 2001:db8:2::100/64
> br0: 2001:db8:2::1/64 (being default gw for the VMs connected to br0,
> routing 2001:db8:2::/64 to the Internet)
> 
> Note 2001:db8:2::/64 being used both on lo and br0, with 2001:db8:2::100 meant
> to be reachable from the Internet even if no VM is already up. my
> hostname will be A-Recorded to 2001:db8:2::100 with proper reverse
> DNS. The background for doing so is that I cannot control the reverse
> DNS for the IP addresses inside 200a:db8:1::/64 in a lot of my setups
> (for example, if my IPv6 comes in via Sixxs).
>

No, DAD config is per interface and addrconf doesn't know anything
about your topology.

^ permalink raw reply

* Re: net_device leak after "bridge: allow creating bridge devices with netlink"?
From: Eric Dumazet @ 2011-08-22 16:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jan Beulich, davem, netdev
In-Reply-To: <20110822085532.2ff4479c@nehalam.ftrdhcpuser.net>

Le lundi 22 août 2011 à 08:55 -0700, Stephen Hemminger a écrit :
> On Mon, 22 Aug 2011 14:50:01 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le lundi 22 août 2011 à 12:34 +0100, Jan Beulich a écrit :
> > > Stephen,
> > > 
> > > directly returning the result of register_netdev() from br_add_bridge()
> > > would - to me - appear to be leaking "dev" in case of failure. Or am I
> > > overlooking some implicit mechanism by which this would get freed?
> > > 
> > > Thanks, Jan
> > 
> > You're 100 % right, there is a leak here.
> > 
> > (But "ip" or "brctl" commands first check link already exist before
> > trying create it, so you need to tweak them to actually hit this bug)
> > 
> > We have to add a free_netdev() call to free the net_device.
> > 
> > 
> > 
> > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> > index 2cdf007..e738154 100644
> > --- a/net/bridge/br_if.c
> > +++ b/net/bridge/br_if.c
> > @@ -231,6 +231,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
> >  int br_add_bridge(struct net *net, const char *name)
> >  {
> >  	struct net_device *dev;
> > +	int res;
> >  
> >  	dev = alloc_netdev(sizeof(struct net_bridge), name,
> >  			   br_dev_setup);
> > @@ -240,7 +241,10 @@ int br_add_bridge(struct net *net, const char *name)
> >  
> >  	dev_net_set(dev, net);
> >  
> > -	return register_netdev(dev);
> > +	res = register_netdev(dev);
> > +	if (res)
> > +		free_netdev(dev);
> > +	return res;
> >  }
> >  
> >  int br_del_bridge(struct net *net, const char *name)
> 
> Missing Signed-off-by:?
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Well, its only a direct coding of Jan report ;)

Thanks

[PATCH] bridge: fix a possible net_device leak

Jan Beulich reported a possible net_device leak in bridge code after
commit bb900b27a2f4 (bridge: allow creating bridge devices with netlink)

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
---
 net/bridge/br_if.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 2cdf007..e738154 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -231,6 +231,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
 int br_add_bridge(struct net *net, const char *name)
 {
 	struct net_device *dev;
+	int res;
 
 	dev = alloc_netdev(sizeof(struct net_bridge), name,
 			   br_dev_setup);
@@ -240,7 +241,10 @@ int br_add_bridge(struct net *net, const char *name)
 
 	dev_net_set(dev, net);
 
-	return register_netdev(dev);
+	res = register_netdev(dev);
+	if (res)
+		free_netdev(dev);
+	return res;
 }
 
 int br_del_bridge(struct net *net, const char *name)



^ permalink raw reply related

* Re: [PATCH 0/2] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods (v3)
From: Ben Hutchings @ 2011-08-22 16:17 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev
In-Reply-To: <20110822002641.GA2550@neilslaptop.think-freely.org>

On Sun, 2011-08-21 at 20:27 -0400, Neil Horman wrote:
> On Wed, Aug 17, 2011 at 04:07:17PM +0100, Ben Hutchings wrote:
> > On Tue, 2011-07-26 at 12:05 -0400, Neil Horman wrote:
> > > Ok, after considering all your comments, Dave suggested this as an alternate
> > > approach:
> > > 
> > > 1) We create a new priv_flag, IFF_SKB_TX_SHARED, to identify drivers capable of
> > > handling shared skbs.  Default is to not set this flag
> > > 
> > > 2) Modify ether_setup to enable this flag, under the assumption that any driver
> > > calling this  function is initalizing a real ethernet device and as such can
> > > handle shared skbs since they don't tend to store state in the skb struct.
> > > Pktgen can then query this flag when a user script attempts to issue the
> > > clone_skb command and decide if it is to be alowed or not.
> > [...]
> > 
> > A bunch of Ethernet drivers do skb_pad() or skb_padto() in their
> > ndo_start_xmit implementations, either to avoid hardware bugs or because
> > the MAC doesn't automatically pad to the minimum frame length.  This
> > presumably means they can't generally handle shared skbs, though in the
> > specific case of pktgen it should be safe as long as a single skb is not
> > submitted by multiple threads at once.
> > 
> Agreed, given that pktgen is doing skb sharing in a serialized manner (i.e. one
> thread of execution increasing skb->users rather than in multiple threads), the
> skb_pad[to] cases are safe.  Are there cases in which shared skbs are
> transmitted in parallel threads that we need to check for?

Not that I'm aware of.  However, you have defined the flag to mean 'safe
to send shared skbs', and this is not generally true for those drivers.

So please decide whether:
a. The flag means 'safe to send shared skbs' (i.e. the TX path does not
modify the skb) and drivers which pad must clear it on their devices.
b. The flag means 'safe to send shared skbs in the way ptkgen does', and
the restrictions this places on the user and driver should be specified.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
From: Alexander Duyck @ 2011-08-22 16:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo
In-Reply-To: <1313935304.3142.22.camel@deadeye>

On 08/21/2011 07:01 AM, Ben Hutchings wrote:
> On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>
>> This change makes it so that the TX work limit is now obsolete.  Instead of
>> using it we can instead rely on the NAPI budget for the number of packets
>> we should clean per interrupt.  The advantage to this approach is that it
>> results in a much more balanced work flow since the same number of RX and
>> TX packets should be cleaned per interrupts.
> [...]
>
> This seems kind of sensible, but it's not how Dave has been recommending
> people to account for TX work in NAPI.
>
> Ben.
>
I wasn't aware there was a recommended approach.  Could you tell me more 
about it?

As I stated in the patch description this approach works very well for 
me, especially in routing workloads since it typically keeps the TX 
clean-up in polling as long as the RX is in polling.

Thanks,

Alex

^ permalink raw reply

* Re: Miscalculated TCP ACK ?
From: Eric Dumazet @ 2011-08-22 16:38 UTC (permalink / raw)
  To: Gabriel Beddingfield; +Cc: netdev
In-Reply-To: <CAPbw_hwQSXtmFSxK49jDDgF2LuG8=cu5t=ZjQ_aO7=M9fm9xeA@mail.gmail.com>

Le lundi 22 août 2011 à 10:34 -0500, Gabriel Beddingfield a écrit :
> I'm having trouble with a particular server via HTTPS.  It appears
> that my local linux machines are sending incorrect ACK.  However, I
> don't have enough expertise to know for sure.
> 
> Using wireshark, the server sends:
> 
>     Transmission Control Protocol, Src Port: https (443), Dst Port:
> 36015 (36015), Seq: 27658, Ack: 827, Len: 18
> 
> Local machine replies:
> 
>     Transmission Control Protocol, Src Port: 36015 (36015), Dst Port:
> https (443), Seq: 827, Ack: 27677, Len: 0
> 
> It appears to me that the ACK is off-by-one (should have been 27676).
> The server replies again with retransmissions.  This is with several
> kernel versions on local machines, ranging from 2.6.31 to 2.6.38.
> 
> Can anyone shed insight into what's happening?  This looks like a bug to me.

Not a bug if frame carried a FIN

18:32:16.177801 IP 10.150.51.213.52264 > 10.150.45.194.22: Flags [F.], seq 3312, ack 3241, win 412, length 0
18:32:16.178096 IP 10.150.45.194.22 > 10.150.51.213.52264: Flags [F.], seq 3241, ack 3313, win 203, length 0
18:32:16.256949 IP 10.150.51.213.52264 > 10.150.45.194.22: Flags [.], ack 3242, win 412, length 0



^ permalink raw reply

* Re: [RFC][PATCH] Failed neighbors attached to routes are not released
From: Eric Dumazet @ 2011-08-22 16:44 UTC (permalink / raw)
  To: Guy Yur; +Cc: netdev
In-Reply-To: <CAC67Hz-MpmPGX4c+DTCgXkfnKdinNCrc7xc9Q03=35t=LLnW+g@mail.gmail.com>

Le vendredi 19 août 2011 à 21:53 +0300, Guy Yur a écrit :
> Hi,
> 
> The issue I am seeing is with a neighbor used as a gateway in a route,
> when the neighbor gets nud FAILED it will not be removed from the
> neighbor cache.
> The reference count for the neighbor remains > 1
> when neigh_periodic_work() is called.
> 
> Issue noticed with IPv6 neighbors on Arch Linux with kernel 3.0.3
> kernel config includes CONFIG_IPV6_ROUTER_PREF
> 
> The problem affects routing when the neighbor loses connectivity and returns.
> 
> Scenario: Using a default route and a static route through different interfaces.
> When the neighbor gateway of the static route goes down the traffic
> will move to the default gateway as expected.
> Once the static route neighbor comes back up it is not asked for
> neighbor solicitation
> because the route is marked as FAILED and the traffic will continue to
> pass through the default gateway.
> 
> Steps to reproduce the route not being removed:
> 1. add an IPv6 address on an interface
> 2. add a route to a network through a gateway on the interface's network
> 3. make sure the gateway address is not reachable
> 4. ping6 a host in the route network
> 5. "ip -6 nei" will show the gateway neighbor as FAILED and it won't be released
> 
> Steps to reproduce the routing problem:
> 1. client and two gateway machines (A and B)
> 2. on the client define a static route through A and a default route through B
> 3. disconnect eth on A
> 4. ping6 a host in the network that should go through A
>    after a while the traffic will move through B which is the default route
> 5. reconnect eth on A
> 6. ping6 a host in the network again, the traffic will still go through B
>    "ip -6 nei" on the client will still show A as FAILED
> 
> 
> Patch to change the nud state to NONE if the reference count > 1
> allowing the neighbor to be released in a future pass.

I wonder why a 'future pass' is needed at all.

Shouldnt we immediately detect link becomes alive and force an immediate
flush at this point, before waiting a garbage collect timer ?


> 
> --- linux/net/core/neighbour.c.orig	2011-08-19 13:15:35.041524169 +0300
> +++ linux/net/core/neighbour.c	2011-08-19 18:19:07.271276096 +0300
> @@ -802,14 +802,16 @@ static void neigh_periodic_work(struct w
>  			if (time_before(n->used, n->confirmed))
>  				n->used = n->confirmed;
> 
> -			if (atomic_read(&n->refcnt) == 1 &&
> -			    (state == NUD_FAILED ||
> +			if ((state == NUD_FAILED ||
>  			     time_after(jiffies, n->used + n->parms->gc_staletime))) {
> -				*np = n->next;
> -				n->dead = 1;
> -				write_unlock(&n->lock);
> -				neigh_cleanup_and_release(n);
> -				continue;
> +				if (atomic_read(&n->refcnt) == 1) {
> +					*np = n->next;
> +					n->dead = 1;
> +					write_unlock(&n->lock);
> +					neigh_cleanup_and_release(n);
> +					continue;
> +				} else
> +					n->nud_state = NUD_NONE;
>  			}
>  			write_unlock(&n->lock);
> --



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox