Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  7:04 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <AANLkTimN1-uyAvpxgpno55XoyPzBptdZimz-MA-0MCWZ@mail.gmail.com>

> 
> Without this branch. prev needs to be initialized to zero again(of
> course, we can avoid this by moving prev = NULL in the previous
> branch). next needs an assignment, and a duplicate check if the the
> queue is empty, which is already known in the above branch. Sorry, but
> I can't see which path I slow.
> 
>         prev = NULL;
>         for (next = qp->q.fragments; next != NULL; next = next->next) {
>                 if (FRAG_CB(next)->offset >= offset)
>                         break;  /* bingo! */
>                 prev = next;
>         }
> 

Concept of 'fast path' has changed over years. It used to be cpu
instructions and cycles, its now number of memory transactions.

The only thing we need to address are the cache lines we must bring into
cpu caches, and keep code short.

These days, one cache line miss -> more than one hundred instructions
that could be done during cpu stall. cpu cycles are cheap if code
already in instruction cache.

Adding a test to avoid entering a NULL loop (no fragment is stored yet)
just bloats the code, making it larger than necessary.

You dont need the else branch :

if (prev) {
	if (FRAG_CB(prev)->offset < offset) {
		next = NULL;
		goto found;
	}
else {
	next = NULL;
	goto found;
}

Just write :

next = NULL;
if (prev && FRAG_CB(prev)->offset < offset)
	goto found;

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  6:40 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276493743.2448.41.camel@edumazet-laptop>


> I am not sure why we need "struct inet_skb_parm h;" field in struct
> ipfrag_skb_cb... 
> 
> I probably need to wakeup this monday morning ?
> 
> Untested patch follows, only compiled.
> 

Oh well, I see light now I woke up ;)

> [PATCH] frags: Remove unecessary bits
> 
> While trying to move 'offset' to the beginning of frag CB, I found
> inet_skb_parm field was unused.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Wrong patch of course.

We need some comment or document better why its there..




^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Mitchell Erblich @ 2010-06-14  6:19 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki
  Cc: David Miller, xiaosuo, kuznet, pekkas, jmorris, kaber, netdev
In-Reply-To: <4C15BAE5.6010900@linux-ipv6.org>

On Jun 13, 2010, at 10:15 PM, YOSHIFUJI Hideaki wrote:

> David Miller wrote:
>>> As the fragments are usually in order,
>> In what universe does this happen "usually"?
>> Linux has been outputting fragments in reverse order for more than 10
>> years.
>> I'm not applying this patch.
> 
> Dave,  I know we've been sending in reverse order, of course.
> And, as far as I know, it seems Linux is the only implementation
> which sends fragments in reverse order.
> 
> This is receiving side.  I think we should accept the fact
> that Linux is not the only implementation, no?
> 
> --yoshfuji

Group,

With respect to IPv4 and PATH MTU, aren't we
unlikely to generate packet frags?

With respect to IPv6, aren't  frags are less likely 
(intermediate nodes are not allowed to frag 
(rfc 2460, sect 4.5 Frag header)) as I would
expect the source to use the PATH MTU?

Thus, a faster path is to support the largest Jumbo
MTUs (MTU >=9000: 16,110 : 14,336 : 10,240)
where possible for bulk data transfers with
proper page orders, IMO.

Mitchell Erblich

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  5:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276493743.2448.41.camel@edumazet-laptop>

On Mon, Jun 14, 2010 at 1:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 14 juin 2010 à 07:16 +0800, Changli Gao a écrit :
>> add fast path
>>
>> As the fragments are usually in order, it is likely the new fragments are
>> at the end of the inet_frag_queue. In the fast path, we check if the skb at the
>> end of the inet_frag_queue is the prev we expect.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> ----
>>  include/net/inet_frag.h |    1 +
>>  net/ipv4/ip_fragment.c  |   17 +++++++++++++++++
>>  net/ipv6/reassembly.c   |   16 ++++++++++++++++
>>  3 files changed, 34 insertions(+)
>> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
>> index 39f2dc9..16ff29a 100644
>> --- a/include/net/inet_frag.h
>> +++ b/include/net/inet_frag.h
>> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>>       atomic_t                refcnt;
>>       struct timer_list       timer;      /* when will this queue expire? */
>>       struct sk_buff          *fragments; /* list of received fragments */
>> +     struct sk_buff          *fragments_tail;
>>       ktime_t                 stamp;
>>       int                     len;        /* total length of orig datagram */
>>       int                     meat;
>> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
>> index 75347ea..d8c36d4 100644
>> --- a/net/ipv4/ip_fragment.c
>> +++ b/net/ipv4/ip_fragment.c
>> @@ -317,6 +317,7 @@ static int ip_frag_reinit(struct ipq *qp)
>>       qp->q.len = 0;
>>       qp->q.meat = 0;
>>       qp->q.fragments = NULL;
>> +     qp->q.fragments_tail = NULL;
>>       qp->iif = 0;
>>
>>       return 0;
>> @@ -389,6 +390,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>>        * in the chain of fragments so far.  We must know where to put
>>        * this fragment, right?
>>        */
>> +     prev = qp->q.fragments_tail;
>> +     if (prev) {
>> +             if (FRAG_CB(prev)->offset < offset) {
>> +                     next = NULL;
>> +                     goto found;
>> +             }
>> +     } else {
>> +             next = NULL;
>
> How can this chunk be a win ? queue is empty anyway.
> You add tests and slow the 'other path'
>
>> +             goto found;
>> +     }

Without this branch. prev needs to be initialized to zero again(of
course, we can avoid this by moving prev = NULL in the previous
branch). next needs an assignment, and a duplicate check if the the
queue is empty, which is already known in the above branch. Sorry, but
I can't see which path I slow.

        prev = NULL;
        for (next = qp->q.fragments; next != NULL; next = next->next) {
                if (FRAG_CB(next)->offset >= offset)
                        break;  /* bingo! */
                prev = next;
        }

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Eric Dumazet @ 2010-06-14  5:35 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1276470995-21713-1-git-send-email-xiaosuo@gmail.com>

Le lundi 14 juin 2010 à 07:16 +0800, Changli Gao a écrit :
> add fast path
> 
> As the fragments are usually in order, it is likely the new fragments are
> at the end of the inet_frag_queue. In the fast path, we check if the skb at the
> end of the inet_frag_queue is the prev we expect.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  include/net/inet_frag.h |    1 +
>  net/ipv4/ip_fragment.c  |   17 +++++++++++++++++
>  net/ipv6/reassembly.c   |   16 ++++++++++++++++
>  3 files changed, 34 insertions(+)
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 39f2dc9..16ff29a 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>  	atomic_t		refcnt;
>  	struct timer_list	timer;      /* when will this queue expire? */
>  	struct sk_buff		*fragments; /* list of received fragments */
> +	struct sk_buff		*fragments_tail;
>  	ktime_t			stamp;
>  	int			len;        /* total length of orig datagram */
>  	int			meat;
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 75347ea..d8c36d4 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -317,6 +317,7 @@ static int ip_frag_reinit(struct ipq *qp)
>  	qp->q.len = 0;
>  	qp->q.meat = 0;
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	qp->iif = 0;
>  
>  	return 0;
> @@ -389,6 +390,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = qp->q.fragments_tail;
> +	if (prev) {
> +		if (FRAG_CB(prev)->offset < offset) {
> +			next = NULL;
> +			goto found;
> +		}
> +	} else {
> +		next = NULL;

How can this chunk be a win ? queue is empty anyway.
You add tests and slow the 'other path'

> +		goto found;
> +	}

Quite frankly, one easy way to speedup things would be to move 'offset'
from  ipfrag_skb_cb close to skb->next field so that only one cache line
per frag is used during lookup.

I am not sure why we need "struct inet_skb_parm h;" field in struct
ipfrag_skb_cb... 

I probably need to wakeup this monday morning ?

Untested patch follows, only compiled.

[PATCH] frags: Remove unecessary bits

While trying to move 'offset' to the beginning of frag CB, I found
inet_skb_parm field was unused.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 75347ea..0f51ae0 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -55,7 +55,6 @@ static int sysctl_ipfrag_max_dist __read_mostly = 64;
 
 struct ipfrag_skb_cb
 {
-	struct inet_skb_parm	h;
 	int			offset;
 };
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 6d4292f..122e0be 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -57,7 +57,6 @@
 
 struct ip6frag_skb_cb
 {
-	struct inet6_skb_parm	h;
 	int			offset;
 };
 



^ permalink raw reply related

* Re: [PATCH] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-14  5:15 UTC (permalink / raw)
  To: David Miller
  Cc: xiaosuo, kuznet, pekkas, jmorris, kaber, netdev,
	YOSHIFUJI Hideaki
In-Reply-To: <20100613.181857.189703148.davem@davemloft.net>

David Miller wrote:
>> As the fragments are usually in order,
> 
> In what universe does this happen "usually"?
> 
> Linux has been outputting fragments in reverse order for more than 10
> years.
> 
> I'm not applying this patch.

Dave,  I know we've been sending in reverse order, of course.
And, as far as I know, it seems Linux is the only implementation
which sends fragments in reverse order.

This is receiving side.  I think we should accept the fact
that Linux is not the only implementation, no?

--yoshfuji

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:55 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <AANLkTilf218t898mfK1jeIFtFBFE8OvAeoOIxQTnmEt3@mail.gmail.com>

On Mon, Jun 14, 2010 at 10:03 AM, Changli Gao <xiaosuo@gmail.com> wrote:
>
> Later I'll test Windows.
>

Windows also sends fragments in order.

10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 0, flags [+],
proto UDP (17), length 1452) 221.238.33.71.4194 > 202.113.29.4.8888:
UDP, length 32768
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 1432, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 2864, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 4296, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 5728, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 7160, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 8592, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 10024, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 11456, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 12888, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 14320, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.064453 IP (tos 0x0, ttl 128, id 35511, offset 15752, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 17184, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 18616, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 20048, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 21480, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 22912, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 24344, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 25776, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 27208, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 28640, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 30072, flags
[+], proto UDP (17), length 1452) 221.238.33.71 > 202.113.29.4: udp
10:40:29.066406 IP (tos 0x0, ttl 128, id 35511, offset 31504, flags
[none], proto UDP (17), length 1292) 221.238.33.71 > 202.113.29.4: udp

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:52 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: David Miller, kuznet, pekkas, jmorris, kaber, netdev
In-Reply-To: <4C1592CE.5030501@linux-ipv6.org>

On Mon, Jun 14, 2010 at 10:24 AM, YOSHIFUJI Hideaki
<yoshfuji@linux-ipv6.org> wrote:
> NIC?
>

Linux: e1000
Darwin: tun

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-14  2:24 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, kuznet, pekkas, jmorris, kaber, netdev,
	YOSHIFUJI Hideaki
In-Reply-To: <AANLkTilf218t898mfK1jeIFtFBFE8OvAeoOIxQTnmEt3@mail.gmail.com>

NIC?

--yoshfuji

(2010/06/14 11:03), Changli Gao wrote:
> On Mon, Jun 14, 2010 at 9:18 AM, David Miller<davem@davemloft.net>  wrote:
>> From: Changli Gao<xiaosuo@gmail.com>
>> Date: Mon, 14 Jun 2010 07:16:35 +0800
>>
>>> As the fragments are usually in order,
>>
>> In what universe does this happen "usually"?
>>
>> Linux has been outputting fragments in reverse order for more than 10
>> years.
>>
>
> I have tested next-next-2.6 and darwin, and found they are both send
> fragments in order:
>
> Darwin:
>
> Darwin localhost 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
> 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386 i386
>
> 09:53:26.891820 IP (tos 0x0, ttl 64, id 19628, offset 0, flags [+],
> proto UDP (17), length 1500) 10.13.3.10.52189>  10.13.150.1.8888: UDP,
> length 8192
> 09:53:26.892048 IP (tos 0x0, ttl 64, id 19628, offset 1480, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892229 IP (tos 0x0, ttl 64, id 19628, offset 2960, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892397 IP (tos 0x0, ttl 64, id 19628, offset 4440, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892529 IP (tos 0x0, ttl 64, id 19628, offset 5920, flags [+],
> proto UDP (17), length 1500) 10.13.3.10>  10.13.150.1: udp
> 09:53:26.892670 IP (tos 0x0, ttl 64, id 19628, offset 7400, flags
> [none], proto UDP (17), length 820) 10.13.3.10>  10.13.150.1: udp
>
> Linux:
>
> Linux localhost 2.6.35-rc1 #88 SMP Sun Jun 13 14:25:07 CST 2010 x86_64
> Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux
>
> 08:01:53.730902 IP (tos 0x0, ttl 64, id 1263, offset 0, flags [+],
> proto UDP (17), length 1500) 10.13.150.50.45295>  10.13.150.1.8888:
> UDP, length 8192
> 08:01:53.730955 IP (tos 0x0, ttl 64, id 1263, offset 1480, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731113 IP (tos 0x0, ttl 64, id 1263, offset 2960, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731139 IP (tos 0x0, ttl 64, id 1263, offset 4440, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731280 IP (tos 0x0, ttl 64, id 1263, offset 5920, flags [+],
> proto UDP (17), length 1500) 10.13.150.50>  10.13.150.1: udp
> 08:01:53.731306 IP (tos 0x0, ttl 64, id 1263, offset 7400, flags
> [none], proto UDP (17), length 820) 10.13.150.50>  10.13.150.1: udp
>
> Later I'll test Windows.
>


^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-14  2:03 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20100613.181857.189703148.davem@davemloft.net>

On Mon, Jun 14, 2010 at 9:18 AM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Mon, 14 Jun 2010 07:16:35 +0800
>
>> As the fragments are usually in order,
>
> In what universe does this happen "usually"?
>
> Linux has been outputting fragments in reverse order for more than 10
> years.
>

I have tested next-next-2.6 and darwin, and found they are both send
fragments in order:

Darwin:

Darwin localhost 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386 i386

09:53:26.891820 IP (tos 0x0, ttl 64, id 19628, offset 0, flags [+],
proto UDP (17), length 1500) 10.13.3.10.52189 > 10.13.150.1.8888: UDP,
length 8192
09:53:26.892048 IP (tos 0x0, ttl 64, id 19628, offset 1480, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892229 IP (tos 0x0, ttl 64, id 19628, offset 2960, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892397 IP (tos 0x0, ttl 64, id 19628, offset 4440, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892529 IP (tos 0x0, ttl 64, id 19628, offset 5920, flags [+],
proto UDP (17), length 1500) 10.13.3.10 > 10.13.150.1: udp
09:53:26.892670 IP (tos 0x0, ttl 64, id 19628, offset 7400, flags
[none], proto UDP (17), length 820) 10.13.3.10 > 10.13.150.1: udp

Linux:

Linux localhost 2.6.35-rc1 #88 SMP Sun Jun 13 14:25:07 CST 2010 x86_64
Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz GenuineIntel GNU/Linux

08:01:53.730902 IP (tos 0x0, ttl 64, id 1263, offset 0, flags [+],
proto UDP (17), length 1500) 10.13.150.50.45295 > 10.13.150.1.8888:
UDP, length 8192
08:01:53.730955 IP (tos 0x0, ttl 64, id 1263, offset 1480, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731113 IP (tos 0x0, ttl 64, id 1263, offset 2960, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731139 IP (tos 0x0, ttl 64, id 1263, offset 4440, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731280 IP (tos 0x0, ttl 64, id 1263, offset 5920, flags [+],
proto UDP (17), length 1500) 10.13.150.50 > 10.13.150.1: udp
08:01:53.731306 IP (tos 0x0, ttl 64, id 1263, offset 7400, flags
[none], proto UDP (17), length 820) 10.13.150.50 > 10.13.150.1: udp

Later I'll test Windows.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: fix for race with 8259(8|9) during shutdown
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, asaitou, donald.c.skidmore
In-Reply-To: <20100611232029.31430.75582.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 16:20:29 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> There is a small window where the watchdog could be running as the
> interface is brought down on a NIC with two ports wired back to back.
> If ixgbe_update_status is then called can lead to a panic.  This patch
> allows the update to bail if we are in that condition.
> 
> This issue was orignally reported and fix proposed by Akihiko Saitou.
> 
> CC: Akihiko Saitou <asaitou@users.sourceforge.net>
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000: Fix message logging defect
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, joe
In-Reply-To: <20100611225148.31194.61945.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 15:51:49 -0700

> From: Joe Perches <joe@perches.com>
> 
> commit 675ad47375c76a7c3be4ace9554d92cd55518ced
> removed the capability to use ethtool.set_msglevel to
> control the types of messages emitted by the driver.
> 
> That commit should probably be reverted.
> 
> If not, then this patch fixes a message logging defect
> introduced by converting a printk without KERN_<level>
> to e_info.
> 
> This also reduces text by about 200 bytes.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: fix automatic LRO/RSC settings for low latency
From: David Miller @ 2010-06-14  1:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, andy, sgruszka, stable
In-Reply-To: <20100611224629.30958.22500.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 11 Jun 2010 15:47:03 -0700

> From: Andy Gospodarek <andy@greyhouse.net>
> 
> This patch added to 2.6.34:
> 
> 	commit f8d1dcaf88bddc7f282722ec1fdddbcb06a72f18
> 	Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 	Date:   Tue Apr 27 01:37:20 2010 +0000
> 
> 	    ixgbe: enable extremely low latency
> 
> introduced a feature where LRO (called RSC on the hardware) was disabled
> automatically when setting rx-usecs to 0 via ethtool.  Some might not
> like the fact that LRO was disabled automatically, but I'm fine with
> that.  What I don't like is that LRO/RSC is automatically enabled when
> rx-usecs is set >0 via ethtool.
> 
> This would certainly be a problem if the device was used for forwarding
> and it was determined that the low latency wasn't needed after the
> device was already forwarding.  I played around with saving the state of
> LRO in the driver, but it just didn't seem worthwhile and would require
> a small change to dev_disable_lro() that I did not like.
> 
> This patch simply leaves LRO disabled when setting rx-usecs >0 and
> requires that the user enable it again.  An extra informational message
> will also now appear in the log so users can understand why LRO isn't
> being enabled as they expect.
> 
> Inconsistency of LRO setting first noticed by Stanislaw Gruszka.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Stanislaw Gruszka <sgruszka@redhat.com>
> CC: stable@kernel.org
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] fragment: add fast path
From: David Miller @ 2010-06-14  1:18 UTC (permalink / raw)
  To: xiaosuo; +Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1276470995-21713-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Mon, 14 Jun 2010 07:16:35 +0800

> As the fragments are usually in order,

In what universe does this happen "usually"?

Linux has been outputting fragments in reverse order for more than 10
years.

I'm not applying this patch.

^ permalink raw reply

* [PATCH] fragment: add fast path
From: Changli Gao @ 2010-06-13 23:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Changli Gao

add fast path

As the fragments are usually in order, it is likely the new fragments are
at the end of the inet_frag_queue. In the fast path, we check if the skb at the
end of the inet_frag_queue is the prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_frag.h |    1 +
 net/ipv4/ip_fragment.c  |   17 +++++++++++++++++
 net/ipv6/reassembly.c   |   16 ++++++++++++++++
 3 files changed, 34 insertions(+)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 39f2dc9..16ff29a 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -20,6 +20,7 @@ struct inet_frag_queue {
 	atomic_t		refcnt;
 	struct timer_list	timer;      /* when will this queue expire? */
 	struct sk_buff		*fragments; /* list of received fragments */
+	struct sk_buff		*fragments_tail;
 	ktime_t			stamp;
 	int			len;        /* total length of orig datagram */
 	int			meat;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 75347ea..d8c36d4 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -317,6 +317,7 @@ static int ip_frag_reinit(struct ipq *qp)
 	qp->q.len = 0;
 	qp->q.meat = 0;
 	qp->q.fragments = NULL;
+	qp->q.fragments_tail = NULL;
 	qp->iif = 0;
 
 	return 0;
@@ -389,6 +390,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	 * in the chain of fragments so far.  We must know where to put
 	 * this fragment, right?
 	 */
+	prev = qp->q.fragments_tail;
+	if (prev) {
+		if (FRAG_CB(prev)->offset < offset) {
+			next = NULL;
+			goto found;
+		}
+	} else {
+		next = NULL;
+		goto found;
+	}
 	prev = NULL;
 	for (next = qp->q.fragments; next != NULL; next = next->next) {
 		if (FRAG_CB(next)->offset >= offset)
@@ -396,6 +407,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 		prev = next;
 	}
 
+found:
 	/* We found where to put this one.  Check for overlap with
 	 * preceding fragment, and, if needed, align things so that
 	 * any overlaps are eliminated.
@@ -454,6 +466,8 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 
 	/* Insert this fragment in the chain of fragments. */
 	skb->next = next;
+	if (!next)
+		qp->q.fragments_tail = skb;
 	if (prev)
 		prev->next = skb;
 	else
@@ -507,6 +521,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 			goto out_nomem;
 
 		fp->next = head->next;
+		if (!fp->next)
+			qp->q.fragments_tail = fp;
 		prev->next = fp;
 
 		skb_morph(head, qp->q.fragments);
@@ -578,6 +594,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	iph->tot_len = htons(len);
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
 	qp->q.fragments = NULL;
+	qp->q.fragments_tail = NULL;
 	return 0;
 
 out_nomem:
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 6d4292f..dc15624 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -336,6 +336,16 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 	 * in the chain of fragments so far.  We must know where to put
 	 * this fragment, right?
 	 */
+	prev = fq->q.fragments_tail;
+	if (prev) {
+		if (FRAG6_CB(prev)->offset < offset) {
+			next = NULL;
+			goto found;
+		}
+	} else {
+		next = NULL;
+		goto found;
+	}
 	prev = NULL;
 	for(next = fq->q.fragments; next != NULL; next = next->next) {
 		if (FRAG6_CB(next)->offset >= offset)
@@ -343,6 +353,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 		prev = next;
 	}
 
+found:
 	/* We found where to put this one.  Check for overlap with
 	 * preceding fragment, and, if needed, align things so that
 	 * any overlaps are eliminated.
@@ -400,6 +411,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 
 	/* Insert this fragment in the chain of fragments. */
 	skb->next = next;
+	if (!next)
+		fq->q.fragments_tail = skb;
 	if (prev)
 		prev->next = skb;
 	else
@@ -466,6 +479,8 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			goto out_oom;
 
 		fp->next = head->next;
+		if (!fp->next)
+			fq->q.fragments_tail = fp;
 		prev->next = fp;
 
 		skb_morph(head, fq->q.fragments);
@@ -553,6 +568,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
 	rcu_read_unlock();
 	fq->q.fragments = NULL;
+	fq->q.fragments_tail = NULL;
 	return 1;
 
 out_oversize:

^ permalink raw reply related

* Re: [net-next PATCH] bnx2x: Fix link problem with some DACs
From: David Miller @ 2010-06-14  0:52 UTC (permalink / raw)
  To: ole; +Cc: yanivr, netdev
In-Reply-To: <4C154D5B.8000004@ans.pl>

From: Krzysztof Olędzki <ole@ans.pl>
Date: Sun, 13 Jun 2010 23:27:55 +0200

> On 2010-06-13 13:43, Yaniv Rosner wrote:
>> Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz
>> since some DACs(direct attached cables) do not work at 400Khz.
>>
>> Reported-by: Krzysztof Oldzki <ole@ans.pl>
>> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
>> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
>> ---
>> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
>> index a22a7e0..9b15b64 100644
>> --- a/drivers/net/bnx2x_link.c
>> +++ b/drivers/net/bnx2x_link.c
>> @@ -4274,7 +4274,7 @@ static u8 bnx2x_ext_phy_init(struct link_params
>> *params, struct link_vars *vars)
>>   				       ext_phy_addr,
>>   				       MDIO_PMA_DEVAD,
>>   				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
>> -				       0xa001);
>> +				       0xa101);
>>
>>   			/* Set TX PreEmphasis if needed */
>>   			if ((params->feature_config_flags&
> 
> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
> 
> However, I believe the comment before this code should also be
> changed, because it is now: "Set 2-wire transfer rate to 400Khz since
> 100Khz
> is not operational ".

Agreed, Yaniv please update your patch.

^ permalink raw reply

* Re: mpd client timeouts (bisected) 2.6.35-rc3
From: David Miller @ 2010-06-14  0:14 UTC (permalink / raw)
  To: markus
  Cc: john.r.fastabend, linux-kernel, netdev, yanmin_zhang, alex.shi,
	tim.c.chen
In-Reply-To: <20100613205922.GA1806@arch.tripp.de>

From: "markus@trippelsdorf.de" <markus@trippelsdorf.de>
Date: Sun, 13 Jun 2010 22:59:22 +0200

> On Sun, Jun 13, 2010 at 01:36:30PM -0700, John Fastabend wrote:
>> [PATCH] net: fix deliver_no_wcard regression on loopback device
> 
> This solves the problem here. Thanks.
> 
> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

Thanks for testing.

^ permalink raw reply

* Re: mpd client timeouts (bisected) 2.6.35-rc3
From: David Miller @ 2010-06-14  0:14 UTC (permalink / raw)
  To: eric.dumazet
  Cc: john.r.fastabend, markus, linux-kernel, netdev, yanmin_zhang,
	alex.shi, tim.c.chen
In-Reply-To: <1276462246.2448.17.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 13 Jun 2010 22:50:46 +0200

> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Eric, please never ACK a patch in the same mail in which you are
posting new patch.  Patchwork won't add your ACK to the patchwork
entry you are ACK'ing because all of your text will go into a new
patchwork entry for the patch you are posting.

> BTW, David, it seems there is a double rxhash copy...
> 
> [PATCH] net: rxhash already set in __copy_skb_header
> 
> No need to copy rxhash again in __skb_clone()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I'll apply this, thanks.

^ permalink raw reply

* Re: mpd client timeouts (bisected) 2.6.35-rc3
From: David Miller @ 2010-06-14  0:13 UTC (permalink / raw)
  To: john.r.fastabend
  Cc: markus, linux-kernel, netdev, yanmin_zhang, alex.shi, tim.c.chen
In-Reply-To: <4C15414E.5090201@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Sun, 13 Jun 2010 13:36:30 -0700

> Needed to set the wcard bit in copy_skb_header otherwise it will not
> be cleared when called from skb_clone.  Which then hits the loopback
> device gets pushed into the rx path and is eventually dropped. The
> following patch fixes this. Hopefully, this is easy and fast enough
> for you Dave.
> 
> 
> [PATCH] net: fix deliver_no_wcard regression on loopback device
> 
> deliver_no_wcard is not being set in skb_copy_header.
> In the skb_cloned case it is not being cleared and
> may cause the skb to be dropped when the loopback device
> pushes it back up the stack.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied, but your email client corrupted this patch in many
ways.  Please correct this for next time, thanks.

^ permalink raw reply

* [PATCH 0/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE  and HIDIOCSFEATURE
From: Alan Ott @ 2010-06-13 22:18 UTC (permalink / raw)
  To: Marcel Holtmann, David S. Miller, Jiri Kosina, Michael Poole,
	Bastien Nocera

This patch adds support to the bluetooth hidp module for getting and 
setting FEATURE reports from hidraw, as requested by Jiri Kosina. This 
patch depends on the patch named:

   [PATCH v2] HID: Add Support for Setting and Getting Feature Reports 
from hidraw

I have a couple of concerns with this patch, which I hope someone here 
can clarify and/or help me with.

1. Is it ok to use test_bit()/set_bit()/clear_bit() on session->flags, 
when other parts in the code may not be using these functions to access 
it? This currently isn't a problem because the other code which uses 
flags only sets bits at initialization time (and deletion time). best I 
can tell, flags is never actually used or read other than by my new code 
(using the *_bit() functions). The solution here may be to change the 
other code to use the *_bit() functions to access flags.

2. Is the loop in hidp_get_raw_report() sufficient without a mutex, 
since I'm synchronizing with the atomic call to test_bit() (and 
clear_bit())? I have convinced myself that in this case, with one 
reader, and one writer, to one pointer, synchronized with 
wait_event_interruptible_timeout() and atomic access through test_bit(), 
that a mutex is not needed.

3. A blocking, synchronous GET_REPORT transfer was easy when I 
implemented this for USB because data is both sent and received as part 
of a single control transfer. Because of the nature of Bluetooth 
however, where it is viewed more as an asynchronous network device, and 
with hidraw allowing multiple handles to a single device to exist, there 
could be a race when two handles call the hidp_get_raw_report() function 
concurrently, requesting the same report. I've convinced myself that 
this is not a problem, because since both callers requested the same 
report, the worst that could happen is that one could get a report which 
is slightly out of date.

Consider the following case:
     1. Client 1 requests report (Userspace call to HIDIOCGFEATURE)
     2. Client 2 requests report (Userspace call to HIDIOCGFEATURE)
     3. Client 1's report is returned, and delivered to BOTH clients
     4. Client 2's report is returned (and discarded)

Note here that Client 1's report and Client 2's report are the same 
report, ie: they reflect the state of the same data on the device, just 
at different times. In this case, they are indeed exactly the same data, 
but consider this case:
     1. Client 1 requests report (Userspace call to HIDIOCGFEATURE)
     2. Client 2 SETS report (Userspace call to HIDIOCSFEATURE)
     2. Client 2 requests report (Userspace call to HIDIOCGFEATURE)
     3. Client 1's report is returned, and delivered to Clients 1 and 2
     4. Client 2's report is returned

In this case, client 2 receives OLD data (since it set new data, and the 
call to write the reports is currently not synchronous). To make writes 
synchronous, we'd run into the same problem, of two writes happening 
concurrently, and the 2nd one receiving the ACK from the first one.

The questions here are:
1. Is this a problem? It's only an issues if two handles (in two 
separate threads) are reading and writing the device concurrently. I'd 
expect that there would be bigger problems in this case than receiving 
an old report.
2. If this is a problem, is there a way to synchronize on the control 
socket for the device (as opposed to just this session)? In this case 
GET_REPORT and SET_REPORT would lock access to the control socket (for 
all clients accessing the device) while they are active.

Your feedback is most appreciated,

Alan.

^ permalink raw reply

* [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Alan Ott @ 2010-06-13 22:20 UTC (permalink / raw)
  To: Marcel Holtmann, David S Miller, Jiri Kosina, Michael Poole,
	Bastien Nocera
  Cc: Alan Ott

This patch adds support or getting and setting feature reports for bluetooth
HID devices from HIDRAW.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/bluetooth/hidp/core.c |  121 +++++++++++++++++++++++++++++++++++++++++++--
 net/bluetooth/hidp/hidp.h |    8 +++
 2 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index bfe641b..0f068a0 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -36,6 +36,7 @@
 #include <linux/file.h>
 #include <linux/init.h>
 #include <linux/wait.h>
+#include <linux/mutex.h>
 #include <net/sock.h>
 
 #include <linux/input.h>
@@ -313,6 +314,93 @@ static int hidp_send_report(struct hidp_session *session, struct hid_report *rep
 	return hidp_queue_report(session, buf, rsize);
 }
 
+static int hidp_get_raw_report(struct hid_device *hid,
+		unsigned char report_number,
+		unsigned char *data, size_t count,
+		unsigned char report_type)
+{
+	struct hidp_session *session = hid->driver_data;
+	struct sk_buff *skb;
+	size_t len;
+	int numbered_reports = hid->report_enum[report_type].numbered;
+
+	switch (report_type) {
+	case HID_FEATURE_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_FEATURE;
+		break;
+	case HID_INPUT_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_INPUT;
+		break;
+	case HID_OUTPUT_REPORT:
+		report_type = HIDP_TRANS_GET_REPORT | HIDP_DATA_RTYPE_OUPUT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (mutex_lock_interruptible(&session->report_mutex))
+		return -ERESTARTSYS;
+
+	/* Set up our wait, and send the report request to the device. */
+	session->waiting_report_type = report_type & HIDP_DATA_RTYPE_MASK;
+	session->waiting_report_number = numbered_reports ? report_number : -1;
+	set_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	data[0] = report_number;
+	if (hidp_send_ctrl_message(hid->driver_data, report_type, data, 1))
+		goto err_eio;
+
+	/* Wait for the return of the report. The returned report
+	   gets put in session->report_return.  */
+	while (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags)) {
+		int res;
+
+		res = wait_event_interruptible_timeout(session->report_queue,
+			!test_bit(HIDP_WAITING_FOR_RETURN, &session->flags),
+			5*HZ);
+		if (res == 0) {
+			/* timeout */
+			goto err_eio;
+		}
+		if (res < 0) {
+			/* signal */
+			goto err_restartsys;
+		}
+	}
+
+	skb = session->report_return;
+	if (skb) {
+		if (numbered_reports) {
+			/* Strip off the report number. */
+			size_t rpt_len = skb->len-1;
+			len = rpt_len < count ? rpt_len : count;
+			memcpy(data, skb->data+1, len);
+		} else {
+			len = skb->len < count ? skb->len : count;
+			memcpy(data, skb->data, len);
+		}
+
+		kfree_skb(skb);
+		session->report_return = NULL;
+	} else {
+		/* Device returned a HANDSHAKE, indicating  protocol error. */
+		len = -EIO;
+	}
+
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+
+	return len;
+
+err_restartsys:
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+	return -ERESTARTSYS;
+err_eio:
+	clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+	mutex_unlock(&session->report_mutex);
+	return -EIO;
+}
+
 static int hidp_output_raw_report(struct hid_device *hid, unsigned char *data, size_t count,
 		unsigned char report_type)
 {
@@ -367,6 +455,10 @@ static void hidp_process_handshake(struct hidp_session *session,
 	case HIDP_HSHK_ERR_INVALID_REPORT_ID:
 	case HIDP_HSHK_ERR_UNSUPPORTED_REQUEST:
 	case HIDP_HSHK_ERR_INVALID_PARAMETER:
+		if (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags)) {
+			clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+			wake_up_interruptible(&session->report_queue);
+		}
 		/* FIXME: Call into SET_ GET_ handlers here */
 		break;
 
@@ -403,9 +495,11 @@ static void hidp_process_hid_control(struct hidp_session *session,
 	}
 }
 
-static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
+/* Returns true if the passed-in skb should be freed by the caller. */
+static int hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 				unsigned char param)
 {
+	int done_with_skb = 1;
 	BT_DBG("session %p skb %p len %d param 0x%02x", session, skb, skb->len, param);
 
 	switch (param) {
@@ -417,7 +511,6 @@ static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 
 		if (session->hid)
 			hid_input_report(session->hid, HID_INPUT_REPORT, skb->data, skb->len, 0);
-
 		break;
 
 	case HIDP_DATA_RTYPE_OTHER:
@@ -429,12 +522,27 @@ static void hidp_process_data(struct hidp_session *session, struct sk_buff *skb,
 		__hidp_send_ctrl_message(session,
 			HIDP_TRANS_HANDSHAKE | HIDP_HSHK_ERR_INVALID_PARAMETER, NULL, 0);
 	}
+
+	if (test_bit(HIDP_WAITING_FOR_RETURN, &session->flags) &&
+				param == session->waiting_report_type) {
+		if (session->waiting_report_number < 0 ||
+		    session->waiting_report_number == skb->data[0]) {
+			/* hidp_get_raw_report() is waiting on this report. */
+			session->report_return = skb;
+			done_with_skb = 0;
+			clear_bit(HIDP_WAITING_FOR_RETURN, &session->flags);
+			wake_up_interruptible(&session->report_queue);
+		}
+	}
+
+	return done_with_skb;
 }
 
 static void hidp_recv_ctrl_frame(struct hidp_session *session,
 					struct sk_buff *skb)
 {
 	unsigned char hdr, type, param;
+	int free_skb = 1;
 
 	BT_DBG("session %p skb %p len %d", session, skb, skb->len);
 
@@ -454,7 +562,7 @@ static void hidp_recv_ctrl_frame(struct hidp_session *session,
 		break;
 
 	case HIDP_TRANS_DATA:
-		hidp_process_data(session, skb, param);
+		free_skb = hidp_process_data(session, skb, param);
 		break;
 
 	default:
@@ -463,7 +571,8 @@ static void hidp_recv_ctrl_frame(struct hidp_session *session,
 		break;
 	}
 
-	kfree_skb(skb);
+	if (free_skb)
+		kfree_skb(skb);
 }
 
 static void hidp_recv_intr_frame(struct hidp_session *session,
@@ -797,6 +906,7 @@ static int hidp_setup_hid(struct hidp_session *session,
 	hid->dev.parent = hidp_get_device(session);
 	hid->ll_driver = &hidp_hid_driver;
 
+	hid->hid_get_raw_report = hidp_get_raw_report;
 	hid->hid_output_raw_report = hidp_output_raw_report;
 
 	err = hid_add_device(hid);
@@ -857,6 +967,9 @@ int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock,
 	skb_queue_head_init(&session->ctrl_transmit);
 	skb_queue_head_init(&session->intr_transmit);
 
+	mutex_init(&session->report_mutex);
+	init_waitqueue_head(&session->report_queue);
+
 	session->flags   = req->flags & (1 << HIDP_BLUETOOTH_VENDOR_ID);
 	session->idle_to = req->idle_to;
 
diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h
index 8d934a1..00e71dd 100644
--- a/net/bluetooth/hidp/hidp.h
+++ b/net/bluetooth/hidp/hidp.h
@@ -80,6 +80,7 @@
 #define HIDP_VIRTUAL_CABLE_UNPLUG	0
 #define HIDP_BOOT_PROTOCOL_MODE		1
 #define HIDP_BLUETOOTH_VENDOR_ID	9
+#define	HIDP_WAITING_FOR_RETURN		10
 
 struct hidp_connadd_req {
 	int   ctrl_sock;	// Connected control socket
@@ -154,6 +155,13 @@ struct hidp_session {
 	struct sk_buff_head ctrl_transmit;
 	struct sk_buff_head intr_transmit;
 
+	/* Used in hidp_get_raw_report() */
+	int waiting_report_type; /* HIDP_DATA_RTYPE_* */
+	int waiting_report_number; /* -1 for not numbered */
+	struct mutex report_mutex;
+	struct sk_buff *report_return;
+	wait_queue_head_t report_queue;
+
 	/* Report descriptor */
 	__u8 *rd_data;
 	uint rd_size;
-- 
1.7.0.4



^ permalink raw reply related

* Re: [PATCH] vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)
From: Ben Hutchings @ 2010-06-13 21:56 UTC (permalink / raw)
  To: Pedro Garcia; +Cc: netdev
In-Reply-To: <c25bca517739ada31d698235c3a4d045@dondevamos.com>

I have no particular opinion on this change, but you need to read and
follow Documentation/SubmittingPatches.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH net-next-2.6] syncookies: check decoded options against sysctl settings
From: Florian Westphal @ 2010-06-13 21:34 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Discard the ACK if we find options that do not match current sysctl
settings.

Previously it was possible to create a connection with sack,
wscale, etc. enabled even if the feature was disabled via sysctl.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/tcp.h     |    2 +-
 net/ipv4/syncookies.c |   25 +++++++++++++++++++------
 net/ipv6/syncookies.c |    4 ++--
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5731664..cca040e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -464,7 +464,7 @@ extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb,
 				     __u16 *mss);
 
 extern __u32 cookie_init_timestamp(struct request_sock *req);
-extern void cookie_check_timestamp(struct tcp_options_received *tcp_opt);
+extern bool cookie_check_timestamp(struct tcp_options_received *tcp_opt);
 
 /* From net/ipv6/syncookies.c */
 extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 02bef6a..51b5662 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -230,23 +230,36 @@ static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb,
  * The lowest 4 bits are for snd_wscale
  * The next 4 lsb are for rcv_wscale
  * The next lsb is for sack_ok
+ *
+ * return false if we decode an option that should not be.
  */
-void cookie_check_timestamp(struct tcp_options_received *tcp_opt)
+bool cookie_check_timestamp(struct tcp_options_received *tcp_opt)
 {
 	/* echoed timestamp, 9 lowest bits contain options */
 	u32 options = tcp_opt->rcv_tsecr & TSMASK;
 
+	if (!tcp_opt->saw_tstamp)  {
+		tcp_clear_options(tcp_opt);
+		return true;
+	}
+
+	if (!sysctl_tcp_timestamps)
+		return false;
+
 	tcp_opt->snd_wscale = options & 0xf;
 	options >>= 4;
 	tcp_opt->rcv_wscale = options & 0xf;
 
 	tcp_opt->sack_ok = (options >> 4) & 0x1;
 
-	if (tcp_opt->sack_ok)
-		tcp_sack_reset(tcp_opt);
+	if (tcp_opt->sack_ok && !sysctl_tcp_sack)
+		return false;
 
-	if (tcp_opt->snd_wscale || tcp_opt->rcv_wscale)
+	if (tcp_opt->snd_wscale || tcp_opt->rcv_wscale) {
 		tcp_opt->wscale_ok = 1;
+		return sysctl_tcp_window_scaling != 0;
+	}
+	return true;
 }
 EXPORT_SYMBOL(cookie_check_timestamp);
 
@@ -281,8 +294,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	memset(&tcp_opt, 0, sizeof(tcp_opt));
 	tcp_parse_options(skb, &tcp_opt, &hash_location, 0);
 
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
+	if (!cookie_check_timestamp(&tcp_opt))
+		goto out;
 
 	ret = NULL;
 	req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 70d330f..c7ee574 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -180,8 +180,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	memset(&tcp_opt, 0, sizeof(tcp_opt));
 	tcp_parse_options(skb, &tcp_opt, &hash_location, 0);
 
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
+	if (!cookie_check_timestamp(&tcp_opt))
+		goto out;
 
 	ret = NULL;
 	req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
-- 
1.6.4.4


^ permalink raw reply related

* [PATCH net-next-2.6] ipv6: syncookies: do not skip ->iif initialization
From: Florian Westphal @ 2010-06-13 21:29 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Glenn Griffin

When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.

Rearrange things so ->iif is also initialized in the syncookie case.

want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.

Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv6/tcp_ipv6.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5887141..f875345 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1272,10 +1272,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (!want_cookie)
 		TCP_ECN_create_request(req, tcp_hdr(skb));
 
-	if (want_cookie) {
-		isn = cookie_v6_init_sequence(sk, skb, &req->mss);
-		req->cookie_ts = tmp_opt.tstamp_ok;
-	} else if (!isn) {
+	if (!isn) {
 		if (ipv6_opt_accepted(sk, skb) ||
 		    np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
 		    np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
@@ -1288,8 +1285,12 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 		if (!sk->sk_bound_dev_if &&
 		    ipv6_addr_type(&treq->rmt_addr) & IPV6_ADDR_LINKLOCAL)
 			treq->iif = inet6_iif(skb);
-
-		isn = tcp_v6_init_sequence(skb);
+		if (!want_cookie) {
+			isn = tcp_v6_init_sequence(skb);
+		} else {
+			isn = cookie_v6_init_sequence(sk, skb, &req->mss);
+			req->cookie_ts = tmp_opt.tstamp_ok;
+		}
 	}
 	tcp_rsk(req)->snt_isn = isn;
 
-- 
1.6.4.4


^ permalink raw reply related

* Re: [net-next PATCH] bnx2x: Fix link problem with some DACs
From: Krzysztof Olędzki @ 2010-06-13 21:27 UTC (permalink / raw)
  To: Yaniv Rosner; +Cc: davem, netdev
In-Reply-To: <1276429385.13056.13.camel@lb-tlvb-yanivr.il.broadcom.com>

On 2010-06-13 13:43, Yaniv Rosner wrote:
> Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz since some DACs(direct attached cables) do not work at 400Khz.
>
> Reported-by: Krzysztof Oldzki <ole@ans.pl>
> Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> index a22a7e0..9b15b64 100644
> --- a/drivers/net/bnx2x_link.c
> +++ b/drivers/net/bnx2x_link.c
> @@ -4274,7 +4274,7 @@ static u8 bnx2x_ext_phy_init(struct link_params *params, struct link_vars *vars)
>   				       ext_phy_addr,
>   				       MDIO_PMA_DEVAD,
>   				       MDIO_PMA_REG_8727_TWO_WIRE_SLAVE_ADDR,
> -				       0xa001);
> +				       0xa101);
>
>   			/* Set TX PreEmphasis if needed */
>   			if ((params->feature_config_flags&

Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>

However, I believe the comment before this code should also be changed, 
because it is now: "Set 2-wire transfer rate to 400Khz since 100Khz
is not operational ".

Thanks.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox