Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
       [not found]     ` <20070930223503.M8966@nuclearcat.com>
@ 2007-10-01  5:59       ` Eric Dumazet
  2007-10-01  7:12         ` David Miller
  2007-10-01 20:10         ` Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2007-10-01  5:59 UTC (permalink / raw)
  To: Denys; +Cc: linux-kernel, David S. Miller, Linux Netdev List

Denys a écrit :
> Hi 
> 
> I got
> 
> pi linux-git # git bisect bad
> Bisecting: 0 revisions left to test after this
> [f85958151900f9d30fa5ff941b0ce71eaa45a7de] [NET]: random functions can use 
> nsec resolution instead of usec
> 
> I will make sure and will try to reverse this patch on 2.6.22
> 
> But it seems "that's it".

Well... thats interesting...

No problem here on bigger servers, so I CC David Miller and netdev on this one.

AFAIK do_gettimeofday() and ktime_get_real() should use the same underlying 
hardware functions on PC and no performance problem should happen here.

(relevant part of this patch :

@ -1521,7 +1515,6 @@ __u32 secure_ip_id(__be32 daddr)
  __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
                                  __be16 sport, __be16 dport)
  {
-       struct timeval tv;
         __u32 seq;
         __u32 hash[4];
         struct keydata *keyptr = get_keyptr();
@@ -1543,12 +1536,11 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 
daddr,
          *      As close as possible to RFC 793, which
          *      suggests using a 250 kHz clock.
          *      Further reading shows this assumes 2 Mb/s networks.
-        *      For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+        *      For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
          *      That's funny, Linux has one built in!  Use it!
          *      (Networks are faster now - should this be increased?)
          */
-       do_gettimeofday(&tv);
-       seq += tv.tv_usec + tv.tv_sec * 1000000;
+       seq += ktime_get_real().tv64;


Thank you for doing this research.

> 
> 
> On Sun, 30 Sep 2007 14:25:37 +1000, Nick Piggin wrote
>> Hi Denys, thanks for reporting (btw. please reply-to-all when 
>> replying on lkml).
>>
>> You say that SLAB is better than SLUB on an otherwise identical 
>> kernel, but I didn't see if you quantified the actual numbers? It 
>> sounds like there is still a regression with SLAB?
>>
>> On Monday 01 October 2007 03:48, Eric Dumazet wrote:
>>> Denys a  :
>>>> I've moved recently one of my proxies(squid and some compressing
>>>> application) from 2.6.21 to 2.6.22, and notice huge performance drop. I
>>>> think this is important, cause it can cause serious regression on some
>>>> other workloads like busy web-servers and etc.
>>>>
>>>> After some analysis of different options i can bring more exact numbers:
>>>>
>>>> 2.6.21 able to process 500-550 requests/second and 15-20 Mbit/s of
>>>> traffic, and working great without any slowdown or instability.
>>>>
>>>> 2.6.22 able to process only 250-300 requests and 8-10 Mbit/s of traffic,
>>>> ssh and console is "freezing" (there is delay even for typing
>>>> characters).
>>>>
>>>> Both proxies is on identical hardware(Sun Fire X4100),
>>>> configuration(small system, LFS-like, on USB flash), different only
>>>> kernel.
>>>>
>>>> I tried to disable/enable various options and optimisations - it doesn't
>>>> change anything, till i reach SLUB/SLAB option.
>>>>
>>>> I've loaded proxy configuration to gentoo PC with 2.6.22 (then upgraded
>>>> it to 2.6.23-rc8), and having same effect.
>>>> Additionally, when load reaching maximum i can notice whole system
>>>> slowdown, for example ssh and scp takes much more time to run, even i do
>>>> nice -n -5 for them.
>>>>
>>>> But even choosing 2.6.23-rc8+SLAB i noticed same "freezing" of ssh (and
>>>> sure it slowdown other kind of network performance), but much less
>>>> comparing with SLUB. On top i am seeing ksoftirqd taking almost 100%
>>>> (sometimes ksoftirqd/0, sometimes ksoftirqd/1).
>>>>
>>>> I tried also different tricks with scheduler (/proc/sys/kernel/sched*),
>>>> but it's also didn't help.
>>>>
>>>> When it freezes it looks like:
>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>     7 root      15  -5     0    0    0 R   64  0.0   2:47.48 ksoftirqd/1
>>>>  5819 root      20   0  134m 130m  596 R   57  3.3   4:36.78 globax
>>>>  5911 squid     20   0 1138m 1.1g 2124 R   26 28.9   2:24.87 squid
>>>>    10 root      15  -5     0    0    0 S    1  0.0   0:01.86 events/1
>>>>  6130 root      20   0  3960 2416 1592 S    0  0.1   0:08.02 oprofiled
>>>>
>>>>
>>>> Oprofile results:
>>>>
>>>>
>>>> Thats oprofile with 2.6.23-rc8 - SLUB
>>>>
>>>> 73918    21.5521  check_bytes
>>>> 38361    11.1848  acpi_pm_read
>>>> 14077     4.1044  init_object
>>>> 13632     3.9747  ip_send_reply
>>>> 8486      2.4742  __slab_alloc
>>>> 7199      2.0990  nf_iterate
>>>> 6718      1.9588  page_address
>>>> 6716      1.9582  tcp_v4_rcv
>>>> 6425      1.8733  __slab_free
>>>> 5604      1.6339  on_freelist
>>>>
>>>>
>>>> Thats oprofile with 2.6.23-rc8 - SLAB
>>>>
>>>> CPU: AMD64 processors, speed 2592.64 MHz (estimated)
>>>> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
>>>> unit mask of 0x00 (No unit mask) count 100000
>>>> samples  %        symbol name
>>>> 138991   14.0627  acpi_pm_read
>>>> 52401     5.3018  tcp_v4_rcv
>>>> 48466     4.9037  nf_iterate
>>>> 38043     3.8491  __slab_alloc
>>>> 34155     3.4557  ip_send_reply
>>>> 20963     2.1210  ip_rcv
>>>> 19475     1.9704  csum_partial
>>>> 19084     1.9309  kfree
>>>> 17434     1.7639  ip_output
>>>> 17278     1.7481  netif_receive_skb
>>>> 15248     1.5428  nf_hook_slow
>>>>
>>>> My .config is at http://www.nuclearcat.com/.config (there is SPARSEMEM
>>>> enabled, it doesn't make any noticeable difference)
>>>>
>>>> Please CC me on reply, i am not in list.
>>> Could you try with SLUB but disabling CONFIG_SLUB_DEBUG ?
> 
> 
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  5:59       ` 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression Eric Dumazet
@ 2007-10-01  7:12         ` David Miller
  2007-10-01  8:07           ` Denys
  2007-10-01 20:10         ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2007-10-01  7:12 UTC (permalink / raw)
  To: dada1; +Cc: nuclearcat, linux-kernel, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 01 Oct 2007 07:59:12 +0200

> No problem here on bigger servers, so I CC David Miller and netdev
> on this one.  AFAIK do_gettimeofday() and ktime_get_real() should
> use the same underlying hardware functions on PC and no performance
> problem should happen here.

One thing that jumps out at me is that on 32-bit (and to a certain
extent on 64-bit) there is a lot of stack accesses and missed
optimizations because all of the work occurs, and gets expanded,
inside of ktime_get_real().

The timespec_to_ktime() inside of there constructs the ktime_t return
value on the stack, then returns that as an aggregate to the caller.

That cannot be without some cost.

ktime_get_real() is definitely a candidate for inlining especially in
these kinds of cases where we'll happily get computations in local
registers instead of all of this on-stack nonsense.  And in several
cases (if the caller only needs the tv_sec value, for example)
computations can be elided entirely.

It would be constructive to experiment and see if this is in fact part
of the problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  7:12         ` David Miller
@ 2007-10-01  8:07           ` Denys
  2007-10-01  8:20             ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Denys @ 2007-10-01  8:07 UTC (permalink / raw)
  To: David Miller, dada1; +Cc: linux-kernel, netdev

Well, i can play a bit more on "live" servers. I have now hot-swap server with
full gentoo,  where i can rebuild any kernel you want, with any applied patch.
But it looks more like not overhead, load becoming high too "spiky", and it is
not just permantenly higher. Also it is not normal that all system becoming
unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
softirq jumps to 100%).

On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Mon, 01 Oct 2007 07:59:12 +0200
> 
> > No problem here on bigger servers, so I CC David Miller and netdev
> > on this one.  AFAIK do_gettimeofday() and ktime_get_real() should
> > use the same underlying hardware functions on PC and no performance
> > problem should happen here.
> 
> One thing that jumps out at me is that on 32-bit (and to a certain
> extent on 64-bit) there is a lot of stack accesses and missed
> optimizations because all of the work occurs, and gets expanded,
> inside of ktime_get_real().
> 
> The timespec_to_ktime() inside of there constructs the ktime_t return
> value on the stack, then returns that as an aggregate to the caller.
> 
> That cannot be without some cost.
> 
> ktime_get_real() is definitely a candidate for inlining especially in
> these kinds of cases where we'll happily get computations in local
> registers instead of all of this on-stack nonsense.  And in several
> cases (if the caller only needs the tv_sec value, for example)
> computations can be elided entirely.
> 
> It would be constructive to experiment and see if this is in fact 
> part of the problem.


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  8:07           ` Denys
@ 2007-10-01  8:20             ` Eric Dumazet
  2007-10-01  8:35               ` Eric Dumazet
                                 ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Dumazet @ 2007-10-01  8:20 UTC (permalink / raw)
  To: Denys; +Cc: David Miller, linux-kernel, netdev

Denys a écrit :
> Well, i can play a bit more on "live" servers. I have now hot-swap server with
> full gentoo,  where i can rebuild any kernel you want, with any applied patch.
> But it looks more like not overhead, load becoming high too "spiky", and it is
> not just permantenly higher. Also it is not normal that all system becoming
> unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> softirq jumps to 100%).
>
>   
Could you try a pristine 2.6.22.9 and some patch in 
secure_tcp_sequence_number() like :

--- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
+++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
@@ -1554,7 +1554,7 @@
* That's funny, Linux has one built in! Use it!
* (Networks are faster now - should this be increased?)
*/
- seq += ktime_get_real().tv64;
+ seq += ktime_get_real().tv64 / 1000;
#if 0
printk("init_seq(%lx, %lx, %d, %d) = %d\n",
saddr, daddr, sport, dport, seq);

Thank you


> On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
>   
>> From: Eric Dumazet <dada1@cosmosbay.com>
>> Date: Mon, 01 Oct 2007 07:59:12 +0200
>>
>>     
>>> No problem here on bigger servers, so I CC David Miller and netdev
>>> on this one.  AFAIK do_gettimeofday() and ktime_get_real() should
>>> use the same underlying hardware functions on PC and no performance
>>> problem should happen here.
>>>       
>> One thing that jumps out at me is that on 32-bit (and to a certain
>> extent on 64-bit) there is a lot of stack accesses and missed
>> optimizations because all of the work occurs, and gets expanded,
>> inside of ktime_get_real().
>>
>> The timespec_to_ktime() inside of there constructs the ktime_t return
>> value on the stack, then returns that as an aggregate to the caller.
>>
>> That cannot be without some cost.
>>
>> ktime_get_real() is definitely a candidate for inlining especially in
>> these kinds of cases where we'll happily get computations in local
>> registers instead of all of this on-stack nonsense.  And in several
>> cases (if the caller only needs the tv_sec value, for example)
>> computations can be elided entirely.
>>
>> It would be constructive to experiment and see if this is in fact 
>> part of the problem.
>>     
>
>
> --
> Denys Fedoryshchenko
> Technical Manager
> Virtual ISP S.A.L.
>
>
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  8:20             ` Eric Dumazet
@ 2007-10-01  8:35               ` Eric Dumazet
  2007-10-01 12:10               ` Denys
  2007-10-01 13:26               ` Denys
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2007-10-01  8:35 UTC (permalink / raw)
  Cc: Denys, David Miller, linux-kernel, netdev

Eric Dumazet a écrit :
> Denys a écrit :
>> Well, i can play a bit more on "live" servers. I have now hot-swap 
>> server with
>> full gentoo,  where i can rebuild any kernel you want, with any 
>> applied patch.
>> But it looks more like not overhead, load becoming high too "spiky", 
>> and it is
>> not just permantenly higher. Also it is not normal that all system 
>> becoming
>> unresposive (for example ping 127.0.0.1 becoming 300ms for period, 
>> when usage
>> softirq jumps to 100%).
>>
>>   
> Could you try a pristine 2.6.22.9 and some patch in 
> secure_tcp_sequence_number() like :
>
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
On 32 bits machine, replace the divide by a shift  to avoid a linker 
error (undefined reference to `__divdi3'):  

seq += ktime_get_real().tv64 >> 10;






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  8:20             ` Eric Dumazet
  2007-10-01  8:35               ` Eric Dumazet
@ 2007-10-01 12:10               ` Denys
  2007-10-01 13:26               ` Denys
  2 siblings, 0 replies; 9+ messages in thread
From: Denys @ 2007-10-01 12:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev

Not able to compile kernel with patch

drivers/built-in.o: In function `secure_tcp_sequence_number':
(.text+0x3ad02): undefined reference to `__divdi3'
make: *** [.tmp_vmlinux1] Error 1

On Mon, 01 Oct 2007 10:20:07 +0200, Eric Dumazet wrote
> Denys a :
> > Well, i can play a bit more on "live" servers. I have now hot-swap server with
> > full gentoo,  where i can rebuild any kernel you want, with any applied patch.
> > But it looks more like not overhead, load becoming high too "spiky", and it is
> > not just permantenly higher. Also it is not normal that all system becoming
> > unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> > softirq jumps to 100%).
> >
> >   
> Could you try a pristine 2.6.22.9 and some patch in 
> secure_tcp_sequence_number() like :
> 
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
> 
> Thank you
> 
> > On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> >   
> >> From: Eric Dumazet <dada1@cosmosbay.com>
> >> Date: Mon, 01 Oct 2007 07:59:12 +0200
> >>
> >>     
> >>> No problem here on bigger servers, so I CC David Miller and netdev
> >>> on this one.  AFAIK do_gettimeofday() and ktime_get_real() should
> >>> use the same underlying hardware functions on PC and no performance
> >>> problem should happen here.
> >>>       
> >> One thing that jumps out at me is that on 32-bit (and to a certain
> >> extent on 64-bit) there is a lot of stack accesses and missed
> >> optimizations because all of the work occurs, and gets expanded,
> >> inside of ktime_get_real().
> >>
> >> The timespec_to_ktime() inside of there constructs the ktime_t return
> >> value on the stack, then returns that as an aggregate to the caller.
> >>
> >> That cannot be without some cost.
> >>
> >> ktime_get_real() is definitely a candidate for inlining especially in
> >> these kinds of cases where we'll happily get computations in local
> >> registers instead of all of this on-stack nonsense.  And in several
> >> cases (if the caller only needs the tv_sec value, for example)
> >> computations can be elided entirely.
> >>
> >> It would be constructive to experiment and see if this is in fact 
> >> part of the problem.
> >>     
> >
> >
> > --
> > Denys Fedoryshchenko
> > Technical Manager
> > Virtual ISP S.A.L.
> >
> >
> >


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  8:20             ` Eric Dumazet
  2007-10-01  8:35               ` Eric Dumazet
  2007-10-01 12:10               ` Denys
@ 2007-10-01 13:26               ` Denys
  2 siblings, 0 replies; 9+ messages in thread
From: Denys @ 2007-10-01 13:26 UTC (permalink / raw)
  To: linux-kernel, netdev

Resend for maillists (was discareded cause of encoding issues as SPAM).

Everything looks fine, for sure. Confirmed on second server.

On Mon, 01 Oct 2007 10:20:07 +0200, Eric Dumazet wrote

> > Well, i can play a bit more on "live" servers. I have now hot-swap server with
> > full gentoo,  where i can rebuild any kernel you want, with any applied patch.
> > But it looks more like not overhead, load becoming high too "spiky", and it is
> > not just permantenly higher. Also it is not normal that all system becoming
> > unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> > softirq jumps to 100%).
> >
> >   
> Could you try a pristine 2.6.22.9 and some patch in 
> secure_tcp_sequence_number() like :
> 
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
> 
> Thank you
> 
> > On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> >   
> >> From: Eric Dumazet <dada1@cosmosbay.com>
> >> Date: Mon, 01 Oct 2007 07:59:12 +0200
> >>
> >>     
> >>> No problem here on bigger servers, so I CC David Miller and netdev
> >>> on this one.  AFAIK do_gettimeofday() and ktime_get_real() should
> >>> use the same underlying hardware functions on PC and no performance
> >>> problem should happen here.
> >>>       
> >> One thing that jumps out at me is that on 32-bit (and to a certain
> >> extent on 64-bit) there is a lot of stack accesses and missed
> >> optimizations because all of the work occurs, and gets expanded,
> >> inside of ktime_get_real().
> >>
> >> The timespec_to_ktime() inside of there constructs the ktime_t return
> >> value on the stack, then returns that as an aggregate to the caller.
> >>
> >> That cannot be without some cost.
> >>
> >> ktime_get_real() is definitely a candidate for inlining especially in
> >> these kinds of cases where we'll happily get computations in local
> >> registers instead of all of this on-stack nonsense.  And in several
> >> cases (if the caller only needs the tv_sec value, for example)
> >> computations can be elided entirely.
> >>
> >> It would be constructive to experiment and see if this is in fact 
> >> part of the problem.
> >>     
> >
> >
> > --
> > Denys Fedoryshchenko
> > Technical Manager
> > Virtual ISP S.A.L.
> >
> >
> >


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01  5:59       ` 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression Eric Dumazet
  2007-10-01  7:12         ` David Miller
@ 2007-10-01 20:10         ` Eric Dumazet
  2007-10-01 20:57           ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2007-10-01 20:10 UTC (permalink / raw)
  To: Denys, David S. Miller; +Cc: linux-kernel, Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

So maybe the following patch is necessary...

I believe IPV6 & DCCP are immune to this problem.

Thanks again Denys for spotting this.

Eric

[PATCH] TCP : secure_tcp_sequence_number() should not use a too fast clock

TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
In order to follow network speed increase, we can use a faster clock, but
we should limit this clock so that the delay between two rollovers is
greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)

Choosing a 64 nsec clock should be OK, since the rollovers occur every
274 seconds.

Problem spotted by Denys Fedoryshchenko

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: seq.patch --]
[-- Type: text/plain, Size: 990 bytes --]

--- linux-2.6.22/drivers/char/random.c	2007-10-01 10:18:42.000000000 +0200
+++ linux-2.6.22-ed/drivers/char/random.c	2007-10-01 21:47:58.000000000 +0200
@@ -1550,11 +1550,13 @@ __u32 secure_tcp_sequence_number(__be32 
 	 *	As close as possible to RFC 793, which
 	 *	suggests using a 250 kHz clock.
 	 *	Further reading shows this assumes 2 Mb/s networks.
-	 *	For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
-	 *	That's funny, Linux has one built in!  Use it!
-	 *	(Networks are faster now - should this be increased?)
+	 *	For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+	 *	For 10 Gb/s Ethernet, a 1 GHz clock should be ok, but
+	 *	we also need to limit the resolution so that the u32 seq
+	 *	overlaps less than one time per MSL (2 minutes).
+	 *	Choosing a clock of 64 ns period is OK. (period of 274 s)
 	 */
-	seq += ktime_get_real().tv64;
+	seq += ktime_get_real().tv64 >> 6;
 #if 0
 	printk("init_seq(%lx, %lx, %d, %d) = %d\n",
 	       saddr, daddr, sport, dport, seq);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression
  2007-10-01 20:10         ` Eric Dumazet
@ 2007-10-01 20:57           ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2007-10-01 20:57 UTC (permalink / raw)
  To: dada1; +Cc: nuclearcat, linux-kernel, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 01 Oct 2007 22:10:03 +0200

> So maybe the following patch is necessary...
> 
> I believe IPV6 & DCCP are immune to this problem.
> 
> Thanks again Denys for spotting this.
> 
> Eric
> 
> [PATCH] TCP : secure_tcp_sequence_number() should not use a too fast clock
> 
> TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
> In order to follow network speed increase, we can use a faster clock, but
> we should limit this clock so that the delay between two rollovers is
> greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)
> 
> Choosing a 64 nsec clock should be OK, since the rollovers occur every
> 274 seconds.
> 
> Problem spotted by Denys Fedoryshchenko
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Thanks a lot Eric for bringing closure to this.

I'll apply this and add a reference in the commit message to the
changeset that introduced this problem, since it might help
others who look at this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-10-01 20:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20070930144443.M52139@visp.net.lb>
     [not found] ` <46FFE17C.9020202@cosmosbay.com>
     [not found]   ` <200709301425.37564.nickpiggin@yahoo.com.au>
     [not found]     ` <20070930223503.M8966@nuclearcat.com>
2007-10-01  5:59       ` 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression Eric Dumazet
2007-10-01  7:12         ` David Miller
2007-10-01  8:07           ` Denys
2007-10-01  8:20             ` Eric Dumazet
2007-10-01  8:35               ` Eric Dumazet
2007-10-01 12:10               ` Denys
2007-10-01 13:26               ` Denys
2007-10-01 20:10         ` Eric Dumazet
2007-10-01 20:57           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).