[PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
@ 2012-03-02  7:59 Sergei Trofimovich
  2012-03-02  9:22 ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-02  7:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, Sergei Trofimovich, Jason Wang, Glauber Costa,
	David S. Miller

From: Sergei Trofimovich <slyfox@gentoo.org>

The commit c43b874d5d7 introduced NFS file transfer hangup (proved by bisection).
> Commit 4acb4190 tries to fix the using uninitialized value
> introduced by commit 3dc43e3,  but it would make the
> per-socket memory limits too small.
>
> This patch fixes this and also remove the redundant codes
> introduced in 4acb4190.

The change looks like a typo (division flipped to multiplication):
> limit = nr_free_buffer_pages() / 8;
> limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);

Patch restores division.

CC: Jason Wang <jasowang@redhat.com>
CC: Glauber Costa <glommer@parallels.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
---

Change since v1: added forgotten commit ID

 net/ipv4/tcp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 22ef5f9..b11667b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3299,7 +3299,7 @@ void __init tcp_init(void)
 
 	tcp_init_mem(&init_net);
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
-	limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
+	limit = nr_free_buffer_pages() >> (PAGE_SHIFT - 10);
 	limit = max(limit, 128UL);
 	max_share = min(4UL*1024*1024, limit);
 
-- 
1.7.8.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-02  7:59 [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression) Sergei Trofimovich
@ 2012-03-02  9:22 ` Jason Wang
  2012-03-02 17:24   ` Sergei Trofimovich
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2012-03-02  9:22 UTC (permalink / raw)
  To: Sergei Trofimovich
  Cc: linux-kernel, netdev, Sergei Trofimovich, Glauber Costa,
	David S. Miller

On 03/02/2012 03:59 PM, Sergei Trofimovich wrote:
> From: Sergei Trofimovich<slyfox@gentoo.org>
>
> The commit c43b874d5d7 introduced NFS file transfer hangup (proved by bisection).
>> Commit 4acb4190 tries to fix the using uninitialized value
>> introduced by commit 3dc43e3,  but it would make the
>> per-socket memory limits too small.
>>
>> This patch fixes this and also remove the redundant codes
>> introduced in 4acb4190.
> The change looks like a typo (division flipped to multiplication):
>> limit = nr_free_buffer_pages() / 8;
>> limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);

Hi, thanks for the reporting. It's not a typo. It was previously: 
sysctl_tcp_mem[1] << (PAGE_SHIFT -  7). Looks like we need to do the 
limit check before shift the value. Please try the following patch, thanks.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 22ef5f9..4035aab 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3299,8 +3299,8 @@ void __init tcp_init(void)

         tcp_init_mem(&init_net);
         /* Set per-socket limits to no more than 1/128 the pressure 
threshold */
-       limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
-       limit = max(limit, 128UL);
+       limit = nr_free_buffer_pages() / 8;
+       limit = max(limit, 128UL) << (PAGE_SHIFT - 7);
         max_share = min(4UL*1024*1024, limit);

         sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-02  9:22 ` Jason Wang
@ 2012-03-02 17:24   ` Sergei Trofimovich
  2012-03-02 17:50     ` Sergei Trofimovich
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-02 17:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, netdev, Sergei Trofimovich, Glauber Costa,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

> > The change looks like a typo (division flipped to multiplication):
> >> limit = nr_free_buffer_pages() / 8;
> >> limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
> 
> Hi, thanks for the reporting. It's not a typo. It was previously: 
> sysctl_tcp_mem[1] << (PAGE_SHIFT -  7). Looks like we need to do the 
> limit check before shift the value. Please try the following patch, thanks.

Still does not help. I test it by checking sha1sum of a large file over NFS
(small files seem to work simetimes):

    $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2 
    ...
    open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
    <HUNG>
After a certain timeout dmesg gets odd spam:
[  314.848094] nfs: server vmhost not responding, still trying
[  314.848134] nfs: server vmhost not responding, still trying
[  314.848145] nfs: server vmhost not responding, still trying
[  314.957047] nfs: server vmhost not responding, still trying
[  314.957066] nfs: server vmhost not responding, still trying
[  314.957075] nfs: server vmhost not responding, still trying
[  314.957085] nfs: server vmhost not responding, still trying
[  314.957100] nfs: server vmhost not responding, still trying
[  314.958023] nfs: server vmhost not responding, still trying
[  314.958035] nfs: server vmhost not responding, still trying
[  314.958044] nfs: server vmhost not responding, still trying
[  314.958054] nfs: server vmhost not responding, still trying

looks like bogus messages. Might be relevant to mishandled timings
somewhere else or a bug in nfs code.

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 22ef5f9..4035aab 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3299,8 +3299,8 @@ void __init tcp_init(void)
> 
>          tcp_init_mem(&init_net);
>          /* Set per-socket limits to no more than 1/128 the pressure 
> threshold */
> -       limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
> -       limit = max(limit, 128UL);
> +       limit = nr_free_buffer_pages() / 8;
> +       limit = max(limit, 128UL) << (PAGE_SHIFT - 7);
>          max_share = min(4UL*1024*1024, limit);
> 
>          sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
> 
> 
> 


-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-02 17:24   ` Sergei Trofimovich
@ 2012-03-02 17:50     ` Sergei Trofimovich
  2012-03-03 14:16       ` Glauber Costa
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-02 17:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, netdev, Glauber Costa, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]

> > > The change looks like a typo (division flipped to multiplication):
> > >> limit = nr_free_buffer_pages() / 8;
> > >> limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
> > 
> > Hi, thanks for the reporting. It's not a typo. It was previously: 
> > sysctl_tcp_mem[1] << (PAGE_SHIFT -  7). Looks like we need to do the 
> > limit check before shift the value. Please try the following patch, thanks.
> 
> Still does not help. I test it by checking sha1sum of a large file over NFS
> (small files seem to work simetimes):
> 
>     $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2 
>     ...
>     open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
>     <HUNG>
> After a certain timeout dmesg gets odd spam:
> [  314.848094] nfs: server vmhost not responding, still trying
> [  314.848134] nfs: server vmhost not responding, still trying
> [  314.848145] nfs: server vmhost not responding, still trying
> [  314.957047] nfs: server vmhost not responding, still trying
> [  314.957066] nfs: server vmhost not responding, still trying
> [  314.957075] nfs: server vmhost not responding, still trying
> [  314.957085] nfs: server vmhost not responding, still trying
> [  314.957100] nfs: server vmhost not responding, still trying
> [  314.958023] nfs: server vmhost not responding, still trying
> [  314.958035] nfs: server vmhost not responding, still trying
> [  314.958044] nfs: server vmhost not responding, still trying
> [  314.958054] nfs: server vmhost not responding, still trying
> 
> looks like bogus messages. Might be relevant to mishandled timings
> somewhere else or a bug in nfs code.

And after 120 seconds hung tasks shows it might be an OOM issue
Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
not running anything heavy:

[  720.798052] INFO: task sha1sum:3811 blocked for more than 120 seconds.
[  720.798056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.798059] sha1sum         D ffff88007bd11d40     0  3811      1 0x00000005
[  720.798065]  ffff880073de9c08 0000000000000082 ffff880073de9af8 ffff880073de9fd8
[  720.798070]  ffff880070db1620 ffff880073de9fd8 ffff880073de8000 0000000000004000
[  720.798075]  ffff880073de8000 ffff880073de9fd8 ffff8800790e0000 ffff880070db1620
[  720.798079] Call Trace:
[  720.798089]  [<ffffffff810fdd53>] ? kfree+0x123/0x150
[  720.798094]  [<ffffffff8123227d>] ? nfs_access_free_entry+0x1d/0x30
[  720.798097]  [<ffffffff810fdd53>] ? kfree+0x123/0x150
[  720.798101]  [<ffffffff8123227d>] ? nfs_access_free_entry+0x1d/0x30
[  720.798104]  [<ffffffff81233cb8>] ? nfs_do_access+0x3a8/0x3d0
[  720.798109]  [<ffffffff8166525a>] schedule+0x3a/0x50
[  720.798112]  [<ffffffff8166390e>] __mutex_lock_slowpath+0xee/0x190
[  720.798117]  [<ffffffff81639228>] ? put_rpccred+0x48/0x130
[  720.798120]  [<ffffffff8166374e>] mutex_lock+0x1e/0x40
[  720.798125]  [<ffffffff81114927>] do_lookup+0x277/0x3a0
[  720.798128]  [<ffffffff811162b8>] do_last.clone.39+0x148/0x7e0
[  720.798132]  [<ffffffff81116a61>] path_openat+0xd1/0x3e0
[  720.798136]  [<ffffffff810604d1>] ? get_parent_ip+0x11/0x50
[  720.798140]  [<ffffffff81060675>] ? add_preempt_count+0x95/0xd0
[  720.798144]  [<ffffffff81666677>] ? _raw_spin_lock_irq+0x17/0x40
[  720.798147]  [<ffffffff81116e84>] do_filp_open+0x44/0xa0
[  720.798151]  [<ffffffff810605a5>] ? sub_preempt_count+0x95/0xd0
[  720.798154]  [<ffffffff81666371>] ? _raw_spin_unlock+0x11/0x40
[  720.798158]  [<ffffffff81123014>] ? alloc_fd+0xe4/0x130
[  720.798163]  [<ffffffff81106f7d>] do_sys_open+0xfd/0x1e0
[  720.798169]  [<ffffffff8100f290>] ? syscall_trace_enter+0xf0/0x1a0
[  720.798172]  [<ffffffff8110707c>] sys_open+0x1c/0x20
[  720.798176]  [<ffffffff81667219>] tracesys+0xd0/0xd5

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-02 17:50     ` Sergei Trofimovich
@ 2012-03-03 14:16       ` Glauber Costa
  2012-03-03 14:43         ` Sergei Trofimovich
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2012-03-03 14:16 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Jason Wang, linux-kernel, netdev, David S. Miller

On 03/02/2012 02:50 PM, Sergei Trofimovich wrote:
>>>> The change looks like a typo (division flipped to multiplication):
>>>>> limit = nr_free_buffer_pages() / 8;
>>>>> limit = nr_free_buffer_pages()<<   (PAGE_SHIFT - 10);
>>>
>>> Hi, thanks for the reporting. It's not a typo. It was previously:
>>> sysctl_tcp_mem[1]<<  (PAGE_SHIFT -  7). Looks like we need to do the
>>> limit check before shift the value. Please try the following patch, thanks.
>>
>> Still does not help. I test it by checking sha1sum of a large file over NFS
>> (small files seem to work simetimes):
>>
>>      $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
>>      ...
>>      open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
>>      <HUNG>
>> After a certain timeout dmesg gets odd spam:
>> [  314.848094] nfs: server vmhost not responding, still trying
>> [  314.848134] nfs: server vmhost not responding, still trying
>> [  314.848145] nfs: server vmhost not responding, still trying
>> [  314.957047] nfs: server vmhost not responding, still trying
>> [  314.957066] nfs: server vmhost not responding, still trying
>> [  314.957075] nfs: server vmhost not responding, still trying
>> [  314.957085] nfs: server vmhost not responding, still trying
>> [  314.957100] nfs: server vmhost not responding, still trying
>> [  314.958023] nfs: server vmhost not responding, still trying
>> [  314.958035] nfs: server vmhost not responding, still trying
>> [  314.958044] nfs: server vmhost not responding, still trying
>> [  314.958054] nfs: server vmhost not responding, still trying
>>
>> looks like bogus messages. Might be relevant to mishandled timings
>> somewhere else or a bug in nfs code.
>
> And after 120 seconds hung tasks shows it might be an OOM issue
> Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
> not running anything heavy:

That is a bit weird.

First because with Jason's patch, we should end up with the very same 
calculation, at the same exact order, as it was in older kernels.
Second, because by shifting << 10, you should be ending up with very 
small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the 
same for wmem.

Can you share which numbers you end up with at 
/proc/sys/net/ipv4/tcp_{r,w}mem ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-03 14:16       ` Glauber Costa
@ 2012-03-03 14:43         ` Sergei Trofimovich
  2012-03-03 23:27           ` Glauber Costa
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-03 14:43 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Jason Wang, linux-kernel, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2711 bytes --]

On Sat, 3 Mar 2012 11:16:41 -0300
Glauber Costa <glommer@parallels.com> wrote:

> On 03/02/2012 02:50 PM, Sergei Trofimovich wrote:
> >>>> The change looks like a typo (division flipped to multiplication):
> >>>>> limit = nr_free_buffer_pages() / 8;
> >>>>> limit = nr_free_buffer_pages()<<   (PAGE_SHIFT - 10);
> >>>
> >>> Hi, thanks for the reporting. It's not a typo. It was previously:
> >>> sysctl_tcp_mem[1]<<  (PAGE_SHIFT -  7). Looks like we need to do the
> >>> limit check before shift the value. Please try the following patch, thanks.
> >>
> >> Still does not help. I test it by checking sha1sum of a large file over NFS
> >> (small files seem to work simetimes):
> >>
> >>      $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
> >>      ...
> >>      open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
> >>      <HUNG>
> >> After a certain timeout dmesg gets odd spam:
> >> [  314.848094] nfs: server vmhost not responding, still trying
> >> [  314.848134] nfs: server vmhost not responding, still trying
> >> [  314.848145] nfs: server vmhost not responding, still trying
> >> [  314.957047] nfs: server vmhost not responding, still trying
> >> [  314.957066] nfs: server vmhost not responding, still trying
> >> [  314.957075] nfs: server vmhost not responding, still trying
> >> [  314.957085] nfs: server vmhost not responding, still trying
> >> [  314.957100] nfs: server vmhost not responding, still trying
> >> [  314.958023] nfs: server vmhost not responding, still trying
> >> [  314.958035] nfs: server vmhost not responding, still trying
> >> [  314.958044] nfs: server vmhost not responding, still trying
> >> [  314.958054] nfs: server vmhost not responding, still trying
> >>
> >> looks like bogus messages. Might be relevant to mishandled timings
> >> somewhere else or a bug in nfs code.
> >
> > And after 120 seconds hung tasks shows it might be an OOM issue
> > Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
> > not running anything heavy:
> 
> That is a bit weird.
> 
> First because with Jason's patch, we should end up with the very same 
> calculation, at the same exact order, as it was in older kernels.
> Second, because by shifting << 10, you should be ending up with very 
> small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the 
> same for wmem.
> 
> Can you share which numbers you end up with at 
> /proc/sys/net/ipv4/tcp_{r,w}mem ?
> 

Sure:

    $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
    4096    87380   1999072
    4096    16384   1999072

Nothing special with NFS nere, so I guess it uses UDP.
TCP works fine on machine (I do everything via SSH).

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-03 14:43         ` Sergei Trofimovich
@ 2012-03-03 23:27           ` Glauber Costa
  2012-03-04  9:14             ` Sergei Trofimovich
  0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2012-03-03 23:27 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Jason Wang, linux-kernel, netdev, David S. Miller

On 03/03/2012 11:43 AM, Sergei Trofimovich wrote:
> On Sat, 3 Mar 2012 11:16:41 -0300
> Glauber Costa<glommer@parallels.com>  wrote:
>
>> On 03/02/2012 02:50 PM, Sergei Trofimovich wrote:
>>>>>> The change looks like a typo (division flipped to multiplication):
>>>>>>> limit = nr_free_buffer_pages() / 8;
>>>>>>> limit = nr_free_buffer_pages()<<    (PAGE_SHIFT - 10);
>>>>>
>>>>> Hi, thanks for the reporting. It's not a typo. It was previously:
>>>>> sysctl_tcp_mem[1]<<   (PAGE_SHIFT -  7). Looks like we need to do the
>>>>> limit check before shift the value. Please try the following patch, thanks.
>>>>
>>>> Still does not help. I test it by checking sha1sum of a large file over NFS
>>>> (small files seem to work simetimes):
>>>>
>>>>       $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
>>>>       ...
>>>>       open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
>>>>       <HUNG>
>>>> After a certain timeout dmesg gets odd spam:
>>>> [  314.848094] nfs: server vmhost not responding, still trying
>>>> [  314.848134] nfs: server vmhost not responding, still trying
>>>> [  314.848145] nfs: server vmhost not responding, still trying
>>>> [  314.957047] nfs: server vmhost not responding, still trying
>>>> [  314.957066] nfs: server vmhost not responding, still trying
>>>> [  314.957075] nfs: server vmhost not responding, still trying
>>>> [  314.957085] nfs: server vmhost not responding, still trying
>>>> [  314.957100] nfs: server vmhost not responding, still trying
>>>> [  314.958023] nfs: server vmhost not responding, still trying
>>>> [  314.958035] nfs: server vmhost not responding, still trying
>>>> [  314.958044] nfs: server vmhost not responding, still trying
>>>> [  314.958054] nfs: server vmhost not responding, still trying
>>>>
>>>> looks like bogus messages. Might be relevant to mishandled timings
>>>> somewhere else or a bug in nfs code.
>>>
>>> And after 120 seconds hung tasks shows it might be an OOM issue
>>> Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
>>> not running anything heavy:
>>
>> That is a bit weird.
>>
>> First because with Jason's patch, we should end up with the very same
>> calculation, at the same exact order, as it was in older kernels.
>> Second, because by shifting<<  10, you should be ending up with very
>> small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the
>> same for wmem.
>>
>> Can you share which numbers you end up with at
>> /proc/sys/net/ipv4/tcp_{r,w}mem ?
>>
>
> Sure:
>
>      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
>      4096    87380   1999072
>      4096    16384   1999072
>
Sergei,

Sorry for not being clearer. I was expecting you'd post those values
both in the scenario in which you see the bug, and in the scenario you
don't.

> Nothing special with NFS nere, so I guess it uses UDP.
> TCP works fine on machine (I do everything via SSH).

Can you confirm that? If you're using nfs through udp, it makes
even less sense that the default values of tcp sock mem will harm
you. So it might be a bug somewhere else...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-03 23:27           ` Glauber Costa
@ 2012-03-04  9:14             ` Sergei Trofimovich
  2012-03-05  6:18               ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-04  9:14 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Jason Wang, linux-kernel, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 4294 bytes --]

On Sat, 3 Mar 2012 20:27:17 -0300
Glauber Costa <glommer@parallels.com> wrote:

> On 03/03/2012 11:43 AM, Sergei Trofimovich wrote:
> > On Sat, 3 Mar 2012 11:16:41 -0300
> > Glauber Costa<glommer@parallels.com>  wrote:
> >
> >> On 03/02/2012 02:50 PM, Sergei Trofimovich wrote:
> >>>>>> The change looks like a typo (division flipped to multiplication):
> >>>>>>> limit = nr_free_buffer_pages() / 8;
> >>>>>>> limit = nr_free_buffer_pages()<<    (PAGE_SHIFT - 10);
> >>>>>
> >>>>> Hi, thanks for the reporting. It's not a typo. It was previously:
> >>>>> sysctl_tcp_mem[1]<<   (PAGE_SHIFT -  7). Looks like we need to do the
> >>>>> limit check before shift the value. Please try the following patch, thanks.
> >>>>
> >>>> Still does not help. I test it by checking sha1sum of a large file over NFS
> >>>> (small files seem to work simetimes):
> >>>>
> >>>>       $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
> >>>>       ...
> >>>>       open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
> >>>>       <HUNG>
> >>>> After a certain timeout dmesg gets odd spam:
> >>>> [  314.848094] nfs: server vmhost not responding, still trying
> >>>> [  314.848134] nfs: server vmhost not responding, still trying
> >>>> [  314.848145] nfs: server vmhost not responding, still trying
> >>>> [  314.957047] nfs: server vmhost not responding, still trying
> >>>> [  314.957066] nfs: server vmhost not responding, still trying
> >>>> [  314.957075] nfs: server vmhost not responding, still trying
> >>>> [  314.957085] nfs: server vmhost not responding, still trying
> >>>> [  314.957100] nfs: server vmhost not responding, still trying
> >>>> [  314.958023] nfs: server vmhost not responding, still trying
> >>>> [  314.958035] nfs: server vmhost not responding, still trying
> >>>> [  314.958044] nfs: server vmhost not responding, still trying
> >>>> [  314.958054] nfs: server vmhost not responding, still trying
> >>>>
> >>>> looks like bogus messages. Might be relevant to mishandled timings
> >>>> somewhere else or a bug in nfs code.
> >>>
> >>> And after 120 seconds hung tasks shows it might be an OOM issue
> >>> Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
> >>> not running anything heavy:
> >>
> >> That is a bit weird.
> >>
> >> First because with Jason's patch, we should end up with the very same
> >> calculation, at the same exact order, as it was in older kernels.
> >> Second, because by shifting<<  10, you should be ending up with very
> >> small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the
> >> same for wmem.
> >>
> >> Can you share which numbers you end up with at
> >> /proc/sys/net/ipv4/tcp_{r,w}mem ?
> >>
> >
> > Sure:
> >
> >      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
> >      4096    87380   1999072
> >      4096    16384   1999072
> >
> Sergei,
> 
> Sorry for not being clearer. I was expecting you'd post those values
> both in the scenario in which you see the bug, and in the scenario you
> don't.

Ah, I see.  Sorry. Patches are on top of v3.3-rc5-166-g1f033c1. Buggy one:
> -       limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
> -       limit = max(limit, 128UL);
> +       limit = nr_free_buffer_pages() / 8;
> +       limit = max(limit, 128UL) << (PAGE_SHIFT - 7);
>         max_share = min(4UL*1024*1024, limit);
> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
    $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
    4096    87380   1999072
    4096    16384   1999072

Working one:
> -       limit = nr_free_buffer_pages() << (PAGE_SHIFT - 10);
> +       limit = nr_free_buffer_pages() >> (PAGE_SHIFT - 10);
>         limit = max(limit, 128UL);
>         max_share = min(4UL*1024*1024, limit);
> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
    $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
    4096    87380   124942
    4096    16384   124942

> > Nothing special with NFS nere, so I guess it uses UDP.
> > TCP works fine on machine (I do everything via SSH).
> 
> Can you confirm that? If you're using nfs through udp, it makes
> even less sense that the default values of tcp sock mem will harm
> you. So it might be a bug somewhere else...

Rechecked with tcpdump. It uses TCP.

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-04  9:14             ` Sergei Trofimovich
@ 2012-03-05  6:18               ` Jason Wang
  2012-03-05 18:22                 ` Sergei Trofimovich
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2012-03-05  6:18 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Glauber Costa, linux-kernel, netdev, David S. Miller

On 03/04/2012 05:14 PM, Sergei Trofimovich wrote:
> On Sat, 3 Mar 2012 20:27:17 -0300
> Glauber Costa<glommer@parallels.com>  wrote:
>
>> On 03/03/2012 11:43 AM, Sergei Trofimovich wrote:
>>> On Sat, 3 Mar 2012 11:16:41 -0300
>>> Glauber Costa<glommer@parallels.com>   wrote:
>>>
>>>> On 03/02/2012 02:50 PM, Sergei Trofimovich wrote:
>>>>>>>> The change looks like a typo (division flipped to multiplication):
>>>>>>>>> limit = nr_free_buffer_pages() / 8;
>>>>>>>>> limit = nr_free_buffer_pages()<<     (PAGE_SHIFT - 10);
>>>>>>> Hi, thanks for the reporting. It's not a typo. It was previously:
>>>>>>> sysctl_tcp_mem[1]<<    (PAGE_SHIFT -  7). Looks like we need to do the
>>>>>>> limit check before shift the value. Please try the following patch, thanks.
>>>>>> Still does not help. I test it by checking sha1sum of a large file over NFS
>>>>>> (small files seem to work simetimes):
>>>>>>
>>>>>>        $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
>>>>>>        ...
>>>>>>        open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
>>>>>>        <HUNG>
Hi Sergei:

Looks like the client does not even start to read the file.
>>>>>> After a certain timeout dmesg gets odd spam:
>>>>>> [  314.848094] nfs: server vmhost not responding, still trying
>>>>>> [  314.848134] nfs: server vmhost not responding, still trying
>>>>>> [  314.848145] nfs: server vmhost not responding, still trying
>>>>>> [  314.957047] nfs: server vmhost not responding, still trying
>>>>>> [  314.957066] nfs: server vmhost not responding, still trying
>>>>>> [  314.957075] nfs: server vmhost not responding, still trying
>>>>>> [  314.957085] nfs: server vmhost not responding, still trying
>>>>>> [  314.957100] nfs: server vmhost not responding, still trying
>>>>>> [  314.958023] nfs: server vmhost not responding, still trying
>>>>>> [  314.958035] nfs: server vmhost not responding, still trying
>>>>>> [  314.958044] nfs: server vmhost not responding, still trying
>>>>>> [  314.958054] nfs: server vmhost not responding, still trying
>>>>>>
>>>>>> looks like bogus messages. Might be relevant to mishandled timings
>>>>>> somewhere else or a bug in nfs code.

  Did you use a virtual machine as your NFS server? Have you tried to 
bisect the server side code?
>>>>> And after 120 seconds hung tasks shows it might be an OOM issue
>>>>> Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
>>>>> not running anything heavy:
>>>> That is a bit weird.
>>>>
>>>> First because with Jason's patch, we should end up with the very same
>>>> calculation, at the same exact order, as it was in older kernels.
>>>> Second, because by shifting<<   10, you should be ending up with very
>>>> small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the
>>>> same for wmem.
>>>>
>>>> Can you share which numbers you end up with at
>>>> /proc/sys/net/ipv4/tcp_{r,w}mem ?
>>>>
>>> Sure:
>>>
>>>       $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
>>>       4096    87380   1999072
>>>       4096    16384   1999072
>>>
>> Sergei,
>>
>> Sorry for not being clearer. I was expecting you'd post those values
>> both in the scenario in which you see the bug, and in the scenario you
>> don't.
> Ah, I see.  Sorry. Patches are on top of v3.3-rc5-166-g1f033c1. Buggy one:
>> -       limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
>> -       limit = max(limit, 128UL);
>> +       limit = nr_free_buffer_pages() / 8;
>> +       limit = max(limit, 128UL)<<  (PAGE_SHIFT - 7);
>>          max_share = min(4UL*1024*1024, limit);
>> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
>      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
>      4096    87380   1999072
>      4096    16384   1999072

Nothing strange to me.
> Working one:
>> -       limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
>> +       limit = nr_free_buffer_pages()>>  (PAGE_SHIFT - 10);
>>          limit = max(limit, 128UL);
>>          max_share = min(4UL*1024*1024, limit);
>> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
>      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
>      4096    87380   124942
>      4096    16384   124942

This one looks small to me, as the tcp_{r,w}mem were count by bytes and 
limit were count by number of pages, so we need to shift PAGE_SHIFT.

As I can't reproduce this locally, in order to narrow down the problem, 
could you please help to check whether the issue were 
introduced/eliminated by commit  4acb4190 or 3dc43e3?

Thanks
>>> Nothing special with NFS nere, so I guess it uses UDP.
>>> TCP works fine on machine (I do everything via SSH).
>> Can you confirm that? If you're using nfs through udp, it makes
>> even less sense that the default values of tcp sock mem will harm
>> you. So it might be a bug somewhere else...
> Rechecked with tcpdump. It uses TCP.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-05  6:18               ` Jason Wang
@ 2012-03-05 18:22                 ` Sergei Trofimovich
  2012-03-06  3:22                   ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Trofimovich @ 2012-03-05 18:22 UTC (permalink / raw)
  To: Jason Wang; +Cc: Glauber Costa, linux-kernel, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 4903 bytes --]

> >>>>>>>> The change looks like a typo (division flipped to multiplication):
> >>>>>>>>> limit = nr_free_buffer_pages() / 8;
> >>>>>>>>> limit = nr_free_buffer_pages()<<     (PAGE_SHIFT - 10);
> >>>>>>> Hi, thanks for the reporting. It's not a typo. It was previously:
> >>>>>>> sysctl_tcp_mem[1]<<    (PAGE_SHIFT -  7). Looks like we need to do the
> >>>>>>> limit check before shift the value. Please try the following patch, thanks.
> >>>>>> Still does not help. I test it by checking sha1sum of a large file over NFS
> >>>>>> (small files seem to work simetimes):
> >>>>>>
> >>>>>>        $ strace sha1sum /gentoo/distfiles/gcc-4.6.2.tar.bz2
> >>>>>>        ...
> >>>>>>        open("/gentoo/distfiles/gcc-4.6.2.tar.bz2", O_RDONLY
> >>>>>>        <HUNG>
> Hi Sergei:
> 
> Looks like the client does not even start to read the file.
> >>>>>> After a certain timeout dmesg gets odd spam:
> >>>>>> [  314.848094] nfs: server vmhost not responding, still trying
> >>>>>> [  314.848134] nfs: server vmhost not responding, still trying
> >>>>>> [  314.848145] nfs: server vmhost not responding, still trying
> >>>>>> [  314.957047] nfs: server vmhost not responding, still trying
> >>>>>> [  314.957066] nfs: server vmhost not responding, still trying
> >>>>>> [  314.957075] nfs: server vmhost not responding, still trying
> >>>>>> [  314.957085] nfs: server vmhost not responding, still trying
> >>>>>> [  314.957100] nfs: server vmhost not responding, still trying
> >>>>>> [  314.958023] nfs: server vmhost not responding, still trying
> >>>>>> [  314.958035] nfs: server vmhost not responding, still trying
> >>>>>> [  314.958044] nfs: server vmhost not responding, still trying
> >>>>>> [  314.958054] nfs: server vmhost not responding, still trying
> >>>>>>
> >>>>>> looks like bogus messages. Might be relevant to mishandled timings
> >>>>>> somewhere else or a bug in nfs code.
> 
>   Did you use a virtual machine as your NFS server? Have you tried to 
> bisect the server side code?
> >>>>> And after 120 seconds hung tasks shows it might be an OOM issue
> >>>>> Likely caused by patch, as it's a 2GB RAM +4GB swap amd64 box
> >>>>> not running anything heavy:
> >>>> That is a bit weird.
> >>>>
> >>>> First because with Jason's patch, we should end up with the very same
> >>>> calculation, at the same exact order, as it was in older kernels.
> >>>> Second, because by shifting<<   10, you should be ending up with very
> >>>> small numbers, effectively having tcp_rmem[1] == tcp_rmem[2], and the
> >>>> same for wmem.
> >>>>
> >>>> Can you share which numbers you end up with at
> >>>> /proc/sys/net/ipv4/tcp_{r,w}mem ?
> >>>>
> >>> Sure:
> >>>
> >>>       $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
> >>>       4096    87380   1999072
> >>>       4096    16384   1999072
> >>>
> >> Sergei,
> >>
> >> Sorry for not being clearer. I was expecting you'd post those values
> >> both in the scenario in which you see the bug, and in the scenario you
> >> don't.
> > Ah, I see.  Sorry. Patches are on top of v3.3-rc5-166-g1f033c1. Buggy one:
> >> -       limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
> >> -       limit = max(limit, 128UL);
> >> +       limit = nr_free_buffer_pages() / 8;
> >> +       limit = max(limit, 128UL)<<  (PAGE_SHIFT - 7);
> >>          max_share = min(4UL*1024*1024, limit);
> >> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
> >      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
> >      4096    87380   1999072
> >      4096    16384   1999072
> 
> Nothing strange to me.
> > Working one:
> >> -       limit = nr_free_buffer_pages()<<  (PAGE_SHIFT - 10);
> >> +       limit = nr_free_buffer_pages()>>  (PAGE_SHIFT - 10);
> >>          limit = max(limit, 128UL);
> >>          max_share = min(4UL*1024*1024, limit);
> >> +       printk(KERN_INFO "TCP: max_share=%u\n", max_share);
> >      $ cat /proc/sys/net/ipv4/tcp_{r,w}mem
> >      4096    87380   124942
> >      4096    16384   124942
> 
> This one looks small to me, as the tcp_{r,w}mem were count by bytes and 
> limit were count by number of pages, so we need to shift PAGE_SHIFT.
> 
> As I can't reproduce this locally, in order to narrow down the problem, 
> could you please help to check whether the issue were 
> introduced/eliminated by commit  4acb4190 or 3dc43e3?

I didn't think of server problem. I did run 3.3-rc0 kernel there
from the kvm tree (v3.2-10396-g05ef4c6):
    commit 05ef4c60568ed1740f65bf66a76da30b19060119
    Author: Michael S. Tsirkin <mst@redhat.com>
    Date:   Wed Jan 18 20:07:09 2012 +0200

        kvm: fix error handling for out of range irq

    from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 
Updating to current vanilla 3.3-rc6 solved the problem.
Are you interested in digging that issue further to find commit
breaking the server?

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression)
  2012-03-05 18:22                 ` Sergei Trofimovich
@ 2012-03-06  3:22                   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2012-03-06  3:22 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: Glauber Costa, linux-kernel, netdev, David S. Miller

On 03/06/2012 02:22 AM, Sergei Trofimovich wrote:
>> This one looks small to me, as the tcp_{r,w}mem were count by bytes and
>> >  limit were count by number of pages, so we need to shift PAGE_SHIFT.
>> >  
>> >  As I can't reproduce this locally, in order to narrow down the problem,
>> >  could you please help to check whether the issue were
>> >  introduced/eliminated by commit  4acb4190 or 3dc43e3?
> I didn't think of server problem. I did run 3.3-rc0 kernel there
> from the kvm tree (v3.2-10396-g05ef4c6):
>      commit 05ef4c60568ed1740f65bf66a76da30b19060119
>      Author: Michael S. Tsirkin<mst@redhat.com>
>      Date:   Wed Jan 18 20:07:09 2012 +0200
>
>          kvm: fix error handling for out of range irq
>
>      from git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>
> Updating to current vanilla 3.3-rc6 solved the problem.
> Are you interested in digging that issue further to find commit
> breaking the server?
Hi Sergei:

So it's very possible that it's a bug which is blacklisted by early 
commits and uncovered by my commit ( 
c43b874d5d714f271b80d4c3f49e05d0cbf51ed2). I wish I can help in nfs 
debugging, but I'm not familiar with that.

thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-06  3:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-02  7:59 [PATCHv 2] tcp: properly initialize tcp memory limits part 2 (fix nfs regression) Sergei Trofimovich
2012-03-02  9:22 ` Jason Wang
2012-03-02 17:24   ` Sergei Trofimovich
2012-03-02 17:50     ` Sergei Trofimovich
2012-03-03 14:16       ` Glauber Costa
2012-03-03 14:43         ` Sergei Trofimovich
2012-03-03 23:27           ` Glauber Costa
2012-03-04  9:14             ` Sergei Trofimovich
2012-03-05  6:18               ` Jason Wang
2012-03-05 18:22                 ` Sergei Trofimovich
2012-03-06  3:22                   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).