All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Banks <gnb@fastmail.fm>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: NeilBrown <neilb@suse.de>,
	linux-nfs@vger.kernel.org, David Miller <davem@davemloft.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>
Subject: Re: Fw: [PATCH] sunrpc: use better NUMA affinities
Date: Fri, 29 Jul 2011 16:53:21 +1000	[thread overview]
Message-ID: <4E3258E1.6020000@fastmail.fm> (raw)
In-Reply-To: <1311921035.7845.10.camel@edumazet-laptop>

On 29/07/11 16:30, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 16:05 +1000, Greg Banks a écrit :
>> On 29/07/11 15:32, NeilBrown wrote:
>>
>> I seem to remember coming to the conclusion that Jeff eventually
>> addressed this problem...am I misremembering or did something regress?
>>
> Currently, all nfsd kthreads use memory for their kernel stack and
> various initial data from a _single_ node, even if you use
> sunrpc.pool_mode=pernode  (or percpu)

That's just plain broken and I'm very pleased to see you fix it.

I was just surprised that it was still broken and wondering how that 
happened.  Looking at ToT I see that because I dropped the ball in 2008, 
Jeff's patches didn't address the problem.  In ToT 
svc_pool_map_set_cpumask() is called *after* kthread_create() and 
applies to the child thread, *after* it's stack has been allocated on 
the wrong node.  In the working SGI code, svc_pool_map_set_cpumask() is 
called by the parent node on itself *before* calling kernel_thread() or 
doing any of the data structure allocations, thus ensuring that 
everything gets allocated using the default memory allocation policy, 
which on SGI NFS servers was globally tuned to be "node-local".

> With my patch, we make sure each thread gets its stack from its local
> node.
>
> Check commit 94dcf29a11b3d20a (kthread: use kthread_create_on_node()) to
> see how this strategy already was adopted for ksoftirqd, kworker,
> migration, and pktgend kthreads.

Ah, I see.  It's unfortunate that the kthread_create() API ends up being 
passed a CPU number but that's only used to format the name and not for 
sensible things :(

-- 
Greg.


WARNING: multiple messages have this Message-ID (diff)
From: Greg Banks <gnb-97jfqw80gc6171pxa8y+qA@public.gmane.org>
To: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Fw: [PATCH] sunrpc: use better NUMA affinities
Date: Fri, 29 Jul 2011 16:53:21 +1000	[thread overview]
Message-ID: <4E3258E1.6020000@fastmail.fm> (raw)
In-Reply-To: <1311921035.7845.10.camel@edumazet-laptop>

On 29/07/11 16:30, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 16:05 +1000, Greg Banks a écrit :
>> On 29/07/11 15:32, NeilBrown wrote:
>>
>> I seem to remember coming to the conclusion that Jeff eventually
>> addressed this problem...am I misremembering or did something regress?
>>
> Currently, all nfsd kthreads use memory for their kernel stack and
> various initial data from a _single_ node, even if you use
> sunrpc.pool_mode=pernode  (or percpu)

That's just plain broken and I'm very pleased to see you fix it.

I was just surprised that it was still broken and wondering how that 
happened.  Looking at ToT I see that because I dropped the ball in 2008, 
Jeff's patches didn't address the problem.  In ToT 
svc_pool_map_set_cpumask() is called *after* kthread_create() and 
applies to the child thread, *after* it's stack has been allocated on 
the wrong node.  In the working SGI code, svc_pool_map_set_cpumask() is 
called by the parent node on itself *before* calling kernel_thread() or 
doing any of the data structure allocations, thus ensuring that 
everything gets allocated using the default memory allocation policy, 
which on SGI NFS servers was globally tuned to be "node-local".

> With my patch, we make sure each thread gets its stack from its local
> node.
>
> Check commit 94dcf29a11b3d20a (kthread: use kthread_create_on_node()) to
> see how this strategy already was adopted for ksoftirqd, kworker,
> migration, and pktgend kthreads.

Ah, I see.  It's unfortunate that the kthread_create() API ends up being 
passed a CPU number but that's only used to format the name and not for 
sensible things :(

-- 
Greg.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-07-29  6:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20110729153207.17af3085@notabene.brown>
2011-07-29  6:05 ` Fw: [PATCH] sunrpc: use better NUMA affinities Greg Banks
2011-07-29  6:30   ` Eric Dumazet
2011-07-29  6:30     ` Eric Dumazet
2011-07-29  6:53     ` Greg Banks [this message]
2011-07-29  6:53       ` Greg Banks
2011-07-29 10:36       ` Christoph Hellwig
2011-07-29 10:36         ` Christoph Hellwig
2011-07-29 11:58         ` Greg Banks
2011-07-29 12:11           ` Eric Dumazet
2011-07-29 13:30             ` Greg Banks
2011-07-29 13:30               ` Greg Banks
2011-07-29 13:30               ` Greg Banks
2011-07-29 16:48               ` J. Bruce Fields
2011-07-29 16:48                 ` J. Bruce Fields
2011-07-29 16:53                 ` J. Bruce Fields
2011-07-29 18:15                   ` Eric Dumazet
2011-07-29 18:15                     ` Eric Dumazet
2011-07-29 20:34                   ` Greg Banks
2011-07-29 20:34                     ` Greg Banks
2011-07-29 23:30                     ` NeilBrown
2011-07-29 23:30                       ` NeilBrown
     [not found]                       ` <20110730093025.716f3f50-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-07-29 23:48                         ` J. Bruce Fields
2011-07-29 23:48                           ` J. Bruce Fields
2011-07-29 23:48                           ` J. Bruce Fields
2011-07-30  4:08                           ` Eric Dumazet
2011-07-30  4:08                             ` Eric Dumazet
2011-07-30  4:08                             ` Eric Dumazet
2011-07-30  6:06                             ` NeilBrown
2011-07-30  6:06                               ` NeilBrown
2011-07-30  6:23                               ` Eric Dumazet
2011-07-30  6:23                                 ` Eric Dumazet
2011-07-31  6:58                                 ` Eric Dumazet
2011-07-31  6:58                                   ` Eric Dumazet
2011-08-27  0:02                                   ` J. Bruce Fields
2011-08-28 10:02                                     ` Eric Dumazet
2011-08-28 10:02                                       ` Eric Dumazet
2011-08-02  1:06                       ` J. Bruce Fields
2011-08-02  1:06                         ` J. Bruce Fields
2011-07-29 16:45       ` Fw: " J. Bruce Fields
2011-07-29 20:24         ` Greg Banks
2011-07-29 16:48   ` Fw: " J. Bruce Fields
2011-07-29 16:48     ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3258E1.6020000@fastmail.fm \
    --to=gnb@fastmail.fm \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.