From: Bron Gondwana <brong@fastmail.fm>
To: Christoph Lameter <cl@linux.com>
Cc: Bron Gondwana <brong@fastmail.fm>,
Robert Mueller <robm@fastmail.fm>,
Shaohua Li <shaohua.li@intel.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, Mel Gorman <mel@csn.ul.ie>
Subject: Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad for file/email/web servers
Date: Sat, 18 Sep 2010 09:01:48 +1000 [thread overview]
Message-ID: <20100917230148.GA10636@brong.net> (raw)
In-Reply-To: <alpine.DEB.2.00.1009170916130.11900@router.home>
On Fri, Sep 17, 2010 at 09:22:00AM -0500, Christoph Lameter wrote:
> On Sat, 18 Sep 2010, Bron Gondwana wrote:
>
> > > From the first look that seems to be the problem. You do not need to be
> > > bound to a particular cpu, the scheduler will just leave a single process
> > > on the same cpu by default. If you then allocate all memory only from this
> > > process then you get the scenario that you described.
> >
> > Huh? Which bit of forking server makes you think one process is allocating
> > lots of memory? They're opening and reading from files. Unless you're
> > calling the kernel a "single process".
>
> I have no idea what your app does.
Ok - Cyrus IMAPd has been around for ages. It's an open source email
server built on a very traditional single-process model.
* a master process which reads config files and manages the other process
* multiple imapd processes, one per connection
* multiple pop3d processes, one per connection
* multiple lmtpd processes, one per connection
* periodical "cleanup" processes.
Each of these is started by the lightweight master forking and then
execing the appropriate daemon.
In our configuration we run 20 separate "master" processes, each
managing a single disk partition's worth of email. The reason
for this is reduced locking contention for the central mailboxes
database, and also better replication concurrency, because each
instance runs a single replication process - so replication is
sequential.
> The data that I glanced over looks as
> if most allocations happen for a particular memory node
Sorry, which data?
> and since the
> memory is optimized to be local to that node other memory is not used
> intensively. This can occur because of allocations through one process /
> thread that is always running on the same cpu and therefore always
> allocates from the memory node local to that cpu.
As Rob said, there are thousands of independent processes, each opening
a single mailbox (3 separate metadata files plus possibly hundreds of
individual email files). It's likely that diffenent processes will open
the same mailbox over time - for example an email client opening multiple
concurrent connections, and at the same time an lmtpd connecting and
delivering new emails to the mailbox.
> It can also happen f.e. if a driver always allocates memory local to the
> I/O bus that it is using.
None of what we're doing is super weird advanced stuff, it's a vanilla
forking daemon where a single process run and does stuff on behalf of
a user. The only slightly interesting things:
1) each "service" has a single lock file, and all the idle processes of
that type (i.e. imapd) block on that lock while they're waiting for
a connection. This is to avoid thundering herd on operating systems
which aren't nice about it. The winner does the accept and handles
the connection.
2) once it's finished processing a request, the process will wait for
another connection rather than closing.
Nothing sounds like what you're talking about (one giant process that's
all on one CPU), and I don't know why you keep talking about it. It's
nothing like what we're running on these machines.
Bron.
WARNING: multiple messages have this Message-ID (diff)
From: Bron Gondwana <brong@fastmail.fm>
To: Christoph Lameter <cl@linux.com>
Cc: Bron Gondwana <brong@fastmail.fm>,
Robert Mueller <robm@fastmail.fm>,
Shaohua Li <shaohua.li@intel.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, Mel Gorman <mel@csn.ul.ie>
Subject: Re: Default zone_reclaim_mode = 1 on NUMA kernel is bad for file/email/web servers
Date: Sat, 18 Sep 2010 09:01:48 +1000 [thread overview]
Message-ID: <20100917230148.GA10636@brong.net> (raw)
In-Reply-To: <alpine.DEB.2.00.1009170916130.11900@router.home>
On Fri, Sep 17, 2010 at 09:22:00AM -0500, Christoph Lameter wrote:
> On Sat, 18 Sep 2010, Bron Gondwana wrote:
>
> > > From the first look that seems to be the problem. You do not need to be
> > > bound to a particular cpu, the scheduler will just leave a single process
> > > on the same cpu by default. If you then allocate all memory only from this
> > > process then you get the scenario that you described.
> >
> > Huh? Which bit of forking server makes you think one process is allocating
> > lots of memory? They're opening and reading from files. Unless you're
> > calling the kernel a "single process".
>
> I have no idea what your app does.
Ok - Cyrus IMAPd has been around for ages. It's an open source email
server built on a very traditional single-process model.
* a master process which reads config files and manages the other process
* multiple imapd processes, one per connection
* multiple pop3d processes, one per connection
* multiple lmtpd processes, one per connection
* periodical "cleanup" processes.
Each of these is started by the lightweight master forking and then
execing the appropriate daemon.
In our configuration we run 20 separate "master" processes, each
managing a single disk partition's worth of email. The reason
for this is reduced locking contention for the central mailboxes
database, and also better replication concurrency, because each
instance runs a single replication process - so replication is
sequential.
> The data that I glanced over looks as
> if most allocations happen for a particular memory node
Sorry, which data?
> and since the
> memory is optimized to be local to that node other memory is not used
> intensively. This can occur because of allocations through one process /
> thread that is always running on the same cpu and therefore always
> allocates from the memory node local to that cpu.
As Rob said, there are thousands of independent processes, each opening
a single mailbox (3 separate metadata files plus possibly hundreds of
individual email files). It's likely that diffenent processes will open
the same mailbox over time - for example an email client opening multiple
concurrent connections, and at the same time an lmtpd connecting and
delivering new emails to the mailbox.
> It can also happen f.e. if a driver always allocates memory local to the
> I/O bus that it is using.
None of what we're doing is super weird advanced stuff, it's a vanilla
forking daemon where a single process run and does stuff on behalf of
a user. The only slightly interesting things:
1) each "service" has a single lock file, and all the idle processes of
that type (i.e. imapd) block on that lock while they're waiting for
a connection. This is to avoid thundering herd on operating systems
which aren't nice about it. The winner does the accept and handles
the connection.
2) once it's finished processing a request, the process will wait for
another connection rather than closing.
Nothing sounds like what you're talking about (one giant process that's
all on one CPU), and I don't know why you keep talking about it. It's
nothing like what we're running on these machines.
Bron.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-17 23:01 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-13 3:39 Default zone_reclaim_mode = 1 on NUMA kernel is bad for file/email/web servers Robert Mueller
2010-09-16 10:01 ` KOSAKI Motohiro
2010-09-16 10:01 ` KOSAKI Motohiro
2010-09-16 17:06 ` Christoph Lameter
2010-09-16 17:06 ` Christoph Lameter
2010-09-17 0:50 ` Robert Mueller
2010-09-17 0:50 ` Robert Mueller
2010-09-17 6:01 ` Shaohua Li
2010-09-17 6:01 ` Shaohua Li
2010-09-17 7:32 ` Robert Mueller
2010-09-17 7:32 ` Robert Mueller
2010-09-17 13:56 ` Christoph Lameter
2010-09-17 13:56 ` Christoph Lameter
2010-09-17 14:09 ` Bron Gondwana
2010-09-17 14:09 ` Bron Gondwana
2010-09-17 14:22 ` Christoph Lameter
2010-09-17 14:22 ` Christoph Lameter
2010-09-17 23:01 ` Bron Gondwana [this message]
2010-09-17 23:01 ` Bron Gondwana
2010-09-20 9:34 ` Mel Gorman
2010-09-20 9:34 ` Mel Gorman
2010-09-20 23:41 ` Default zone_reclaim_mode = 1 on NUMA kernel is bad forfile/email/web servers Rob Mueller
2010-09-20 23:41 ` Rob Mueller
2010-09-21 9:04 ` Mel Gorman
2010-09-21 9:04 ` Mel Gorman
2010-09-21 14:14 ` Christoph Lameter
2010-09-21 14:14 ` Christoph Lameter
2010-09-22 3:44 ` Rob Mueller
2010-09-22 3:44 ` Rob Mueller
2010-09-27 2:01 ` KOSAKI Motohiro
2010-09-27 2:01 ` KOSAKI Motohiro
2010-09-27 13:53 ` Christoph Lameter
2010-09-27 13:53 ` Christoph Lameter
2010-09-27 23:17 ` Robert Mueller
2010-09-27 23:17 ` Robert Mueller
2010-09-28 12:35 ` Christoph Lameter
2010-09-28 12:35 ` Christoph Lameter
2010-09-28 12:42 ` Bron Gondwana
2010-09-28 12:42 ` Bron Gondwana
2010-09-28 12:49 ` Christoph Lameter
2010-09-28 12:49 ` Christoph Lameter
2010-09-30 7:05 ` Andi Kleen
2010-09-30 7:05 ` Andi Kleen
2010-10-04 12:45 ` KOSAKI Motohiro
2010-10-04 12:45 ` KOSAKI Motohiro
2010-10-04 13:07 ` Christoph Lameter
2010-10-04 13:07 ` Christoph Lameter
2010-10-05 5:32 ` KOSAKI Motohiro
2010-10-05 5:32 ` KOSAKI Motohiro
2010-10-04 19:43 ` David Rientjes
2010-10-04 19:43 ` David Rientjes
2010-09-21 1:05 ` Default zone_reclaim_mode = 1 on NUMA kernel is bad for file/email/web servers KAMEZAWA Hiroyuki
2010-09-21 1:05 ` KAMEZAWA Hiroyuki
2010-09-27 2:04 ` KOSAKI Motohiro
2010-09-27 2:04 ` KOSAKI Motohiro
2010-09-27 2:06 ` KAMEZAWA Hiroyuki
2010-09-27 2:06 ` KAMEZAWA Hiroyuki
2010-09-23 11:44 ` Balbir Singh
2010-09-23 11:44 ` Balbir Singh
2010-09-30 8:38 ` Bron Gondwana
2010-09-30 8:38 ` Bron Gondwana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100917230148.GA10636@brong.net \
--to=brong@fastmail.fm \
--cc=cl@linux.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=robm@fastmail.fm \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.