From: Robin Holt <holt@sgi.com>
To: Avi Kivity <avi@redhat.com>
Cc: linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
linux-kernel@vger.kernel.org,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 10:12:18 -0500 [thread overview]
Message-ID: <20100928151218.GJ14068@sgi.com> (raw)
In-Reply-To: <4CA1E338.6070201@redhat.com>
On Tue, Sep 28, 2010 at 02:44:40PM +0200, Avi Kivity wrote:
> On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> >This set of patches decouples the concept that a single memory
> >section corresponds to a single directory in
> >/sys/devices/system/memory/. On systems
> >with large amounts of memory (1+ TB) there are perfomance issues
> >related to creating the large number of sysfs directories. For
> >a powerpc machine with 1 TB of memory we are creating 63,000+
> >directories. This is resulting in boot times of around 45-50
> >minutes for systems with 1 TB of memory and 8 hours for systems
> >with 2 TB of memory. With this patch set applied I am now seeing
> >boot times of 5 minutes or less.
> >
> >The root of this issue is in sysfs directory creation. Every time
> >a directory is created a string compare is done against all sibling
> >directories to ensure we do not create duplicates. The list of
> >directory nodes in sysfs is kept as an unsorted list which results
> >in this being an exponentially longer operation as the number of
> >directories are created.
> >
> >The solution solved by this patch set is to allow a single
> >directory in sysfs to span multiple memory sections. This is
> >controlled by an optional architecturally defined function
> >memory_block_size_bytes(). The default definition of this
> >routine returns a memory block size equal to the memory section
> >size. This maintains the current layout of sysfs memory
> >directories as it appears to userspace to remain the same as it
> >is today.
> >
>
> Why not update sysfs directory creation to be fast, for example by
> using an rbtree instead of a linked list. This fixes an
> implementation problem in the kernel instead of working around it
> and creating a new ABI.
Because the old ABI creates 129,000+ entries inside
/sys/devices/system/memory with their associated links from
/sys/devices/system/node/node*/ back to those directory entries.
Thankfully things like rpm, hald, and other miscellaneous commands scan
that information. On our 8 TB test machine, hald runs continuously
following boot for nearly an hour mostly scanning useless information
from /sys/
Robin
>
> New ABIs mean old tools won't work, and new tools need to understand
> both ABIs.
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
WARNING: multiple messages have this Message-ID (diff)
From: Robin Holt <holt@sgi.com>
To: Avi Kivity <avi@redhat.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 10:12:18 -0500 [thread overview]
Message-ID: <20100928151218.GJ14068@sgi.com> (raw)
In-Reply-To: <4CA1E338.6070201@redhat.com>
On Tue, Sep 28, 2010 at 02:44:40PM +0200, Avi Kivity wrote:
> On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> >This set of patches decouples the concept that a single memory
> >section corresponds to a single directory in
> >/sys/devices/system/memory/. On systems
> >with large amounts of memory (1+ TB) there are perfomance issues
> >related to creating the large number of sysfs directories. For
> >a powerpc machine with 1 TB of memory we are creating 63,000+
> >directories. This is resulting in boot times of around 45-50
> >minutes for systems with 1 TB of memory and 8 hours for systems
> >with 2 TB of memory. With this patch set applied I am now seeing
> >boot times of 5 minutes or less.
> >
> >The root of this issue is in sysfs directory creation. Every time
> >a directory is created a string compare is done against all sibling
> >directories to ensure we do not create duplicates. The list of
> >directory nodes in sysfs is kept as an unsorted list which results
> >in this being an exponentially longer operation as the number of
> >directories are created.
> >
> >The solution solved by this patch set is to allow a single
> >directory in sysfs to span multiple memory sections. This is
> >controlled by an optional architecturally defined function
> >memory_block_size_bytes(). The default definition of this
> >routine returns a memory block size equal to the memory section
> >size. This maintains the current layout of sysfs memory
> >directories as it appears to userspace to remain the same as it
> >is today.
> >
>
> Why not update sysfs directory creation to be fast, for example by
> using an rbtree instead of a linked list. This fixes an
> implementation problem in the kernel instead of working around it
> and creating a new ABI.
Because the old ABI creates 129,000+ entries inside
/sys/devices/system/memory with their associated links from
/sys/devices/system/node/node*/ back to those directory entries.
Thankfully things like rpm, hald, and other miscellaneous commands scan
that information. On our 8 TB test machine, hald runs continuously
following boot for nearly an hour mostly scanning useless information
from /sys/
Robin
>
> New ABIs mean old tools won't work, and new tools need to understand
> both ABIs.
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
WARNING: multiple messages have this Message-ID (diff)
From: Robin Holt <holt@sgi.com>
To: Avi Kivity <avi@redhat.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 10:12:18 -0500 [thread overview]
Message-ID: <20100928151218.GJ14068@sgi.com> (raw)
In-Reply-To: <4CA1E338.6070201@redhat.com>
On Tue, Sep 28, 2010 at 02:44:40PM +0200, Avi Kivity wrote:
> On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> >This set of patches decouples the concept that a single memory
> >section corresponds to a single directory in
> >/sys/devices/system/memory/. On systems
> >with large amounts of memory (1+ TB) there are perfomance issues
> >related to creating the large number of sysfs directories. For
> >a powerpc machine with 1 TB of memory we are creating 63,000+
> >directories. This is resulting in boot times of around 45-50
> >minutes for systems with 1 TB of memory and 8 hours for systems
> >with 2 TB of memory. With this patch set applied I am now seeing
> >boot times of 5 minutes or less.
> >
> >The root of this issue is in sysfs directory creation. Every time
> >a directory is created a string compare is done against all sibling
> >directories to ensure we do not create duplicates. The list of
> >directory nodes in sysfs is kept as an unsorted list which results
> >in this being an exponentially longer operation as the number of
> >directories are created.
> >
> >The solution solved by this patch set is to allow a single
> >directory in sysfs to span multiple memory sections. This is
> >controlled by an optional architecturally defined function
> >memory_block_size_bytes(). The default definition of this
> >routine returns a memory block size equal to the memory section
> >size. This maintains the current layout of sysfs memory
> >directories as it appears to userspace to remain the same as it
> >is today.
> >
>
> Why not update sysfs directory creation to be fast, for example by
> using an rbtree instead of a linked list. This fixes an
> implementation problem in the kernel instead of working around it
> and creating a new ABI.
Because the old ABI creates 129,000+ entries inside
/sys/devices/system/memory with their associated links from
/sys/devices/system/node/node*/ back to those directory entries.
Thankfully things like rpm, hald, and other miscellaneous commands scan
that information. On our 8 TB test machine, hald runs continuously
following boot for nearly an hour mostly scanning useless information
from /sys/
Robin
>
> New ABIs mean old tools won't work, and new tools need to understand
> both ABIs.
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-28 15:12 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-27 19:09 [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections Nathan Fontenot
2010-09-27 19:09 ` Nathan Fontenot
2010-09-27 19:09 ` Nathan Fontenot
2010-09-27 19:21 ` [PATCH 1/8] v2 Move find_memory_block() routine Nathan Fontenot
2010-09-27 19:21 ` Nathan Fontenot
2010-09-27 19:21 ` Nathan Fontenot
2010-09-27 19:22 ` [PATCH 2/8] v2 Add section count to memory_block struct Nathan Fontenot
2010-09-27 19:22 ` Nathan Fontenot
2010-09-27 19:22 ` Nathan Fontenot
2010-09-28 9:31 ` Robin Holt
2010-09-28 9:31 ` Robin Holt
2010-09-28 9:31 ` Robin Holt
2010-09-28 18:14 ` Nathan Fontenot
2010-09-28 18:14 ` Nathan Fontenot
2010-09-28 18:14 ` Nathan Fontenot
2010-09-27 19:23 ` [PATCH 3/8] v2 Add mutex for adding/removing memory blocks Nathan Fontenot
2010-09-27 19:23 ` Nathan Fontenot
2010-09-27 19:23 ` Nathan Fontenot
2010-09-27 19:25 ` [PATCH 4/8] v2 Allow memory block to span multiple memory sections Nathan Fontenot
2010-09-27 19:25 ` Nathan Fontenot
2010-09-27 19:25 ` Nathan Fontenot
2010-09-27 23:55 ` Dave Hansen
2010-09-27 23:55 ` Dave Hansen
2010-09-27 23:55 ` Dave Hansen
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 12:48 ` Robin Holt
2010-09-28 12:48 ` Robin Holt
2010-09-28 12:48 ` Robin Holt
2010-09-28 18:20 ` Nathan Fontenot
2010-09-28 18:20 ` Nathan Fontenot
2010-09-28 18:20 ` Nathan Fontenot
2010-09-27 19:26 ` [PATCH 5/8] v2 Add end_phys_index file Nathan Fontenot
2010-09-27 19:26 ` Nathan Fontenot
2010-09-27 19:26 ` Nathan Fontenot
2010-09-27 19:27 ` [PATCH 6/8] v2 Update node sysfs code Nathan Fontenot
2010-09-27 19:27 ` Nathan Fontenot
2010-09-27 19:27 ` Nathan Fontenot
2010-09-28 9:29 ` Robin Holt
2010-09-28 9:29 ` Robin Holt
2010-09-28 9:29 ` Robin Holt
2010-09-28 15:21 ` Dave Hansen
2010-09-28 15:21 ` Dave Hansen
2010-09-28 15:21 ` Dave Hansen
2010-09-27 19:28 ` [PATCH 7/8] v2 Define memory_block_size_bytes() for powerpc/pseries Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` [PATCH 8/8] v2 Update memory hotplug documentation Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-28 12:45 ` Avi Kivity
2010-09-28 12:45 ` Avi Kivity
2010-09-28 12:45 ` Avi Kivity
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 12:38 ` [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections Robin Holt
2010-09-28 12:38 ` Robin Holt
2010-09-28 12:38 ` Robin Holt
2010-09-28 18:17 ` Nathan Fontenot
2010-09-28 18:17 ` Nathan Fontenot
2010-09-28 18:17 ` Nathan Fontenot
2010-09-29 19:28 ` Robin Holt
2010-09-29 19:28 ` Robin Holt
2010-09-29 19:28 ` Robin Holt
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 16:39 ` Robin Holt
2010-09-30 16:39 ` Robin Holt
2010-09-30 16:39 ` Robin Holt
2010-09-28 12:44 ` Avi Kivity
2010-09-28 12:44 ` Avi Kivity
2010-09-28 12:44 ` Avi Kivity
2010-09-28 15:12 ` Robin Holt [this message]
2010-09-28 15:12 ` Robin Holt
2010-09-28 15:12 ` Robin Holt
2010-09-28 16:34 ` Avi Kivity
2010-09-28 16:34 ` Avi Kivity
2010-09-28 16:34 ` Avi Kivity
2010-09-29 2:50 ` Greg KH
2010-09-29 2:50 ` Greg KH
2010-09-29 2:50 ` Greg KH
2010-09-29 8:32 ` Avi Kivity
2010-09-29 8:32 ` Avi Kivity
2010-09-29 8:32 ` Avi Kivity
2010-09-29 12:37 ` Greg KH
2010-09-29 12:37 ` Greg KH
2010-09-29 12:37 ` Greg KH
2010-09-29 13:39 ` Kay Sievers
2010-09-29 13:39 ` Kay Sievers
2010-09-29 13:39 ` Kay Sievers
2010-10-03 7:52 ` Avi Kivity
2010-10-03 7:52 ` Avi Kivity
2010-10-03 7:52 ` Avi Kivity
2010-09-28 15:17 ` Dave Hansen
2010-09-28 15:17 ` Dave Hansen
2010-09-28 15:17 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100928151218.GJ14068@sgi.com \
--to=holt@sgi.com \
--cc=avi@redhat.com \
--cc=dave@linux.vnet.ibm.com \
--cc=greg@kroah.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.