From: Avi Kivity <avi@redhat.com>
To: Nathan Fontenot <nfont@austin.ibm.com>
Cc: linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
linux-kernel@vger.kernel.org,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 14:44:40 +0200 [thread overview]
Message-ID: <4CA1E338.6070201@redhat.com> (raw)
In-Reply-To: <4CA0EBEB.1030204@austin.ibm.com>
On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> This set of patches decouples the concept that a single memory
> section corresponds to a single directory in
> /sys/devices/system/memory/. On systems
> with large amounts of memory (1+ TB) there are perfomance issues
> related to creating the large number of sysfs directories. For
> a powerpc machine with 1 TB of memory we are creating 63,000+
> directories. This is resulting in boot times of around 45-50
> minutes for systems with 1 TB of memory and 8 hours for systems
> with 2 TB of memory. With this patch set applied I am now seeing
> boot times of 5 minutes or less.
>
> The root of this issue is in sysfs directory creation. Every time
> a directory is created a string compare is done against all sibling
> directories to ensure we do not create duplicates. The list of
> directory nodes in sysfs is kept as an unsorted list which results
> in this being an exponentially longer operation as the number of
> directories are created.
>
> The solution solved by this patch set is to allow a single
> directory in sysfs to span multiple memory sections. This is
> controlled by an optional architecturally defined function
> memory_block_size_bytes(). The default definition of this
> routine returns a memory block size equal to the memory section
> size. This maintains the current layout of sysfs memory
> directories as it appears to userspace to remain the same as it
> is today.
>
Why not update sysfs directory creation to be fast, for example by using
an rbtree instead of a linked list. This fixes an implementation
problem in the kernel instead of working around it and creating a new ABI.
New ABIs mean old tools won't work, and new tools need to understand
both ABIs.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Nathan Fontenot <nfont@austin.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 14:44:40 +0200 [thread overview]
Message-ID: <4CA1E338.6070201@redhat.com> (raw)
In-Reply-To: <4CA0EBEB.1030204@austin.ibm.com>
On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> This set of patches decouples the concept that a single memory
> section corresponds to a single directory in
> /sys/devices/system/memory/. On systems
> with large amounts of memory (1+ TB) there are perfomance issues
> related to creating the large number of sysfs directories. For
> a powerpc machine with 1 TB of memory we are creating 63,000+
> directories. This is resulting in boot times of around 45-50
> minutes for systems with 1 TB of memory and 8 hours for systems
> with 2 TB of memory. With this patch set applied I am now seeing
> boot times of 5 minutes or less.
>
> The root of this issue is in sysfs directory creation. Every time
> a directory is created a string compare is done against all sibling
> directories to ensure we do not create duplicates. The list of
> directory nodes in sysfs is kept as an unsorted list which results
> in this being an exponentially longer operation as the number of
> directories are created.
>
> The solution solved by this patch set is to allow a single
> directory in sysfs to span multiple memory sections. This is
> controlled by an optional architecturally defined function
> memory_block_size_bytes(). The default definition of this
> routine returns a memory block size equal to the memory section
> size. This maintains the current layout of sysfs memory
> directories as it appears to userspace to remain the same as it
> is today.
>
Why not update sysfs directory creation to be fast, for example by using
an rbtree instead of a linked list. This fixes an implementation
problem in the kernel instead of working around it and creating a new ABI.
New ABIs mean old tools won't work, and new tools need to understand
both ABIs.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Nathan Fontenot <nfont@austin.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
Date: Tue, 28 Sep 2010 14:44:40 +0200 [thread overview]
Message-ID: <4CA1E338.6070201@redhat.com> (raw)
In-Reply-To: <4CA0EBEB.1030204@austin.ibm.com>
On 09/27/2010 09:09 PM, Nathan Fontenot wrote:
> This set of patches decouples the concept that a single memory
> section corresponds to a single directory in
> /sys/devices/system/memory/. On systems
> with large amounts of memory (1+ TB) there are perfomance issues
> related to creating the large number of sysfs directories. For
> a powerpc machine with 1 TB of memory we are creating 63,000+
> directories. This is resulting in boot times of around 45-50
> minutes for systems with 1 TB of memory and 8 hours for systems
> with 2 TB of memory. With this patch set applied I am now seeing
> boot times of 5 minutes or less.
>
> The root of this issue is in sysfs directory creation. Every time
> a directory is created a string compare is done against all sibling
> directories to ensure we do not create duplicates. The list of
> directory nodes in sysfs is kept as an unsorted list which results
> in this being an exponentially longer operation as the number of
> directories are created.
>
> The solution solved by this patch set is to allow a single
> directory in sysfs to span multiple memory sections. This is
> controlled by an optional architecturally defined function
> memory_block_size_bytes(). The default definition of this
> routine returns a memory block size equal to the memory section
> size. This maintains the current layout of sysfs memory
> directories as it appears to userspace to remain the same as it
> is today.
>
Why not update sysfs directory creation to be fast, for example by using
an rbtree instead of a linked list. This fixes an implementation
problem in the kernel instead of working around it and creating a new ABI.
New ABIs mean old tools won't work, and new tools need to understand
both ABIs.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-28 12:44 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-27 19:09 [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections Nathan Fontenot
2010-09-27 19:09 ` Nathan Fontenot
2010-09-27 19:09 ` Nathan Fontenot
2010-09-27 19:21 ` [PATCH 1/8] v2 Move find_memory_block() routine Nathan Fontenot
2010-09-27 19:21 ` Nathan Fontenot
2010-09-27 19:21 ` Nathan Fontenot
2010-09-27 19:22 ` [PATCH 2/8] v2 Add section count to memory_block struct Nathan Fontenot
2010-09-27 19:22 ` Nathan Fontenot
2010-09-27 19:22 ` Nathan Fontenot
2010-09-28 9:31 ` Robin Holt
2010-09-28 9:31 ` Robin Holt
2010-09-28 9:31 ` Robin Holt
2010-09-28 18:14 ` Nathan Fontenot
2010-09-28 18:14 ` Nathan Fontenot
2010-09-28 18:14 ` Nathan Fontenot
2010-09-27 19:23 ` [PATCH 3/8] v2 Add mutex for adding/removing memory blocks Nathan Fontenot
2010-09-27 19:23 ` Nathan Fontenot
2010-09-27 19:23 ` Nathan Fontenot
2010-09-27 19:25 ` [PATCH 4/8] v2 Allow memory block to span multiple memory sections Nathan Fontenot
2010-09-27 19:25 ` Nathan Fontenot
2010-09-27 19:25 ` Nathan Fontenot
2010-09-27 23:55 ` Dave Hansen
2010-09-27 23:55 ` Dave Hansen
2010-09-27 23:55 ` Dave Hansen
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 18:06 ` Nathan Fontenot
2010-09-28 12:48 ` Robin Holt
2010-09-28 12:48 ` Robin Holt
2010-09-28 12:48 ` Robin Holt
2010-09-28 18:20 ` Nathan Fontenot
2010-09-28 18:20 ` Nathan Fontenot
2010-09-28 18:20 ` Nathan Fontenot
2010-09-27 19:26 ` [PATCH 5/8] v2 Add end_phys_index file Nathan Fontenot
2010-09-27 19:26 ` Nathan Fontenot
2010-09-27 19:26 ` Nathan Fontenot
2010-09-27 19:27 ` [PATCH 6/8] v2 Update node sysfs code Nathan Fontenot
2010-09-27 19:27 ` Nathan Fontenot
2010-09-27 19:27 ` Nathan Fontenot
2010-09-28 9:29 ` Robin Holt
2010-09-28 9:29 ` Robin Holt
2010-09-28 9:29 ` Robin Holt
2010-09-28 15:21 ` Dave Hansen
2010-09-28 15:21 ` Dave Hansen
2010-09-28 15:21 ` Dave Hansen
2010-09-27 19:28 ` [PATCH 7/8] v2 Define memory_block_size_bytes() for powerpc/pseries Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` [PATCH 8/8] v2 Update memory hotplug documentation Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-27 19:28 ` Nathan Fontenot
2010-09-28 12:45 ` Avi Kivity
2010-09-28 12:45 ` Avi Kivity
2010-09-28 12:45 ` Avi Kivity
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 18:18 ` Nathan Fontenot
2010-09-28 12:38 ` [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections Robin Holt
2010-09-28 12:38 ` Robin Holt
2010-09-28 12:38 ` Robin Holt
2010-09-28 18:17 ` Nathan Fontenot
2010-09-28 18:17 ` Nathan Fontenot
2010-09-28 18:17 ` Nathan Fontenot
2010-09-29 19:28 ` Robin Holt
2010-09-29 19:28 ` Robin Holt
2010-09-29 19:28 ` Robin Holt
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 15:17 ` Nathan Fontenot
2010-09-30 16:39 ` Robin Holt
2010-09-30 16:39 ` Robin Holt
2010-09-30 16:39 ` Robin Holt
2010-09-28 12:44 ` Avi Kivity [this message]
2010-09-28 12:44 ` Avi Kivity
2010-09-28 12:44 ` Avi Kivity
2010-09-28 15:12 ` Robin Holt
2010-09-28 15:12 ` Robin Holt
2010-09-28 15:12 ` Robin Holt
2010-09-28 16:34 ` Avi Kivity
2010-09-28 16:34 ` Avi Kivity
2010-09-28 16:34 ` Avi Kivity
2010-09-29 2:50 ` Greg KH
2010-09-29 2:50 ` Greg KH
2010-09-29 2:50 ` Greg KH
2010-09-29 8:32 ` Avi Kivity
2010-09-29 8:32 ` Avi Kivity
2010-09-29 8:32 ` Avi Kivity
2010-09-29 12:37 ` Greg KH
2010-09-29 12:37 ` Greg KH
2010-09-29 12:37 ` Greg KH
2010-09-29 13:39 ` Kay Sievers
2010-09-29 13:39 ` Kay Sievers
2010-09-29 13:39 ` Kay Sievers
2010-10-03 7:52 ` Avi Kivity
2010-10-03 7:52 ` Avi Kivity
2010-10-03 7:52 ` Avi Kivity
2010-09-28 15:17 ` Dave Hansen
2010-09-28 15:17 ` Dave Hansen
2010-09-28 15:17 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CA1E338.6070201@redhat.com \
--to=avi@redhat.com \
--cc=dave@linux.vnet.ibm.com \
--cc=greg@kroah.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=nfont@austin.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.