All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Fontenot <nfont@austin.ibm.com>
To: balbir@linux.vnet.ibm.com
Cc: linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
	linux-kernel@vger.kernel.org,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-mm@kvack.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
Date: Fri, 24 Sep 2010 09:35:35 -0500	[thread overview]
Message-ID: <4C9CB737.9000903@austin.ibm.com> (raw)
In-Reply-To: <20100923184002.GM3952@balbir.in.ibm.com>

On 09/23/2010 01:40 PM, Balbir Singh wrote:
> * Nathan Fontenot <nfont@austin.ibm.com> [2010-09-22 09:15:43]:
> 
>> This set of patches decouples the concept that a single memory
>> section corresponds to a single directory in 
>> /sys/devices/system/memory/.  On systems
>> with large amounts of memory (1+ TB) there are performance issues
>> related to creating the large number of sysfs directories.  For
>> a powerpc machine with 1 TB of memory we are creating 63,000+
>> directories.  This is resulting in boot times of around 45-50
>> minutes for systems with 1 TB of memory and 8 hours for systems
>> with 2 TB of memory.  With this patch set applied I am now seeing
>> boot times of 5 minutes or less.
>>
>> The root of this issue is in sysfs directory creation. Every time
>> a directory is created a string compare is done against all sibling
>> directories to ensure we do not create duplicates.  The list of
>> directory nodes in sysfs is kept as an unsorted list which results
>> in this being an exponentially longer operation as the number of
>> directories are created.
>>
>> The solution solved by this patch set is to allow a single
>> directory in sysfs to span multiple memory sections.  This is
>> controlled by an optional architecturally defined function
>> memory_block_size_bytes().  The default definition of this
>> routine returns a memory block size equal to the memory section
>> size. This maintains the current layout of sysfs memory
>> directories as it appears to userspace to remain the same as it
>> is today.
>>
>> For architectures that define their own version of this routine,
>> as is done for powerpc in this patchset, the view in userspace
>> would change such that each memoryXXX directory would span
>> multiple memory sections.  The number of sections spanned would
>> depend on the value reported by memory_block_size_bytes.
>>
>> In both cases a new file 'end_phys_index' is created in each
>> memoryXXX directory.  This file will contain the physical id
>> of the last memory section covered by the sysfs directory.  For
>> the default case, the value in 'end_phys_index' will be the same
>> as in the existing 'phys_index' file.
>>
> 
> What does this mean for memory hotplug or hotunplug? 
> 

Memory hotplug will function on a memory block size basis.  For
architectures that do not define their own memory_block_size_bytes()
routine, they will get the default size and everything will work
the same as it does today.

For architectures that define their own memory_block_size_bytes()
routine and have multiple memory sections per memory block, hotplug
operations will add or remove all of the memory sections in the memory
memory block.

-Nathan

WARNING: multiple messages have this Message-ID (diff)
From: Nathan Fontenot <nfont@austin.ibm.com>
To: balbir@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>
Subject: Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
Date: Fri, 24 Sep 2010 09:35:35 -0500	[thread overview]
Message-ID: <4C9CB737.9000903@austin.ibm.com> (raw)
In-Reply-To: <20100923184002.GM3952@balbir.in.ibm.com>

On 09/23/2010 01:40 PM, Balbir Singh wrote:
> * Nathan Fontenot <nfont@austin.ibm.com> [2010-09-22 09:15:43]:
> 
>> This set of patches decouples the concept that a single memory
>> section corresponds to a single directory in 
>> /sys/devices/system/memory/.  On systems
>> with large amounts of memory (1+ TB) there are performance issues
>> related to creating the large number of sysfs directories.  For
>> a powerpc machine with 1 TB of memory we are creating 63,000+
>> directories.  This is resulting in boot times of around 45-50
>> minutes for systems with 1 TB of memory and 8 hours for systems
>> with 2 TB of memory.  With this patch set applied I am now seeing
>> boot times of 5 minutes or less.
>>
>> The root of this issue is in sysfs directory creation. Every time
>> a directory is created a string compare is done against all sibling
>> directories to ensure we do not create duplicates.  The list of
>> directory nodes in sysfs is kept as an unsorted list which results
>> in this being an exponentially longer operation as the number of
>> directories are created.
>>
>> The solution solved by this patch set is to allow a single
>> directory in sysfs to span multiple memory sections.  This is
>> controlled by an optional architecturally defined function
>> memory_block_size_bytes().  The default definition of this
>> routine returns a memory block size equal to the memory section
>> size. This maintains the current layout of sysfs memory
>> directories as it appears to userspace to remain the same as it
>> is today.
>>
>> For architectures that define their own version of this routine,
>> as is done for powerpc in this patchset, the view in userspace
>> would change such that each memoryXXX directory would span
>> multiple memory sections.  The number of sections spanned would
>> depend on the value reported by memory_block_size_bytes.
>>
>> In both cases a new file 'end_phys_index' is created in each
>> memoryXXX directory.  This file will contain the physical id
>> of the last memory section covered by the sysfs directory.  For
>> the default case, the value in 'end_phys_index' will be the same
>> as in the existing 'phys_index' file.
>>
> 
> What does this mean for memory hotplug or hotunplug? 
> 

Memory hotplug will function on a memory block size basis.  For
architectures that do not define their own memory_block_size_bytes()
routine, they will get the default size and everything will work
the same as it does today.

For architectures that define their own memory_block_size_bytes()
routine and have multiple memory sections per memory block, hotplug
operations will add or remove all of the memory sections in the memory
memory block.

-Nathan

WARNING: multiple messages have this Message-ID (diff)
From: Nathan Fontenot <nfont@austin.ibm.com>
To: balbir@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linuxppc-dev@ozlabs.org, Greg KH <greg@kroah.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>
Subject: Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
Date: Fri, 24 Sep 2010 09:35:35 -0500	[thread overview]
Message-ID: <4C9CB737.9000903@austin.ibm.com> (raw)
In-Reply-To: <20100923184002.GM3952@balbir.in.ibm.com>

On 09/23/2010 01:40 PM, Balbir Singh wrote:
> * Nathan Fontenot <nfont@austin.ibm.com> [2010-09-22 09:15:43]:
> 
>> This set of patches decouples the concept that a single memory
>> section corresponds to a single directory in 
>> /sys/devices/system/memory/.  On systems
>> with large amounts of memory (1+ TB) there are performance issues
>> related to creating the large number of sysfs directories.  For
>> a powerpc machine with 1 TB of memory we are creating 63,000+
>> directories.  This is resulting in boot times of around 45-50
>> minutes for systems with 1 TB of memory and 8 hours for systems
>> with 2 TB of memory.  With this patch set applied I am now seeing
>> boot times of 5 minutes or less.
>>
>> The root of this issue is in sysfs directory creation. Every time
>> a directory is created a string compare is done against all sibling
>> directories to ensure we do not create duplicates.  The list of
>> directory nodes in sysfs is kept as an unsorted list which results
>> in this being an exponentially longer operation as the number of
>> directories are created.
>>
>> The solution solved by this patch set is to allow a single
>> directory in sysfs to span multiple memory sections.  This is
>> controlled by an optional architecturally defined function
>> memory_block_size_bytes().  The default definition of this
>> routine returns a memory block size equal to the memory section
>> size. This maintains the current layout of sysfs memory
>> directories as it appears to userspace to remain the same as it
>> is today.
>>
>> For architectures that define their own version of this routine,
>> as is done for powerpc in this patchset, the view in userspace
>> would change such that each memoryXXX directory would span
>> multiple memory sections.  The number of sections spanned would
>> depend on the value reported by memory_block_size_bytes.
>>
>> In both cases a new file 'end_phys_index' is created in each
>> memoryXXX directory.  This file will contain the physical id
>> of the last memory section covered by the sysfs directory.  For
>> the default case, the value in 'end_phys_index' will be the same
>> as in the existing 'phys_index' file.
>>
> 
> What does this mean for memory hotplug or hotunplug? 
> 

Memory hotplug will function on a memory block size basis.  For
architectures that do not define their own memory_block_size_bytes()
routine, they will get the default size and everything will work
the same as it does today.

For architectures that define their own memory_block_size_bytes()
routine and have multiple memory sections per memory block, hotplug
operations will add or remove all of the memory sections in the memory
memory block.

-Nathan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-24 14:36 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22 14:15 [PATCH 0/8] De-couple sysfs memory directories from memory sections Nathan Fontenot
2010-09-22 14:15 ` Nathan Fontenot
2010-09-22 14:28 ` [PATCH 1/8] Move find_memory_block() routine Nathan Fontenot
2010-09-22 14:28   ` Nathan Fontenot
2010-09-22 14:29 ` [PATCH 2/8] Update memory block struct to have start and end phys index Nathan Fontenot
2010-09-22 14:29   ` Nathan Fontenot
2010-09-22 14:30 ` [PATCH 3/8] Add section count to memory_block struct Nathan Fontenot
2010-09-22 14:30   ` Nathan Fontenot
2010-09-22 14:32 ` [PATCH 4/8] Add mutex for adding/removing memory blocks Nathan Fontenot
2010-09-22 14:32   ` Nathan Fontenot
2010-09-22 14:33 ` [PATCH 5/8] Allow a memory block to span multiple memory sections Nathan Fontenot
2010-09-22 14:33   ` Nathan Fontenot
2010-09-22 14:34 ` [PATCH 6/8] Update node sysfs code Nathan Fontenot
2010-09-22 14:34   ` Nathan Fontenot
2010-09-22 14:35 ` [PATCH 7/8] Define memory_block_size_bytes() for powerpc/pseries Nathan Fontenot
2010-09-22 14:35   ` Nathan Fontenot
2010-09-22 14:36 ` [PATCH 8/8] Update memory hotplug documentation Nathan Fontenot
2010-09-22 14:36   ` Nathan Fontenot
2010-09-22 15:20 ` [PATCH 0/8] De-couple sysfs memory directories from memory sections Dave Hansen
2010-09-22 15:20   ` Dave Hansen
2010-09-22 15:20   ` Dave Hansen
2010-09-22 18:40   ` Nathan Fontenot
2010-09-22 18:40     ` Nathan Fontenot
2010-09-22 18:40     ` Nathan Fontenot
2010-09-22 18:58     ` Dave Hansen
2010-09-22 18:58       ` Dave Hansen
2010-09-22 18:58       ` Dave Hansen
2010-09-23 18:40 ` Balbir Singh
2010-09-23 18:40   ` Balbir Singh
2010-09-23 18:40   ` Balbir Singh
2010-09-24 14:35   ` Nathan Fontenot [this message]
2010-09-24 14:35     ` Nathan Fontenot
2010-09-24 14:35     ` Nathan Fontenot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C9CB737.9000903@austin.ibm.com \
    --to=nfont@austin.ibm.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=greg@kroah.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.