From: Eric Dumazet <dada1@cosmosbay.com>
To: clameter@sgi.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, dgc@sgi.com,
Mel Gorman <mel@csn.ul.ie>
Subject: Re: [RFC 0/3] Slab Defrag / Slab Targeted Reclaim and general Slab API changes
Date: Sat, 05 May 2007 07:07:04 +0200 [thread overview]
Message-ID: <463C10F8.4040803@cosmosbay.com> (raw)
In-Reply-To: <20070504221555.642061626@sgi.com>
clameter@sgi.com a écrit :
> I originally intended this for the 2.6.23 development cycle but since there
> is an aggressive push for SLUB I thought that we may want to introduce this earlier.
> Note that this covers new locking approaches that we may need to talk
> over before going any further.
>
> This is an RFC for patches that do major changes to the way that slab
> allocations are handled in order to introduce some more advanced features
> and in order to get rid of some things that are no longer used or awkward.
>
> A. Add Slab fragmentation
>
> On kmem_cache_shrink SLUB will not only sort the partial slabs by object
> number but attempt to free objects out of partial slabs that have a low
> number of objects. Doing so increases the object density in the remaining
> partial slabs and frees up memory. Ideally kmem_cache_shrink would be
> able to completely defrag the partial list so that only one partial
> slab is left over. But it is advantageous to have slabs with a few free
> objects since that speeds up kfree. Also going to the extreme on this one
> would mean that the reclaimable slabs would have to be able to move objects
> in a reliable way. So we just free objects in slabs with a low population ratio
> and tolerate if a attempt to move an object fails.
nice idea
>
> B. Targeted Reclaim
>
> Mainly to support antifragmentation / defragmentation methods. The slab adds
> a new function kmem_cache_vacate(struct page *) which can be used to request
> that a page be cleared of all objects. This makes it possible to reduce the
> size of the RECLAIMABLE fragmentation area and move slabs into the MOVABLE
> area enhancing the capabilities of antifragmentation significantly.
>
> C. Introduces a slab_ops structure that allows a slab user to provide
> operations on slabs.
Could you please make it const ?
>
> This replaces the current constructor / destructor scheme. It is necessary
> in order to support additional methods needed to support targeted reclaim
> and slab defragmentation. A slab supporting targeted reclaim and
> slab defragmentation must support the following additional methods:
>
> 1. get_reference(void *)
> Get a reference on a particular slab object.
>
> 2. kick_object(void *)
> Kick an object off a slab. The object is either reclaimed
> (easiest) or a new object is alloced using kmem_cache_alloc()
> and then the object is moved to the new location.
>
> D. Slab creation is no longer done using kmem_cache_create
>
> kmem_cache_create is not a clean API since it has only 2 call backs for
> constructor and destructor, does not allow the specification of a slab ops
> structure. Parameters are confusing.
>
> F.e. It is possible to specify alignment information in the alignment
> field and in addition in the flags field (SLAB_HWCACHE_ALIGN). The semantics
> of SLAB_HWCACHE_ALIGN are fuzzy because it only aligns object if
> larger than 1/2 cache line.
>
> All of this is really not necessary since the compiler knows how to align
> structures and we should use this information instead of having the user
> specify an alignment. I would like to get rid of SLAB_HWCACHE_ALIGN
> and kmem_cache_create. Instead one would use the following macros (that
> then result in a call to __kmem_cache_create).
Hum, the problem is the compiler sometimes doesnt know the target processor
alignment.
Adding ____cacheline_aligned to 'struct ...' definitions might be overkill if
you compile a generic kernel and happens to boot a Pentium III with it.
>
> KMEM_CACHE(<struct-name>, flags)
>
> The macro will determine the slab name from the struct name and use that for
> /sys/slab, will use the size of the struct for slab size and the alignment
> of the structure for alignment. This means one will be able to set slab
> object alignment by specifying the usual alignment options for static
> allocations when defining the structure.
>
> Since the name is derived from the struct name it will much easier to
> find the source code for slabs listed in /sys/slab.
>
> An additional macro is provided if the slab also supports slab operations.
>
> KMEM_CACHE_OPS(<struct-name>, flags, slab_ops)
>
> It is likely that this macro will be rarely used.
>
> E. kmem_cache_create() SLAB_HWCACHE_ALIGN legacy interface
>
> In order to avoid having to modify all slab creation calls throughout
> the kernel we will provide a kmem_cache_create emulation. That function
> is the only call that will still understand SLAB_HWCACHE_ALIGN. If that
> parameter is specified then it will set up the proper alignment (the slab
> allocators never see that flag).
>
> If constructor or destructor are specified then we will allocate a slab_ops
> structure and populate it with the values specified. Note that this will
> cause a memory leak if the slab is disposed of later. If you need disposable
> slabs then the new API must be used.
>
> F. Remove destructor support from all slab allocators?
>
> I am only aware of two call sites left after all the changes that are
> scheduled to go into 2.6.22-rc1 have been merged. These are in FRV and sh
> arch code. The one in FRV will go away if they switch to quicklists like
> i386. Sh contains another use but a single user is no justification for keeping
> destructors around.
>
>
>
G. Being able to track the number of pages in a kmem_cache
If you look at fs/buffer.c, you'll notice the bh_accounting, recalc_bh_state()
that might be overkill for large SMP configurations, when the real concern is
to be able to limit the bh's not to exceed 10% of LOWMEM.
Adding a callback in slab_ops to track total number of pages in use by a given
kmem_cache would be good.
Same thing for fs/file_table.c : nr_file logic
(percpu_counter_dec()/percpu_counter_inc() for each file open/close) could be
simplified if we could just count the pages in use by filp_cachep kmem_cache.
The get_nr_files() thing is not worth the pain.
WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <dada1@cosmosbay.com>
To: clameter@sgi.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, dgc@sgi.com,
Mel Gorman <mel@csn.ul.ie>
Subject: Re: [RFC 0/3] Slab Defrag / Slab Targeted Reclaim and general Slab API changes
Date: Sat, 05 May 2007 07:07:04 +0200 [thread overview]
Message-ID: <463C10F8.4040803@cosmosbay.com> (raw)
In-Reply-To: <20070504221555.642061626@sgi.com>
clameter@sgi.com a ecrit :
> I originally intended this for the 2.6.23 development cycle but since there
> is an aggressive push for SLUB I thought that we may want to introduce this earlier.
> Note that this covers new locking approaches that we may need to talk
> over before going any further.
>
> This is an RFC for patches that do major changes to the way that slab
> allocations are handled in order to introduce some more advanced features
> and in order to get rid of some things that are no longer used or awkward.
>
> A. Add Slab fragmentation
>
> On kmem_cache_shrink SLUB will not only sort the partial slabs by object
> number but attempt to free objects out of partial slabs that have a low
> number of objects. Doing so increases the object density in the remaining
> partial slabs and frees up memory. Ideally kmem_cache_shrink would be
> able to completely defrag the partial list so that only one partial
> slab is left over. But it is advantageous to have slabs with a few free
> objects since that speeds up kfree. Also going to the extreme on this one
> would mean that the reclaimable slabs would have to be able to move objects
> in a reliable way. So we just free objects in slabs with a low population ratio
> and tolerate if a attempt to move an object fails.
nice idea
>
> B. Targeted Reclaim
>
> Mainly to support antifragmentation / defragmentation methods. The slab adds
> a new function kmem_cache_vacate(struct page *) which can be used to request
> that a page be cleared of all objects. This makes it possible to reduce the
> size of the RECLAIMABLE fragmentation area and move slabs into the MOVABLE
> area enhancing the capabilities of antifragmentation significantly.
>
> C. Introduces a slab_ops structure that allows a slab user to provide
> operations on slabs.
Could you please make it const ?
>
> This replaces the current constructor / destructor scheme. It is necessary
> in order to support additional methods needed to support targeted reclaim
> and slab defragmentation. A slab supporting targeted reclaim and
> slab defragmentation must support the following additional methods:
>
> 1. get_reference(void *)
> Get a reference on a particular slab object.
>
> 2. kick_object(void *)
> Kick an object off a slab. The object is either reclaimed
> (easiest) or a new object is alloced using kmem_cache_alloc()
> and then the object is moved to the new location.
>
> D. Slab creation is no longer done using kmem_cache_create
>
> kmem_cache_create is not a clean API since it has only 2 call backs for
> constructor and destructor, does not allow the specification of a slab ops
> structure. Parameters are confusing.
>
> F.e. It is possible to specify alignment information in the alignment
> field and in addition in the flags field (SLAB_HWCACHE_ALIGN). The semantics
> of SLAB_HWCACHE_ALIGN are fuzzy because it only aligns object if
> larger than 1/2 cache line.
>
> All of this is really not necessary since the compiler knows how to align
> structures and we should use this information instead of having the user
> specify an alignment. I would like to get rid of SLAB_HWCACHE_ALIGN
> and kmem_cache_create. Instead one would use the following macros (that
> then result in a call to __kmem_cache_create).
Hum, the problem is the compiler sometimes doesnt know the target processor
alignment.
Adding ____cacheline_aligned to 'struct ...' definitions might be overkill if
you compile a generic kernel and happens to boot a Pentium III with it.
>
> KMEM_CACHE(<struct-name>, flags)
>
> The macro will determine the slab name from the struct name and use that for
> /sys/slab, will use the size of the struct for slab size and the alignment
> of the structure for alignment. This means one will be able to set slab
> object alignment by specifying the usual alignment options for static
> allocations when defining the structure.
>
> Since the name is derived from the struct name it will much easier to
> find the source code for slabs listed in /sys/slab.
>
> An additional macro is provided if the slab also supports slab operations.
>
> KMEM_CACHE_OPS(<struct-name>, flags, slab_ops)
>
> It is likely that this macro will be rarely used.
>
> E. kmem_cache_create() SLAB_HWCACHE_ALIGN legacy interface
>
> In order to avoid having to modify all slab creation calls throughout
> the kernel we will provide a kmem_cache_create emulation. That function
> is the only call that will still understand SLAB_HWCACHE_ALIGN. If that
> parameter is specified then it will set up the proper alignment (the slab
> allocators never see that flag).
>
> If constructor or destructor are specified then we will allocate a slab_ops
> structure and populate it with the values specified. Note that this will
> cause a memory leak if the slab is disposed of later. If you need disposable
> slabs then the new API must be used.
>
> F. Remove destructor support from all slab allocators?
>
> I am only aware of two call sites left after all the changes that are
> scheduled to go into 2.6.22-rc1 have been merged. These are in FRV and sh
> arch code. The one in FRV will go away if they switch to quicklists like
> i386. Sh contains another use but a single user is no justification for keeping
> destructors around.
>
>
>
G. Being able to track the number of pages in a kmem_cache
If you look at fs/buffer.c, you'll notice the bh_accounting, recalc_bh_state()
that might be overkill for large SMP configurations, when the real concern is
to be able to limit the bh's not to exceed 10% of LOWMEM.
Adding a callback in slab_ops to track total number of pages in use by a given
kmem_cache would be good.
Same thing for fs/file_table.c : nr_file logic
(percpu_counter_dec()/percpu_counter_inc() for each file open/close) could be
simplified if we could just count the pages in use by filp_cachep kmem_cache.
The get_nr_files() thing is not worth the pain.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-05-05 5:07 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-04 22:15 [RFC 0/3] Slab Defrag / Slab Targeted Reclaim and general Slab API changes clameter
2007-05-04 22:15 ` clameter
2007-05-04 22:15 ` [RFC 1/3] SLUB: slab_ops instead of constructors / destructors clameter
2007-05-04 22:15 ` clameter
2007-05-05 10:14 ` Pekka Enberg
2007-05-05 10:14 ` Pekka Enberg
2007-05-05 15:43 ` Christoph Lameter
2007-05-05 15:43 ` Christoph Lameter
2007-05-06 19:19 ` Bert Wesarg
2007-05-06 19:19 ` Bert Wesarg
2007-05-06 19:46 ` Satyam Sharma
2007-05-06 19:46 ` Satyam Sharma
2007-05-04 22:15 ` [RFC 2/3] SLUB: Implement targeted reclaim and partial list defragmentation clameter
2007-05-04 22:15 ` clameter
2007-05-04 23:03 ` Christoph Lameter
2007-05-04 23:03 ` Christoph Lameter
2007-05-05 1:04 ` Randy Dunlap
2007-05-05 1:04 ` Randy Dunlap
2007-05-05 1:07 ` Christoph Lameter
2007-05-05 1:07 ` Christoph Lameter
2007-05-05 5:32 ` William Lee Irwin III
2007-05-05 5:32 ` William Lee Irwin III
2007-05-05 15:35 ` Christoph Lameter
2007-05-05 15:35 ` Christoph Lameter
2007-05-05 10:38 ` Andi Kleen
2007-05-05 10:38 ` Andi Kleen
2007-05-05 15:42 ` Christoph Lameter
2007-05-05 15:42 ` Christoph Lameter
2007-05-05 17:11 ` Andi Kleen
2007-05-05 17:11 ` Andi Kleen
2007-05-09 15:05 ` Mel Gorman
2007-05-09 15:05 ` Mel Gorman
2007-05-09 16:34 ` Christoph Lameter
2007-05-09 16:34 ` Christoph Lameter
2007-05-04 22:15 ` [RFC 3/3] Support targeted reclaim and slab defrag for dentry cache clameter
2007-05-04 22:15 ` clameter
2007-05-05 5:07 ` Eric Dumazet [this message]
2007-05-05 5:07 ` [RFC 0/3] Slab Defrag / Slab Targeted Reclaim and general Slab API changes Eric Dumazet
2007-05-05 5:14 ` Christoph Lameter
2007-05-05 5:14 ` Christoph Lameter
2007-05-05 5:41 ` Eric Dumazet
2007-05-05 5:41 ` Eric Dumazet
2007-05-05 7:37 ` Eric Dumazet
2007-05-05 7:37 ` Eric Dumazet
2007-05-05 15:39 ` Christoph Lameter
2007-05-05 15:39 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=463C10F8.4040803@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=clameter@sgi.com \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.