* [RFC 00/10] implement alternative and much simpler id allocator
@ 2016-12-08 1:22 Rasmus Villemoes
2016-12-08 1:23 ` [RFC 06/10] block: use tida as small " Rasmus Villemoes
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2016-12-08 1:22 UTC (permalink / raw)
To: Tejun Heo, Andrew Morton
Cc: linux-kernel, Lai Jiangshan, Jens Axboe, Greg Kroah-Hartman,
Rasmus Villemoes, linux-block, dri-devel
TL;DR: these patches save 250 KB of memory, with more low-hanging
fruit ready to pick.
While browsing through the lib/idr.c code, I noticed that the code at
the end of ida_get_new_above() probably doesn't work as intended: Most
users of ida use it via ida_simple_get(), and that starts by
unconditionally calling ida_pre_get(), ensuring that ida->idr has
8==MAX_IDR_FREE idr_layers in its free list id_free. In the common
case, none (or at most one) of these get used during
ida_get_new_above(), and we only free one, leaving at least 6 (usually
7) idr_layers in the free list.
The comment suggests that the idea is that subsequent allocations
would get rid of these one by one, but this doesn't work due to the
unconditional filling done by ida_simple_get - and callers who do
their own locking presumably call ida_pre_get themselves, with the
same effect.
To confirm my suspicion, and to see how many idas are actually in use
and how much they're utilized, I hacked together a /proc/ida file to
provide some stats on the currently active idas. Some examples of
lines from that on my laptop:
file:line gets puts diff layers id_free_cnt
block/blk-core.c:49 27 0 27 1 7
drivers/base/platform.c:35 1 0 1 1 6
drivers/gpu/drm/drm_crtc.c:201 0 0 0 0 0
drivers/gpu/drm/drm_crtc.c:201 0 0 0 0 0
...
drivers/gpu/drm/drm_crtc.c:201 1 0 1 1 6
drivers/gpu/drm/drm_crtc.c:201 2 0 2 1 7
drivers/gpu/drm/drm_crtc.c:5700 5 0 5 1 7
...
drivers/scsi/hosts.c:45 6 0 6 1 7
drivers/scsi/sd.c:123 1 0 1 1 6
fs/devpts/inode.c:386 57 40 17 1 7
fs/kernfs/dir.c:918 10 0 10 1 7
fs/kernfs/dir.c:918 11 0 11 1 7
...
fs/kernfs/dir.c:918 24799 1609 23190 1 7
...
fs/super.c:882 49 5 44 1 7
kernel/workqueue.c:3198 10 8 2 1 7
kernel/workqueue.c:3198 11 9 2 1 7
kernel/workqueue.c:3198 2 0 2 1 7
kernel/workqueue.c:3198 2 0 2 1 7
kernel/workqueue.c:3198 2 0 2 1 7
kernel/workqueue.c:3198 94 91 3 1 7
file:line is the location of the ida_init or DEFINE_IDA, gets and puts
are the lifetime number of allocations/frees - their difference is
thus the current number of allocated ids. layers and id_free_cnt are
the members of ida->idr.
As expected, there are 0, 6 or 7 idr_layers cached and unused in each
ida, depending on whether there have been 0, 1 or more allocations
done. Add the idr_layer which is in use, and we see that e.g. the
workqueue instances use more than 6*8*sizeof(struct idr_layer) ~ 100K
of memory, or almost 1K per allocated id.
Patches 1 and 2 are minor optimization opportunities, while patch 3 is
an attempt at making the 'free the extra idr_layers one at a time'
actually work. But it's not a very good solution, since it doesn't
help the users who never do more than a handful of allocations, nor
does it help those who call ida_pre_get/ida_get_new
directly. Moreover, even if we somehow had perfect heuristics and got
rid of all excess idr_layers and ida_bitmap (another 128 bytes we
usually have hanging around), the minimum overhead of sizeof(struct
idr_layer)+sizeof(struct ida_bitmap) ~ 2200 bytes is quite a lot for
the many ida users who never allocate more than 100 ids.
So instead/in addition, I suggest we introduce a much simpler
allocator based on a dynamically growing bitmap. Patches 4-10
introduce struct tida and has a few examples of replacing ida with
tida. [Yeah, tida is not a good name, and probably _get and _put are also
bad.]
I've booted a 4.8.12 kernel with these backported, and as expected
they save around 250KB of memory. There's more to gain, but this is
just an RFC.
Rasmus Villemoes (10):
lib/idr.c: reused free bitmaps are already clear
lib/idr.c: delete useless condition
lib/idr.c: only fill ida->idr when needed
lib/tida.c: a very simple integer id allocator
kernel/workqueue.c: replace id allocator ida with tida
block: use tida as small id allocator
drivers/base/platform.c: use simpler id allocator
lib/tida.c: introduce tida_get_above
drm: use simpler id allocator
fs/devpts: use tida for id allocation
block/blk-core.c | 6 +--
block/blk-sysfs.c | 2 +-
block/blk.h | 4 +-
drivers/base/platform.c | 10 ++--
drivers/gpu/drm/drm_connector.c | 21 ++++----
drivers/gpu/drm/drm_crtc.c | 4 +-
fs/devpts/inode.c | 28 ++++-------
include/drm/drm_crtc.h | 3 +-
include/linux/tida.h | 32 ++++++++++++
kernel/workqueue.c | 14 +++---
lib/Makefile | 2 +-
lib/idr.c | 13 +++--
lib/tida.c | 107 ++++++++++++++++++++++++++++++++++++++++
13 files changed, 188 insertions(+), 58 deletions(-)
create mode 100644 include/linux/tida.h
create mode 100644 lib/tida.c
--
2.1.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 06/10] block: use tida as small id allocator
2016-12-08 1:22 [RFC 00/10] implement alternative and much simpler id allocator Rasmus Villemoes
@ 2016-12-08 1:23 ` Rasmus Villemoes
2016-12-08 3:56 ` Jens Axboe
2016-12-09 13:49 ` [RFC 00/10] implement alternative and much simpler " Tejun Heo
2016-12-09 22:01 ` Andrew Morton
2 siblings, 1 reply; 16+ messages in thread
From: Rasmus Villemoes @ 2016-12-08 1:23 UTC (permalink / raw)
To: Tejun Heo, Andrew Morton
Cc: linux-kernel, Lai Jiangshan, Jens Axboe, Greg Kroah-Hartman,
Rasmus Villemoes, linux-block
A struct ida ends up costing > 16 KB of runtime memory, which is quite
a lot for something which on my laptop as of this writing has handed
out 27 ids in its lifetime. So use the simpler and lighter-weight
struct tida.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
block/blk-core.c | 6 +++---
block/blk-sysfs.c | 2 +-
block/blk.h | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c0740dc0..6919a519d797 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -46,7 +46,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete);
EXPORT_TRACEPOINT_SYMBOL_GPL(block_split);
EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
-DEFINE_IDA(blk_queue_ida);
+DEFINE_TIDA(blk_queue_ida);
/*
* For the allocated request tables
@@ -699,7 +699,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
if (!q)
return NULL;
- q->id = ida_simple_get(&blk_queue_ida, 0, 0, gfp_mask);
+ q->id = tida_get(&blk_queue_ida, gfp_mask);
if (q->id < 0)
goto fail_q;
@@ -771,7 +771,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
fail_split:
bioset_free(q->bio_split);
fail_id:
- ida_simple_remove(&blk_queue_ida, q->id);
+ tida_put(&blk_queue_ida, q->id);
fail_q:
kmem_cache_free(blk_requestq_cachep, q);
return NULL;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c5439a..5a04dd6cb20d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -652,7 +652,7 @@ static void blk_release_queue(struct kobject *kobj)
if (q->bio_split)
bioset_free(q->bio_split);
- ida_simple_remove(&blk_queue_ida, q->id);
+ tida_put(&blk_queue_ida, q->id);
call_rcu(&q->rcu_head, blk_free_queue_rcu);
}
diff --git a/block/blk.h b/block/blk.h
index 74444c49078f..a0a606efed91 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -1,7 +1,7 @@
#ifndef BLK_INTERNAL_H
#define BLK_INTERNAL_H
-#include <linux/idr.h>
+#include <linux/tida.h>
#include <linux/blk-mq.h>
#include "blk-mq.h"
@@ -34,7 +34,7 @@ struct blk_flush_queue {
extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype;
-extern struct ida blk_queue_ida;
+extern struct tida blk_queue_ida;
static inline struct blk_flush_queue *blk_get_flush_queue(
struct request_queue *q, struct blk_mq_ctx *ctx)
--
2.1.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC 06/10] block: use tida as small id allocator
2016-12-08 1:23 ` [RFC 06/10] block: use tida as small " Rasmus Villemoes
@ 2016-12-08 3:56 ` Jens Axboe
2016-12-08 11:02 ` Greg Kroah-Hartman
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2016-12-08 3:56 UTC (permalink / raw)
To: Rasmus Villemoes, Tejun Heo, Andrew Morton
Cc: linux-kernel, Lai Jiangshan, Greg Kroah-Hartman, linux-block
On 12/07/2016 06:23 PM, Rasmus Villemoes wrote:
> A struct ida ends up costing > 16 KB of runtime memory, which is quite
> a lot for something which on my laptop as of this writing has handed
> out 27 ids in its lifetime. So use the simpler and lighter-weight
> struct tida.
I'm worried that your example of your laptop isn't an all encompassing
test case. How well does the simplified ida allocator work for tens of
thousands of disks, at scan time? SCSI is notorious for setting up and
tearing down a ton of queues at probe time.
Unless we have more testing than 'it works on my laptop and saves 16k',
I'm not super intereted in the patch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 06/10] block: use tida as small id allocator
2016-12-08 3:56 ` Jens Axboe
@ 2016-12-08 11:02 ` Greg Kroah-Hartman
0 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2016-12-08 11:02 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Jens Axboe, Tejun Heo, Andrew Morton, linux-kernel, Lai Jiangshan,
linux-block
On Wed, Dec 07, 2016 at 08:56:27PM -0700, Jens Axboe wrote:
> On 12/07/2016 06:23 PM, Rasmus Villemoes wrote:
> > A struct ida ends up costing > 16 KB of runtime memory, which is quite
> > a lot for something which on my laptop as of this writing has handed
> > out 27 ids in its lifetime. So use the simpler and lighter-weight
> > struct tida.
>
> I'm worried that your example of your laptop isn't an all encompassing
> test case. How well does the simplified ida allocator work for tens of
> thousands of disks, at scan time? SCSI is notorious for setting up and
> tearing down a ton of queues at probe time.
>
> Unless we have more testing than 'it works on my laptop and saves 16k',
> I'm not super intereted in the patch.
Rasmus, you can create 30k virtual scsi devices on your laptop to test
if this really does work or saves any real time or memory. I'd
recommend doing that if you want to get patches like this ever accepted.
good luck!
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-08 1:22 [RFC 00/10] implement alternative and much simpler id allocator Rasmus Villemoes
2016-12-08 1:23 ` [RFC 06/10] block: use tida as small " Rasmus Villemoes
@ 2016-12-09 13:49 ` Tejun Heo
2016-12-09 22:01 ` Andrew Morton
2 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2016-12-09 13:49 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Andrew Morton, linux-kernel, Lai Jiangshan, Jens Axboe,
Greg Kroah-Hartman, linux-block, dri-devel
Hello, Rasmus.
On Thu, Dec 08, 2016 at 02:22:55AM +0100, Rasmus Villemoes wrote:
> TL;DR: these patches save 250 KB of memory, with more low-hanging
> fruit ready to pick.
>
> While browsing through the lib/idr.c code, I noticed that the code at
> the end of ida_get_new_above() probably doesn't work as intended: Most
> users of ida use it via ida_simple_get(), and that starts by
> unconditionally calling ida_pre_get(), ensuring that ida->idr has
> 8==MAX_IDR_FREE idr_layers in its free list id_free. In the common
> case, none (or at most one) of these get used during
> ida_get_new_above(), and we only free one, leaving at least 6 (usually
> 7) idr_layers in the free list.
So, if you take a look at idr_layer_alloc(), there are two alternative
preloading mechanisms. The one based on per-idr freelist -
get_from_free_list() - and the one using per-cpu preload cache. idr
currently uses the new percpu path and ida uses the old path. There
isn't anything fundamental to this difference. It's just that we
introduced the new perpcu path to idr and haven't converted ida over
to it yet.
> Patches 1 and 2 are minor optimization opportunities, while patch 3 is
> an attempt at making the 'free the extra idr_layers one at a time'
> actually work. But it's not a very good solution, since it doesn't
> help the users who never do more than a handful of allocations, nor
> does it help those who call ida_pre_get/ida_get_new
> directly. Moreover, even if we somehow had perfect heuristics and got
> rid of all excess idr_layers and ida_bitmap (another 128 bytes we
> usually have hanging around), the minimum overhead of sizeof(struct
> idr_layer)+sizeof(struct ida_bitmap) ~ 2200 bytes is quite a lot for
> the many ida users who never allocate more than 100 ids.
>
> So instead/in addition, I suggest we introduce a much simpler
> allocator based on a dynamically growing bitmap. Patches 4-10
> introduce struct tida and has a few examples of replacing ida with
> tida. [Yeah, tida is not a good name, and probably _get and _put are also
> bad.]
So, instead of creating something new, it'd be a lot better to
implement the same per-cpu preload behavior for ida so that caching is
per-cpu instead of per-ida.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-08 1:22 [RFC 00/10] implement alternative and much simpler id allocator Rasmus Villemoes
2016-12-08 1:23 ` [RFC 06/10] block: use tida as small " Rasmus Villemoes
2016-12-09 13:49 ` [RFC 00/10] implement alternative and much simpler " Tejun Heo
@ 2016-12-09 22:01 ` Andrew Morton
2016-12-12 17:09 ` Tejun Heo
2016-12-16 19:14 ` Matthew Wilcox
2 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2016-12-09 22:01 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel, Lai Jiangshan, Jens Axboe,
Greg Kroah-Hartman, linux-block, dri-devel, Matthew Wilcox
On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
> TL;DR: these patches save 250 KB of memory, with more low-hanging
> fruit ready to pick.
>
> While browsing through the lib/idr.c code, I noticed that the code at
> the end of ida_get_new_above() probably doesn't work as intended: Most
> users of ida use it via ida_simple_get(), and that starts by
> unconditionally calling ida_pre_get(), ensuring that ida->idr has
> 8==MAX_IDR_FREE idr_layers in its free list id_free. In the common
> case, none (or at most one) of these get used during
> ida_get_new_above(), and we only free one, leaving at least 6 (usually
> 7) idr_layers in the free list.
Please be aware of
http://ozlabs.org/~akpm/mmots/broken-out/reimplement-idr-and-ida-using-the-radix-tree.patch
http://lkml.kernel.org/r/1480369871-5271-68-git-send-email-mawilcox@linuxonhyperv.com
I expect we'll be merging patches 1-32 of that series into 4.10-rc1 and
the above patch (#33) into 4.11-rc1.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-09 22:01 ` Andrew Morton
@ 2016-12-12 17:09 ` Tejun Heo
2016-12-12 17:35 ` Matthew Wilcox
2016-12-16 19:14 ` Matthew Wilcox
1 sibling, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2016-12-12 17:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Rasmus Villemoes, linux-kernel, Lai Jiangshan, Jens Axboe,
Greg Kroah-Hartman, linux-block, dri-devel, Matthew Wilcox
Hello,
On Fri, Dec 09, 2016 at 02:01:40PM -0800, Andrew Morton wrote:
> On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
>
> > TL;DR: these patches save 250 KB of memory, with more low-hanging
> > fruit ready to pick.
> >
> > While browsing through the lib/idr.c code, I noticed that the code at
> > the end of ida_get_new_above() probably doesn't work as intended: Most
> > users of ida use it via ida_simple_get(), and that starts by
> > unconditionally calling ida_pre_get(), ensuring that ida->idr has
> > 8==MAX_IDR_FREE idr_layers in its free list id_free. In the common
> > case, none (or at most one) of these get used during
> > ida_get_new_above(), and we only free one, leaving at least 6 (usually
> > 7) idr_layers in the free list.
>
> Please be aware of
>
> http://ozlabs.org/~akpm/mmots/broken-out/reimplement-idr-and-ida-using-the-radix-tree.patch
> http://lkml.kernel.org/r/1480369871-5271-68-git-send-email-mawilcox@linuxonhyperv.com
>
> I expect we'll be merging patches 1-32 of that series into 4.10-rc1 and
> the above patch (#33) into 4.11-rc1.
Ah, yeah, great to see the silly implementation being replaced the
radix tree. ida_pre_get() looks suspicious tho. idr_preload()
immedicately being followed by idr_preload_end() probably is broken.
Maybe what we need is moving ida to idr like preload interface and
then convert it to radix based interface? ida currently assumes
per-ida preloading.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-12 17:09 ` Tejun Heo
@ 2016-12-12 17:35 ` Matthew Wilcox
2016-12-12 18:05 ` Tejun Heo
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-12 17:35 UTC (permalink / raw)
To: Tejun Heo, Andrew Morton
Cc: Rasmus Villemoes, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, kent.overstreet@gmail.com
RnJvbTogVGVqdW4gSGVvIFttYWlsdG86aHRlanVuQGdtYWlsLmNvbV0gT24gQmVoYWxmIE9mIFRl
anVuIEhlbw0KPiBBaCwgeWVhaCwgZ3JlYXQgdG8gc2VlIHRoZSBzaWxseSBpbXBsZW1lbnRhdGlv
biBiZWluZyByZXBsYWNlZCB0aGUNCj4gcmFkaXggdHJlZS4gIGlkYV9wcmVfZ2V0KCkgbG9va3Mg
c3VzcGljaW91cyB0aG8uICBpZHJfcHJlbG9hZCgpDQo+IGltbWVkaWNhdGVseSBiZWluZyBmb2xs
b3dlZCBieSBpZHJfcHJlbG9hZF9lbmQoKSBwcm9iYWJseSBpcyBicm9rZW4uDQo+IE1heWJlIHdo
YXQgd2UgbmVlZCBpcyBtb3ZpbmcgaWRhIHRvIGlkciBsaWtlIHByZWxvYWQgaW50ZXJmYWNlIGFu
ZA0KPiB0aGVuIGNvbnZlcnQgaXQgdG8gcmFkaXggYmFzZWQgaW50ZXJmYWNlPyAgaWRhIGN1cnJl
bnRseSBhc3N1bWVzDQo+IHBlci1pZGEgcHJlbG9hZGluZy4NCg0KSGV5IFRlanVuISAgR3JlYXQg
dG8gaGF2ZSB5b3VyIGNvbW1lbnRzIG9uIHRoaXMgcmVpbXBsZW1lbnRhdGlvbi4NCg0KSSBrbm93
IHRoZSBwcmVsb2FkIGZvbGxvd2VkIGJ5IHByZWxvYWRfZW5kIGxvb2tzIHdyb25nLiAgSSBkb24n
dCB0aGluayBpdCdzIGJyb2tlbiB0aG91Z2guICBJZiB3ZSBnZXQgcHJlZW1wdGVkLCB0aGVuIHRo
ZSB3b3JzdCBzaXR1YXRpb24gaXMgdGhhdCB3ZSdsbCBlbmQgdXAgd2l0aCB0aGUgbWVtb3J5IHdl
IHByZWFsbG9jYXRlZCBiZWluZyBhbGxvY2F0ZWQgdG8gc29tZWJvZHkgZWxzZS4gIFRoZW4gdGhl
IGF0dGVtcHQgdG8gYWxsb2NhdGUgbWVtb3J5IGNhbiBmYWlsLCBhbmQgd2UnbGwgcmV0dXJuIC1F
QUdBSU4sIGF0IHdoaWNoIHBvaW50IGFsbCBjYWxsZXJzIGFyZSBzdXBwb3NlZCB0byByZXR1cm4g
dG8gdGhlIHByZV9nZXQoKSBzdGFnZS4gIENlcnRhaW5seSB0aGF0J3Mgd2hhdCBpZGFfc2ltcGxl
X2dldCgpIGRvZXMuDQoNCkhtbSAuLi4gbG9va2luZyBhdCB0aGUgaW1wbGVtZW50YXRpb24gYWdh
aW4sIHdlIG1pZ2h0IHJldHVybiAtRU5PTUVNIHdoZW4gd2Ugc2hvdWxkIHJldHVybiAtRUFHQUlO
LiAgTGV0IG1lIGZpeCB0aGF0IChhbmQgdGhlIHRlc3Qgc3VpdGUgLi4uKQ0KDQpJJ2QgZGVmaW5p
dGVseSBiZSBvcGVuIHRvIGNoYW5naW5nIHRoZSBJREEgQVBJLiAgSSBrbm93IEtlbnQgaGFkIHNv
bWUgdGhvdWdodHMgb24gdGhhdCBpbmNsdWRpbmcgc3BsaXR0aW5nIHRoZSBzaW1wbGUgbG9jayBp
bnRvIGEgcGVyLUlEQSBsb2NrLiAgSGlzIGxhc3Qgd29yayBvbiBpdCB3YXMgaGVyZSwgSSBiZWxp
ZXZlOg0KDQpodHRwczovL2V2aWxwaWVwaXJhdGUub3JnL2dpdC9saW51eC1iY2FjaGUuZ2l0L2xv
Zy8/aD1pZHINCg0K
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-12 17:35 ` Matthew Wilcox
@ 2016-12-12 18:05 ` Tejun Heo
0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2016-12-12 18:05 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Rasmus Villemoes, linux-kernel@vger.kernel.org,
Lai Jiangshan, Jens Axboe, Greg Kroah-Hartman,
linux-block@vger.kernel.org, dri-devel@lists.freedesktop.org,
kent.overstreet@gmail.com
Hello, Matthew.
On Mon, Dec 12, 2016 at 05:35:17PM +0000, Matthew Wilcox wrote:
> I know the preload followed by preload_end looks wrong. I don't
> think it's broken though. If we get preempted, then the worst
> situation is that we'll end up with the memory we preallocated being
> allocated to somebody else. Then the attempt to allocate memory can
> fail, and we'll return -EAGAIN, at which point all callers are
> supposed to return to the pre_get() stage. Certainly that's what
> ida_simple_get() does.
Ah, right, ida_pre_get() doesn't have any protection against other
task allocating inbetween pre_get and the actual allocation, so it
should retry on failure. Yeah, then the proposed preloading wouldn't
be wrong. It'd be nice to explain what's going on tho.
> I'd definitely be open to changing the IDA API. I know Kent had
> some thoughts on that including splitting the simple lock into a
> per-IDA lock. His last work on it was here, I believe:
>
> https://evilpiepirate.org/git/linux-bcache.git/log/?h=idr
Yeah, that was a big re-write, but for now I think it'd be nice to
replace ida's pre_get mechanism with something similar to idr's
preload so that they're more consistent. There aren't that many
direct users of ida_pre_get(), so it shouldn't be too difficult to
change.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-09 22:01 ` Andrew Morton
2016-12-12 17:09 ` Tejun Heo
@ 2016-12-16 19:14 ` Matthew Wilcox
2016-12-16 20:32 ` Rasmus Villemoes
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-16 19:14 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
RnJvbTogQW5kcmV3IE1vcnRvbiBbbWFpbHRvOmFrcG1AbGludXgtZm91bmRhdGlvbi5vcmddDQo+
IE9uIFRodSwgIDggRGVjIDIwMTYgMDI6MjI6NTUgKzAxMDAgUmFzbXVzIFZpbGxlbW9lcw0KPiA8
bGludXhAcmFzbXVzdmlsbGVtb2VzLmRrPiB3cm90ZToNCj4gPiBUTDtEUjogdGhlc2UgcGF0Y2hl
cyBzYXZlIDI1MCBLQiBvZiBtZW1vcnksIHdpdGggbW9yZSBsb3ctaGFuZ2luZw0KPiA+IGZydWl0
IHJlYWR5IHRvIHBpY2suDQo+ID4NCj4gPiBXaGlsZSBicm93c2luZyB0aHJvdWdoIHRoZSBsaWIv
aWRyLmMgY29kZSwgSSBub3RpY2VkIHRoYXQgdGhlIGNvZGUgYXQNCj4gPiB0aGUgZW5kIG9mIGlk
YV9nZXRfbmV3X2Fib3ZlKCkgcHJvYmFibHkgZG9lc24ndCB3b3JrIGFzIGludGVuZGVkOiBNb3N0
DQo+ID4gdXNlcnMgb2YgaWRhIHVzZSBpdCB2aWEgaWRhX3NpbXBsZV9nZXQoKSwgYW5kIHRoYXQg
c3RhcnRzIGJ5DQo+ID4gdW5jb25kaXRpb25hbGx5IGNhbGxpbmcgaWRhX3ByZV9nZXQoKSwgZW5z
dXJpbmcgdGhhdCBpZGEtPmlkciBoYXMNCj4gPiA4PT1NQVhfSURSX0ZSRUUgaWRyX2xheWVycyBp
biBpdHMgZnJlZSBsaXN0IGlkX2ZyZWUuIEluIHRoZSBjb21tb24NCj4gPiBjYXNlLCBub25lIChv
ciBhdCBtb3N0IG9uZSkgb2YgdGhlc2UgZ2V0IHVzZWQgZHVyaW5nDQo+ID4gaWRhX2dldF9uZXdf
YWJvdmUoKSwgYW5kIHdlIG9ubHkgZnJlZSBvbmUsIGxlYXZpbmcgYXQgbGVhc3QgNiAodXN1YWxs
eQ0KPiA+IDcpIGlkcl9sYXllcnMgaW4gdGhlIGZyZWUgbGlzdC4NCj4gDQo+IEkgZXhwZWN0IHdl
J2xsIGJlIG1lcmdpbmcgcGF0Y2hlcyAxLTMyIG9mIHRoYXQgc2VyaWVzIGludG8gNC4xMC1yYzEg
YW5kDQo+IHRoZSBhYm92ZSBwYXRjaCAoIzMzKSBpbnRvIDQuMTEtcmMxLg0KDQpIaSBSYXNtdXMs
DQoNClRoYW5rcyBmb3IgeW91ciB3b3JrIG9uIHRoaXM7IHlvdSd2ZSByZWFsbHkgcHV0IHNvbWUg
ZWZmb3J0IGludG8gcHJvdmluZyB5b3VyIHdvcmsgaGFzIHZhbHVlLiAgTXkgbW90aXZhdGlvbiB3
YXMgcHVyZWx5IGFlc3RoZXRpYywgYnV0IHlvdSd2ZSBnb3Qgc29tZSBnZW51aW5lIHNhdmluZ3Mg
aGVyZSAoYWRtaXR0ZWRseSBpdCdzIGFib3V0IGEgcXVhcnRlciBvZiBhIGNlbnQncyB3b3J0aCBv
ZiBtZW1vcnkgd2l0aCBEUkFNIHNlbGxpbmcgZm9yICQxMC9HQikuICBOZXZlcnRoZWxlc3MsIHRo
YXQgYWRkcyB1cCBvdmVyIGEgYmlsbGlvbiBkZXZpY2VzLCBhbmQgdGhlcmUgYXJlIHN0aWxsIHBl
b3BsZSB0cnlpbmcgdG8gZml0IExpbnV4IGludG8gNE1CIGVtYmVkZGVkIGRldmljZXMuDQoNCkkg
dGhpbmsgbXkgcmVpbXBsZW1lbnRhdGlvbiBvZiB0aGUgSURBIG9uIHRvcCBvZiB0aGUgcmFkaXgg
dHJlZSBpcyBjbG9zZSBlbm91Z2ggdG8geW91ciB0SURBIGluIG1lbW9yeSBjb25zdW1wdGlvbiB0
aGF0IGl0IGRvZXNuJ3Qgd2FycmFudCBhIG5ldyBkYXRhIHN0cnVjdHVyZS4NCg0KT24gYSA2NC1i
aXQgbWFjaGluZSwgeW91ciB0SURBIHJvb3QgaXMgMjQgYnl0ZXM7IG15IG5ldyBJREEgcm9vdCBp
cyAxNiBieXRlcy4gIElmIHlvdSBhbGxvY2F0ZSBvbmx5IG9uZSBlbnRyeSwgeW91J2xsIGFsbG9j
YXRlIDggYnl0ZXMuICBUaGFua3MgdG8gdGhlIHNsYWIgYWxsb2NhdG9yLCB0aGF0IGdldHMgcm91
bmRlZCB1cCB0byAzMiBieXRlcy4gIEkgYWxsb2NhdGUgdGhlIGZ1bGwgMTI4IGJ5dGUgbGVhZiwg
YnV0IEkgc3RvcmUgdGhlIHBvaW50ZXIgdG8gaXQgaW4gdGhlIHJvb3QgKHVubGlrZSB0aGUgSURS
LCB0aGUgcmFkaXggdHJlZSBkb2Vzbid0IG5lZWQgdG8gYWxsb2NhdGUgYSBsYXllciBmb3IgYSBz
aW5nbGUgZW50cnkpLiAgU28gdElEQSB3aW5zIG9uIG1lbW9yeSBjb25zdW1wdGlvbiBiZXR3ZWVu
IDEgYW5kIDUxMSBJRHMsIGFuZCBuZXdJREEgaXMgc2xpZ2h0bHkgYWhlYWQgYmV0d2VlbiA1MTIg
YW5kIDEwMjMgSURzLiAgQWJvdmUgMTAyNCBJRHMsIEkgYWxsb2NhdGUgYSBsYXllciAoNTc2IGJ5
dGVzKSwgYW5kIGEgc2Vjb25kIGxlYWYgKDgzMiBieXRlcyB0b3RhbCksIHdoaWxlIHlvdSBqdXN0
IGRvdWJsZSB0byAyNTYgYnl0ZXMuICBJIHRoaW5rIHRJREEncyBtZW1vcnkgY29uc3VtcHRpb24g
dGhlbiBzdGF5cyBhaGVhZCBvZiBuZXcgSURBLiAgQnV0IHBlcmZvcm1hbmNlIG9mICdhbGxvY2F0
ZSBuZXcgSUQnIHNob3VsZCBiZSBiZXR0ZXIgZm9yIG5ld0lEQSB0aGFuIHRJREEgYXMgbmV3SURB
IGNhbiBza2lwIG92ZXIgYWxsIHRoZSBjYWNoZWxpbmVzIG9mIGZ1bGwgYml0bWFwcy4NCg0KWWVz
dGVyZGF5LCBJIGZvdW5kIGEgbmV3IHByb2JsZW0gd2l0aCB0aGUgSURBIGFsbG9jYXRvciB0aGF0
IHlvdSBoYWRuJ3QgbWVudGlvbmVkIC0tIGFib3V0IGhhbGYgb2YgdGhlIHVzZXJzIG9mIHRoZSBJ
REEgZGF0YSBzdHJ1Y3R1cmUgbmV2ZXIgY2FsbCBkZXN0cm95X2lkYSgpLiAgV2hpY2ggbWVhbnMg
dGhhdCB0aGV5J3JlIGxlYWtpbmcgdGhlIHByZWxvYWRlZCBiaXRtYXAuICBJIGhhdmUgYSBwYXRj
aCB3aGljaCBtb3ZlcyB0aGUgcHJlbG9hZGVkIElEQSBiaXRtYXAgZnJvbSBiZWluZyBzdG9yZWQg
aW4gdGhlIElEQSB0byBiZWluZyBzdG9yZWQgaW4gYSBwZXJjcHUgdmFyaWFibGUuICBZb3UgY2Fu
IGZpbmQgaXQgaGVyZTogaHR0cDovL2dpdC5pbmZyYWRlYWQub3JnL3VzZXJzL3dpbGx5L2xpbnV4
LWRheC5naXQvc2hvcnRsb2cvcmVmcy9oZWFkcy9pZHItMjAxNi0xMi0xNiBJJ2Qgd2VsY29tZSBt
b3JlIHRlc3RpbmcgYW5kIGNvZGUgcmV2aWV3Lg0K
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-16 19:14 ` Matthew Wilcox
@ 2016-12-16 20:32 ` Rasmus Villemoes
2016-12-16 21:09 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Rasmus Villemoes @ 2016-12-16 20:32 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
On Fri, Dec 16 2016, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: Andrew Morton [mailto:akpm@linux-foundation.org]
>> On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes
>> <linux@rasmusvillemoes.dk> wrote:
>> > TL;DR: these patches save 250 KB of memory, with more low-hanging
>> > fruit ready to pick.
>> >
>> > While browsing through the lib/idr.c code, I noticed that the code at
>> > the end of ida_get_new_above() probably doesn't work as intended: Most
>> > users of ida use it via ida_simple_get(), and that starts by
>> > unconditionally calling ida_pre_get(), ensuring that ida->idr has
>> > 8==MAX_IDR_FREE idr_layers in its free list id_free. In the common
>> > case, none (or at most one) of these get used during
>> > ida_get_new_above(), and we only free one, leaving at least 6 (usually
>> > 7) idr_layers in the free list.
>>
>> I expect we'll be merging patches 1-32 of that series into 4.10-rc1 and
>> the above patch (#33) into 4.11-rc1.
>
> Hi Rasmus,
>
> Thanks for your work on this; you've really put some effort into
> proving your work has value. My motivation was purely aesthetic, but
> you've got some genuine savings here (admittedly it's about a quarter
> of a cent's worth of memory with DRAM selling for $10/GB).
> Nevertheless, that adds up over a billion devices, and there are still
> people trying to fit Linux into 4MB embedded devices.
>
Yeah, my main motivation was embedded devices which don't have the
luxury of measuring their RAM in GB. E.g., it's crazy that the
watchdog_ida effectively use more memory than the .text of the watchdog
subsystem, and similarly for the kthread workers, etc., etc.. I didn't
mean for my patches to go in as is, more to provoke some discussion. I
wasn't aware of your reimplementation, but it seems that may make the
problem go away.
> I think my reimplementation of the IDA on top of the radix tree is
> close enough to your tIDA in memory consumption that it doesn't
> warrant a new data structure.
>
> On a 64-bit machine, your tIDA root is 24 bytes; my new IDA root is 16
> bytes. If you allocate only one entry, you'll allocate 8 bytes.
> Thanks to the slab allocator, that gets rounded up to 32 bytes. I
> allocate the full 128 byte leaf, but I store the pointer to it in the
> root (unlike the IDR, the radix tree doesn't need to allocate a layer
> for a single entry). So tIDA wins on memory consumption between 1 and
> 511 IDs, and newIDA is slightly ahead between 512 and 1023 IDs.
This sounds good. I think there may still be a lot of users that never
allocate more than a handful of IDAs, making a 128 byte allocation still
somewhat excessive. One thing I considered was (exactly as it's done for
file descriptor tables) to embed a single word in the struct ida and
use that initially; I haven't looked closely at newIDA, so I don't know
how easy that would be or if its worth the complexity.
Rasmus
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-16 20:32 ` Rasmus Villemoes
@ 2016-12-16 21:09 ` Matthew Wilcox
2016-12-17 13:28 ` Matthew Wilcox
2016-12-18 2:42 ` Matthew Wilcox
2 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-16 21:09 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
RnJvbTogUmFzbXVzIFZpbGxlbW9lcyBbbWFpbHRvOmxpbnV4QHJhc211c3ZpbGxlbW9lcy5ka10N
Cj4gT24gRnJpLCBEZWMgMTYgMjAxNiwgTWF0dGhldyBXaWxjb3ggPG1hd2lsY294QG1pY3Jvc29m
dC5jb20+IHdyb3RlOg0KPiA+IFRoYW5rcyBmb3IgeW91ciB3b3JrIG9uIHRoaXM7IHlvdSd2ZSBy
ZWFsbHkgcHV0IHNvbWUgZWZmb3J0IGludG8NCj4gPiBwcm92aW5nIHlvdXIgd29yayBoYXMgdmFs
dWUuICBNeSBtb3RpdmF0aW9uIHdhcyBwdXJlbHkgYWVzdGhldGljLCBidXQNCj4gPiB5b3UndmUg
Z290IHNvbWUgZ2VudWluZSBzYXZpbmdzIGhlcmUgKGFkbWl0dGVkbHkgaXQncyBhYm91dCBhIHF1
YXJ0ZXINCj4gPiBvZiBhIGNlbnQncyB3b3J0aCBvZiBtZW1vcnkgd2l0aCBEUkFNIHNlbGxpbmcg
Zm9yICQxMC9HQikuDQo+ID4gTmV2ZXJ0aGVsZXNzLCB0aGF0IGFkZHMgdXAgb3ZlciBhIGJpbGxp
b24gZGV2aWNlcywgYW5kIHRoZXJlIGFyZSBzdGlsbA0KPiA+IHBlb3BsZSB0cnlpbmcgdG8gZml0
IExpbnV4IGludG8gNE1CIGVtYmVkZGVkIGRldmljZXMuDQo+ID4NCj4gDQo+IFllYWgsIG15IG1h
aW4gbW90aXZhdGlvbiB3YXMgZW1iZWRkZWQgZGV2aWNlcyB3aGljaCBkb24ndCBoYXZlIHRoZQ0K
PiBsdXh1cnkgb2YgbWVhc3VyaW5nIHRoZWlyIFJBTSBpbiBHQi4gRS5nLiwgaXQncyBjcmF6eSB0
aGF0IHRoZQ0KPiB3YXRjaGRvZ19pZGEgZWZmZWN0aXZlbHkgdXNlIG1vcmUgbWVtb3J5IHRoYW4g
dGhlIC50ZXh0IG9mIHRoZSB3YXRjaGRvZw0KPiBzdWJzeXN0ZW0sIGFuZCBzaW1pbGFybHkgZm9y
IHRoZSBrdGhyZWFkIHdvcmtlcnMsIGV0Yy4sIGV0Yy4uIEkgZGlkbid0DQo+IG1lYW4gZm9yIG15
IHBhdGNoZXMgdG8gZ28gaW4gYXMgaXMsIG1vcmUgdG8gcHJvdm9rZSBzb21lIGRpc2N1c3Npb24u
IEkNCj4gd2Fzbid0IGF3YXJlIG9mIHlvdXIgcmVpbXBsZW1lbnRhdGlvbiwgYnV0IGl0IHNlZW1z
IHRoYXQgbWF5IG1ha2UgdGhlDQo+IHByb2JsZW0gZ28gYXdheS4NCg0KSXQgY2VydGFpbmx5IHNo
cmlua3MgdGhlIHByb2JsZW0gZG93biB0byBhIHNpemUgd2hlcmUgaXQgbWF5IG5vdCBiZSB3b3J0
aCBpbnRyb2R1Y2luZyBhbm90aGVyIGltcGxlbWVudGF0aW9uLg0KDQo+ID4gT24gYSA2NC1iaXQg
bWFjaGluZSwgeW91ciB0SURBIHJvb3QgaXMgMjQgYnl0ZXM7IG15IG5ldyBJREEgcm9vdCBpcyAx
Ng0KPiA+IGJ5dGVzLiAgSWYgeW91IGFsbG9jYXRlIG9ubHkgb25lIGVudHJ5LCB5b3UnbGwgYWxs
b2NhdGUgOCBieXRlcy4NCj4gPiBUaGFua3MgdG8gdGhlIHNsYWIgYWxsb2NhdG9yLCB0aGF0IGdl
dHMgcm91bmRlZCB1cCB0byAzMiBieXRlcy4gIEkNCj4gPiBhbGxvY2F0ZSB0aGUgZnVsbCAxMjgg
Ynl0ZSBsZWFmLCBidXQgSSBzdG9yZSB0aGUgcG9pbnRlciB0byBpdCBpbiB0aGUNCj4gPiByb290
ICh1bmxpa2UgdGhlIElEUiwgdGhlIHJhZGl4IHRyZWUgZG9lc24ndCBuZWVkIHRvIGFsbG9jYXRl
IGEgbGF5ZXINCj4gPiBmb3IgYSBzaW5nbGUgZW50cnkpLiAgU28gdElEQSB3aW5zIG9uIG1lbW9y
eSBjb25zdW1wdGlvbiBiZXR3ZWVuIDEgYW5kDQo+ID4gNTExIElEcywgYW5kIG5ld0lEQSBpcyBz
bGlnaHRseSBhaGVhZCBiZXR3ZWVuIDUxMiBhbmQgMTAyMyBJRHMuDQo+IA0KPiBUaGlzIHNvdW5k
cyBnb29kLiBJIHRoaW5rIHRoZXJlIG1heSBzdGlsbCBiZSBhIGxvdCBvZiB1c2VycyB0aGF0IG5l
dmVyDQo+IGFsbG9jYXRlIG1vcmUgdGhhbiBhIGhhbmRmdWwgb2YgSURBcywgbWFraW5nIGEgMTI4
IGJ5dGUgYWxsb2NhdGlvbiBzdGlsbA0KPiBzb21ld2hhdCBleGNlc3NpdmUuIE9uZSB0aGluZyBJ
IGNvbnNpZGVyZWQgd2FzIChleGFjdGx5IGFzIGl0J3MgZG9uZSBmb3INCj4gZmlsZSBkZXNjcmlw
dG9yIHRhYmxlcykgdG8gZW1iZWQgYSBzaW5nbGUgd29yZCBpbiB0aGUgc3RydWN0IGlkYSBhbmQN
Cj4gdXNlIHRoYXQgaW5pdGlhbGx5OyBJIGhhdmVuJ3QgbG9va2VkIGNsb3NlbHkgYXQgbmV3SURB
LCBzbyBJIGRvbid0IGtub3cNCj4gaG93IGVhc3kgdGhhdCB3b3VsZCBiZSBvciBpZiBpdHMgd29y
dGggdGhlIGNvbXBsZXhpdHkuDQoNCkhlaCwgSSB3YXMgdGhpbmtpbmcgYWJvdXQgdGhhdCB0b28u
ICBUaGUgcmFkaXggdHJlZSBzdXBwb3J0cyAiZXhjZXB0aW9uYWwgZW50cmllcyIgd2hpY2ggaGF2
ZSB0aGUgYm90dG9tIGJpdCBzZXQuICBPbiBhIDY0LWJpdCBtYWNoaW5lLCB3ZSBjb3VsZCB1c2Ug
NjIgb2YgdGhlIGJpdHMgaW4gdGhlIHJhZGl4IHRyZWUgcm9vdCB0byBzdG9yZSB0aGUgSUQgYml0
bWFwLiAgSSdtIGEgbGl0dGxlIHdhcnkgb2YgdGhlIHBvdGVudGlhbCBjb21wbGV4aXR5LCBidXQg
d2Ugc2hvdWxkIHRyeSBpdCBvdXQuDQoNCkRpZCB5b3UgY29tZSB1cCB3aXRoIGFueSBmdW4gdGVz
dHMgdGhhdCBjb3VsZCBiZSBhZGRlZCB0byB0aGUgdGVzdC1zdWl0ZT8gIEl0IGZlZWxzIGEgbGl0
dGxlIHNsZW5kZXIgcmlnaHQgbm93Lg0K
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-16 20:32 ` Rasmus Villemoes
2016-12-16 21:09 ` Matthew Wilcox
@ 2016-12-17 13:28 ` Matthew Wilcox
2016-12-22 23:46 ` Rasmus Villemoes
2016-12-18 2:42 ` Matthew Wilcox
2 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-17 13:28 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
RnJvbTogTWF0dGhldyBXaWxjb3gNCj4gRnJvbTogUmFzbXVzIFZpbGxlbW9lcyBbbWFpbHRvOmxp
bnV4QHJhc211c3ZpbGxlbW9lcy5ka10NCj4gPiBUaGlzIHNvdW5kcyBnb29kLiBJIHRoaW5rIHRo
ZXJlIG1heSBzdGlsbCBiZSBhIGxvdCBvZiB1c2VycyB0aGF0IG5ldmVyDQo+ID4gYWxsb2NhdGUg
bW9yZSB0aGFuIGEgaGFuZGZ1bCBvZiBJREFzLCBtYWtpbmcgYSAxMjggYnl0ZSBhbGxvY2F0aW9u
IHN0aWxsDQo+ID4gc29tZXdoYXQgZXhjZXNzaXZlLiBPbmUgdGhpbmcgSSBjb25zaWRlcmVkIHdh
cyAoZXhhY3RseSBhcyBpdCdzIGRvbmUgZm9yDQo+ID4gZmlsZSBkZXNjcmlwdG9yIHRhYmxlcykg
dG8gZW1iZWQgYSBzaW5nbGUgd29yZCBpbiB0aGUgc3RydWN0IGlkYSBhbmQNCj4gPiB1c2UgdGhh
dCBpbml0aWFsbHk7IEkgaGF2ZW4ndCBsb29rZWQgY2xvc2VseSBhdCBuZXdJREEsIHNvIEkgZG9u
J3Qga25vdw0KPiA+IGhvdyBlYXN5IHRoYXQgd291bGQgYmUgb3IgaWYgaXRzIHdvcnRoIHRoZSBj
b21wbGV4aXR5Lg0KPiANCj4gSGVoLCBJIHdhcyB0aGlua2luZyBhYm91dCB0aGF0IHRvby4gIFRo
ZSByYWRpeCB0cmVlIHN1cHBvcnRzICJleGNlcHRpb25hbA0KPiBlbnRyaWVzIiB3aGljaCBoYXZl
IHRoZSBib3R0b20gYml0IHNldC4gIE9uIGEgNjQtYml0IG1hY2hpbmUsIHdlIGNvdWxkIHVzZSA2
Mg0KPiBvZiB0aGUgYml0cyBpbiB0aGUgcmFkaXggdHJlZSByb290IHRvIHN0b3JlIHRoZSBJRCBi
aXRtYXAuICBJJ20gYSBsaXR0bGUgd2FyeSBvZiB0aGUNCj4gcG90ZW50aWFsIGNvbXBsZXhpdHks
IGJ1dCB3ZSBzaG91bGQgdHJ5IGl0IG91dC4NCg0KVGVzdCBwYXRjaCBoZXJlOiBodHRwOi8vZ2l0
LmluZnJhZGVhZC5vcmcvdXNlcnMvd2lsbHkvbGludXgtZGF4LmdpdC9zaG9ydGxvZy9yZWZzL2hl
YWRzL2lkci0yMDE2LTEyLTE2DQpJdCBwYXNzZXMgdGhlIHRlc3Qgc3VpdGUgLi4uIHdoaWNoIEkg
YWN0dWFsbHkgaGFkIHRvIGFkanVzdCBiZWNhdXNlIGl0IG5vdyBzdWNjZWVkcyBpbiBjYXNlcyB3
aGVyZSBpdCBoYWRuJ3QgKGFsbG9jYXRpbmcgSUQgMCB3aXRob3V0IHByZWFsbG9jYXRpbmcpLCBh
bmQgaXQgd2lsbCBub3cgZmFpbCBpbiBjYXNlcyB3aGVyZSBpdCBoYWRuJ3QgcHJldmlvdXNseSAo
YXNzdW1pbmcgYSBzaW5nbGUgcHJlYWxsb2NhdGlvbiB3b3VsZCBiZSBlbm91Z2gpLiAgVGhlcmUg
c2hvdWxkbid0IGJlIGFueSBleGFtcGxlcyBvZiB0aGF0IGluIHRoZSBrZXJuZWwgcHJvcGVyOyBp
dCB3YXMgc2ltcGx5IG1lIGJlaW5nIGxhenkgd2hlbiBJIHdyb3RlIHRoZSB0ZXN0IHN1aXRlLg0K
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-16 20:32 ` Rasmus Villemoes
2016-12-16 21:09 ` Matthew Wilcox
2016-12-17 13:28 ` Matthew Wilcox
@ 2016-12-18 2:42 ` Matthew Wilcox
2 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-18 2:42 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
RnJvbTogTWF0dGhldyBXaWxjb3gNCj4gRnJvbTogTWF0dGhldyBXaWxjb3gNCj4gPiBIZWgsIEkg
d2FzIHRoaW5raW5nIGFib3V0IHRoYXQgdG9vLiAgVGhlIHJhZGl4IHRyZWUgc3VwcG9ydHMgImV4
Y2VwdGlvbmFsDQo+ID4gZW50cmllcyIgd2hpY2ggaGF2ZSB0aGUgYm90dG9tIGJpdCBzZXQuICBP
biBhIDY0LWJpdCBtYWNoaW5lLCB3ZSBjb3VsZCB1c2UNCj4gNjINCj4gPiBvZiB0aGUgYml0cyBp
biB0aGUgcmFkaXggdHJlZSByb290IHRvIHN0b3JlIHRoZSBJRCBiaXRtYXAuICBJJ20gYSBsaXR0
bGUgd2FyeSBvZg0KPiB0aGUNCj4gPiBwb3RlbnRpYWwgY29tcGxleGl0eSwgYnV0IHdlIHNob3Vs
ZCB0cnkgaXQgb3V0Lg0KPiANCj4gVGVzdCBwYXRjaCBoZXJlOiBodHRwOi8vZ2l0LmluZnJhZGVh
ZC5vcmcvdXNlcnMvd2lsbHkvbGludXgtDQo+IGRheC5naXQvc2hvcnRsb2cvcmVmcy9oZWFkcy9p
ZHItMjAxNi0xMi0xNg0KPiBJdCBwYXNzZXMgdGhlIHRlc3Qgc3VpdGUgLi4uIHdoaWNoIEkgYWN0
dWFsbHkgaGFkIHRvIGFkanVzdCBiZWNhdXNlIGl0IG5vdyBzdWNjZWVkcw0KPiBpbiBjYXNlcyB3
aGVyZSBpdCBoYWRuJ3QgKGFsbG9jYXRpbmcgSUQgMCB3aXRob3V0IHByZWFsbG9jYXRpbmcpLCBh
bmQgaXQgd2lsbCBub3cNCj4gZmFpbCBpbiBjYXNlcyB3aGVyZSBpdCBoYWRuJ3QgcHJldmlvdXNs
eSAoYXNzdW1pbmcgYSBzaW5nbGUgcHJlYWxsb2NhdGlvbiB3b3VsZA0KPiBiZSBlbm91Z2gpLiAg
VGhlcmUgc2hvdWxkbid0IGJlIGFueSBleGFtcGxlcyBvZiB0aGF0IGluIHRoZSBrZXJuZWwgcHJv
cGVyOyBpdA0KPiB3YXMgc2ltcGx5IG1lIGJlaW5nIGxhenkgd2hlbiBJIHdyb3RlIHRoZSB0ZXN0
IHN1aXRlLg0KDQpGb3VuZCBhIGJ1ZywgY29tbWl0dGVkIGEgZml4IChhbmQgYW5vdGhlciB0ZXN0
IGNhc2UpLiAgSXQgbm93IG5vIGxvbmdlciByZXR1cm5zIC1FQUdBSU4gaW4gc2l0dWF0aW9ucyB3
aGVyZSBpdCB3b3VsZG4ndCBoYXZlLCBzbyBJJ3ZlIHJldmVydGVkIHRoYXQgYml0IG9mIHRoZSB0
ZXN0IHN1aXRlIGNoYW5nZS4gIFN0aWxsIHN1Y2NlZWRzIHdoZW4gaXQgd291bGRuJ3QgaGF2ZSBi
ZWZvcmUsIGJ1dCBJIGZlZWwgbm8gcHJlc3N1cmUgdG8gY2hhbmdlIHRoYXQgOy0pDQo=
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-17 13:28 ` Matthew Wilcox
@ 2016-12-22 23:46 ` Rasmus Villemoes
2016-12-23 17:03 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: Rasmus Villemoes @ 2016-12-22 23:46 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
On Sat, Dec 17 2016, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: Matthew Wilcox
>> From: Rasmus Villemoes [mailto:linux@rasmusvillemoes.dk]
>> > This sounds good. I think there may still be a lot of users that never
>> > allocate more than a handful of IDAs, making a 128 byte allocation still
>> > somewhat excessive. One thing I considered was (exactly as it's done for
>> > file descriptor tables) to embed a single word in the struct ida and
>> > use that initially; I haven't looked closely at newIDA, so I don't know
>> > how easy that would be or if its worth the complexity.
>>
>> Heh, I was thinking about that too. The radix tree supports "exceptional
>> entries" which have the bottom bit set. On a 64-bit machine, we could use 62
>> of the bits in the radix tree root to store the ID bitmap. I'm a little wary of the
>> potential complexity, but we should try it out.
>
> Test patch here: http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-16
> It passes the test suite ... which I actually had to adjust because it
> now succeeds in cases where it hadn't (allocating ID 0 without
> preallocating), and it will now fail in cases where it hadn't
> previously (assuming a single preallocation would be enough). There
> shouldn't be any examples of that in the kernel proper; it was simply
> me being lazy when I wrote the test suite.
Nice work! A few random comments/questions:
- It does add some complexity, but I think a few comments would make it
more digestable.
- Hm, maybe I'm confused, and I certainly don't understand how the whole
radix tree works. But do you use every leaf node as an exceptional
entry initially, to allocate the first 62 ids from that level? This
code
if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) <
BITS_PER_LONG) {
bit += RADIX_TREE_EXCEPTIONAL_SHIFT;
radix_tree_iter_replace(root, &iter, slot,
(void *)((1UL << bit) |
RADIX_TREE_EXCEPTIONAL_ENTRY));
*id = new;
return 0;
}
operates on bit, which is the offset from index*IDA_BITMAP_BITS, and
it seems to create an exceptional entry somewhere down the tree
(which may of course be the root).
But we don't seem to allocate another bit from that exceptional entry
ever unless it happened to sit at index 0; the code
unsigned long tmp = (unsigned long)bitmap;
if (start < BITS_PER_LONG) {
unsigned tbit = find_next_zero_bit(&tmp,
BITS_PER_LONG, start);
if (tbit < BITS_PER_LONG) {
tmp |= 1UL << tbit;
rcu_assign_pointer(*slot, (void *)tmp);
*id = new + tbit -
RADIX_TREE_EXCEPTIONAL_SHIFT;
return 0;
}
}
is only used for small values of start (though it does seem to
account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).
- In the micro-optimization department, I think we should avoid
find_next_zero_bit on a single word, and instead use __ffs. Something
like (assuming we can use bit instead of start)
if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) {
tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT);
if (tmp) {
tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
tmp = (unsigned long)bitmap | (1UL << tbit);
rcu_assign_pointer(*slot, (void *)tmp);
*id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT;
return 0;
}
}
Rasmus
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFC 00/10] implement alternative and much simpler id allocator
2016-12-22 23:46 ` Rasmus Villemoes
@ 2016-12-23 17:03 ` Matthew Wilcox
0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2016-12-23 17:03 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Tejun Heo, linux-kernel@vger.kernel.org, Lai Jiangshan,
Jens Axboe, Greg Kroah-Hartman, linux-block@vger.kernel.org,
dri-devel@lists.freedesktop.org, Andrew Morton
RnJvbTogUmFzbXVzIFZpbGxlbW9lcyBbbWFpbHRvOmxpbnV4QHJhc211c3ZpbGxlbW9lcy5ka10N
Cj4gTmljZSB3b3JrISBBIGZldyByYW5kb20gY29tbWVudHMvcXVlc3Rpb25zOg0KPiANCj4gLSBJ
dCBkb2VzIGFkZCBzb21lIGNvbXBsZXhpdHksIGJ1dCBJIHRoaW5rIGEgZmV3IGNvbW1lbnRzIHdv
dWxkIG1ha2UgaXQNCj4gICBtb3JlIGRpZ2VzdGFibGUuDQoNCkknbSBvcGVuIHRvIGFkZGluZyBz
b21lIGNvbW1lbnRzIC4uLiBJIG5lZWQgc29tZSB0aW1lIGJldHdlZW4gd3JpdGluZyB0aGUgY29k
ZSBhbmQgd3JpdGluZyB0aGUgY29tbWVudHMgdG8gYmUgc3VyZSB3aGF0IGNvbW1lbnRzIGFyZSB1
c2VmdWwuDQoNCj4gLSBIbSwgbWF5YmUgSSdtIGNvbmZ1c2VkLCBhbmQgSSBjZXJ0YWlubHkgZG9u
J3QgdW5kZXJzdGFuZCBob3cgdGhlIHdob2xlDQo+ICAgcmFkaXggdHJlZSB3b3Jrcy4gQnV0IGRv
IHlvdSB1c2UgZXZlcnkgbGVhZiBub2RlIGFzIGFuIGV4Y2VwdGlvbmFsDQo+ICAgZW50cnkgaW5p
dGlhbGx5LCB0byBhbGxvY2F0ZSB0aGUgZmlyc3QgNjIgaWRzIGZyb20gdGhhdCBsZXZlbD8gVGhp
cw0KPiAgIGNvZGUNCg0KSSBkbyEgIEFuZCB0aGF0IHF1ZXN0aW9uIHRlbGxzIG1lIG9uZSB1c2Vm
dWwgY29tbWVudCB0byBhZGQhDQoNCj4gCWlmICgoYml0ICsgUkFESVhfVFJFRV9FWENFUFRJT05B
TF9TSElGVCkgPA0KPiAJCQkJCUJJVFNfUEVSX0xPTkcpIHsNCj4gCQliaXQgKz0gUkFESVhfVFJF
RV9FWENFUFRJT05BTF9TSElGVDsNCj4gCQlyYWRpeF90cmVlX2l0ZXJfcmVwbGFjZShyb290LCAm
aXRlciwgc2xvdCwNCj4gCQkJCSh2b2lkICopKCgxVUwgPDwgYml0KSB8DQo+IAkJCQlSQURJWF9U
UkVFX0VYQ0VQVElPTkFMX0VOVFJZKSk7DQo+IAkJKmlkID0gbmV3Ow0KPiAJCXJldHVybiAwOw0K
PiAJfQ0KPiANCj4gICAgb3BlcmF0ZXMgb24gYml0LCB3aGljaCBpcyB0aGUgb2Zmc2V0IGZyb20g
aW5kZXgqSURBX0JJVE1BUF9CSVRTLCBhbmQNCj4gICAgaXQgc2VlbXMgdG8gY3JlYXRlIGFuIGV4
Y2VwdGlvbmFsIGVudHJ5IHNvbWV3aGVyZSBkb3duIHRoZSB0cmVlDQo+ICAgICh3aGljaCBtYXkg
b2YgY291cnNlIGJlIHRoZSByb290KS4NCj4gDQo+ICAgIEJ1dCB3ZSBkb24ndCBzZWVtIHRvIGFs
bG9jYXRlIGFub3RoZXIgYml0IGZyb20gdGhhdCBleGNlcHRpb25hbCBlbnRyeQ0KPiAgICBldmVy
IHVubGVzcyBpdCBoYXBwZW5lZCB0byBzaXQgYXQgaW5kZXggMDsgdGhlIGNvZGUNCj4gDQo+IAl1
bnNpZ25lZCBsb25nIHRtcCA9ICh1bnNpZ25lZCBsb25nKWJpdG1hcDsNCj4gCWlmIChzdGFydCA8
IEJJVFNfUEVSX0xPTkcpIHsNCj4gCQl1bnNpZ25lZCB0Yml0ID0gZmluZF9uZXh0X3plcm9fYml0
KCZ0bXAsDQo+IAkJCQkJQklUU19QRVJfTE9ORywgc3RhcnQpOw0KPiAJCWlmICh0Yml0IDwgQklU
U19QRVJfTE9ORykgew0KPiAJCQl0bXAgfD0gMVVMIDw8IHRiaXQ7DQo+IAkJCXJjdV9hc3NpZ25f
cG9pbnRlcigqc2xvdCwgKHZvaWQgKil0bXApOw0KPiAJCQkqaWQgPSBuZXcgKyB0Yml0IC0NCj4g
CQkJCVJBRElYX1RSRUVfRVhDRVBUSU9OQUxfU0hJRlQ7DQo+IAkJCXJldHVybiAwOw0KPiAJCX0N
Cj4gCX0NCj4gDQo+ICAgIGlzIG9ubHkgdXNlZCBmb3Igc21hbGwgdmFsdWVzIG9mIHN0YXJ0ICh0
aG91Z2ggaXQgZG9lcyBzZWVtIHRvDQo+ICAgIGFjY291bnQgZm9yIGEgbm9uLXplcm8gdmFsdWUg
b2YgbmV3ID09IGl0ZXIuaW5kZXggKiBJREFfQklUTUFQX0JJVFMpLg0KDQpBaGguICBZb3UncmUg
cmVhZGluZyB0aGUgY29kZSB3cm9uZywgYW5kIEkgd3JvdGUgdGhlIGNvZGUgd3JvbmdseSB0b28s
IHNvIGl0J3MgY2xlYXJseSBvdmVybHkgY29tcGxleC4gIEkgd2FzIHRlc3Rpbmcgd2l0aCAnc3Rh
cnQnIHNldCB0byAwLCBhbGxvY2F0aW5nIE4gZW50cmllcywgYW5kIHRoZW4gZnJlZWluZyB0aGVt
LiAgSWYgSSdkIHNldCBzdGFydCB0byAxMDI0IGFuZCBhbGxvY2F0ZWQgdHdvIGVudHJpZXMsIEkn
ZCd2ZSBzZWVuIHRoZSBmYWlsdXJlLg0KDQpQbGVhc2Ugc2VlIHRoZSB0b3AgY29tbWl0IGhlcmUg
KCJJbXByb3ZlIElEQSBleGNlcHRpb25hbCBlbnRyeSBoYW5kbGluZyIpOiBodHRwOi8vZ2l0Lmlu
ZnJhZGVhZC5vcmcvdXNlcnMvd2lsbHkvbGludXgtZGF4LmdpdC9zaG9ydGxvZy9yZWZzL2hlYWRz
L2lkci0yMDE2LTEyLTIwDQoNCg0KPiAtIEluIHRoZSBtaWNyby1vcHRpbWl6YXRpb24gZGVwYXJ0
bWVudCwgSSB0aGluayB3ZSBzaG91bGQgYXZvaWQNCj4gICBmaW5kX25leHRfemVyb19iaXQgb24g
YSBzaW5nbGUgd29yZCwgYW5kIGluc3RlYWQgdXNlIF9fZmZzLiBTb21ldGhpbmcNCj4gICBsaWtl
IChhc3N1bWluZyB3ZSBjYW4gdXNlIGJpdCBpbnN0ZWFkIG9mIHN0YXJ0KQ0KPiANCj4gICBpZiAo
Yml0ICsgUkFESVhfVFJFRV9FWENFUFRJT05BTF9TSElGVCA8IEJJVFNfUEVSX0xPTkcpIHsNCj4g
ICAgIHRtcCA9ICh+KHVuc2lnbmVkIGxvbmcpYml0bWFwKSA+PiAoYml0ICsgUkFESVhfVFJFRV9F
WENFUFRJT05BTF9TSElGVCk7DQo+ICAgICBpZiAodG1wKSB7DQo+ICAgICAgIHRiaXQgPSBfX2Zm
cyh0bXApICsgYml0ICsgUkFESVhfVFJFRV9FWENFUFRJT05BTF9TSElGVDsNCj4gICAgICAgdG1w
ID0gKHVuc2lnbmVkIGxvbmcpYml0bWFwIHwgKDFVTCA8PCB0Yml0KTsNCj4gICAgICAgcmN1X2Fz
c2lnbl9wb2ludGVyKCpzbG90LCAodm9pZCAqKXRtcCk7DQo+ICAgICAgICppZCA9IG5ldyArIHRi
aXQgLSBSQURJWF9UUkVFX0VYQ0VQVElPTkFMX1NISUZUOw0KPiAgICAgICByZXR1cm4gMDsNCj4g
ICAgIH0NCj4gICB9DQoNCkknbSBjZXJ0YWlubHkgb3BlbiB0byBtaWNyb29wdGltaXNhdGlvbnMs
IGJ1dCBJJ2xsIGhhdmUgdG8gdGhpbmsgYWJvdXQgdGhpcyBvbmUgZm9yIGEgYml0Lg0K
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-12-23 17:03 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-08 1:22 [RFC 00/10] implement alternative and much simpler id allocator Rasmus Villemoes
2016-12-08 1:23 ` [RFC 06/10] block: use tida as small " Rasmus Villemoes
2016-12-08 3:56 ` Jens Axboe
2016-12-08 11:02 ` Greg Kroah-Hartman
2016-12-09 13:49 ` [RFC 00/10] implement alternative and much simpler " Tejun Heo
2016-12-09 22:01 ` Andrew Morton
2016-12-12 17:09 ` Tejun Heo
2016-12-12 17:35 ` Matthew Wilcox
2016-12-12 18:05 ` Tejun Heo
2016-12-16 19:14 ` Matthew Wilcox
2016-12-16 20:32 ` Rasmus Villemoes
2016-12-16 21:09 ` Matthew Wilcox
2016-12-17 13:28 ` Matthew Wilcox
2016-12-22 23:46 ` Rasmus Villemoes
2016-12-23 17:03 ` Matthew Wilcox
2016-12-18 2:42 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).