* [RFC][PATCH v2 0/3] mm/zsmalloc: increase objects density and reduce memory wastage @ 2016-02-21 13:27 Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 1/3] mm/zsmalloc: introduce zs_get_huge_class_size_watermark() Sergey Senozhatsky ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-21 13:27 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky Hello, RFC huge classes are evil. zsmalloc knows the watermark after which classes are considered to be ->huge - every object stored consumes the entire zspage (which consist of a single order-0 page). zram, however, has its own statically defined watermark for `bad' compression and stores every object larger than this watermark as a PAGE_SIZE, object, IOW, to a ->huge class, this results in increased memory consumption and memory wastage. And zram's 'bad' watermark is much lower than zsmalloc's one. Apart from that, 'bad' compressions are not so rare, on some of my tests 41% of writes are 'bad' compressions. This patch set inverts this 'huge class watermark' enforcement, it's zsmalloc that knows better, not zram. It also reduces the number of huge classes, this saves some memory. Since we request less pages for object larger than 3072 bytes, zmalloc in some cases should behave nicer when the system is getting low on free pages. Object's location is encoded as <PFN, OBJ_INDEX_BITS | OBJ_ALLOCATED_TAG | HANDLE_PIN_BIT> so mostly we have enough bits in OBJ_INDEX_BITS to increase ZS_MAX_ZSPAGE_ORDER and keep all of the classes. This is not true, however, on PAE/LPAE and PAGE_SHIFT 16 systems, so we need to preserve the exiting ZS_MAX_ZSPAGE_ORDER 2 limit there. Please commit 0003 for some numbers. Thanks to Joonsoo Kim for valuable questions and opinions. v2: -- keep ZS_MAX_PAGES_PER_ZSPAGE order of two (Joonsoo) -- suffice ZS_MIN_ALLOC_SIZE alignment requirement -- do not change ZS_MAX_PAGES_PER_ZSPAGE on PAE/LPAE and on PAGE_SHIFT 16 systems (Joonsoo) Sergey Senozhatsky (3): mm/zsmalloc: introduce zs_get_huge_class_size_watermark() zram: use zs_get_huge_class_size_watermark() mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE drivers/block/zram/zram_drv.c | 2 +- drivers/block/zram/zram_drv.h | 6 ------ include/linux/zsmalloc.h | 2 ++ mm/zsmalloc.c | 43 ++++++++++++++++++++++++++++++++++++------- 4 files changed, 39 insertions(+), 14 deletions(-) -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC][PATCH v2 1/3] mm/zsmalloc: introduce zs_get_huge_class_size_watermark() 2016-02-21 13:27 [RFC][PATCH v2 0/3] mm/zsmalloc: increase objects density and reduce memory wastage Sergey Senozhatsky @ 2016-02-21 13:27 ` Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE Sergey Senozhatsky 2 siblings, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-21 13:27 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> zsmalloc knows the watermark after which classes are considered to be ->huge -- every object stored consumes the entire zspage (which consist of a single order-0 page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is 3264, so starting down from size 3264, objects share page(-s) and thus minimize memory wastage. zram, however, has its own statically defined watermark for `bad' compression "3 * PAGE_SIZE / 4 = 3072", and stores every object larger than this watermark (3072) as a PAGE_SIZE, object, IOW, to a ->huge class, this results in increased memory consumption and memory wastage. (With a small exception: 3264 bytes class. zs_malloc() adds ZS_HANDLE_SIZE to the object's size, so some objects can pass 3072 bytes and get_size_class_index(size) will return 3264 bytes size class). Introduce a zs_get_huge_class_size_watermark() function which tells the size of a first non-huge class; so zram now can store objects to ->huge clases only when those objects have sizes greater than huge_class_size_watermark. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- include/linux/zsmalloc.h | 2 ++ mm/zsmalloc.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 34eb160..45dcb51 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -55,4 +55,6 @@ unsigned long zs_get_total_pages(struct zs_pool *pool); unsigned long zs_compact(struct zs_pool *pool); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +int zs_get_huge_class_size_watermark(void); #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 43e4cbc..e7f10bd 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -188,6 +188,11 @@ static struct dentry *zs_stat_root; static int zs_size_classes; /* + * All classes above this class_size are huge classes + */ +static int huge_class_size_watermark; + +/* * We assign a page to ZS_ALMOST_EMPTY fullness group when: * n <= N / f, where * n = number of allocated objects @@ -1241,6 +1246,12 @@ unsigned long zs_get_total_pages(struct zs_pool *pool) } EXPORT_SYMBOL_GPL(zs_get_total_pages); +int zs_get_huge_class_size_watermark(void) +{ + return huge_class_size_watermark; +} +EXPORT_SYMBOL_GPL(zs_get_huge_class_size_watermark); + /** * zs_map_object - get address of allocated object from handle. * @pool: pool from which the object was allocated @@ -1942,10 +1953,13 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t flags) if (pages_per_zspage == 1 && get_maxobj_per_zspage(size, pages_per_zspage) == 1) class->huge = true; + spin_lock_init(&class->lock); pool->size_class[i] = class; prev_class = class; + if (!class->huge && !huge_class_size_watermark) + huge_class_size_watermark = size - ZS_HANDLE_SIZE; } pool->flags = flags; -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-21 13:27 [RFC][PATCH v2 0/3] mm/zsmalloc: increase objects density and reduce memory wastage Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 1/3] mm/zsmalloc: introduce zs_get_huge_class_size_watermark() Sergey Senozhatsky @ 2016-02-21 13:27 ` Sergey Senozhatsky 2016-02-22 0:04 ` Minchan Kim 2016-02-21 13:27 ` [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE Sergey Senozhatsky 2 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-21 13:27 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> zram should stop enforcing its own 'bad' object size watermark, and start using zs_get_huge_class_size_watermark(). zsmalloc really knows better. Drop `max_zpage_size' and use zs_get_huge_class_size_watermark() instead. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- drivers/block/zram/zram_drv.c | 2 +- drivers/block/zram/zram_drv.h | 6 ------ 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 46055db..2621564 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -714,7 +714,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, goto out; } src = zstrm->buffer; - if (unlikely(clen > max_zpage_size)) { + if (unlikely(clen > zs_get_huge_class_size_watermark())) { clen = PAGE_SIZE; if (is_partial_io(bvec)) src = uncmem; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 8e92339..8879161 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -23,12 +23,6 @@ /*-- Configurable parameters */ /* - * Pages that compress to size greater than this are stored - * uncompressed in memory. - */ -static const size_t max_zpage_size = PAGE_SIZE / 4 * 3; - -/* * NOTE: max_zpage_size must be less than or equal to: * ZS_MAX_ALLOC_SIZE. Otherwise, zs_malloc() would * always return failure. -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-21 13:27 ` [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() Sergey Senozhatsky @ 2016-02-22 0:04 ` Minchan Kim 2016-02-22 0:40 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 0:04 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky On Sun, Feb 21, 2016 at 10:27:53PM +0900, Sergey Senozhatsky wrote: > From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> > > zram should stop enforcing its own 'bad' object size watermark, > and start using zs_get_huge_class_size_watermark(). zsmalloc > really knows better. > > Drop `max_zpage_size' and use zs_get_huge_class_size_watermark() > instead. max_zpage_size was there since zram's grandpa(ie, ramzswap). AFAIR, at that time, it works to forward incompressible (e.g, PAGE_SIZE/2) page to backing swap if it presents. If it doesn't have any backing swap and it's incompressbile (e.g, PAGE_SIZE*3/4), it stores it as uncompressed page to avoid *decompress* overhead later. And Nitin want to make it as tunable parameter. I agree the approach because I don't want to make coupling between zram and allocator as far as possible. If huge class is pain, it's allocator problem, not zram stuff. I think we should try to remove such problem in zsmalloc layer, firstly. Having said that, I agree your claim that uncompressible pages are pain. I want to handle the problem as multiple-swap apparoach. Now, zram is very popular and I expect we will use multiple swap(i.e., zram swap + eMMC swap) shortly. For that case, we could forward uncompressible page to the eMMC swap with simple tweaking of swap subsystem if zram returns error once it found it's incompressible page. For that, we should introduce new knob in zram layer like Nitin did and make it configurable so we could solve the problem of single zram-swap system as well as multiple swap system. > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > --- > drivers/block/zram/zram_drv.c | 2 +- > drivers/block/zram/zram_drv.h | 6 ------ > 2 files changed, 1 insertion(+), 7 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index 46055db..2621564 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -714,7 +714,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > goto out; > } > src = zstrm->buffer; > - if (unlikely(clen > max_zpage_size)) { > + if (unlikely(clen > zs_get_huge_class_size_watermark())) { > clen = PAGE_SIZE; > if (is_partial_io(bvec)) > src = uncmem; > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > index 8e92339..8879161 100644 > --- a/drivers/block/zram/zram_drv.h > +++ b/drivers/block/zram/zram_drv.h > @@ -23,12 +23,6 @@ > /*-- Configurable parameters */ > > /* > - * Pages that compress to size greater than this are stored > - * uncompressed in memory. > - */ > -static const size_t max_zpage_size = PAGE_SIZE / 4 * 3; > - > -/* > * NOTE: max_zpage_size must be less than or equal to: > * ZS_MAX_ALLOC_SIZE. Otherwise, zs_malloc() would > * always return failure. > -- > 2.7.1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 0:04 ` Minchan Kim @ 2016-02-22 0:40 ` Sergey Senozhatsky 2016-02-22 1:27 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 0:40 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky On (02/22/16 09:04), Minchan Kim wrote: [..] > max_zpage_size was there since zram's grandpa(ie, ramzswap). > AFAIR, at that time, it works to forward incompressible > (e.g, PAGE_SIZE/2) page to backing swap if it presents. > If it doesn't have any backing swap and it's incompressbile > (e.g, PAGE_SIZE*3/4), it stores it as uncompressed page > to avoid *decompress* overhead later. "PAGE_SIZE * 3 / 4" introduces a bigger memory overhead than decompression of 3K bytes later. > And Nitin want to make it as tunable parameter. I agree the > approach because I don't want to make coupling between zram > and allocator as far as possible. > > If huge class is pain they are. > it's allocator problem, not zram stuff. the allocator's problems start at the point where zram begins to have opinion on what should be stored as ->huge object and what should not. it's not up to zram to enforce this. > I think we should try to remove such problem in zsmalloc layer, > firstly. zram asks to store a PAGE_SIZE sized object, what zsmalloc can possible do about this? > Having said that, I agree your claim that uncompressible pages > are pain. I want to handle the problem as multiple-swap apparoach. zram is not just for swapping. as simple as that. and enforcing a multi-swap approach on folks who use zram for swap doesn't look right to me. > For that, we should introduce new knob in zram layer like Nitin > did and make it configurable so we could solve the problem of > single zram-swap system as well as multiple swap system. a 'bad compression' watermark knob? isn't it an absolutely low level thing no one ever should see? -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 0:40 ` Sergey Senozhatsky @ 2016-02-22 1:27 ` Minchan Kim 2016-02-22 1:59 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 1:27 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 09:40:47AM +0900, Sergey Senozhatsky wrote: > On (02/22/16 09:04), Minchan Kim wrote: > [..] > > max_zpage_size was there since zram's grandpa(ie, ramzswap). > > AFAIR, at that time, it works to forward incompressible > > (e.g, PAGE_SIZE/2) page to backing swap if it presents. > > If it doesn't have any backing swap and it's incompressbile > > (e.g, PAGE_SIZE*3/4), it stores it as uncompressed page > > to avoid *decompress* overhead later. > > "PAGE_SIZE * 3 / 4" introduces a bigger memory overhead than > decompression of 3K bytes later. > > > And Nitin want to make it as tunable parameter. I agree the > > approach because I don't want to make coupling between zram > > and allocator as far as possible. > > > > If huge class is pain > > they are. > > > it's allocator problem, not zram stuff. > > the allocator's problems start at the point where zram begins to have > opinion on what should be stored as ->huge object and what should not. > it's not up to zram to enforce this. > > > > I think we should try to remove such problem in zsmalloc layer, > > firstly. > > zram asks to store a PAGE_SIZE sized object, what zsmalloc can > possible do about this? zsmalloc can increase ZS_MAX_ZSPAGE_ORDER or can save metadata in the extra space. In fact, I tried interlink approach long time ago. For example, class-A -> class-B A = x, B = (4096 - y) >= x The problem was class->B zspage consumes memory although there is no object in the zspage because class-A object in the extra space of class-B pin the class-B zspage. I prefer your ZS_MAX_ZSPAGE_ORDER increaing approach but as I told in that thread, we should prepare dynamic creating of sub-page in zspage. > > > > Having said that, I agree your claim that uncompressible pages > > are pain. I want to handle the problem as multiple-swap apparoach. > > zram is not just for swapping. as simple as that. Yes, I mean if we have backing storage, we could mitigate the problem like the mentioned approach. Otherwise, we should solve it in allocator itself and you suggested the idea and I commented first step. What's the problem, now? > > > and enforcing a multi-swap approach on folks who use zram for swap > doesn't look right to me. Ditto. > > > > For that, we should introduce new knob in zram layer like Nitin > > did and make it configurable so we could solve the problem of > > single zram-swap system as well as multiple swap system. > > a 'bad compression' watermark knob? isn't it an absolutely low level > thing no one ever should see? It's a knob to determine that how to handle incompressible page in zram layer. For example, admin can tune it to 2048. It means if we have backing store and compressed ratio is under 50%, admin want to pass the page into swap storage. If the system is no backed store, it means admin want to avoid decompress overhead if the ratio is smaller. I don't think it's a low level thing. > > -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 1:27 ` Minchan Kim @ 2016-02-22 1:59 ` Sergey Senozhatsky 2016-02-22 2:05 ` Sergey Senozhatsky 2016-02-22 2:57 ` Minchan Kim 0 siblings, 2 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 1:59 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 10:27), Minchan Kim wrote: [..] > > zram asks to store a PAGE_SIZE sized object, what zsmalloc can > > possible do about this? > > zsmalloc can increase ZS_MAX_ZSPAGE_ORDER or can save metadata in > the extra space. In fact, I tried interlink approach long time ago. > For example, class-A -> class-B > > A = x, B = (4096 - y) >= x > > The problem was class->B zspage consumes memory although there is > no object in the zspage because class-A object in the extra space > of class-B pin the class-B zspage. I thought about it too -- utilizing 'unused space' to store there smaller objects. and I think it potentially has more problems. compaction (and everything) seem to be much simpler when we have only objects of size-X in class_size X. > I prefer your ZS_MAX_ZSPAGE_ORDER increaing approach but as I told > in that thread, we should prepare dynamic creating of sub-page > in zspage. I agree that in general dynamic class page allocation sounds interesting enough. > > > Having said that, I agree your claim that uncompressible pages > > > are pain. I want to handle the problem as multiple-swap apparoach. > > > > zram is not just for swapping. as simple as that. > > Yes, I mean if we have backing storage, we could mitigate the problem > like the mentioned approach. Otherwise, we should solve it in allocator > itself and you suggested the idea and I commented first step. > What's the problem, now? well, I didn't say I have problems. so you want a backing device that will keep only 'bad compression' objects and use zsmalloc to keep there only 'good compression' objects? IOW, no huge classes in zsmalloc at all? well, that can work out. it's a bit strange though that to solve zram-zsmalloc issues we would ask someone to create a additional device. it looks (at least for now) that we can address those issues in zram-zsmalloc entirely; w/o user intervention or a 3rd party device. > > > For that, we should introduce new knob in zram layer like Nitin > > > did and make it configurable so we could solve the problem of > > > single zram-swap system as well as multiple swap system. > > > > a 'bad compression' watermark knob? isn't it an absolutely low level > > thing no one ever should see? > > It's a knob to determine that how to handle incompressible page > in zram layer. For example, admin can tune it to 2048. It means > if we have backing store and compressed ratio is under 50%, > admin want to pass the page into swap storage. If the system > is no backed store, it means admin want to avoid decompress > overhead if the ratio is smaller. I see your point. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 1:59 ` Sergey Senozhatsky @ 2016-02-22 2:05 ` Sergey Senozhatsky 2016-02-22 2:57 ` Minchan Kim 1 sibling, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 2:05 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Minchan Kim, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 10:59), Sergey Senozhatsky wrote: [..] > > > > Having said that, I agree your claim that uncompressible pages > > > > are pain. I want to handle the problem as multiple-swap apparoach. > > > > > > zram is not just for swapping. as simple as that. > > > > Yes, I mean if we have backing storage, we could mitigate the problem > > like the mentioned approach. Otherwise, we should solve it in allocator > > itself and you suggested the idea and I commented first step. > > What's the problem, now? > > well, I didn't say I have problems. > so you want a backing device that will keep only 'bad compression' > objects and use zsmalloc to keep there only 'good compression' objects? > IOW, no huge classes in zsmalloc at all? hm, in the worst case we can have _for example_ 80+% of writes to be 'bad compression'. that turns zsmalloc into a 3rd wheel, and makes it almost unneeded. hm, may be it's better for now to fix zsmalloc-zram pair. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 1:59 ` Sergey Senozhatsky 2016-02-22 2:05 ` Sergey Senozhatsky @ 2016-02-22 2:57 ` Minchan Kim 2016-02-22 3:54 ` Sergey Senozhatsky 1 sibling, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 2:57 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 10:59:12AM +0900, Sergey Senozhatsky wrote: > On (02/22/16 10:27), Minchan Kim wrote: > [..] > > > zram asks to store a PAGE_SIZE sized object, what zsmalloc can > > > possible do about this? > > > > zsmalloc can increase ZS_MAX_ZSPAGE_ORDER or can save metadata in > > the extra space. In fact, I tried interlink approach long time ago. > > For example, class-A -> class-B > > > > A = x, B = (4096 - y) >= x > > > > The problem was class->B zspage consumes memory although there is > > no object in the zspage because class-A object in the extra space > > of class-B pin the class-B zspage. > > I thought about it too -- utilizing 'unused space' to store there > smaller objects. and I think it potentially has more problems. > compaction (and everything) seem to be much simpler when we have only > objects of size-X in class_size X. > > > I prefer your ZS_MAX_ZSPAGE_ORDER increaing approach but as I told > > in that thread, we should prepare dynamic creating of sub-page > > in zspage. > > I agree that in general dynamic class page allocation sounds > interesting enough. > > > > > Having said that, I agree your claim that uncompressible pages > > > > are pain. I want to handle the problem as multiple-swap apparoach. > > > > > > zram is not just for swapping. as simple as that. > > > > Yes, I mean if we have backing storage, we could mitigate the problem > > like the mentioned approach. Otherwise, we should solve it in allocator > > itself and you suggested the idea and I commented first step. > > What's the problem, now? > > well, I didn't say I have problems. > so you want a backing device that will keep only 'bad compression' > objects and use zsmalloc to keep there only 'good compression' objects? > IOW, no huge classes in zsmalloc at all? well, that can work out. it's > a bit strange though that to solve zram-zsmalloc issues we would ask > someone to create a additional device. it looks (at least for now) that > we can address those issues in zram-zsmalloc entirely; w/o user > intervention or a 3rd party device. Agree. That's what I want. zram shouldn't be aware of allocator's internal implementation. IOW, zsmalloc should handle it without exposing any internal limitation. Backing device issue is orthogonal but what I said about thing was it could solve the issue too without exposing zsmalloc's limitation to the zram. Let's summary my points in here. Let's make zsmalloc smarter to reduce wasted space. One of option is dynamic page creation which I agreed. Before the feature, we should test how memory footprint is bigger without the feature if we increase ZS_MAX_ZSPAGE_ORDER. If it's not big, we could go with your patch easily without adding more complex stuff(i.e, dynamic page creation). Please, check max_used_pages rather than mem_used_total for seeing memory footprint at the some moment and test very fragmented scenario (creating files and free part of files) rather than just full coping. If memory footprint is high, we can decide to go dynamic page creation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 2:57 ` Minchan Kim @ 2016-02-22 3:54 ` Sergey Senozhatsky 2016-02-22 4:54 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 3:54 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 11:57), Minchan Kim wrote: [..] > > > Yes, I mean if we have backing storage, we could mitigate the problem > > > like the mentioned approach. Otherwise, we should solve it in allocator > > > itself and you suggested the idea and I commented first step. > > > What's the problem, now? > > > > well, I didn't say I have problems. > > so you want a backing device that will keep only 'bad compression' > > objects and use zsmalloc to keep there only 'good compression' objects? > > IOW, no huge classes in zsmalloc at all? well, that can work out. it's > > a bit strange though that to solve zram-zsmalloc issues we would ask > > someone to create a additional device. it looks (at least for now) that > > we can address those issues in zram-zsmalloc entirely; w/o user > > intervention or a 3rd party device. > > Agree. That's what I want. zram shouldn't be aware of allocator's > internal implementation. IOW, zsmalloc should handle it without > exposing any internal limitation. well, at the same time zram must not dictate what to do. zram simply spoils zsmalloc; it does not offer guaranteed good compression, and it does not let zsmalloc to do it's job. zram has only excuses to be the way it is. the existing zram->zsmalloc dependency looks worse than zsmalloc->zram to me. > Backing device issue is orthogonal but what I said about thing > was it could solve the issue too without exposing zsmalloc's > limitation to the zram. well, backing device would not reduce the amount of pages we request. and that's the priority issue, especially if we are talking about embedded system with a low free pages capability. we would just move huge objects from zsmalloc to backing device. other than that we would still request 1000 (for example) pages to store 1000 objects. it's zsmalloc's "page sharing" that permits us to request less than 1000 pages to store 1000 objects. so yes, I agree, increasing ZS_MAX_ZSPAGE_ORDER and do more tests is the step #1 to take. > Let's summary my points in here. > > Let's make zsmalloc smarter to reduce wasted space. One of option is > dynamic page creation which I agreed. > > Before the feature, we should test how memory footprint is bigger > without the feature if we increase ZS_MAX_ZSPAGE_ORDER. > If it's not big, we could go with your patch easily without adding > more complex stuff(i.e, dynamic page creation). yes, agree. alloc_zspage()/init_zspage() and friends must be the last thing to touch. only if increased ZS_MAX_ZSPAGE_ORDER will turn out not to be good enough. > Please, check max_used_pages rather than mem_used_total for seeing > memory footprint at the some moment and test very fragmented scenario > (creating files and free part of files) rather than just full coping. sure, more tests will follow. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 3:54 ` Sergey Senozhatsky @ 2016-02-22 4:54 ` Minchan Kim 2016-02-22 5:05 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 4:54 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 12:54:48PM +0900, Sergey Senozhatsky wrote: > On (02/22/16 11:57), Minchan Kim wrote: > [..] > > > > Yes, I mean if we have backing storage, we could mitigate the problem > > > > like the mentioned approach. Otherwise, we should solve it in allocator > > > > itself and you suggested the idea and I commented first step. > > > > What's the problem, now? > > > > > > well, I didn't say I have problems. > > > so you want a backing device that will keep only 'bad compression' > > > objects and use zsmalloc to keep there only 'good compression' objects? > > > IOW, no huge classes in zsmalloc at all? well, that can work out. it's > > > a bit strange though that to solve zram-zsmalloc issues we would ask > > > someone to create a additional device. it looks (at least for now) that > > > we can address those issues in zram-zsmalloc entirely; w/o user > > > intervention or a 3rd party device. > > > > Agree. That's what I want. zram shouldn't be aware of allocator's > > internal implementation. IOW, zsmalloc should handle it without > > exposing any internal limitation. > > well, at the same time zram must not dictate what to do. zram simply spoils > zsmalloc; it does not offer guaranteed good compression, and it does not let > zsmalloc to do it's job. zram has only excuses to be the way it is. > the existing zram->zsmalloc dependency looks worse than zsmalloc->zram to me. I don't get it why you think it's zram->zsmalloc dependency. I already explained. Here it goes, again. Long time ago, zram(i.e, ramzswap) can fallback incompressible page to backed device if it presents and the size was PAGE_SIZE / 2. IOW, if compress ratio is bad than 50%, zram passes the page to backed storage to make memory efficiency. If zram doesn't have backed storage and compress ratio under 25%(ie, short of memory saving) it store pages as uncompressible for avoiding additional *decompress* overhead. Of course, it's arguable whether memory efficiency VS. CPU consumption so we should handle it as another topic. What I want to say in here is it's not dependency between zram and zsmalloc but it was a zram policy for a long time. If it's not good, we can fix it. > > > Backing device issue is orthogonal but what I said about thing > > was it could solve the issue too without exposing zsmalloc's > > limitation to the zram. > > well, backing device would not reduce the amount of pages we request. > and that's the priority issue, especially if we are talking about > embedded system with a low free pages capability. we would just move huge > objects from zsmalloc to backing device. other than that we would still > request 1000 (for example) pages to store 1000 objects. it's zsmalloc's > "page sharing" that permits us to request less than 1000 pages to store > 1000 objects. > > so yes, I agree, increasing ZS_MAX_ZSPAGE_ORDER and do more tests is > the step #1 to take. > > > Let's summary my points in here. > > > > Let's make zsmalloc smarter to reduce wasted space. One of option is > > dynamic page creation which I agreed. > > > > Before the feature, we should test how memory footprint is bigger > > without the feature if we increase ZS_MAX_ZSPAGE_ORDER. > > If it's not big, we could go with your patch easily without adding > > more complex stuff(i.e, dynamic page creation). > > yes, agree. alloc_zspage()/init_zspage() and friends must be the last > thing to touch. only if increased ZS_MAX_ZSPAGE_ORDER will turn out not > to be good enough. > > > Please, check max_used_pages rather than mem_used_total for seeing > > memory footprint at the some moment and test very fragmented scenario > > (creating files and free part of files) rather than just full coping. > > sure, more tests will follow. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() 2016-02-22 4:54 ` Minchan Kim @ 2016-02-22 5:05 ` Sergey Senozhatsky 0 siblings, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 5:05 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 13:54), Minchan Kim wrote: [..] > > well, at the same time zram must not dictate what to do. zram simply spoils > > zsmalloc; it does not offer guaranteed good compression, and it does not let > > zsmalloc to do it's job. zram has only excuses to be the way it is. > > the existing zram->zsmalloc dependency looks worse than zsmalloc->zram to me. > > I don't get it why you think it's zram->zsmalloc dependency. clearly 'dependency' was simply a wrong word to use, 'enforcement' or 'policy' are better choices here. but you got my point. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-21 13:27 [RFC][PATCH v2 0/3] mm/zsmalloc: increase objects density and reduce memory wastage Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 1/3] mm/zsmalloc: introduce zs_get_huge_class_size_watermark() Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() Sergey Senozhatsky @ 2016-02-21 13:27 ` Sergey Senozhatsky 2016-02-22 0:25 ` Minchan Kim 2 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-21 13:27 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> The existing limit of max 4 pages per zspage sets a tight limit on ->huge classes, which results in increased memory consumption. On x86_64, PAGE_SHIFT 12, ->huge class_size range is 3280-4096. The problem with ->huge classes is that in most of the cases they waste memory, because each ->huge zspage has only one order-0 page and can store only one object. For instance, we store 3408 bytes objects as PAGE_SIZE objects, while in fact each of those objects has 4096 - 3408 = 688 bytes of spare space, so we need to store 5 objects to have enough spare space to save the 6th objects with out requesting a new order-0 page. In general, turning a ->huge class into a normal will save PAGE_SIZE bytes every time "PAGE_SIZE/(PAGE_SIZE - CLASS_SIZE)"-th object is stored. The maximum number of order-0 pages in zspages is limited by ZS_MAX_ZSPAGE_ORDER (zspage can consist of up to 1<<ZS_MAX_ZSPAGE_ORDER pages). Increasing ZS_MAX_ZSPAGE_ORDER permits us to have less ->huge classes, because some of them now can form a 'normal' zspage consisting of several order-0 pages. We can't increase ZS_MAX_ZSPAGE_ORDER on every platform: 32-bit PAE/LPAE and PAGE_SHIFT 16 kernels don't have enough bits left in OBJ_INDEX_BITS. Other than that, we can increase ZS_MAX_ZSPAGE_ORDER to 4. This will change the ->huge classes range (on PAGE_SHIFT 12 systems) from 3280-4096 to 3856-4096. This will increase density and reduce memory wastage/usage. TESTS (ZS_MAX_ZSPAGE_ORDER 4) ============================= showing only bottom of /sys/kernel/debug/zsmalloc/zram0/classes class size almost_full almost_empty obj_allocated obj_used pages_used ======================================================================== 1) compile glibc -j8 BASE ... 168 2720 0 14 4500 4479 3000 190 3072 0 15 3016 2986 2262 202 3264 2 2 70 61 56 254 4096 0 0 40213 40213 40213 Total 63 247 155676 153957 74955 PATCHED ... 191 3088 1 1 130 116 100 192 3104 1 1 119 103 91 194 3136 1 1 260 254 200 197 3184 0 3 522 503 406 199 3216 2 3 350 320 275 200 3232 0 2 114 93 90 202 3264 2 2 210 202 168 206 3328 1 5 464 418 377 207 3344 1 2 121 108 99 208 3360 0 3 153 119 126 211 3408 2 4 360 341 300 212 3424 1 2 133 112 112 214 3456 0 2 182 170 154 217 3504 0 4 217 200 186 219 3536 0 3 135 108 117 222 3584 0 3 144 132 126 223 3600 1 1 51 35 45 225 3632 1 2 108 99 96 228 3680 0 2 140 129 126 230 3712 0 3 110 94 100 232 3744 1 2 132 113 121 234 3776 1 2 143 128 132 235 3792 0 3 112 81 104 236 3808 0 2 75 62 70 238 3840 0 2 112 91 105 254 4096 0 0 36112 36112 36112 Total 127 228 158342 154050 73884 == Consumed 74955-73884 = 1071 less order-0 pages. 2) copy linux-next directory (with object files, 2.5G) BASE ... 190 3072 0 1 9092 9091 6819 202 3264 0 0 240 240 192 254 4096 0 0 360304 360304 360304 Total 34 83 687545 686443 480962 PATCHED ... 191 3088 0 1 455 449 350 192 3104 1 0 425 421 325 194 3136 1 0 936 935 720 197 3184 0 1 1539 1532 1197 199 3216 0 1 1148 1142 902 200 3232 0 1 570 560 450 202 3264 1 0 1245 1244 996 206 3328 0 1 2896 2887 2353 207 3344 0 0 825 825 675 208 3360 0 1 850 845 700 211 3408 0 1 2694 2692 2245 212 3424 0 1 931 922 784 214 3456 1 0 1924 1923 1628 217 3504 0 0 2968 2968 2544 219 3536 0 1 2220 2209 1924 222 3584 0 1 3120 3114 2730 223 3600 0 1 1088 1081 960 225 3632 0 1 2133 2130 1896 228 3680 0 1 3340 3334 3006 230 3712 0 1 2035 2025 1850 232 3744 0 1 1980 1972 1815 234 3776 0 1 2015 2009 1860 235 3792 0 1 1022 1013 949 236 3808 1 0 960 958 896 238 3840 0 0 1968 1968 1845 254 4096 0 0 319370 319370 319370 Total 71 137 687877 684436 471265 Consumed 480962 - 471265 = 9697 less order-0 pages. 3) Run a test script (storing text files of various sizes, binary files of various sizes) cat /sys/block/zram0/mm_stat column 3 is zs_get_total_pages() << PAGE_SHIFT BASE 614477824 425627436 436678656 0 436678656 539608 0 1 614526976 425709397 436813824 0 436813824 539580 0 1 614502400 425694649 436719616 0 436719616 539585 0 1 614510592 425658934 436723712 0 436723712 539583 0 1 614477824 425685915 436740096 0 436740096 539589 0 1 PATCHED 614543360 387655040 395124736 0 395124736 539577 0 1 614445056 387667599 395206656 0 395206656 539614 0 1 614477824 387686121 395059200 0 395059200 539589 0 1 614461440 387748115 395075584 0 395075584 539592 0 1 614486016 387670405 395022336 0 395022336 539588 0 1 == Consumed around 39MB less memory. P.S. on x86_64, minimum LZO compressed buffer size seems to be around 44 bytes. zsmalloc adds ZS_HANDLE_SIZE (sizeof(unsigned long)) to the object's size in zs_malloc(). Thus, 32 bytes and 48 bytes classes are unreachable by LZO on x86_64 PAGE_SHIFT 12 platforms. LZ4, however, seems to have a minimum compressed buffer size around 26 bytes. So, once again, on x86_64, 32 bytes class is unreachable, but we need to keep 48 bytes size class. In he worst case, in theory, if we ever run out of bits in OBJ_INDEX_BITS we can drop 32 bytes and (well, with some consideration) 48 bytes classes, IOW, do ZS_MIN_ALLOC_SIZE << 1. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- mm/zsmalloc.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index e7f10bd..ab9ed8f 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -73,13 +73,6 @@ */ #define ZS_ALIGN 8 -/* - * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single) - * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N. - */ -#define ZS_MAX_ZSPAGE_ORDER 2 -#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) - #define ZS_HANDLE_SIZE (sizeof(unsigned long)) /* @@ -96,6 +89,7 @@ #ifndef MAX_PHYSMEM_BITS #ifdef CONFIG_HIGHMEM64G #define MAX_PHYSMEM_BITS 36 +#define ZS_MAX_ZSPAGE_ORDER 2 #else /* !CONFIG_HIGHMEM64G */ /* * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just @@ -104,9 +98,30 @@ #define MAX_PHYSMEM_BITS BITS_PER_LONG #endif #endif + #define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) /* + * We don't have enough bits in OBJ_INDEX_BITS on HIGHMEM64G and + * PAGE_SHIFT 16 systems to have huge ZS_MAX_ZSPAGE_ORDER there. + * This will significantly increase ZS_MIN_ALLOC_SIZE and drop a + * number of important (frequently used in general) size classes. + */ +#if PAGE_SHIFT > 14 +#define ZS_MAX_ZSPAGE_ORDER 2 +#endif + +#ifndef ZS_MAX_ZSPAGE_ORDER +#define ZS_MAX_ZSPAGE_ORDER 4 +#endif + +/* + * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single) + * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N. + */ +#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) + +/* * Memory for allocating for handle keeps object position by * encoding <page, obj_idx> and the encoded value has a room * in least bit(ie, look at obj_to_location). -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-21 13:27 ` [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE Sergey Senozhatsky @ 2016-02-22 0:25 ` Minchan Kim 2016-02-22 0:47 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 0:25 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky On Sun, Feb 21, 2016 at 10:27:54PM +0900, Sergey Senozhatsky wrote: > From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> > > The existing limit of max 4 pages per zspage sets a tight limit > on ->huge classes, which results in increased memory consumption. > > On x86_64, PAGE_SHIFT 12, ->huge class_size range is 3280-4096. > The problem with ->huge classes is that in most of the cases they > waste memory, because each ->huge zspage has only one order-0 page > and can store only one object. > > For instance, we store 3408 bytes objects as PAGE_SIZE objects, > while in fact each of those objects has 4096 - 3408 = 688 bytes > of spare space, so we need to store 5 objects to have enough spare > space to save the 6th objects with out requesting a new order-0 page. > In general, turning a ->huge class into a normal will save PAGE_SIZE > bytes every time "PAGE_SIZE/(PAGE_SIZE - CLASS_SIZE)"-th object is > stored. > > The maximum number of order-0 pages in zspages is limited by > ZS_MAX_ZSPAGE_ORDER (zspage can consist of up to 1<<ZS_MAX_ZSPAGE_ORDER > pages). Increasing ZS_MAX_ZSPAGE_ORDER permits us to have less ->huge > classes, because some of them now can form a 'normal' zspage consisting > of several order-0 pages. > > We can't increase ZS_MAX_ZSPAGE_ORDER on every platform: 32-bit > PAE/LPAE and PAGE_SHIFT 16 kernels don't have enough bits left in > OBJ_INDEX_BITS. Other than that, we can increase ZS_MAX_ZSPAGE_ORDER > to 4. This will change the ->huge classes range (on PAGE_SHIFT 12 > systems) from 3280-4096 to 3856-4096. This will increase density > and reduce memory wastage/usage. I tempted it several times with same reason you pointed out. But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can consume more memory because we need several pages chain to populate just a object. Even, at that time, we didn't have compaction scheme so fragmentation of object in zspage is huge pain to waste memory. Now, we have compaction facility so fragment of object might not be a severe problem but still painful to allocate 16 pages to store 3408 byte. So, if we want to increase ZS_MAX_ZSPAGE_ORDER, first of all, we should prepare dynamic creating of sub-page of zspage, I think and more smart compaction to minimize wasted memory. > > TESTS (ZS_MAX_ZSPAGE_ORDER 4) > ============================= > > showing only bottom of /sys/kernel/debug/zsmalloc/zram0/classes > > class size almost_full almost_empty obj_allocated obj_used pages_used > ======================================================================== > > 1) compile glibc -j8 > > BASE > ... > 168 2720 0 14 4500 4479 3000 > 190 3072 0 15 3016 2986 2262 > 202 3264 2 2 70 61 56 > 254 4096 0 0 40213 40213 40213 > > Total 63 247 155676 153957 74955 > > PATCHED > ... > 191 3088 1 1 130 116 100 > 192 3104 1 1 119 103 91 > 194 3136 1 1 260 254 200 > 197 3184 0 3 522 503 406 > 199 3216 2 3 350 320 275 > 200 3232 0 2 114 93 90 > 202 3264 2 2 210 202 168 > 206 3328 1 5 464 418 377 > 207 3344 1 2 121 108 99 > 208 3360 0 3 153 119 126 > 211 3408 2 4 360 341 300 > 212 3424 1 2 133 112 112 > 214 3456 0 2 182 170 154 > 217 3504 0 4 217 200 186 > 219 3536 0 3 135 108 117 > 222 3584 0 3 144 132 126 > 223 3600 1 1 51 35 45 > 225 3632 1 2 108 99 96 > 228 3680 0 2 140 129 126 > 230 3712 0 3 110 94 100 > 232 3744 1 2 132 113 121 > 234 3776 1 2 143 128 132 > 235 3792 0 3 112 81 104 > 236 3808 0 2 75 62 70 > 238 3840 0 2 112 91 105 > 254 4096 0 0 36112 36112 36112 > > Total 127 228 158342 154050 73884 > > == Consumed 74955-73884 = 1071 less order-0 pages. > > 2) copy linux-next directory (with object files, 2.5G) > > BASE > ... > 190 3072 0 1 9092 9091 6819 > 202 3264 0 0 240 240 192 > 254 4096 0 0 360304 360304 360304 > > Total 34 83 687545 686443 480962 > > PATCHED > ... > 191 3088 0 1 455 449 350 > 192 3104 1 0 425 421 325 > 194 3136 1 0 936 935 720 > 197 3184 0 1 1539 1532 1197 > 199 3216 0 1 1148 1142 902 > 200 3232 0 1 570 560 450 > 202 3264 1 0 1245 1244 996 > 206 3328 0 1 2896 2887 2353 > 207 3344 0 0 825 825 675 > 208 3360 0 1 850 845 700 > 211 3408 0 1 2694 2692 2245 > 212 3424 0 1 931 922 784 > 214 3456 1 0 1924 1923 1628 > 217 3504 0 0 2968 2968 2544 > 219 3536 0 1 2220 2209 1924 > 222 3584 0 1 3120 3114 2730 > 223 3600 0 1 1088 1081 960 > 225 3632 0 1 2133 2130 1896 > 228 3680 0 1 3340 3334 3006 > 230 3712 0 1 2035 2025 1850 > 232 3744 0 1 1980 1972 1815 > 234 3776 0 1 2015 2009 1860 > 235 3792 0 1 1022 1013 949 > 236 3808 1 0 960 958 896 > 238 3840 0 0 1968 1968 1845 > 254 4096 0 0 319370 319370 319370 > > Total 71 137 687877 684436 471265 > > Consumed 480962 - 471265 = 9697 less order-0 pages. > > 3) Run a test script (storing text files of various sizes, binary files > of various sizes) > > cat /sys/block/zram0/mm_stat column 3 is zs_get_total_pages() << PAGE_SHIFT > > BASE > 614477824 425627436 436678656 0 436678656 539608 0 1 > 614526976 425709397 436813824 0 436813824 539580 0 1 > 614502400 425694649 436719616 0 436719616 539585 0 1 > 614510592 425658934 436723712 0 436723712 539583 0 1 > 614477824 425685915 436740096 0 436740096 539589 0 1 > > PATCHED > 614543360 387655040 395124736 0 395124736 539577 0 1 > 614445056 387667599 395206656 0 395206656 539614 0 1 > 614477824 387686121 395059200 0 395059200 539589 0 1 > 614461440 387748115 395075584 0 395075584 539592 0 1 > 614486016 387670405 395022336 0 395022336 539588 0 1 > > == Consumed around 39MB less memory. > > P.S. on x86_64, minimum LZO compressed buffer size seems to be around 44 > bytes. zsmalloc adds ZS_HANDLE_SIZE (sizeof(unsigned long)) to the object's > size in zs_malloc(). Thus, 32 bytes and 48 bytes classes are unreachable by > LZO on x86_64 PAGE_SHIFT 12 platforms. LZ4, however, seems to have a minimum > compressed buffer size around 26 bytes. So, once again, on x86_64, 32 bytes > class is unreachable, but we need to keep 48 bytes size class. In he worst > case, in theory, if we ever run out of bits in OBJ_INDEX_BITS we can drop 32 > bytes and (well, with some consideration) 48 bytes classes, IOW, do > ZS_MIN_ALLOC_SIZE << 1. > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > --- > mm/zsmalloc.c | 29 ++++++++++++++++++++++------- > 1 file changed, 22 insertions(+), 7 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index e7f10bd..ab9ed8f 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -73,13 +73,6 @@ > */ > #define ZS_ALIGN 8 > > -/* > - * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single) > - * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N. > - */ > -#define ZS_MAX_ZSPAGE_ORDER 2 > -#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) > - > #define ZS_HANDLE_SIZE (sizeof(unsigned long)) > > /* > @@ -96,6 +89,7 @@ > #ifndef MAX_PHYSMEM_BITS > #ifdef CONFIG_HIGHMEM64G > #define MAX_PHYSMEM_BITS 36 > +#define ZS_MAX_ZSPAGE_ORDER 2 > #else /* !CONFIG_HIGHMEM64G */ > /* > * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just > @@ -104,9 +98,30 @@ > #define MAX_PHYSMEM_BITS BITS_PER_LONG > #endif > #endif > + > #define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) > > /* > + * We don't have enough bits in OBJ_INDEX_BITS on HIGHMEM64G and > + * PAGE_SHIFT 16 systems to have huge ZS_MAX_ZSPAGE_ORDER there. > + * This will significantly increase ZS_MIN_ALLOC_SIZE and drop a > + * number of important (frequently used in general) size classes. > + */ > +#if PAGE_SHIFT > 14 > +#define ZS_MAX_ZSPAGE_ORDER 2 > +#endif > + > +#ifndef ZS_MAX_ZSPAGE_ORDER > +#define ZS_MAX_ZSPAGE_ORDER 4 > +#endif > + > +/* > + * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single) > + * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N. > + */ > +#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) > + > +/* > * Memory for allocating for handle keeps object position by > * encoding <page, obj_idx> and the encoded value has a room > * in least bit(ie, look at obj_to_location). > -- > 2.7.1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 0:25 ` Minchan Kim @ 2016-02-22 0:47 ` Sergey Senozhatsky 2016-02-22 1:34 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 0:47 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel, Sergey Senozhatsky On (02/22/16 09:25), Minchan Kim wrote: [..] > I tempted it several times with same reason you pointed out. > But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can > consume more memory because we need several pages chain to populate > just a object. Even, at that time, we didn't have compaction scheme > so fragmentation of object in zspage is huge pain to waste memory. well, the thing is -- we end up requesting less pages after all, so zsmalloc has better chances to survive. for example, gcc5 compilation test BASE 168 2720 0 1 115833 115831 77222 2 190 3072 0 1 109708 109707 82281 3 202 3264 0 5 1910 1895 1528 4 254 4096 0 0 380174 380174 380174 1 Total 44 285 1621495 1618234 891703 PATCHED 192 3104 1 0 3740 3737 2860 13 194 3136 0 1 7215 7208 5550 10 197 3184 1 0 11151 11150 8673 7 199 3216 0 1 9310 9304 7315 11 200 3232 0 1 4731 4717 3735 15 202 3264 0 1 8400 8396 6720 4 206 3328 0 1 22064 22051 17927 13 207 3344 0 1 4884 4877 3996 9 208 3360 0 1 4420 4415 3640 14 211 3408 0 1 11250 11246 9375 5 212 3424 1 0 3344 3343 2816 16 214 3456 0 2 7345 7329 6215 11 217 3504 0 1 10801 10797 9258 6 219 3536 0 1 5295 5289 4589 13 222 3584 0 0 6008 6008 5257 7 223 3600 0 1 1530 1518 1350 15 225 3632 0 1 3519 3514 3128 8 228 3680 0 1 3990 3985 3591 9 230 3712 0 2 2167 2151 1970 10 232 3744 1 2 1848 1835 1694 11 234 3776 0 2 1404 1384 1296 12 235 3792 0 2 672 654 624 13 236 3808 1 2 615 592 574 14 238 3840 1 2 1120 1098 1050 15 254 4096 0 0 241824 241824 241824 1 Total 129 489 1627756 1618193 850147 that's 891703 - 850147 = 41556 less pages. or 162MB less memory used. 41556 less pages means that zsmalloc had 41556 less chances to fail. > Now, we have compaction facility so fragment of object might not > be a severe problem but still painful to allocate 16 pages to store > 3408 byte. So, if we want to increase ZS_MAX_ZSPAGE_ORDER, > first of all, we should prepare dynamic creating of sub-page of > zspage, I think and more smart compaction to minimize wasted memory. well, I agree, but given that we allocate less pages, do we really want to introduce this complexity at this point? -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 0:47 ` Sergey Senozhatsky @ 2016-02-22 1:34 ` Minchan Kim 2016-02-22 2:01 ` Sergey Senozhatsky 2016-02-22 2:24 ` Sergey Senozhatsky 0 siblings, 2 replies; 26+ messages in thread From: Minchan Kim @ 2016-02-22 1:34 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 09:47:58AM +0900, Sergey Senozhatsky wrote: > On (02/22/16 09:25), Minchan Kim wrote: > [..] > > I tempted it several times with same reason you pointed out. > > But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can > > consume more memory because we need several pages chain to populate > > just a object. Even, at that time, we didn't have compaction scheme > > so fragmentation of object in zspage is huge pain to waste memory. > > well, the thing is -- we end up requesting less pages after all, so > zsmalloc has better chances to survive. for example, gcc5 compilation test Indeed. I saw your test result. > > BASE > > 168 2720 0 1 115833 115831 77222 2 > 190 3072 0 1 109708 109707 82281 3 > 202 3264 0 5 1910 1895 1528 4 > 254 4096 0 0 380174 380174 380174 1 > > Total 44 285 1621495 1618234 891703 > > > PATCHED > > 192 3104 1 0 3740 3737 2860 13 > 194 3136 0 1 7215 7208 5550 10 > 197 3184 1 0 11151 11150 8673 7 > 199 3216 0 1 9310 9304 7315 11 > 200 3232 0 1 4731 4717 3735 15 > 202 3264 0 1 8400 8396 6720 4 > 206 3328 0 1 22064 22051 17927 13 > 207 3344 0 1 4884 4877 3996 9 > 208 3360 0 1 4420 4415 3640 14 > 211 3408 0 1 11250 11246 9375 5 > 212 3424 1 0 3344 3343 2816 16 > 214 3456 0 2 7345 7329 6215 11 > 217 3504 0 1 10801 10797 9258 6 > 219 3536 0 1 5295 5289 4589 13 > 222 3584 0 0 6008 6008 5257 7 > 223 3600 0 1 1530 1518 1350 15 > 225 3632 0 1 3519 3514 3128 8 > 228 3680 0 1 3990 3985 3591 9 > 230 3712 0 2 2167 2151 1970 10 > 232 3744 1 2 1848 1835 1694 11 > 234 3776 0 2 1404 1384 1296 12 > 235 3792 0 2 672 654 624 13 > 236 3808 1 2 615 592 574 14 > 238 3840 1 2 1120 1098 1050 15 > 254 4096 0 0 241824 241824 241824 1 > > Total 129 489 1627756 1618193 850147 > > > that's 891703 - 850147 = 41556 less pages. or 162MB less memory used. > 41556 less pages means that zsmalloc had 41556 less chances to fail. Let's think swap-case which is more important for zram now. As you know, most of usecase are swap in embedded world. Do we really need 16 pages allocator for just less PAGE_SIZE objet at the moment which is really heavy memory pressure? > > > > Now, we have compaction facility so fragment of object might not > > be a severe problem but still painful to allocate 16 pages to store > > 3408 byte. So, if we want to increase ZS_MAX_ZSPAGE_ORDER, > > first of all, we should prepare dynamic creating of sub-page of > > zspage, I think and more smart compaction to minimize wasted memory. > > well, I agree, but given that we allocate less pages, do we really want to > introduce this complexity at this point? I agree with you. Before dynamic subpage chaining feature, we need lots of testing in heavy memory pressure with zram-swap. However, I think the feature itself is good to have in the future. :) > > -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 1:34 ` Minchan Kim @ 2016-02-22 2:01 ` Sergey Senozhatsky 2016-02-22 2:34 ` Minchan Kim 2016-02-22 2:24 ` Sergey Senozhatsky 1 sibling, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 2:01 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 10:34), Minchan Kim wrote: [..] > > > I tempted it several times with same reason you pointed out. > > > But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can > > > consume more memory because we need several pages chain to populate > > > just a object. Even, at that time, we didn't have compaction scheme > > > so fragmentation of object in zspage is huge pain to waste memory. > > > > well, the thing is -- we end up requesting less pages after all, so > > zsmalloc has better chances to survive. for example, gcc5 compilation test > > Indeed. I saw your test result. [..] > > Total 129 489 1627756 1618193 850147 > > > > > > that's 891703 - 850147 = 41556 less pages. or 162MB less memory used. > > 41556 less pages means that zsmalloc had 41556 less chances to fail. > > > Let's think swap-case which is more important for zram now. As you know, > most of usecase are swap in embedded world. > Do we really need 16 pages allocator for just less PAGE_SIZE objet > at the moment which is really heavy memory pressure? I'll take a look at dynamic class page addition. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 2:01 ` Sergey Senozhatsky @ 2016-02-22 2:34 ` Minchan Kim 2016-02-22 3:59 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 2:34 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 11:01:13AM +0900, Sergey Senozhatsky wrote: > On (02/22/16 10:34), Minchan Kim wrote: > [..] > > > > I tempted it several times with same reason you pointed out. > > > > But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can > > > > consume more memory because we need several pages chain to populate > > > > just a object. Even, at that time, we didn't have compaction scheme > > > > so fragmentation of object in zspage is huge pain to waste memory. > > > > > > well, the thing is -- we end up requesting less pages after all, so > > > zsmalloc has better chances to survive. for example, gcc5 compilation test > > > > Indeed. I saw your test result. > > > [..] > > > Total 129 489 1627756 1618193 850147 > > > > > > > > > that's 891703 - 850147 = 41556 less pages. or 162MB less memory used. > > > 41556 less pages means that zsmalloc had 41556 less chances to fail. > > > > > > Let's think swap-case which is more important for zram now. As you know, > > most of usecase are swap in embedded world. > > Do we really need 16 pages allocator for just less PAGE_SIZE objet > > at the moment which is really heavy memory pressure? > > I'll take a look at dynamic class page addition. Thanks, Sergey. Just a note: I am preparing zsmalloc migration now and almost done so I hope I can send it within two weeks. In there, I changed a lot of things in zsmalloc, page chaining, struct page fields usecases and locking scheme and so on. The zsmalloc fragment/migration is really painful now so we should solve it first so I hope you help to review that and let's go further dynamic chaining after that, please. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 2:34 ` Minchan Kim @ 2016-02-22 3:59 ` Sergey Senozhatsky 2016-02-22 4:41 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 3:59 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 11:34), Minchan Kim wrote: [..] > > I'll take a look at dynamic class page addition. > > Thanks, Sergey. > > Just a note: > > I am preparing zsmalloc migration now and almost done so I hope > I can send it within two weeks. In there, I changed a lot of > things in zsmalloc, page chaining, struct page fields usecases > and locking scheme and so on. The zsmalloc fragment/migration > is really painful now so we should solve it first so I hope > you help to review that and let's go further dynamic chaining > after that, please. :) oh, sure. so let's keep dynamic page allocation out of sight for now. I'll do more tests with the increase ORDER and if it's OK then hopefully we can just merge it, it's quite simple and shouldn't interfere with any of the changes you are about to introduce. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 3:59 ` Sergey Senozhatsky @ 2016-02-22 4:41 ` Minchan Kim 2016-02-22 10:43 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-22 4:41 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 12:59:54PM +0900, Sergey Senozhatsky wrote: > On (02/22/16 11:34), Minchan Kim wrote: > [..] > > > I'll take a look at dynamic class page addition. > > > > Thanks, Sergey. > > > > Just a note: > > > > I am preparing zsmalloc migration now and almost done so I hope > > I can send it within two weeks. In there, I changed a lot of > > things in zsmalloc, page chaining, struct page fields usecases > > and locking scheme and so on. The zsmalloc fragment/migration > > is really painful now so we should solve it first so I hope > > you help to review that and let's go further dynamic chaining > > after that, please. :) > > oh, sure. > > so let's keep dynamic page allocation out of sight for now. > I'll do more tests with the increase ORDER and if it's OK then > hopefully we can just merge it, it's quite simple and shouldn't > interfere with any of the changes you are about to introduce. Thanks. And as another idea, we could try fallback approach that we couldn't meet nr_pages to minimize wastage so let's fallback to order-0 page like as-is. It will enhance, at least than now with small-amount of code compared to dynmaic page allocation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 4:41 ` Minchan Kim @ 2016-02-22 10:43 ` Sergey Senozhatsky 2016-02-23 8:25 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 10:43 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 13:41), Minchan Kim wrote: [..] > > oh, sure. > > > > so let's keep dynamic page allocation out of sight for now. > > I'll do more tests with the increase ORDER and if it's OK then > > hopefully we can just merge it, it's quite simple and shouldn't > > interfere with any of the changes you are about to introduce. > > Thanks. > > And as another idea, we could try fallback approach that > we couldn't meet nr_pages to minimize wastage so let's fallback > to order-0 page like as-is. It will enhance, at least than now > with small-amount of code compared to dynmaic page allocation. speaking of fallback, with bigger ZS_MAX_ZSPAGE_ORDER 'normal' classes also become bigger. PATCHED 6 128 0 1 96 78 3 1 7 144 0 1 256 104 9 9 8 160 0 1 128 80 5 5 9 176 0 1 256 78 11 11 10 192 1 1 128 99 6 3 11 208 0 1 256 52 13 13 12 224 1 1 512 472 28 7 13 240 0 1 256 70 15 15 14 256 1 1 64 49 4 1 15 272 0 1 60 48 4 1 BASE 6 128 0 1 96 83 3 1 7 144 0 1 170 113 6 3 8 160 0 1 102 72 4 2 9 176 1 0 93 75 4 4 10 192 0 1 128 104 6 3 11 208 1 1 78 52 4 2 12 224 1 1 511 475 28 4 13 240 1 1 85 73 5 1 14 256 1 1 64 53 4 1 15 272 1 0 45 43 3 1 _techically_, zsmalloc is correct. for instance, in 11 pages we can store 4096 * 11 / 176 == 256 objects. 256 * 176 == 45056, which is 4096 * 11. so if zspage for class_size 176 will contain 11 order-0 pages, we can count on 0 bytes of unused space once zspage will become ZS_FULL. but it's ugly, because I think this will introduce bigger internal fragmentation, which, in some cases, can be handled by compaction, but I'd prefer to touch only ->huge classes and keep the existing behaviour for normal classes. so I'm currently thinking of doing something like this #define ZS_MAX_ZSPAGE_ORDER 2 #define ZS_MAX_HUGE_ZSPAGE_ORDER 4 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) #define ZS_MAX_PAGES_PER_HUGE_ZSPAGE (_AC(1, UL) << ZS_MAX_HUGE_ZSPAGE_ORDER) so, normal classes have ORDER of 2. huge classes, however, as a fallback, can grow up to ZS_MAX_HUGE_ZSPAGE_ORDER pages. extend only ->huge classes: pages == 1 && get_maxobj_per_zspage(class_size, pages) == 1. like this: static int __get_pages_per_zspage(int class_size, int max_pages) { int i, max_usedpc = 0; /* zspage order which gives maximum used size per KB */ int max_usedpc_order = 1; for (i = 1; i <= max_pages; i++) { int zspage_size; int waste, usedpc; zspage_size = i * PAGE_SIZE; waste = zspage_size % class_size; usedpc = (zspage_size - waste) * 100 / zspage_size; if (usedpc > max_usedpc) { max_usedpc = usedpc; max_usedpc_order = i; } } return max_usedpc_order; } static int get_pages_per_zspage(int class_size) { /* normal class first */ int pages = __get_pages_per_zspage(class_size, ZS_MAX_PAGES_PER_ZSPAGE); /* test if the class is ->huge and try to turn it into a normal one */ if (pages == 1 && get_maxobj_per_zspage(class_size, pages) == 1) { pages = __get_pages_per_zspage(class_size, ZS_MAX_PAGES_PER_HUGE_ZSPAGE); } return pages; } -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 10:43 ` Sergey Senozhatsky @ 2016-02-23 8:25 ` Minchan Kim 2016-02-23 10:35 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-23 8:25 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Mon, Feb 22, 2016 at 07:43:25PM +0900, Sergey Senozhatsky wrote: > On (02/22/16 13:41), Minchan Kim wrote: > [..] > > > oh, sure. > > > > > > so let's keep dynamic page allocation out of sight for now. > > > I'll do more tests with the increase ORDER and if it's OK then > > > hopefully we can just merge it, it's quite simple and shouldn't > > > interfere with any of the changes you are about to introduce. > > > > Thanks. > > > > And as another idea, we could try fallback approach that > > we couldn't meet nr_pages to minimize wastage so let's fallback > > to order-0 page like as-is. It will enhance, at least than now > > with small-amount of code compared to dynmaic page allocation. > > > speaking of fallback, > with bigger ZS_MAX_ZSPAGE_ORDER 'normal' classes also become bigger. > > PATCHED > > 6 128 0 1 96 78 3 1 > 7 144 0 1 256 104 9 9 > 8 160 0 1 128 80 5 5 > 9 176 0 1 256 78 11 11 > 10 192 1 1 128 99 6 3 > 11 208 0 1 256 52 13 13 > 12 224 1 1 512 472 28 7 > 13 240 0 1 256 70 15 15 > 14 256 1 1 64 49 4 1 > 15 272 0 1 60 48 4 1 > > > BASE > > 6 128 0 1 96 83 3 1 > 7 144 0 1 170 113 6 3 > 8 160 0 1 102 72 4 2 > 9 176 1 0 93 75 4 4 > 10 192 0 1 128 104 6 3 > 11 208 1 1 78 52 4 2 > 12 224 1 1 511 475 28 4 > 13 240 1 1 85 73 5 1 > 14 256 1 1 64 53 4 1 > 15 272 1 0 45 43 3 1 > > > _techically_, zsmalloc is correct. > for instance, in 11 pages we can store 4096 * 11 / 176 == 256 objects. > 256 * 176 == 45056, which is 4096 * 11. so if zspage for class_size 176 will contain 11 > order-0 pages, we can count on 0 bytes of unused space once zspage will become ZS_FULL. > > but it's ugly, because I think this will introduce bigger internal fragmentation, which, > in some cases, can be handled by compaction, but I'd prefer to touch only ->huge classes > and keep the existing behaviour for normal classes. > > so I'm currently thinking of doing something like this > > #define ZS_MAX_ZSPAGE_ORDER 2 > #define ZS_MAX_HUGE_ZSPAGE_ORDER 4 > #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) > #define ZS_MAX_PAGES_PER_HUGE_ZSPAGE (_AC(1, UL) << ZS_MAX_HUGE_ZSPAGE_ORDER) > > > so, normal classes have ORDER of 2. huge classes, however, as a fallback, can grow > up to ZS_MAX_HUGE_ZSPAGE_ORDER pages. > > > extend only ->huge classes: pages == 1 && get_maxobj_per_zspage(class_size, pages) == 1. > > like this: > > static int __get_pages_per_zspage(int class_size, int max_pages) > { > int i, max_usedpc = 0; > /* zspage order which gives maximum used size per KB */ > int max_usedpc_order = 1; > > for (i = 1; i <= max_pages; i++) { > int zspage_size; > int waste, usedpc; > > zspage_size = i * PAGE_SIZE; > waste = zspage_size % class_size; > usedpc = (zspage_size - waste) * 100 / zspage_size; > > if (usedpc > max_usedpc) { > max_usedpc = usedpc; > max_usedpc_order = i; > } > } > > return max_usedpc_order; > } > > static int get_pages_per_zspage(int class_size) > { > /* normal class first */ > int pages = __get_pages_per_zspage(class_size, > ZS_MAX_PAGES_PER_ZSPAGE); > > /* test if the class is ->huge and try to turn it into a normal one */ > if (pages == 1 && > get_maxobj_per_zspage(class_size, pages) == 1) { > pages = __get_pages_per_zspage(class_size, > ZS_MAX_PAGES_PER_HUGE_ZSPAGE); > } > > return pages; > } > That sounds like a plan but at a first glance, my worry is we might need some special handling related to objs_per_zspage and pages_per_zspage because currently, we have assumed all of zspages in a class has same number of subpages so it might make it ugly. Hmm, at least, I need to check code how it makes ugly. If you think it's not trouble, please send a patch. As well, please write down why order-4 for MAX_ZSPAGES is best if you resend it as formal patch. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-23 8:25 ` Minchan Kim @ 2016-02-23 10:35 ` Sergey Senozhatsky 2016-02-23 16:05 ` Minchan Kim 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-23 10:35 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/23/16 17:25), Minchan Kim wrote: [..] > > That sounds like a plan but at a first glance, my worry is we might need > some special handling related to objs_per_zspage and pages_per_zspage > because currently, we have assumed all of zspages in a class has same > number of subpages so it might make it ugly. I did some further testing, and something has showed up that I want to discuss before we go with ORDER4 (here and later ORDER4 stands for `#define ZS_MAX_HUGE_ZSPAGE_ORDER 4' for simplicity). /* * for testing purposes I have extended zsmalloc pool stats with zs_can_compact() value. * see below */ And the thing is -- quite huge internal class fragmentation. These are the 'normal' classes, not affected by ORDER modification in any way: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage compact 107 1744 1 23 196 76 84 3 51 111 1808 0 0 63 63 28 4 0 126 2048 0 160 568 408 284 1 80 144 2336 52 620 8631 5747 4932 4 1648 151 2448 123 406 10090 8736 6054 3 810 168 2720 0 512 15738 14926 10492 2 540 190 3072 0 2 136 130 102 3 3 so I've been thinking about using some sort of watermaks (well, zsmalloc is an allocator after all, allocators love watermarks :-)). we can't defeat this fragmentation, we never know in advance which of the pages will be modified or we the size class those pages will land after compression. but we know stats for every class -- zs_can_compact(), obj_allocated/obj_used, etc. so we can start class compaction if we detect that internal fragmentation is too high (e.g. 30+% of class pages can be compacted). on the other hand, we always can wait for the shrinker to come in and do the job for us, but that can take some time. what's your opinion on this? The test. 1) create 2G zram, ext4, lzo, device 2) create 1G of text files, 1G of binary files -- the last part is tricky. binary files in general already imply some sort of compression, so the chances that binary files will just pressure 4096 class are very high. in my test I use vmscan.c as a text file, and vmlinux as a binary file: seems to fit perfect, it warm ups all of the "ex-huge" classes on my system: 202 3264 1 0 17820 17819 14256 4 0 206 3328 0 1 10096 10087 8203 13 0 207 3344 0 1 3212 3206 2628 9 0 208 3360 0 1 1785 1779 1470 14 0 211 3408 0 0 10662 10662 8885 5 0 212 3424 0 1 1881 1876 1584 16 0 214 3456 0 1 5174 5170 4378 11 0 217 3504 0 0 6181 6181 5298 6 0 219 3536 0 1 4410 4406 3822 13 0 222 3584 0 1 5224 5220 4571 7 0 223 3600 0 1 952 946 840 15 0 225 3632 1 0 1638 1636 1456 8 0 228 3680 0 1 1410 1403 1269 9 0 230 3712 1 0 462 461 420 10 0 232 3744 0 1 528 519 484 11 0 234 3776 0 1 559 554 516 12 0 235 3792 0 1 70 57 65 13 0 236 3808 1 0 105 104 98 14 0 238 3840 0 1 176 166 165 15 0 254 4096 0 0 1944 1944 1944 1 0 3) MAIN-test: for j in {2..10}; do create_test_files truncate_bin_files $j truncate_text_files $j remove_test_files done so it creates text and binary files, truncates them, removes, and does the whole thing again. the truncation is 1/2, 1/3 ... 1/10 of then original file size. the order of file modifications is preserved across all of the tests. 4) SUB-test (gzipped files pressure 4096 class mostly, but I decided to keep it) `gzip -9' all text files create file copy for every gzipped file "cp FOO.gz FOO", so `gzip -d' later has to overwrite FOO file content `gzip -d' all text files 5) goto 1 I'll just post a shorter version of the results (two columns from zram's mm_stat: total_used_mem / max_used_mem) #1 BASE ORDER4 INITIAL STATE 1016832000 / 1016832000 968470528 / 968470528 TRUNCATE BIN 1/2 715878400 / 1017081856 744165376 / 968691712 TRUNCATE TEXT 1/2 388759552 / 1017081856 417140736 / 968691712 REMOVE FILES 6467584 / 1017081856 6754304 / 968691712 * see below #2 INITIAL STATE 1021116416 / 1021116416 972718080 / 972718080 TRUNCATE BIN 1/3 683802624 / 1021378560 683589632 / 972955648 TRUNCATE TEXT 1/3 244162560 / 1021378560 244170752 / 972955648 REMOVE FILES 12943360 / 1021378560 11587584 / 972955648 #3 INITIAL STATE 1023041536 / 1023041536 974557184 / 974557184 TRUNCATE BIN 1/4 685211648 / 1023049728 685113344 / 974581760 TRUNCATE TEXT 1/4 189755392 / 1023049728 189194240 / 974581760 REMOVE FILES 14589952 / 1023049728 13537280 / 974581760 #4 INITIAL STATE 1023139840 / 1023139840 974815232 / 974815232 TRUNCATE BIN 1/5 685199360 / 1023143936 686104576 / 974823424 TRUNCATE TEXT 1/5 156557312 / 1023143936 156545024 / 974823424 REMOVE FILES 14704640 / 1023143936 14594048 / 974823424 #COMPRESS/DECOMPRESS test INITIAL STATE 1022980096 / 1023135744 974516224 / 974749696 COMPRESS TEXT 1120362496 / 1124478976 1072607232 / 1076731904 DECOMPRESS TEXT 1024786432 / 1124478976 976502784 / 1076731904 Test #1 suffers from fragmentation, the pool stats for that test are: 100 1632 1 6 95 73 38 2 8 107 1744 0 18 154 60 66 3 39 111 1808 0 1 36 33 16 4 0 126 2048 0 41 208 167 104 1 20 144 2336 52 588 28637 26079 16364 4 1460 151 2448 113 396 37705 36391 22623 3 786 168 2720 0 525 69378 68561 46252 2 544 190 3072 0 123 1476 1222 1107 3 189 202 3264 25 97 1995 1685 1596 4 248 206 3328 11 119 2144 786 1742 13 1092 207 3344 0 91 1001 259 819 9 603 208 3360 0 69 1173 157 966 14 826 211 3408 20 114 1758 1320 1465 5 365 212 3424 0 63 1197 169 1008 16 864 214 3456 5 97 1326 506 1122 11 693 217 3504 27 109 1232 737 1056 6 420 219 3536 0 92 1380 383 1196 13 858 222 3584 4 131 1168 573 1022 7 518 223 3600 0 37 629 70 555 15 480 225 3632 0 99 891 377 792 8 456 228 3680 0 31 310 59 279 9 225 230 3712 0 0 0 0 0 10 0 232 3744 0 28 336 68 308 11 242 234 3776 0 14 182 28 168 12 132 Note that all of the classes (for example the leader is 2336) are significantly fragmented. With ORDER4 we have more classes that just join the "let's fragment party" and add up to the numbers. So, dynamic page allocation is good, but we also would need a dynamic page release. And it sounds to me that class watermark is a much simpler thing to do. Even if we abandon the idea of having ORDER4, the class fragmentation would not go away. > As well, please write down why order-4 for MAX_ZSPAGES is best > if you resend it as formal patch. sure, if it will ever be a formal patch then I'll put more effort into documenting. ** The stat patch: we have only numbers of FULL and ALMOST_EMPTY classes, but they don't tell us how badly the class is fragmented internally. so the /sys/kernel/debug/zsmalloc/zram0/classes output now looks as follows: class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage compact [..] 12 224 0 2 146 5 8 4 4 13 240 0 0 0 0 0 1 0 14 256 1 13 1840 1672 115 1 10 15 272 0 0 0 0 0 1 0 [..] 49 816 0 3 745 735 149 1 2 51 848 3 4 361 306 76 4 8 52 864 12 14 378 268 81 3 21 54 896 1 12 117 57 26 2 12 57 944 0 0 0 0 0 3 0 [..] Total 26 131 12709 10994 1071 134 for example, class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed by compaction. does it look to you good enough to be committed on its own (off the series)? ====8<====8<==== From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] mm/zsmalloc: add can_compact to pool stat --- mm/zsmalloc.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 43e4cbc..046d364 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -494,6 +494,8 @@ static void __exit zs_stat_exit(void) debugfs_remove_recursive(zs_stat_root); } +static unsigned long zs_can_compact(struct size_class *class); + static int zs_stats_size_show(struct seq_file *s, void *v) { int i; @@ -501,14 +503,15 @@ static int zs_stats_size_show(struct seq_file *s, void *v) struct size_class *class; int objs_per_zspage; unsigned long class_almost_full, class_almost_empty; - unsigned long obj_allocated, obj_used, pages_used; + unsigned long obj_allocated, obj_used, pages_used, compact; unsigned long total_class_almost_full = 0, total_class_almost_empty = 0; unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; + unsigned long total_compact = 0; - seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s\n", + seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s %7s\n", "class", "size", "almost_full", "almost_empty", "obj_allocated", "obj_used", "pages_used", - "pages_per_zspage"); + "pages_per_zspage", "compact"); for (i = 0; i < zs_size_classes; i++) { class = pool->size_class[i]; @@ -521,6 +524,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); obj_used = zs_stat_get(class, OBJ_USED); + compact = zs_can_compact(class); spin_unlock(&class->lock); objs_per_zspage = get_maxobj_per_zspage(class->size, @@ -528,23 +532,25 @@ static int zs_stats_size_show(struct seq_file *s, void *v) pages_used = obj_allocated / objs_per_zspage * class->pages_per_zspage; - seq_printf(s, " %5u %5u %11lu %12lu %13lu %10lu %10lu %16d\n", + seq_printf(s, " %5u %5u %11lu %12lu %13lu" + " %10lu %10lu %16d %7lu\n", i, class->size, class_almost_full, class_almost_empty, obj_allocated, obj_used, pages_used, - class->pages_per_zspage); + class->pages_per_zspage, compact); total_class_almost_full += class_almost_full; total_class_almost_empty += class_almost_empty; total_objs += obj_allocated; total_used_objs += obj_used; total_pages += pages_used; + total_compact += compact; } seq_puts(s, "\n"); - seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu\n", + seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu %16s %7lu\n", "Total", "", total_class_almost_full, total_class_almost_empty, total_objs, - total_used_objs, total_pages); + total_used_objs, total_pages, "", total_compact); return 0; } -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-23 10:35 ` Sergey Senozhatsky @ 2016-02-23 16:05 ` Minchan Kim 2016-02-27 6:31 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Minchan Kim @ 2016-02-23 16:05 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On Tue, Feb 23, 2016 at 07:35:27PM +0900, Sergey Senozhatsky wrote: > On (02/23/16 17:25), Minchan Kim wrote: > [..] > > > > That sounds like a plan but at a first glance, my worry is we might need > > some special handling related to objs_per_zspage and pages_per_zspage > > because currently, we have assumed all of zspages in a class has same > > number of subpages so it might make it ugly. > > I did some further testing, and something has showed up that I want > to discuss before we go with ORDER4 (here and later ORDER4 stands for > `#define ZS_MAX_HUGE_ZSPAGE_ORDER 4' for simplicity). > > /* > * for testing purposes I have extended zsmalloc pool stats with zs_can_compact() value. > * see below > */ > > And the thing is -- quite huge internal class fragmentation. These are the 'normal' > classes, not affected by ORDER modification in any way: > > class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage compact > 107 1744 1 23 196 76 84 3 51 > 111 1808 0 0 63 63 28 4 0 > 126 2048 0 160 568 408 284 1 80 > 144 2336 52 620 8631 5747 4932 4 1648 > 151 2448 123 406 10090 8736 6054 3 810 > 168 2720 0 512 15738 14926 10492 2 540 > 190 3072 0 2 136 130 102 3 3 > > > so I've been thinking about using some sort of watermaks (well, zsmalloc is an allocator > after all, allocators love watermarks :-)). we can't defeat this fragmentation, we never > know in advance which of the pages will be modified or we the size class those pages will > land after compression. but we know stats for every class -- zs_can_compact(), > obj_allocated/obj_used, etc. so we can start class compaction if we detect that internal > fragmentation is too high (e.g. 30+% of class pages can be compacted). AFAIRC, we discussed about that when I introduced compaction. Namely, per-class compaction. I love it and just wanted to do after soft landing of compaction. So, it's good time to introduce it. ;-) > > on the other hand, we always can wait for the shrinker to come in and do the job for us, > but that can take some time. Sure, with the feature, we can remove shrinker itself, I think. > > what's your opinion on this? I will be very happy. > > > > The test. > > 1) create 2G zram, ext4, lzo, device > 2) create 1G of text files, 1G of binary files -- the last part is tricky. binary files > in general already imply some sort of compression, so the chances that binary files > will just pressure 4096 class are very high. in my test I use vmscan.c as a text file, > and vmlinux as a binary file: seems to fit perfect, it warm ups all of the "ex-huge" > classes on my system: > > 202 3264 1 0 17820 17819 14256 4 0 > 206 3328 0 1 10096 10087 8203 13 0 > 207 3344 0 1 3212 3206 2628 9 0 > 208 3360 0 1 1785 1779 1470 14 0 > 211 3408 0 0 10662 10662 8885 5 0 > 212 3424 0 1 1881 1876 1584 16 0 > 214 3456 0 1 5174 5170 4378 11 0 > 217 3504 0 0 6181 6181 5298 6 0 > 219 3536 0 1 4410 4406 3822 13 0 > 222 3584 0 1 5224 5220 4571 7 0 > 223 3600 0 1 952 946 840 15 0 > 225 3632 1 0 1638 1636 1456 8 0 > 228 3680 0 1 1410 1403 1269 9 0 > 230 3712 1 0 462 461 420 10 0 > 232 3744 0 1 528 519 484 11 0 > 234 3776 0 1 559 554 516 12 0 > 235 3792 0 1 70 57 65 13 0 > 236 3808 1 0 105 104 98 14 0 > 238 3840 0 1 176 166 165 15 0 > 254 4096 0 0 1944 1944 1944 1 0 > > > 3) MAIN-test: > for j in {2..10}; do > create_test_files > truncate_bin_files $j > truncate_text_files $j > remove_test_files > done > > so it creates text and binary files, truncates them, removes, and does the whole thing again. > the truncation is 1/2, 1/3 ... 1/10 of then original file size. > the order of file modifications is preserved across all of the tests. > > 4) SUB-test (gzipped files pressure 4096 class mostly, but I decided to keep it) > `gzip -9' all text files > create file copy for every gzipped file "cp FOO.gz FOO", so `gzip -d' later has to overwrite FOO file content > `gzip -d' all text files > > 5) goto 1 > > > > I'll just post a shorter version of the results > (two columns from zram's mm_stat: total_used_mem / max_used_mem) > > #1 BASE ORDER4 > INITIAL STATE 1016832000 / 1016832000 968470528 / 968470528 > TRUNCATE BIN 1/2 715878400 / 1017081856 744165376 / 968691712 > TRUNCATE TEXT 1/2 388759552 / 1017081856 417140736 / 968691712 > REMOVE FILES 6467584 / 1017081856 6754304 / 968691712 > > * see below > > > #2 > INITIAL STATE 1021116416 / 1021116416 972718080 / 972718080 > TRUNCATE BIN 1/3 683802624 / 1021378560 683589632 / 972955648 > TRUNCATE TEXT 1/3 244162560 / 1021378560 244170752 / 972955648 > REMOVE FILES 12943360 / 1021378560 11587584 / 972955648 > > #3 > INITIAL STATE 1023041536 / 1023041536 974557184 / 974557184 > TRUNCATE BIN 1/4 685211648 / 1023049728 685113344 / 974581760 > TRUNCATE TEXT 1/4 189755392 / 1023049728 189194240 / 974581760 > REMOVE FILES 14589952 / 1023049728 13537280 / 974581760 > > #4 > INITIAL STATE 1023139840 / 1023139840 974815232 / 974815232 > TRUNCATE BIN 1/5 685199360 / 1023143936 686104576 / 974823424 > TRUNCATE TEXT 1/5 156557312 / 1023143936 156545024 / 974823424 > REMOVE FILES 14704640 / 1023143936 14594048 / 974823424 > > > #COMPRESS/DECOMPRESS test > INITIAL STATE 1022980096 / 1023135744 974516224 / 974749696 > COMPRESS TEXT 1120362496 / 1124478976 1072607232 / 1076731904 > DECOMPRESS TEXT 1024786432 / 1124478976 976502784 / 1076731904 > > > Test #1 suffers from fragmentation, the pool stats for that test are: > > 100 1632 1 6 95 73 38 2 8 > 107 1744 0 18 154 60 66 3 39 > 111 1808 0 1 36 33 16 4 0 > 126 2048 0 41 208 167 104 1 20 > 144 2336 52 588 28637 26079 16364 4 1460 > 151 2448 113 396 37705 36391 22623 3 786 > 168 2720 0 525 69378 68561 46252 2 544 > 190 3072 0 123 1476 1222 1107 3 189 > 202 3264 25 97 1995 1685 1596 4 248 > 206 3328 11 119 2144 786 1742 13 1092 > 207 3344 0 91 1001 259 819 9 603 > 208 3360 0 69 1173 157 966 14 826 > 211 3408 20 114 1758 1320 1465 5 365 > 212 3424 0 63 1197 169 1008 16 864 > 214 3456 5 97 1326 506 1122 11 693 > 217 3504 27 109 1232 737 1056 6 420 > 219 3536 0 92 1380 383 1196 13 858 > 222 3584 4 131 1168 573 1022 7 518 > 223 3600 0 37 629 70 555 15 480 > 225 3632 0 99 891 377 792 8 456 > 228 3680 0 31 310 59 279 9 225 > 230 3712 0 0 0 0 0 10 0 > 232 3744 0 28 336 68 308 11 242 > 234 3776 0 14 182 28 168 12 132 > > > Note that all of the classes (for example the leader is 2336) are significantly > fragmented. With ORDER4 we have more classes that just join the "let's fragment > party" and add up to the numbers. > > > > So, dynamic page allocation is good, but we also would need a dynamic page > release. And it sounds to me that class watermark is a much simpler thing > to do. > > Even if we abandon the idea of having ORDER4, the class fragmentation would > not go away. True. > > > > > As well, please write down why order-4 for MAX_ZSPAGES is best > > if you resend it as formal patch. > > sure, if it will ever be a formal patch then I'll put more effort into documenting. > > > > > ** The stat patch: > > we have only numbers of FULL and ALMOST_EMPTY classes, but they don't tell > us how badly the class is fragmented internally. > > so the /sys/kernel/debug/zsmalloc/zram0/classes output now looks as follows: > > class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage compact > [..] > 12 224 0 2 146 5 8 4 4 > 13 240 0 0 0 0 0 1 0 > 14 256 1 13 1840 1672 115 1 10 > 15 272 0 0 0 0 0 1 0 > [..] > 49 816 0 3 745 735 149 1 2 > 51 848 3 4 361 306 76 4 8 > 52 864 12 14 378 268 81 3 21 > 54 896 1 12 117 57 26 2 12 > 57 944 0 0 0 0 0 3 0 > [..] > Total 26 131 12709 10994 1071 134 > > > for example, class-896 is heavily fragmented -- it occupies 26 pages, 12 can be > freed by compaction. > > > does it look to you good enough to be committed on its own (off the series)? I think it's good to have. Firstly, I thought we can get the information by existing stats with simple math on userspace but changed my mind because we could change the implementation sometime so such simple math might not be perfect in future and even, we can expose it easily so yes, let's do it. Thanks! > > ====8<====8<==== > > From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Subject: [PATCH] mm/zsmalloc: add can_compact to pool stat > > --- > mm/zsmalloc.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 43e4cbc..046d364 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -494,6 +494,8 @@ static void __exit zs_stat_exit(void) > debugfs_remove_recursive(zs_stat_root); > } > > +static unsigned long zs_can_compact(struct size_class *class); > + > static int zs_stats_size_show(struct seq_file *s, void *v) > { > int i; > @@ -501,14 +503,15 @@ static int zs_stats_size_show(struct seq_file *s, void *v) > struct size_class *class; > int objs_per_zspage; > unsigned long class_almost_full, class_almost_empty; > - unsigned long obj_allocated, obj_used, pages_used; > + unsigned long obj_allocated, obj_used, pages_used, compact; > unsigned long total_class_almost_full = 0, total_class_almost_empty = 0; > unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; > + unsigned long total_compact = 0; > > - seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s\n", > + seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s %7s\n", > "class", "size", "almost_full", "almost_empty", > "obj_allocated", "obj_used", "pages_used", > - "pages_per_zspage"); > + "pages_per_zspage", "compact"); > > for (i = 0; i < zs_size_classes; i++) { > class = pool->size_class[i]; > @@ -521,6 +524,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) > class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); > obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); > obj_used = zs_stat_get(class, OBJ_USED); > + compact = zs_can_compact(class); > spin_unlock(&class->lock); > > objs_per_zspage = get_maxobj_per_zspage(class->size, > @@ -528,23 +532,25 @@ static int zs_stats_size_show(struct seq_file *s, void *v) > pages_used = obj_allocated / objs_per_zspage * > class->pages_per_zspage; > > - seq_printf(s, " %5u %5u %11lu %12lu %13lu %10lu %10lu %16d\n", > + seq_printf(s, " %5u %5u %11lu %12lu %13lu" > + " %10lu %10lu %16d %7lu\n", > i, class->size, class_almost_full, class_almost_empty, > obj_allocated, obj_used, pages_used, > - class->pages_per_zspage); > + class->pages_per_zspage, compact); > > total_class_almost_full += class_almost_full; > total_class_almost_empty += class_almost_empty; > total_objs += obj_allocated; > total_used_objs += obj_used; > total_pages += pages_used; > + total_compact += compact; > } > > seq_puts(s, "\n"); > - seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu\n", > + seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu %16s %7lu\n", > "Total", "", total_class_almost_full, > total_class_almost_empty, total_objs, > - total_used_objs, total_pages); > + total_used_objs, total_pages, "", total_compact); > > return 0; > } > -- > 2.7.1 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-23 16:05 ` Minchan Kim @ 2016-02-27 6:31 ` Sergey Senozhatsky 0 siblings, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-27 6:31 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel Hello Minchan, sorry for very long reply. On (02/24/16 01:05), Minchan Kim wrote: [..] > > And the thing is -- quite huge internal class fragmentation. These are the 'normal' > > classes, not affected by ORDER modification in any way: > > > > class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage compact > > 107 1744 1 23 196 76 84 3 51 > > 111 1808 0 0 63 63 28 4 0 > > 126 2048 0 160 568 408 284 1 80 > > 144 2336 52 620 8631 5747 4932 4 1648 > > 151 2448 123 406 10090 8736 6054 3 810 > > 168 2720 0 512 15738 14926 10492 2 540 > > 190 3072 0 2 136 130 102 3 3 > > > > > > so I've been thinking about using some sort of watermaks (well, zsmalloc is an allocator > > after all, allocators love watermarks :-)). we can't defeat this fragmentation, we never > > know in advance which of the pages will be modified or we the size class those pages will > > land after compression. but we know stats for every class -- zs_can_compact(), > > obj_allocated/obj_used, etc. so we can start class compaction if we detect that internal > > fragmentation is too high (e.g. 30+% of class pages can be compacted). > > AFAIRC, we discussed about that when I introduced compaction. > Namely, per-class compaction. > I love it and just wanted to do after soft landing of compaction. > So, it's good time to introduce it. ;-) ah, yeah, indeed. I vaguely recall this. my first 'auto-compaction' submission has had this "compact every class in zs_free()", which was a subject to 10+% performance penalty on some of the tests. but with watermarks this will be less dramatic, I think. > > > > on the other hand, we always can wait for the shrinker to come in and do the job for us, > > but that can take some time. > > Sure, with the feature, we can remove shrinker itself, I think. > > > > what's your opinion on this? > > I will be very happy. good, I'll take a look later, to avoid any conflicts with your re-work. [..] > > does it look to you good enough to be committed on its own (off the series)? > > I think it's good to have. Firstly, I thought we can get the information > by existing stats with simple math on userspace but changed my mind > because we could change the implementation sometime so such simple math > might not be perfect in future and even, we can expose it easily so yes, > let's do it. thanks! submitted. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE 2016-02-22 1:34 ` Minchan Kim 2016-02-22 2:01 ` Sergey Senozhatsky @ 2016-02-22 2:24 ` Sergey Senozhatsky 1 sibling, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2016-02-22 2:24 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, Joonsoo Kim, linux-mm, linux-kernel On (02/22/16 10:34), Minchan Kim wrote: [..] > > > > that's 891703 - 850147 = 41556 less pages. or 162MB less memory used. > > 41556 less pages means that zsmalloc had 41556 less chances to fail. > > > Let's think swap-case which is more important for zram now. As you know, > most of usecase are swap in embedded world. > Do we really need 16 pages allocator for just less PAGE_SIZE objet > at the moment which is really heavy memory pressure? well, it's not about having less PAGE_SIZE sized objects, it's about allocating less pages in the first place; and to achieve this we need less PAGE_SIZE sized objects. in the existing scheme of things (current implementation) allocating up to 16 pages to end up using less pages looks quite ok. and not all of the huge classes request 16 pages to become a 'normal' class: 191 3088 1 0 3588 3586 2760 10 192 3104 1 0 3740 3737 2860 13 194 3136 0 1 7215 7208 5550 10 197 3184 1 0 11151 11150 8673 7 199 3216 0 1 9310 9304 7315 11 200 3232 0 1 4731 4717 3735 15 202 3264 0 1 8400 8396 6720 4 206 3328 0 1 22064 22051 17927 13 207 3344 0 1 4884 4877 3996 9 208 3360 0 1 4420 4415 3640 14 211 3408 0 1 11250 11246 9375 5 212 3424 1 0 3344 3343 2816 16 214 3456 0 2 7345 7329 6215 11 217 3504 0 1 10801 10797 9258 6 219 3536 0 1 5295 5289 4589 13 222 3584 0 0 6008 6008 5257 7 223 3600 0 1 1530 1518 1350 15 225 3632 0 1 3519 3514 3128 8 228 3680 0 1 3990 3985 3591 9 230 3712 0 2 2167 2151 1970 10 232 3744 1 2 1848 1835 1694 11 234 3776 0 2 1404 1384 1296 12 235 3792 0 2 672 654 624 13 236 3808 1 2 615 592 574 14 238 3840 1 2 1120 1098 1050 15 254 4096 0 0 241824 241824 241824 1 hm.... I just thought about it. do we have a big enough computation error in static int get_pages_per_zspage(int class_size) 777 zspage_size = i * PAGE_SIZE; 778 waste = zspage_size % class_size; 779 usedpc = (zspage_size - waste) * 100 / zspage_size; 780 781 if (usedpc > max_usedpc) { 782 max_usedpc = usedpc; 783 max_usedpc_order = i; 784 } to begin `misconfiguring' the classes? we cast `usedpc' to int, so we can miss the difference between 90% and 90.95% for example... hm, need to check it later. so, yes, dynamic page allocation sounds interesting. but should it be part of this patch set or we can introduce it later (I think we can do it later)? a good testing for now would be really valuable, hopefully you guys can help me here. depending on those tests we will have a better road map, I think. the test I've done (and will do more) demonstrate that we save pages. -ss -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2016-02-27 6:33 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-21 13:27 [RFC][PATCH v2 0/3] mm/zsmalloc: increase objects density and reduce memory wastage Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 1/3] mm/zsmalloc: introduce zs_get_huge_class_size_watermark() Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 2/3] zram: use zs_get_huge_class_size_watermark() Sergey Senozhatsky 2016-02-22 0:04 ` Minchan Kim 2016-02-22 0:40 ` Sergey Senozhatsky 2016-02-22 1:27 ` Minchan Kim 2016-02-22 1:59 ` Sergey Senozhatsky 2016-02-22 2:05 ` Sergey Senozhatsky 2016-02-22 2:57 ` Minchan Kim 2016-02-22 3:54 ` Sergey Senozhatsky 2016-02-22 4:54 ` Minchan Kim 2016-02-22 5:05 ` Sergey Senozhatsky 2016-02-21 13:27 ` [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE Sergey Senozhatsky 2016-02-22 0:25 ` Minchan Kim 2016-02-22 0:47 ` Sergey Senozhatsky 2016-02-22 1:34 ` Minchan Kim 2016-02-22 2:01 ` Sergey Senozhatsky 2016-02-22 2:34 ` Minchan Kim 2016-02-22 3:59 ` Sergey Senozhatsky 2016-02-22 4:41 ` Minchan Kim 2016-02-22 10:43 ` Sergey Senozhatsky 2016-02-23 8:25 ` Minchan Kim 2016-02-23 10:35 ` Sergey Senozhatsky 2016-02-23 16:05 ` Minchan Kim 2016-02-27 6:31 ` Sergey Senozhatsky 2016-02-22 2:24 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).