Re: [PATCH v6 11/12] zsmalloc: page migration support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH v6 11/12] zsmalloc: page migration support
Date: Tue, 24 May 2016 17:17:21 +0900	[thread overview]
Message-ID: <20160524081721.GC29094@bbox> (raw)
In-Reply-To: <20160524080511.GB496@swordfish>

On Tue, May 24, 2016 at 05:05:11PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (05/24/16 15:28), Minchan Kim wrote:
> [..]
> > Most important point to me is that it makes code *simple* at the cost of
> > addtional wasting memory. Now, every zspage lives in *a* list so we don't
> > need to check zspage groupness to use list_empty of zspage.
> > I'm not sure how you feel it makes code simple a lot.
> > However, while I implement page migration logic, the check with condition
> > that zspage's groupness is either almost_empty and almost_full is really
> > bogus and tricky to me so I should debug several time to find what's
> > wrong.
> > 
> > Compared to old, zsmalloc is complicated day by day so I want to weight
> > on *simple* for easy maintainance.
> > 
> > One more note:
> > Now, ZS_EMPTY is used as pool. Look at find_get_zspage. So adding
> > "empty" column in ZSMALLOC_STAT might be worth but I wanted to handle it
> > as another topic.
> > 
> > So if you don't feel strong the saving is really huge, I want to
> > go with this. And if we are adding more wasted memory in future,
> > let's handle it then.
> 
> oh, sure, all those micro-optimizations can be done later,
> off the series.
> 
> > About CONFIG_ZSMALLOC_STAT, It might be off-topic. Frankly speaking,
> > I have guided production team to enable it because when I profile the
> > overhead caused by ZSMALLOC_STAT, there is no performance lost
> > in real workload. However, the stat gives more detailed useful
> > information.
> 
> ok, agree.
> good to know that you use stats in production, by the way.
> 
> [..]
> > > > +	pos = (((class->objs_per_zspage * class->size) *
> > > > +		page_idx / class->pages_per_zspage) / class->size
> > > > +	      ) * class->size;
> > > 
> > > 
> > > something went wrong with the indentation here :)
> > > 
> > > so... it's
> > > 
> > > 	(((class->objs_per_zspage * class->size) * page_idx / class->pages_per_zspage) / class->size ) * class->size;
> > > 
> > > the last ' / class->size ) * class->size' can be dropped, I think.
> > 
> > You prove I didn't learn math.
> > Will drop it.
> 
> haha, no, that wasn't the point :) great job with the series!
> 
> [..]
> > > hm... zsmalloc is getting sooo complex now.
> > > 
> > > `system_wq' -- can we have problems here when the system is getting
> > > low on memory and workers are getting increasingly busy trying to
> > > allocate the memory for some other purposes?
> > > 
> > > _theoretically_ zsmalloc can stack a number of ready-to-release zspages,
> > > which won't be accessible to zsmalloc, nor will they be released. how likely
> > > is this? hm, can zsmalloc take zspages from that deferred release list when
> > > it wants to allocate a new zspage?
> > 
> > Done.
> 
> oh, good. that was a purely theoretical thing, and to continue with the
> theories, I assume that zs_malloc() will improve with this change. the
> sort of kind of problem with zs_malloc(), *I think*, is that we release
> the class ->lock after failed find_get_zspage():
> 
> 	handle = cache_alloc_handle(pool, gfp);
> 	if (!handle)
> 		return 0;
> 
> 	zspage = find_get_zspage(class);
> 	if (likely(zspage)) {
> 		obj = obj_malloc(class, zspage, handle);
> 		[..]
> 		spin_unlock(&class->lock);
> 
> 		return handle;
> 	}
> 
> 	spin_unlock(&class->lock);
> 
> 	zspage = alloc_zspage(pool, class, gfp);
> 	if (!zspage) {
> 		cache_free_handle(pool, handle);
> 		return 0;
> 	}
> 
> 	spin_lock(&class->lock);
> 	obj = obj_malloc(class, zspage, handle);
> 	[..]
> 	spin_unlock(&class->lock);
> 
> 
> _theoretically_, on a not-really-huge system, let's say 64 CPUs for
> example, we can have 64 write paths trying to store objects of size
> OBJ_SZ to a ZS_FULL class-OBJSZ. the write path (each of them) will
> fail on find_get_zspage(), unlock the class ->lock (so another write
> path will have its chance to fail on find_get_zspage()), alloc_zspage(),
> create a page chain, spin on class ->lock to add the new zspage to the
> class. so we can end up allocating up to 64 zspages, each of them will
> carry N PAGE_SIZE pages. those zspages, at least at the beginning, will
> store only one object per-zspage; which will blastoff the internal
> fragmentation and can cause more compaction/migration/etc later on. well,
> it's a bit pessimistic, but I think to _some extent_ this scenario is
> quite possible.
> 
> I assume that this "pick an already marked for release zspage" thing is
> happening as a fast path within the first class ->lock section, so the
> rest of concurrent write requests that are spinning on the class ->lock
> at the moment will see a zspage, instead of !find_get_zspage().

As well, we would reduce page alloc/free cost although it's not expensive
compared to comp overhead. :)

Thanks for giving the thought!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH v6 11/12] zsmalloc: page migration support
Date: Tue, 24 May 2016 17:17:21 +0900	[thread overview]
Message-ID: <20160524081721.GC29094@bbox> (raw)
In-Reply-To: <20160524080511.GB496@swordfish>

On Tue, May 24, 2016 at 05:05:11PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (05/24/16 15:28), Minchan Kim wrote:
> [..]
> > Most important point to me is that it makes code *simple* at the cost of
> > addtional wasting memory. Now, every zspage lives in *a* list so we don't
> > need to check zspage groupness to use list_empty of zspage.
> > I'm not sure how you feel it makes code simple a lot.
> > However, while I implement page migration logic, the check with condition
> > that zspage's groupness is either almost_empty and almost_full is really
> > bogus and tricky to me so I should debug several time to find what's
> > wrong.
> > 
> > Compared to old, zsmalloc is complicated day by day so I want to weight
> > on *simple* for easy maintainance.
> > 
> > One more note:
> > Now, ZS_EMPTY is used as pool. Look at find_get_zspage. So adding
> > "empty" column in ZSMALLOC_STAT might be worth but I wanted to handle it
> > as another topic.
> > 
> > So if you don't feel strong the saving is really huge, I want to
> > go with this. And if we are adding more wasted memory in future,
> > let's handle it then.
> 
> oh, sure, all those micro-optimizations can be done later,
> off the series.
> 
> > About CONFIG_ZSMALLOC_STAT, It might be off-topic. Frankly speaking,
> > I have guided production team to enable it because when I profile the
> > overhead caused by ZSMALLOC_STAT, there is no performance lost
> > in real workload. However, the stat gives more detailed useful
> > information.
> 
> ok, agree.
> good to know that you use stats in production, by the way.
> 
> [..]
> > > > +	pos = (((class->objs_per_zspage * class->size) *
> > > > +		page_idx / class->pages_per_zspage) / class->size
> > > > +	      ) * class->size;
> > > 
> > > 
> > > something went wrong with the indentation here :)
> > > 
> > > so... it's
> > > 
> > > 	(((class->objs_per_zspage * class->size) * page_idx / class->pages_per_zspage) / class->size ) * class->size;
> > > 
> > > the last ' / class->size ) * class->size' can be dropped, I think.
> > 
> > You prove I didn't learn math.
> > Will drop it.
> 
> haha, no, that wasn't the point :) great job with the series!
> 
> [..]
> > > hm... zsmalloc is getting sooo complex now.
> > > 
> > > `system_wq' -- can we have problems here when the system is getting
> > > low on memory and workers are getting increasingly busy trying to
> > > allocate the memory for some other purposes?
> > > 
> > > _theoretically_ zsmalloc can stack a number of ready-to-release zspages,
> > > which won't be accessible to zsmalloc, nor will they be released. how likely
> > > is this? hm, can zsmalloc take zspages from that deferred release list when
> > > it wants to allocate a new zspage?
> > 
> > Done.
> 
> oh, good. that was a purely theoretical thing, and to continue with the
> theories, I assume that zs_malloc() will improve with this change. the
> sort of kind of problem with zs_malloc(), *I think*, is that we release
> the class ->lock after failed find_get_zspage():
> 
> 	handle = cache_alloc_handle(pool, gfp);
> 	if (!handle)
> 		return 0;
> 
> 	zspage = find_get_zspage(class);
> 	if (likely(zspage)) {
> 		obj = obj_malloc(class, zspage, handle);
> 		[..]
> 		spin_unlock(&class->lock);
> 
> 		return handle;
> 	}
> 
> 	spin_unlock(&class->lock);
> 
> 	zspage = alloc_zspage(pool, class, gfp);
> 	if (!zspage) {
> 		cache_free_handle(pool, handle);
> 		return 0;
> 	}
> 
> 	spin_lock(&class->lock);
> 	obj = obj_malloc(class, zspage, handle);
> 	[..]
> 	spin_unlock(&class->lock);
> 
> 
> _theoretically_, on a not-really-huge system, let's say 64 CPUs for
> example, we can have 64 write paths trying to store objects of size
> OBJ_SZ to a ZS_FULL class-OBJSZ. the write path (each of them) will
> fail on find_get_zspage(), unlock the class ->lock (so another write
> path will have its chance to fail on find_get_zspage()), alloc_zspage(),
> create a page chain, spin on class ->lock to add the new zspage to the
> class. so we can end up allocating up to 64 zspages, each of them will
> carry N PAGE_SIZE pages. those zspages, at least at the beginning, will
> store only one object per-zspage; which will blastoff the internal
> fragmentation and can cause more compaction/migration/etc later on. well,
> it's a bit pessimistic, but I think to _some extent_ this scenario is
> quite possible.
> 
> I assume that this "pick an already marked for release zspage" thing is
> happening as a fast path within the first class ->lock section, so the
> rest of concurrent write requests that are spinning on the class ->lock
> at the moment will see a zspage, instead of !find_get_zspage().

As well, we would reduce page alloc/free cost although it's not expensive
compared to comp overhead. :)

Thanks for giving the thought!

next prev parent reply	other threads:[~2016-05-24  8:17 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20 14:23 [PATCH v6 00/12] Support non-lru page migration Minchan Kim
2016-05-20 14:23 ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 01/12] mm: use put_page to free page instead of putback_lru_page Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 02/12] mm: migrate: support non-lru movable page migration Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-27 14:26   ` Vlastimil Babka
2016-05-27 14:26   ` Vlastimil Babka
2016-05-27 14:26     ` Vlastimil Babka
2016-05-27 14:26     ` Vlastimil Babka
2016-05-30  1:33     ` Minchan Kim
2016-05-30  1:33       ` Minchan Kim
2016-05-30  1:33       ` Minchan Kim
2016-05-30  9:01       ` Vlastimil Babka
2016-05-30  9:01         ` Vlastimil Babka
2016-05-30  9:01         ` Vlastimil Babka
2016-05-30  1:33     ` Minchan Kim
2016-05-30  1:39   ` PATCH v6v2 " Minchan Kim
2016-05-30  1:39     ` Minchan Kim
2016-05-30  1:39     ` Minchan Kim
2016-05-30  1:39     ` Minchan Kim
2016-05-30  9:36     ` Vlastimil Babka
2016-05-30  9:36     ` Vlastimil Babka
2016-05-30  9:36       ` Vlastimil Babka
2016-05-30 16:25       ` Minchan Kim
2016-05-30 16:25         ` Minchan Kim
2016-05-30 16:25         ` Minchan Kim
2016-05-31  7:51         ` Vlastimil Babka
2016-05-31  7:51         ` Vlastimil Babka
2016-05-31  7:51           ` Vlastimil Babka
2016-05-31  0:01     ` [PATCH v6v3 " Minchan Kim
2016-05-31  0:01       ` Minchan Kim
2016-05-31  0:01       ` Minchan Kim
2016-05-31  7:52       ` Vlastimil Babka
2016-05-31  7:52         ` Vlastimil Babka
2016-05-31 23:05         ` Minchan Kim
2016-05-31 23:05           ` Minchan Kim
2016-05-31 23:05         ` Minchan Kim
2016-05-31  7:52       ` Vlastimil Babka
2016-06-13  9:38       ` Anshuman Khandual
2016-06-13  9:38       ` Anshuman Khandual
2016-06-13  9:38         ` Anshuman Khandual
2016-06-15  2:32         ` Minchan Kim
2016-06-15  2:32           ` Minchan Kim
2016-06-15  6:45           ` Anshuman Khandual
2016-06-15  6:45             ` Anshuman Khandual
2016-06-15  6:45             ` Anshuman Khandual
2016-06-16  0:26             ` Minchan Kim
2016-06-16  0:26             ` Minchan Kim
2016-06-16  0:26               ` Minchan Kim
2016-06-16  0:26               ` Minchan Kim
2016-06-16  3:42               ` Anshuman Khandual
2016-06-16  3:42                 ` Anshuman Khandual
2016-06-16  3:42                 ` Anshuman Khandual
2016-06-16  5:37                 ` Minchan Kim
2016-06-16  5:37                   ` Minchan Kim
2016-06-27  5:51                   ` Anshuman Khandual
2016-06-27  5:51                     ` Anshuman Khandual
2016-06-28  6:39                     ` Minchan Kim
2016-06-28  6:39                     ` Minchan Kim
2016-06-28  6:39                       ` Minchan Kim
2016-06-30  5:56                       ` Anshuman Khandual
2016-06-30  5:56                       ` Anshuman Khandual
2016-06-30  5:56                         ` Anshuman Khandual
2016-06-30  6:18                         ` Minchan Kim
2016-06-30  6:18                         ` Minchan Kim
2016-06-30  6:18                           ` Minchan Kim
2016-06-30  6:18                           ` Minchan Kim
2016-06-27  5:51                   ` Anshuman Khandual
2016-06-16  5:37                 ` Minchan Kim
2016-06-15  2:32         ` Minchan Kim
2016-05-31  0:01     ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 " Minchan Kim
2016-05-20 14:23 ` [PATCH v6 03/12] mm: balloon: use general non-lru movable page feature Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-30 12:16   ` Vlastimil Babka
2016-05-30 12:16     ` Vlastimil Babka
2016-05-30 12:16   ` Vlastimil Babka
2016-05-20 14:23 ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 04/12] zsmalloc: keep max_object in size_class Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 05/12] zsmalloc: use bit_spin_lock Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 06/12] zsmalloc: use accessor Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 07/12] zsmalloc: factor page chain functionality out Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 08/12] zsmalloc: introduce zspage structure Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 09/12] zsmalloc: separate free_zspage from putback_zspage Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 10/12] zsmalloc: use freeobj for index Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 11/12] zsmalloc: page migration support Minchan Kim
2016-05-20 14:23   ` Minchan Kim
2016-05-24  5:28   ` Sergey Senozhatsky
2016-05-24  5:28     ` Sergey Senozhatsky
2016-05-24  6:28     ` Minchan Kim
2016-05-24  6:28       ` Minchan Kim
2016-05-24  8:05       ` Sergey Senozhatsky
2016-05-24  8:05         ` Sergey Senozhatsky
2016-05-24  8:17         ` Minchan Kim [this message]
2016-05-24  8:17           ` Minchan Kim
2016-05-25  5:14       ` Minchan Kim
2016-05-25  5:14         ` Minchan Kim
2016-05-25 15:23         ` Sergey Senozhatsky
2016-05-25 15:23           ` Sergey Senozhatsky
2016-05-26  0:32           ` Minchan Kim
2016-05-26  0:32             ` Minchan Kim
2016-05-26  0:59             ` Sergey Senozhatsky
2016-05-26  0:59               ` Sergey Senozhatsky
2016-05-26  4:37               ` Minchan Kim
2016-05-26  4:37                 ` Minchan Kim
2016-05-26 21:50   ` [PATCH v6r2 " Minchan Kim
2016-05-26 21:50     ` Minchan Kim
2016-05-20 14:23 ` [PATCH v6 12/12] zram: use __GFP_MOVABLE for memory allocation Minchan Kim
2016-05-20 14:23   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160524081721.GC29094@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.