linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Weijie Yang <weijie.yang.kh@gmail.com>
To: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>,
	Minchan Kim <minchan@kernel.org>, Nitin Gupta <ngupta@vflare.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Bob Liu <bob.liu@oracle.com>, Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Weijie Yang <weijie.yang@samsung.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Linux-MM <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: zpool: implement zsmalloc shrinking
Date: Sat, 26 Apr 2014 16:37:31 +0800	[thread overview]
Message-ID: <CAL1ERfMPcfyUeACnmZ2QF5WxJUQ2PaKbtRzis8sPbQsjnvf_GQ@mail.gmail.com> (raw)
In-Reply-To: <1397922764-1512-3-git-send-email-ddstreet@ieee.org>

On Sat, Apr 19, 2014 at 11:52 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> Add zs_shrink() and helper functions to zsmalloc.  Update zsmalloc
> zs_create_pool() creation function to include ops param that provides
> an evict() function for use during shrinking.  Update helper function
> fix_fullness_group() to always reinsert changed zspages even if the
> fullness group did not change, so they are updated in the fullness
> group lru.  Also update zram to use the new zsmalloc pool creation
> function but pass NULL as the ops param, since zram does not use
> pool shrinking.
>

I only review the code without test, however, I think this patch is
not acceptable.

The biggest problem is it will call zswap_writeback_entry() under lock,
zswap_writeback_entry() may sleep, so it is a bug. see below

The 3/4 patch has a lot of #ifdef, I don't think it's a good kind of
abstract way.

What about just disable zswap reclaim when using zsmalloc?
There is a long way to optimize writeback reclaim(both zswap and zram) ,
Maybe a small and simple step forward is better.

Regards,

> Signed-off-by: Dan Streetman <ddstreet@ieee.org>
>
> ---
>
> To find all the used objects inside a zspage, I had to do a full scan
> of the zspage->freelist for each object, since there's no list of used
> objects, and no way to keep a list of used objects without allocating
> more memory for each zspage (that I could see).  Of course, by taking
> a byte (or really only a bit) out of each object's memory area to use
> as a flag, we could just check that instead of scanning ->freelist
> for each zspage object, but that would (slightly) reduce the available
> size of each zspage object.
>
>
>  drivers/block/zram/zram_drv.c |   2 +-
>  include/linux/zsmalloc.h      |   7 +-
>  mm/zsmalloc.c                 | 168 ++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 160 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 9849b52..dacf343 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -249,7 +249,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
>                 goto free_meta;
>         }
>
> -       meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM);
> +       meta->mem_pool = zs_create_pool(GFP_NOIO | __GFP_HIGHMEM, NULL);
>         if (!meta->mem_pool) {
>                 pr_err("Error creating memory pool\n");
>                 goto free_table;
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index e44d634..a75ab6e 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -36,11 +36,16 @@ enum zs_mapmode {
>
>  struct zs_pool;
>
> -struct zs_pool *zs_create_pool(gfp_t flags);
> +struct zs_ops {
> +       int (*evict)(struct zs_pool *pool, unsigned long handle);
> +};
> +
> +struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops);
>  void zs_destroy_pool(struct zs_pool *pool);
>
>  unsigned long zs_malloc(struct zs_pool *pool, size_t size);
>  void zs_free(struct zs_pool *pool, unsigned long obj);
> +int zs_shrink(struct zs_pool *pool, size_t size);
>
>  void *zs_map_object(struct zs_pool *pool, unsigned long handle,
>                         enum zs_mapmode mm);
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 36b4591..b99bec0 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -219,6 +219,8 @@ struct zs_pool {
>         struct size_class size_class[ZS_SIZE_CLASSES];
>
>         gfp_t flags;    /* allocation flags used when growing pool */
> +
> +       struct zs_ops *ops;
>  };
>
>  /*
> @@ -389,16 +391,14 @@ static enum fullness_group fix_fullness_group(struct zs_pool *pool,
>         BUG_ON(!is_first_page(page));
>
>         get_zspage_mapping(page, &class_idx, &currfg);
> -       newfg = get_fullness_group(page);
> -       if (newfg == currfg)
> -               goto out;
> -
>         class = &pool->size_class[class_idx];
> +       newfg = get_fullness_group(page);
> +       /* Need to do this even if currfg == newfg, to update lru */
>         remove_zspage(page, class, currfg);
>         insert_zspage(page, class, newfg);
> -       set_zspage_mapping(page, class_idx, newfg);
> +       if (currfg != newfg)
> +               set_zspage_mapping(page, class_idx, newfg);
>
> -out:
>         return newfg;
>  }
>
> @@ -438,6 +438,36 @@ static int get_pages_per_zspage(int class_size)
>  }
>
>  /*
> + * To determine which class to use when shrinking, we find the
> + * first zspage class that is greater than the requested shrink
> + * size, and has at least one zspage.  This returns the class
> + * with the class lock held, or NULL.
> + */
> +static struct size_class *get_class_to_shrink(struct zs_pool *pool,
> +                       size_t size)
> +{
> +       struct size_class *class;
> +       int i;
> +       bool in_use, large_enough;
> +
> +       for (i = 0; i <= ZS_SIZE_CLASSES; i++) {
> +               class = &pool->size_class[i];
> +
> +               spin_lock(&class->lock);
> +
> +               in_use = class->pages_allocated > 0;
> +               large_enough = class->pages_per_zspage * PAGE_SIZE >= size;
> +
> +               if (in_use && large_enough)
> +                       return class;
> +
> +               spin_unlock(&class->lock);
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
>   * A single 'zspage' is composed of many system pages which are
>   * linked together using fields in struct page. This function finds
>   * the first/head page, given any component page of a zspage.
> @@ -508,6 +538,48 @@ static unsigned long obj_idx_to_offset(struct page *page,
>         return off + obj_idx * class_size;
>  }
>
> +static bool obj_handle_is_free(struct page *first_page,
> +                       struct size_class *class, unsigned long handle)
> +{
> +       unsigned long obj, idx, offset;
> +       struct page *page;
> +       struct link_free *link;
> +
> +       BUG_ON(!is_first_page(first_page));
> +
> +       obj = (unsigned long)first_page->freelist;
> +
> +       while (obj) {
> +               if (obj == handle)
> +                       return true;
> +
> +               obj_handle_to_location(obj, &page, &idx);
> +               offset = obj_idx_to_offset(page, idx, class->size);
> +
> +               link = (struct link_free *)kmap_atomic(page) +
> +                                       offset / sizeof(*link);
> +               obj = (unsigned long)link->next;
> +               kunmap_atomic(link);
> +       }
> +
> +       return false;
> +}
> +
> +static void obj_free(unsigned long obj, struct page *page, unsigned long offset)
> +{
> +       struct page *first_page = get_first_page(page);
> +       struct link_free *link;
> +
> +       /* Insert this object in containing zspage's freelist */
> +       link = (struct link_free *)((unsigned char *)kmap_atomic(page)
> +                                                       + offset);
> +       link->next = first_page->freelist;
> +       kunmap_atomic(link);
> +       first_page->freelist = (void *)obj;
> +
> +       first_page->inuse--;
> +}
> +
>  static void reset_page(struct page *page)
>  {
>         clear_bit(PG_private, &page->flags);
> @@ -651,6 +723,39 @@ cleanup:
>         return first_page;
>  }
>
> +static int reclaim_zspage(struct zs_pool *pool, struct size_class *class,
> +                       struct page *first_page)
> +{
> +       struct page *page = first_page;
> +       unsigned long offset = 0, handle, idx, objs;
> +       int ret = 0;
> +
> +       BUG_ON(!is_first_page(page));
> +       BUG_ON(!pool->ops);
> +       BUG_ON(!pool->ops->evict);
> +
> +       while (page) {
> +               objs = DIV_ROUND_UP(PAGE_SIZE - offset, class->size);
> +
> +               for (idx = 0; idx < objs; idx++) {
> +                       handle = (unsigned long)obj_location_to_handle(page,
> +                                                                       idx);
> +                       if (!obj_handle_is_free(first_page, class, handle))
> +                               ret = pool->ops->evict(pool, handle);

call zswap_writeback_entry() under class->lock.

> +                       if (ret)
> +                               return ret;
> +                       else
> +                               obj_free(handle, page, offset);
> +               }
> +
> +               page = get_next_page(page);
> +               if (page)
> +                       offset = page->index;
> +       }
> +
> +       return 0;
> +}
> +
>  static struct page *find_get_zspage(struct size_class *class)
>  {
>         int i;
> @@ -856,7 +961,7 @@ fail:
>   * On success, a pointer to the newly created pool is returned,
>   * otherwise NULL.
>   */
> -struct zs_pool *zs_create_pool(gfp_t flags)
> +struct zs_pool *zs_create_pool(gfp_t flags, struct zs_ops *ops)
>  {
>         int i, ovhd_size;
>         struct zs_pool *pool;
> @@ -883,6 +988,7 @@ struct zs_pool *zs_create_pool(gfp_t flags)
>         }
>
>         pool->flags = flags;
> +       pool->ops = ops;
>
>         return pool;
>  }
> @@ -968,7 +1074,6 @@ EXPORT_SYMBOL_GPL(zs_malloc);
>
>  void zs_free(struct zs_pool *pool, unsigned long obj)
>  {
> -       struct link_free *link;
>         struct page *first_page, *f_page;
>         unsigned long f_objidx, f_offset;
>
> @@ -988,14 +1093,8 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
>
>         spin_lock(&class->lock);
>
> -       /* Insert this object in containing zspage's freelist */
> -       link = (struct link_free *)((unsigned char *)kmap_atomic(f_page)
> -                                                       + f_offset);
> -       link->next = first_page->freelist;
> -       kunmap_atomic(link);
> -       first_page->freelist = (void *)obj;
> +       obj_free(obj, f_page, f_offset);
>
> -       first_page->inuse--;
>         fullness = fix_fullness_group(pool, first_page);
>
>         if (fullness == ZS_EMPTY)
> @@ -1008,6 +1107,45 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
>  }
>  EXPORT_SYMBOL_GPL(zs_free);
>
> +int zs_shrink(struct zs_pool *pool, size_t size)
> +{
> +       struct size_class *class;
> +       struct page *first_page;
> +       enum fullness_group fullness;
> +       int ret;
> +
> +       if (!pool->ops || !pool->ops->evict)
> +               return -EINVAL;
> +
> +       /* the class is returned locked */
> +       class = get_class_to_shrink(pool, size);
> +       if (!class)
> +               return -ENOENT;
> +
> +       first_page = find_get_zspage(class);
> +       if (!first_page) {
> +               spin_unlock(&class->lock);
> +               return -ENOENT;
> +       }
> +
> +       ret = reclaim_zspage(pool, class, first_page);
> +
> +       if (ret) {
> +               fullness = fix_fullness_group(pool, first_page);
> +
> +               if (fullness == ZS_EMPTY)
> +                       class->pages_allocated -= class->pages_per_zspage;
> +       }
> +
> +       spin_unlock(&class->lock);
> +
> +       if (!ret || fullness == ZS_EMPTY)
> +               free_zspage(first_page);
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(zs_shrink);
> +
>  /**
>   * zs_map_object - get address of allocated object from handle.
>   * @pool: pool from which the object was allocated
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-04-26  8:37 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-19 15:52 [PATCH 0/4] mm: zpool: add common api for zswap to use zbud/zsmalloc Dan Streetman
2014-04-19 15:52 ` [PATCH 1/4] mm: zpool: zbud_alloc() minor param change Dan Streetman
2014-04-19 15:52 ` [PATCH 2/4] mm: zpool: implement zsmalloc shrinking Dan Streetman
2014-04-26  8:37   ` Weijie Yang [this message]
2014-04-27  4:13     ` Dan Streetman
2014-05-02 20:01     ` Seth Jennings
2014-05-04 20:38       ` Dan Streetman
2014-04-19 15:52 ` [PATCH 3/4] mm: zpool: implement common zpool api to zbud/zsmalloc Dan Streetman
2014-04-22 10:05   ` Sergey Senozhatsky
2014-04-22 13:43     ` Dan Streetman
2014-04-19 15:52 ` [PATCH 4/4] mm: zpool: update zswap to use zpool Dan Streetman
2014-04-21  2:47 ` [PATCH 0/4] mm: zpool: add common api for zswap to use zbud/zsmalloc Weijie Yang
2014-05-07 21:51 ` [PATCHv2 0/4] mm/zpool: " Dan Streetman
2014-05-07 21:51   ` [PATCHv2 1/4] mm/zbud: zbud_alloc() minor param change Dan Streetman
2014-05-09  3:33     ` Seth Jennings
2014-05-07 21:51   ` [PATCH 2/4] mm/zbud: change zbud_alloc size type to size_t Dan Streetman
2014-05-09  3:33     ` Seth Jennings
2014-05-07 21:51   ` [PATCHv2 3/4] mm/zpool: implement common zpool api to zbud/zsmalloc Dan Streetman
2014-05-09  4:13     ` Seth Jennings
2014-05-10 16:06       ` Dan Streetman
2014-05-07 21:51   ` [PATCHv2 4/4] mm/zswap: update zswap to use zpool Dan Streetman
2014-05-24 19:06   ` [PATCHv3 0/6] mm/zpool: add common api for zswap to use zbud/zsmalloc Dan Streetman
2014-05-24 19:06     ` [PATCHv2 1/6] mm/zbud: zbud_alloc() minor param change Dan Streetman
2014-05-24 19:06     ` [PATCH 2/6] mm/zbud: change zbud_alloc size type to size_t Dan Streetman
2014-05-24 19:06     ` [PATCHv3 3/6] mm/zpool: implement common zpool api to zbud/zsmalloc Dan Streetman
2014-05-27 22:06       ` Seth Jennings
2014-05-27 22:48         ` Seth Jennings
2014-05-28  0:06         ` Dan Streetman
2014-05-29  3:48           ` Seth Jennings
2014-05-24 19:06     ` [PATCH 4/6] mm/zpool: zbud/zsmalloc implement zpool Dan Streetman
2014-05-24 19:06     ` [PATCHv3 5/6] mm/zpool: update zswap to use zpool Dan Streetman
2014-05-24 19:06     ` [PATCH 6/6] mm/zpool: prevent zbud/zsmalloc from unloading when used Dan Streetman
2014-05-27 22:40       ` Seth Jennings
2014-05-28  0:40         ` Dan Streetman
2014-05-27 22:44     ` [PATCHv3 0/6] mm/zpool: add common api for zswap to use zbud/zsmalloc Seth Jennings
2014-06-02 22:19     ` [PATCHv4 " Dan Streetman
2014-06-02 22:19       ` [PATCHv2 1/6] mm/zbud: zbud_alloc() minor param change Dan Streetman
2014-06-23 21:19         ` Andrew Morton
2014-06-24 15:24           ` Dan Streetman
2014-06-02 22:19       ` [PATCH 2/6] mm/zbud: change zbud_alloc size type to size_t Dan Streetman
2014-06-02 22:19       ` [PATCHv4 3/6] mm/zpool: implement common zpool api to zbud/zsmalloc Dan Streetman
2014-06-23 21:46         ` Andrew Morton
2014-06-24 15:39           ` Dan Streetman
2014-06-24 23:08             ` Andrew Morton
2014-06-27 17:11               ` Dan Streetman
2014-06-27 19:17                 ` Andrew Morton
2014-06-02 22:19       ` [PATCHv2 4/6] mm/zpool: zbud/zsmalloc implement zpool Dan Streetman
2014-06-02 22:19       ` [PATCHv4 5/6] mm/zpool: update zswap to use zpool Dan Streetman
2014-06-02 22:19       ` [PATCHv2 6/6] mm/zpool: prevent zbud/zsmalloc from unloading when used Dan Streetman
2014-06-23 21:48         ` Andrew Morton
2014-06-24 15:41           ` Dan Streetman
2014-06-04  1:38       ` [PATCHv4 0/6] mm/zpool: add common api for zswap to use zbud/zsmalloc Bob Liu
2014-06-06 21:01       ` Seth Jennings
2014-07-02 21:43       ` [PATCHv5 0/4] " Dan Streetman
2014-07-02 21:45       ` Dan Streetman
2014-07-02 21:45         ` [PATCHv2 1/4] mm/zbud: change zbud_alloc size type to size_t Dan Streetman
2014-07-02 21:45         ` [PATCHv5 2/4] mm/zpool: implement common zpool api to zbud/zsmalloc Dan Streetman
2014-07-02 21:45         ` [PATCHv3 3/4] mm/zpool: zbud/zsmalloc implement zpool Dan Streetman
2014-07-02 21:45         ` [PATCHv5 4/4] mm/zpool: update zswap to use zpool Dan Streetman
2014-07-14 18:10         ` [PATCHv5 0/4] mm/zpool: add common api for zswap to use zbud/zsmalloc Dan Streetman
2014-07-16 20:59           ` Seth Jennings
2014-07-16 21:05             ` Dan Streetman
2014-07-16 22:00               ` Seth Jennings
2014-07-25 16:59                 ` Dan Streetman
2014-07-28 20:40                   ` Seth Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL1ERfMPcfyUeACnmZ2QF5WxJUQ2PaKbtRzis8sPbQsjnvf_GQ@mail.gmail.com \
    --to=weijie.yang.kh@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bob.liu@oracle.com \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=riel@redhat.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=sjennings@variantweb.net \
    --cc=weijie.yang@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).