linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Glauber Costa <glommer@openvz.org>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Thelen <gthelen@google.com>,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v4 08/31] list: add a new LRU list type
Date: Tue, 30 Apr 2013 16:18:54 +0100	[thread overview]
Message-ID: <20130430151854.GH6415@suse.de> (raw)
In-Reply-To: <1367018367-11278-9-git-send-email-glommer@openvz.org>

On Sat, Apr 27, 2013 at 03:19:04AM +0400, Glauber Costa wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Several subsystems use the same construct for LRU lists - a list
> head, a spin lock and and item count. They also use exactly the same
> code for adding and removing items from the LRU. Create a generic
> type for these LRU lists.
> 
> This is the beginning of generic, node aware LRUs for shrinkers to
> work with.
> 
> [ glommer: enum defined constants for lru. Suggested by gthelen,
>   don't relock over retry ]
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
> Reviewed-by: Greg Thelen <gthelen@google.com>
> ---
>  include/linux/list_lru.h |  45 ++++++++++++++++++
>  lib/Makefile             |   2 +-
>  lib/list_lru.c           | 118 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 164 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/list_lru.h
>  create mode 100644 lib/list_lru.c
> 
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> new file mode 100644
> index 0000000..c0b796d
> --- /dev/null
> +++ b/include/linux/list_lru.h
> @@ -0,0 +1,45 @@
> +/*
> + * Copyright (c) 2010-2012 Red Hat, Inc. All rights reserved.
> + * Author: David Chinner
> + *
> + * Generic LRU infrastructure
> + */
> +#ifndef _LRU_LIST_H
> +#define _LRU_LIST_H
> +
> +#include <linux/list.h>
> +
> +enum lru_status {
> +	LRU_REMOVED,		/* item removed from list */
> +	LRU_ROTATE,		/* item referenced, give another pass */
> +	LRU_SKIP,		/* item cannot be locked, skip */
> +	LRU_RETRY,		/* item not freeable. May drop the lock
> +				   internally, but has to return locked. */
> +};
> +
> +struct list_lru {
> +	spinlock_t		lock;
> +	struct list_head	list;
> +	long			nr_items;
> +};

Is there ever a circumstance where nr_items is negative? If so, what
does that mean?

> +
> +int list_lru_init(struct list_lru *lru);

I was going to complain that this always returns 0 and should be void
but that changes later.

> +int list_lru_add(struct list_lru *lru, struct list_head *item);
> +int list_lru_del(struct list_lru *lru, struct list_head *item);
> +

However, these are bool and the return value determines if the item was
really added to the list or not. It fails if the item is already part of
a list and it would be very nice to have a comment explaining why it's
not a bug if this happens because it feels like it would be a lookup and
insertion race. Maybe it's clear later in the series why this is ok but
it's not very obvious at this point.

> +static inline long list_lru_count(struct list_lru *lru)
> +{
> +	return lru->nr_items;
> +}
> +
> +typedef enum lru_status
> +(*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg);
> +
> +typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list);
> +
> +long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
> +		   void *cb_arg, long nr_to_walk);
> +

Is nr_to_walk ever negative?

> +long list_lru_dispose_all(struct list_lru *lru, list_lru_dispose_cb dispose);
> +
> +#endif /* _LRU_LIST_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index af79e8c..40a6d4a 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
>  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
>  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> -	 earlycpio.o percpu-refcount.o
> +	 earlycpio.o percpu-refcount.o list_lru.o
>  
>  obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>  lib-$(CONFIG_MMU) += ioremap.o
> diff --git a/lib/list_lru.c b/lib/list_lru.c
> new file mode 100644
> index 0000000..937ee87
> --- /dev/null
> +++ b/lib/list_lru.c
> @@ -0,0 +1,118 @@
> +/*
> + * Copyright (c) 2010-2012 Red Hat, Inc. All rights reserved.
> + * Author: David Chinner
> + *
> + * Generic LRU infrastructure
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/list_lru.h>
> +
> +int
> +list_lru_add(
> +	struct list_lru	*lru,
> +	struct list_head *item)
> +{
> +	spin_lock(&lru->lock);
> +	if (list_empty(item)) {
> +		list_add_tail(item, &lru->list);
> +		lru->nr_items++;
> +		spin_unlock(&lru->lock);
> +		return 1;
> +	}
> +	spin_unlock(&lru->lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(list_lru_add);
> +
> +int
> +list_lru_del(
> +	struct list_lru	*lru,
> +	struct list_head *item)
> +{
> +	spin_lock(&lru->lock);
> +	if (!list_empty(item)) {
> +		list_del_init(item);
> +		lru->nr_items--;
> +		spin_unlock(&lru->lock);
> +		return 1;
> +	}
> +	spin_unlock(&lru->lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(list_lru_del);
> +
> +long
> +list_lru_walk(

It returns long but never actually returns a negative number.

> +	struct list_lru *lru,
> +	list_lru_walk_cb isolate,
> +	void		*cb_arg,
> +	long		nr_to_walk)
> +{
> +	struct list_head *item, *n;
> +	long removed = 0;
> +
> +	spin_lock(&lru->lock);
> +restart:
> +	list_for_each_safe(item, n, &lru->list) {
> +		enum lru_status ret;
> +
> +		if (nr_to_walk-- < 0)
> +			break;
> +
> +		ret = isolate(item, &lru->lock, cb_arg);
> +		switch (ret) {
> +		case LRU_REMOVED:
> +			lru->nr_items--;
> +			removed++;
> +			break;
> +		case LRU_ROTATE:
> +			list_move_tail(item, &lru->list);
> +			break;
> +		case LRU_SKIP:
> +			break;
> +		case LRU_RETRY:
> +			goto restart;

Are the two users of LRU_RETRY guaranteed to eventually make progress
or can this infinite loop? It feels like the behaviour of LRU_RETRY is
not very desirable. Once an object that returns LRU_RETRY is isolated on
the list then it looks like XFS can stall for 100 ticks (bit arbitrary)
each time it tries and maybe do this forever.

The inode.c user looks like it could race where some other process is
reallocating the buffers between each time this isolate callback tries to
isolate it. Granted, it may accidentally break out because the spinlock
is contended and it returns LRU_SKIP so maybe no one will hit the problem
but if it's ok with LRU_SKIP then maybe a LRU_RETRY_ONCE would also be
suitable and use that in fs/inode.c?

Either hypothetical situation would require that the list you are trying
to walk is very small but maybe memcg will hit that problem. If there is
a guarantee of forward progress then a comment would be nice.

> +		default:
> +			BUG();
> +		}
> +	}
> +	spin_unlock(&lru->lock);
> +	return removed;
> +}
> +EXPORT_SYMBOL_GPL(list_lru_walk);
> +
> +long
> +list_lru_dispose_all(
> +	struct list_lru *lru,
> +	list_lru_dispose_cb dispose)
> +{
> +	long disposed = 0;
> +	LIST_HEAD(dispose_list);
> +
> +	spin_lock(&lru->lock);
> +	while (!list_empty(&lru->list)) {
> +		list_splice_init(&lru->list, &dispose_list);
> +		disposed += lru->nr_items;
> +		lru->nr_items = 0;
> +		spin_unlock(&lru->lock);
> +
> +		dispose(&dispose_list);
> +
> +		spin_lock(&lru->lock);
> +	}
> +	spin_unlock(&lru->lock);
> +	return disposed;
> +}
> +
> +int
> +list_lru_init(
> +	struct list_lru	*lru)
> +{
> +	spin_lock_init(&lru->lock);
> +	INIT_LIST_HEAD(&lru->list);
> +	lru->nr_items = 0;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(list_lru_init);

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-04-30 15:18 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-26 23:18 [PATCH v4 00/31] kmemcg shrinkers Glauber Costa
2013-04-26 23:18 ` [PATCH v4 01/31] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-04-30 13:03   ` Mel Gorman
2013-04-26 23:18 ` [PATCH v4 02/31] vmscan: take at least one pass with shrinkers Glauber Costa
2013-04-30 13:22   ` Mel Gorman
2013-04-30 13:31     ` Glauber Costa
2013-04-30 15:37       ` Mel Gorman
2013-05-07 13:35         ` Glauber Costa
2013-04-26 23:18 ` [PATCH v4 03/31] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-04-30 13:37   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 04/31] dentry: move to per-sb LRU locks Glauber Costa
2013-04-30 14:01   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 05/31] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-04-30 14:14   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 06/31] mm: new shrinker API Glauber Costa
2013-04-30 14:40   ` Mel Gorman
2013-04-30 15:03     ` Glauber Costa
2013-04-30 15:32       ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 07/31] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-04-30 14:49   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 08/31] list: add a new LRU list type Glauber Costa
2013-04-30 15:18   ` Mel Gorman [this message]
2013-04-30 16:01     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 09/31] inode: convert inode lru list to generic lru list code Glauber Costa
2013-04-30 15:46   ` Mel Gorman
2013-05-07 13:47     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 10/31] dcache: convert to use new lru list infrastructure Glauber Costa
2013-04-30 16:04   ` Mel Gorman
2013-04-30 16:13     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 11/31] list_lru: per-node " Glauber Costa
2013-04-30 16:33   ` Mel Gorman
2013-04-30 21:44     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 12/31] shrinker: add node awareness Glauber Costa
2013-04-30 16:35   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 13/31] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-04-30 17:39   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 14/31] xfs: convert buftarg LRU to generic code Glauber Costa
2013-04-26 23:19 ` [PATCH v4 15/31] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-04-26 23:19 ` [PATCH v4 16/31] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-04-26 23:19 ` [PATCH v4 17/31] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-04-30 21:53   ` Mel Gorman
2013-04-30 22:00     ` Kent Overstreet
2013-05-02  9:37       ` Mel Gorman
2013-05-02 13:37         ` Glauber Costa
2013-05-01 15:26     ` Daniel Vetter
2013-05-02  9:31       ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 18/31] shrinker: convert remaining shrinkers to " Glauber Costa
2013-04-26 23:19 ` [PATCH v4 19/31] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-04-26 23:19 ` [PATCH v4 20/31] shrinker: Kill old ->shrink API Glauber Costa
2013-04-30 21:57   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 21/31] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-04-26 23:19 ` [PATCH v4 22/31] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-04-26 23:19 ` [PATCH v4 23/31] lru: add an element to a memcg list Glauber Costa
2013-04-26 23:19 ` [PATCH v4 24/31] list_lru: per-memcg walks Glauber Costa
2013-04-26 23:19 ` [PATCH v4 25/31] memcg: per-memcg kmem shrinking Glauber Costa
2013-04-26 23:19 ` [PATCH v4 26/31] memcg: scan cache objects hierarchically Glauber Costa
2013-04-26 23:19 ` [PATCH v4 27/31] super: targeted memcg reclaim Glauber Costa
2013-04-26 23:19 ` [PATCH v4 28/31] memcg: move initialization to memcg creation Glauber Costa
2013-04-26 23:19 ` [PATCH v4 29/31] vmpressure: in-kernel notifications Glauber Costa
2013-04-26 23:19 ` [PATCH v4 30/31] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-04-26 23:19 ` [PATCH v4 31/31] memcg: debugging facility to access dangling memcgs Glauber Costa
2013-04-30 22:47 ` [PATCH v4 00/31] kmemcg shrinkers Mel Gorman
2013-05-01  9:05   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130430151854.GH6415@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=glommer@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).