linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, xfs@oss.sgi.com,
	Greg Thelen <gthelen@google.com>, Ying Han <yinghan@google.com>,
	Suleiman Souhlal <suleiman@google.com>
Subject: Re: [PATCH 09/19] list_lru: per-node list infrastructure
Date: Fri, 18 Jan 2013 11:10:29 +1100	[thread overview]
Message-ID: <20130118001029.GK2498@dastard> (raw)
In-Reply-To: <50F84118.7030608@parallels.com>

On Thu, Jan 17, 2013 at 10:21:12AM -0800, Glauber Costa wrote:
> >> Deepest fears:
> >>
> >> 1) snakes.
> > 
> > Snakes are merely poisonous. Drop Bears are far more dangerous :P
> 
> fears are irrational anyway...
> 
> >> 2) It won't surprise you to know that I am adapting your work, which
> >> provides a very sane and helpful API, to memcg shrinking.
> >>
> >> The dumb and simple approach in there is to copy all lrus that are
> >> marked memcg aware at memcg creation time. The API is kept the same,
> >> but when you do something like list_lru_add(lru, obj), for instance, we
> >> derive the memcg context from obj and relay it to the right list.
> > 
> > At which point, you don't want the overhead of per-node lists.
> 
> This is one of the assumptions we may have to end up doing here.

*nod*. Good to get it out in the open to see if we can work around
it....

> > This is a problem that superblock contexts don't care about - they
> > are global by their very nature. Hence I'm wondering if trying to
> > fit these two very different behaviours into the one LRU list is
> > the wrong approach.
> > 
> 
> I am not that much concerned about that, honestly. I like the API, and I
> like the fact that it allow me to have the subsystems using it
> transparently, just by referring to the "master" lru (the dentry, inode,
> etc). It reduces complexity to reuse the data structures, but that is
> not paramount.
> 
> However, a more flexible data structure in which we could select at
> least at creation time if we want per-node lists or not, would be quite
> helpful.

*nod*

> > Consider this: these patches give us a generic LRU list structure.
> > It currently uses a list_head in each object for indexing, and we
> > are talking about single LRU lists because of this limitation and
> > trying to build infrastructure that can support this indexing
> > mechanism.
> > 
> > I think that all of thses problems go away if we replace the
> > list_head index in the object with a "struct lru_item" index. To
> > start with, it's just a s/list_head/lru_item/ changeover, but from
> > there we can expand.
> > 
> > What I'm getting at is that we want to have multiple axis of
> > tracking and reclaim, but we only have a single axis for tracking.
> > If the lru_item grew a second list_head called "memcg_lru", then
> > suddenly the memcg LRUs can be maintained separately to the global
> > (per-superblock) LRU. i.e.:
> > 
> > struct lru_item {
> > 	struct list_head global_list;
> > 	struct list_head memcg_list;
> > }
> > 
> 
> I may be misunderstanding you, but that is not how I see it. Your global
> list AFAIU, is more like a hook to keep the lists together. The actual
> accesses to it are controlled by a parent structure, like the
> super-block, which in turns, embeds a shrinker.
> 
> So we get (in the sb case), from shrinker to sb, and from sb to dentry
> list (or inode). We never care about the global list head.
> 
> From this point on, we "entered" the LRU, but we still don't know which
> list to reclaim from: there is one list per node, and we need to figure
> out which is our target, based on the flags.
> 
> This list selection mechanism is where I am usually hooking memcg: and
> for the same way you are using an array - given a node, you want fast
> access to the underlying list - so am I. Given the memcg context, I want
> to get to the corresponding memcg list.
> 
> Now, in my earliest implementations, the memcg would still take me to a
> node-wide array, and an extra level would be required. We seem to agree
> that (at least as a starting point) getting rid of this extra level, so
> the memcg colapses all objects in the same list would provide decent
> behavior in most cases, while still keeping the footprint manageable. So
> that is what I am pursuing at the moment.

Ah, I think that maybe you misunderstood. There are two main
triggers for reclaim: global memory is short, or a memcg is short on
memory.

To find appropriate objects quickly for reclaim, we need objects on
appropriate lists. E.g. if we are doing global reclaim (e.g. from
kswapd) it means a node is short of memory and needs more. Hence
just walking a per-node list is the most efficient method of doing
this. Having to walk all the memcg LRU lists to find objects on a
specific node is not feasible. OTOH, if we are doing memcg specific
reclaim, the opposite is true.

SO, if we have:

struct lru_list_head {
	struct list_head	head;
	spinlock_t		lock;
	u64			nr_items;
}

struct lru_list {
	struct lru_list_node	*per_node;
	int			numnodes;
	nodemask_t		active_nodes;
	void			*memcg_lists;	/* managed by memcg code */
	....
}

lru_list_init(bool per_node)
{
	numnodes = 1;
	if (pernode)
		numnodes = NRNODES;
	lru_list->pernode = alloc(numnodes * sizeof(struct lru_list_head));
	....
}

And then each object uses:

struct lru_item {
	struct list_head global_list;
	struct list_head memcg_list;
}

and we end up with:

lru_add(struct lru_list *lru, struct lru_item *item)
{
	node_id = min(object_to_nid(item), lru->numnodes);
	
	__lru_add(lru, node_id, &item->global_list);
	if (memcg) {
		memcg_lru = find_memcg_lru(lru->memcg_lists, memcg_id)
		__lru_add_(memcg_lru, node_id, &item->memcg_list);
	}
}

Then when it comes to reclaim, the reclaim information passed tot eh
shrinker needs to indicate either the node to reclaim from or the
memcg_id (or both), so that reclaim can walk the appropriate list to
find objects to reclaim. Then we delete them from both lists and
reclaim the object....

And the memcg lists can instantiate new struct lru_list and index
them appropriately according to some <handwave> criteria. Then all
the generic LRU code cares about is that the memcg lookup returns
the correct struct lru_list for it to operate on...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-01-18  0:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-27 23:14 [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers Dave Chinner
2012-11-27 23:14 ` [PATCH 01/19] dcache: convert dentry_stat.nr_unused to per-cpu counters Dave Chinner
2012-11-27 23:14 ` [PATCH 02/19] dentry: move to per-sb LRU locks Dave Chinner
2012-11-27 23:14 ` [PATCH 03/19] dcache: remove dentries from LRU before putting on dispose list Dave Chinner
2012-11-27 23:14 ` [PATCH 04/19] mm: new shrinker API Dave Chinner
2012-11-27 23:14 ` [PATCH 05/19] shrinker: convert superblock shrinkers to new API Dave Chinner
2012-12-20 11:06   ` Glauber Costa
2012-12-21  1:46     ` Dave Chinner
2012-12-21 10:17       ` Glauber Costa
2012-11-27 23:14 ` [PATCH 06/19] list: add a new LRU list type Dave Chinner
2012-11-28 16:10   ` Christoph Hellwig
2012-11-27 23:14 ` [PATCH 07/19] inode: convert inode lru list to generic lru list code Dave Chinner
2012-11-27 23:14 ` [PATCH 08/19] dcache: convert to use new lru list infrastructure Dave Chinner
2012-11-27 23:14 ` [PATCH 09/19] list_lru: per-node " Dave Chinner
2012-12-20 11:21   ` Glauber Costa
2012-12-21  1:54     ` Dave Chinner
2013-01-16 19:21   ` Glauber Costa
2013-01-16 22:55     ` Dave Chinner
2013-01-17  0:35       ` Glauber Costa
2013-01-17  4:22         ` Dave Chinner
2013-01-17 18:21           ` Glauber Costa
2013-01-18  0:10             ` Dave Chinner [this message]
2013-01-18  0:14               ` Glauber Costa
2013-01-18  8:11                 ` Dave Chinner
2013-01-18 19:10                   ` Glauber Costa
2013-01-19  0:10                     ` Dave Chinner
2013-01-19  0:13                       ` Glauber Costa
2013-01-18  0:51               ` Glauber Costa
2013-01-18  8:08                 ` Dave Chinner
2013-01-18 19:01                   ` Glauber Costa
2012-11-27 23:14 ` [PATCH 10/19] shrinker: add node awareness Dave Chinner
2012-11-27 23:14 ` [PATCH 11/19] fs: convert inode and dentry shrinking to be node aware Dave Chinner
2012-11-27 23:14 ` [PATCH 12/19] xfs: convert buftarg LRU to generic code Dave Chinner
2012-11-27 23:14 ` [PATCH 13/19] xfs: Node aware direct inode reclaim Dave Chinner
2012-11-27 23:14 ` [PATCH 14/19] xfs: use generic AG walk for background " Dave Chinner
2012-11-27 23:14 ` [PATCH 15/19] xfs: convert dquot cache lru to list_lru Dave Chinner
2012-11-28 16:17   ` Christoph Hellwig
2012-11-27 23:14 ` [PATCH 16/19] fs: convert fs shrinkers to new scan/count API Dave Chinner
2012-11-27 23:14 ` [PATCH 17/19] drivers: convert shrinkers to new count/scan API Dave Chinner
2012-11-28  1:13   ` Chris Wilson
2012-11-28  3:17     ` Dave Chinner
2012-11-28  8:21       ` Glauber Costa
2012-11-28 21:28         ` Dave Chinner
2012-11-29 10:29           ` Glauber Costa
2012-11-29 22:02             ` Dave Chinner
2013-06-07 13:37   ` Konrad Rzeszutek Wilk
2012-11-27 23:14 ` [PATCH 18/19] shrinker: convert remaining shrinkers to " Dave Chinner
2012-11-27 23:14 ` [PATCH 19/19] shrinker: Kill old ->shrink API Dave Chinner
2012-11-29 19:02 ` [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers Andi Kleen
2012-11-29 22:09   ` Dave Chinner
2012-12-20 11:45 ` Glauber Costa
2012-12-21  2:50   ` Dave Chinner
2012-12-21 10:41     ` Glauber Costa
2013-01-21 16:08 ` Glauber Costa
2013-01-21 23:21   ` Dave Chinner
2013-01-23 14:36     ` Glauber Costa
2013-01-23 23:46       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130118001029.GK2498@dastard \
    --to=david@fromorbit.com \
    --cc=glommer@parallels.com \
    --cc=gthelen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=suleiman@google.com \
    --cc=xfs@oss.sgi.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).