Re: [PATCH 2/3] Add shrink_pagecache_parent

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Li Wang <liwang@ubuntukylin.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Zefan Li <lizefan@huawei.com>, Matthew Wilcox <matthew@wil.cx>,
	Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Subject: Re: [PATCH 2/3] Add shrink_pagecache_parent
Date: Tue, 7 Jan 2014 00:30:49 +1100	[thread overview]
Message-ID: <20140106133049.GB5145@destitution> (raw)
In-Reply-To: <20140102155534.9b0cd498209d835d0c93837e@linux-foundation.org>

On Thu, Jan 02, 2014 at 03:55:34PM -0800, Andrew Morton wrote:
> On Mon, 30 Dec 2013 21:45:17 +0800 Li Wang <liwang@ubuntukylin.com> wrote:
> 
> > Analogous to shrink_dcache_parent except that it collects inodes.
> > It is not very appropriate to be put in dcache.c, but d_walk can only
> > be invoked from here.
> 
> Please cc Dave Chinner on future revisions.  He be da man.
> 
> The overall intent of the patchset seems reasonable and I agree that it
> can't be efficiently done from userspace with the current kernel API. 
> We *could* do it from userspace by providing facilities for userspace to
> query the VFS caches: "is this pathname in the dentry cache" and "is
> this inode in the inode cache".
> 
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -1318,6 +1318,42 @@ void shrink_dcache_parent(struct dentry *parent)
> >  }
> >  EXPORT_SYMBOL(shrink_dcache_parent);
> >  
> > +static enum d_walk_ret gather_inode(void *data, struct dentry *dentry)
> > +{
> > +	struct list_head *list = data;
> > +	struct inode *inode = dentry->d_inode;
> > +
> > +	if ((inode == NULL) || ((!inode_owner_or_capable(inode)) &&
> > +				(!capable(CAP_SYS_ADMIN))))
> > +		goto out;
> > +	spin_lock(&inode->i_lock);
> > +	if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> 
> It's unclear what rationale lies behind this particular group of tests.
> 
> > +		(inode->i_mapping->nrpages == 0) ||
> > +		(!list_empty(&inode->i_lru))) {
> 
> arg, the "Inode locking rules" at the top of fs/inode.c needs a
> refresh, I suspect.  It is too vague.

Yes, it probably does need work.

> Formally, inode->i_lru is protected by
> i_sb->s_inode_lru->node[nid].lock, not by ->i_lock.  I guess you can
> just do a list_lru_add() and that will atomically add the inode to your
> local list_lru if ->i_lru wasn't being used for anything else.

There is no such thing as a "local" list_lru. If you need to put an
object on a local list, then just use a local struct list_head.
That's how we do dispose lists for the objects being removed from
the LRU...

However, the only way you can check if the i_lru is not in use is to
hold the relevant LRU lock, and that's something that should not be
directly accessed - the internal locking of the LRU is private,
subject to change and as such is only accessible in th places that
it is explicitly exposed. i.e. the ->isolate callback.

> I *think* that your use of i_lock works OK, because code which fiddles
> with i_lru and s_inode_lru also takes i_lock.  However we need to
> decide which is the preferred and official lock.  ie: what is the
> design here??

THe LRU lock nests inside the i_lock. The i_lock does not provide
exclusive access to i_lru if the inode is on the LRU; LRU list
manipulations can modify i_lru (e.g. removing an adjacent inode in
the LRU list) without holding i_lock....

> However...  most inodes will be on an LRU list, won't they?  Doesn't
> this reuse of i_lru mean that many inodes will fail to be processed? 
> If so, we might need to add a new list_head to the inode, which will be
> problematic.

Yes, yes, and yes, adding a new list head to the struct inode for
such an uncommon corner case is a non-starter.

> Aside: inode_lru_isolate() fiddles directly with inode->i_lru without
> taking i_sb->s_inode_lru->node[nid].lock.  Why doesn't this make a
> concurrent s_inode_lru walker go oops??  Should we be using
> list_lru_del() in there? 

No, inode_lru_isoalte() is called with the lru lock held. The
specific list lock is passed as the lru_lock parameter, so it can be
droppped if a blocking operation needs to be done to prepare the
object for isolation.  So, calling list_lru_del() will deadlock on
the LRU lock.

> (which should have been called list_lru_del_init(), sigh).

That implies that removing the object from the LRU without
initialising the object being removed is a valid thing to do. It's
not - the lru_list code requires that an object not on an LRU is in
an intialised state so that list_empty() checks work correctly. i.e
list_lru_del(object); list_lru_add(object); needs to work, and that
is non-negotiable. So, no need for suffixes to define different
behaviours - there can be only one...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-01-08  8:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-30 13:45 [PATCH 0/3] Fadvise: Directory level page cache cleaning support Li Wang
2013-12-30 13:45 ` [PATCH 1/3] VFS: Add the declaration of shrink_pagecache_parent Li Wang
2013-12-30 13:45 ` [PATCH 2/3] Add shrink_pagecache_parent Li Wang
2014-01-02 23:55   ` Andrew Morton
2014-01-06 13:30     ` Dave Chinner [this message]
2014-01-08  2:06     ` Li Wang
2014-01-15  0:22       ` Dave Chinner
2013-12-30 13:45 ` [PATCH 3/3] Fadvise: Add the ability for directory level page cache cleaning Li Wang
2013-12-30 14:57 ` [PATCH 0/3] Fadvise: Directory level page cache cleaning support Matthew Wilcox
2013-12-30 19:18 ` Dave Hansen
2013-12-30 19:40   ` Andreas Dilger
2013-12-30 21:33     ` Dave Hansen
2014-01-02 12:44       ` Li Wang
2014-01-02 18:35         ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140106133049.GB5145@destitution \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwang@ubuntukylin.com \
    --cc=lizefan@huawei.com \
    --cc=matthew@wil.cx \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yunchuanwen@ubuntukylin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox