All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Wang <liwang@ubuntukylin.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Zefan Li <lizefan@huawei.com>, Matthew Wilcox <matthew@wil.cx>,
	Yunchuan Wen <yunchuanwen@ubuntukylin.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 2/3] Add shrink_pagecache_parent
Date: Wed, 08 Jan 2014 10:06:31 +0800	[thread overview]
Message-ID: <52CCB2A7.2000300@ubuntukylin.com> (raw)
In-Reply-To: <20140102155534.9b0cd498209d835d0c93837e@linux-foundation.org>

Hi,

On 01/03/2014 07:55 AM, Andrew Morton wrote:
> On Mon, 30 Dec 2013 21:45:17 +0800 Li Wang <liwang@ubuntukylin.com> wrote:
>
>> Analogous to shrink_dcache_parent except that it collects inodes.
>> It is not very appropriate to be put in dcache.c, but d_walk can only
>> be invoked from here.
>
> Please cc Dave Chinner on future revisions.  He be da man.
>
> The overall intent of the patchset seems reasonable and I agree that it
> can't be efficiently done from userspace with the current kernel API.
> We *could* do it from userspace by providing facilities for userspace to
> query the VFS caches: "is this pathname in the dentry cache" and "is
> this inode in the inode cache".
>
Even we have these available, i am afraid it will still introduce
non-negligible overhead due to frequent system calls for a directory
  walking operation, especially under massive small file situations.

>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1318,6 +1318,42 @@ void shrink_dcache_parent(struct dentry *parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static enum d_walk_ret gather_inode(void *data, struct dentry *dentry)
>> +{
>> +	struct list_head *list = data;
>> +	struct inode *inode = dentry->d_inode;
>> +
>> +	if ((inode == NULL) || ((!inode_owner_or_capable(inode)) &&
>> +				(!capable(CAP_SYS_ADMIN))))
>> +		goto out;
>> +	spin_lock(&inode->i_lock);
>> +	if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
>
> It's unclear what rationale lies behind this particular group of tests.
>
>> +		(inode->i_mapping->nrpages == 0) ||
>> +		(!list_empty(&inode->i_lru))) {
>
> arg, the "Inode locking rules" at the top of fs/inode.c needs a
> refresh, I suspect.  It is too vague.
>
> Formally, inode->i_lru is protected by
> i_sb->s_inode_lru->node[nid].lock, not by ->i_lock.  I guess you can
> just do a list_lru_add() and that will atomically add the inode to your
> local list_lru if ->i_lru wasn't being used for anything else.
>
> I *think* that your use of i_lock works OK, because code which fiddles
> with i_lru and s_inode_lru also takes i_lock.  However we need to
> decide which is the preferred and official lock.  ie: what is the
> design here??
>
> However...  most inodes will be on an LRU list, won't they?  Doesn't
> this reuse of i_lru mean that many inodes will fail to be processed?
> If so, we might need to add a new list_head to the inode, which will be
> problematic.
>
As far as I know, fix me if i am wrong, only when inode has zero
reference count, it will be put into superblock lru list. For most
situations, there is at least a dentry refers to it, so it will not
be on any lru list.

>
> Aside: inode_lru_isolate() fiddles directly with inode->i_lru without
> taking i_sb->s_inode_lru->node[nid].lock.  Why doesn't this make a
> concurrent s_inode_lru walker go oops??  Should we be using
> list_lru_del() in there?  (which should have been called
> list_lru_del_init(), sigh).
>
It seems inode_lru_isolate() only called by prune_icache_sb() as
a callback function. Before calling it, the caller has hold
the lock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Li Wang <liwang@ubuntukylin.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Zefan Li <lizefan@huawei.com>, Matthew Wilcox <matthew@wil.cx>,
	Yunchuan Wen <yunchuanwen@ubuntukylin.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 2/3] Add shrink_pagecache_parent
Date: Wed, 08 Jan 2014 10:06:31 +0800	[thread overview]
Message-ID: <52CCB2A7.2000300@ubuntukylin.com> (raw)
In-Reply-To: <20140102155534.9b0cd498209d835d0c93837e@linux-foundation.org>

Hi,

On 01/03/2014 07:55 AM, Andrew Morton wrote:
> On Mon, 30 Dec 2013 21:45:17 +0800 Li Wang <liwang@ubuntukylin.com> wrote:
>
>> Analogous to shrink_dcache_parent except that it collects inodes.
>> It is not very appropriate to be put in dcache.c, but d_walk can only
>> be invoked from here.
>
> Please cc Dave Chinner on future revisions.  He be da man.
>
> The overall intent of the patchset seems reasonable and I agree that it
> can't be efficiently done from userspace with the current kernel API.
> We *could* do it from userspace by providing facilities for userspace to
> query the VFS caches: "is this pathname in the dentry cache" and "is
> this inode in the inode cache".
>
Even we have these available, i am afraid it will still introduce
non-negligible overhead due to frequent system calls for a directory
  walking operation, especially under massive small file situations.

>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1318,6 +1318,42 @@ void shrink_dcache_parent(struct dentry *parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static enum d_walk_ret gather_inode(void *data, struct dentry *dentry)
>> +{
>> +	struct list_head *list = data;
>> +	struct inode *inode = dentry->d_inode;
>> +
>> +	if ((inode == NULL) || ((!inode_owner_or_capable(inode)) &&
>> +				(!capable(CAP_SYS_ADMIN))))
>> +		goto out;
>> +	spin_lock(&inode->i_lock);
>> +	if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
>
> It's unclear what rationale lies behind this particular group of tests.
>
>> +		(inode->i_mapping->nrpages == 0) ||
>> +		(!list_empty(&inode->i_lru))) {
>
> arg, the "Inode locking rules" at the top of fs/inode.c needs a
> refresh, I suspect.  It is too vague.
>
> Formally, inode->i_lru is protected by
> i_sb->s_inode_lru->node[nid].lock, not by ->i_lock.  I guess you can
> just do a list_lru_add() and that will atomically add the inode to your
> local list_lru if ->i_lru wasn't being used for anything else.
>
> I *think* that your use of i_lock works OK, because code which fiddles
> with i_lru and s_inode_lru also takes i_lock.  However we need to
> decide which is the preferred and official lock.  ie: what is the
> design here??
>
> However...  most inodes will be on an LRU list, won't they?  Doesn't
> this reuse of i_lru mean that many inodes will fail to be processed?
> If so, we might need to add a new list_head to the inode, which will be
> problematic.
>
As far as I know, fix me if i am wrong, only when inode has zero
reference count, it will be put into superblock lru list. For most
situations, there is at least a dentry refers to it, so it will not
be on any lru list.

>
> Aside: inode_lru_isolate() fiddles directly with inode->i_lru without
> taking i_sb->s_inode_lru->node[nid].lock.  Why doesn't this make a
> concurrent s_inode_lru walker go oops??  Should we be using
> list_lru_del() in there?  (which should have been called
> list_lru_del_init(), sigh).
>
It seems inode_lru_isolate() only called by prune_icache_sb() as
a callback function. Before calling it, the caller has hold
the lock.

  parent reply	other threads:[~2014-01-08  2:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-30 13:45 [PATCH 0/3] Fadvise: Directory level page cache cleaning support Li Wang
2013-12-30 13:45 ` Li Wang
2013-12-30 13:45 ` [PATCH 1/3] VFS: Add the declaration of shrink_pagecache_parent Li Wang
2013-12-30 13:45   ` Li Wang
2013-12-30 13:45 ` [PATCH 2/3] Add shrink_pagecache_parent Li Wang
2013-12-30 13:45   ` Li Wang
2014-01-02 23:55   ` Andrew Morton
2014-01-02 23:55     ` Andrew Morton
2014-01-06 13:30     ` Dave Chinner
2014-01-06 13:30       ` Dave Chinner
2014-01-08  2:06     ` Li Wang [this message]
2014-01-08  2:06       ` Li Wang
2014-01-15  0:22       ` Dave Chinner
2014-01-15  0:22         ` Dave Chinner
2013-12-30 13:45 ` [PATCH 3/3] Fadvise: Add the ability for directory level page cache cleaning Li Wang
2013-12-30 13:45   ` Li Wang
2013-12-30 14:57 ` [PATCH 0/3] Fadvise: Directory level page cache cleaning support Matthew Wilcox
2013-12-30 14:57   ` Matthew Wilcox
2013-12-30 19:18 ` Dave Hansen
2013-12-30 19:18   ` Dave Hansen
2013-12-30 19:40   ` Andreas Dilger
2013-12-30 19:40     ` Andreas Dilger
2013-12-30 21:33     ` Dave Hansen
2013-12-30 21:33       ` Dave Hansen
2014-01-02 12:44       ` Li Wang
2014-01-02 12:44         ` Li Wang
2014-01-02 18:35         ` Dave Hansen
2014-01-02 18:35           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CCB2A7.2000300@ubuntukylin.com \
    --to=liwang@ubuntukylin.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=matthew@wil.cx \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yunchuanwen@ubuntukylin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.