linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Imran Khan <imran.f.khan@oracle.com>
To: Ian Kent <raven@themaw.net>, Anders Roxell <anders.roxell@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>, Tejun Heo <tj@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Minchan Kim <minchan@kernel.org>,
	Eric Sandeen <sandeen@sandeen.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Rick Lindsley <ricklind@linux.vnet.ibm.com>,
	David Howells <dhowells@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Carlos Maiolino <cmaiolino@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>,
	elver@google.com
Subject: Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read
Date: Fri, 28 Jul 2023 11:06:12 +1000	[thread overview]
Message-ID: <120330e2-2b37-1d0c-de60-18ae66de573f@oracle.com> (raw)
In-Reply-To: <70d667af-661b-6f62-aa29-a3b8610feda6@themaw.net>

Hello Ian,

On 28/7/2023 10:16 am, Ian Kent wrote:
> On 28/7/23 08:00, Ian Kent wrote:
>> On 27/7/23 12:30, Imran Khan wrote:
>>> Hello Ian,
>>> Sorry for late reply. I was about to reply this week.
>>>
>>> On 27/7/2023 10:38 am, Ian Kent wrote:
>>>> On 20/7/23 10:03, Ian Kent wrote:
>>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>>> [...]
>>>>> I do see a problem with recent changes.
>>>>>
>>>>> I'll send this off to Greg after I've done some testing (primarily just
>>>>> compile and function).
>>>>>
>>>>> Here's a patch which describes what I found.
>>>>>
>>>>> Comments are, of course, welcome, ;)
>>>> Anders I was hoping you would check if/what lockdep trace
>>>>
>>>> you get with this patch.
>>>>
>>>>
>>>> Imran, I was hoping you would comment on my change as it
>>>>
>>>> relates to the kernfs_iattr_rwsem changes.
>>>>
>>>>
>>>> Ian
>>>>
>>>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>>>
>>>>> From: Ian Kent <raven@themaw.net>
>>>>>
>>>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>>>
>>>>> The update of the kernfs directory node child count was also protected
>>>>> by the kernfs_rwsem and needs to be included in the change so that the
>>>>> child count (and so the inode n_link attribute) does not change while
>>>>> holding the rwsem for read.
>>>>>
>>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
>>> these are getting invoked while adding (kernfs_add_one),
>>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
>>> operations proceed under kernfs_rwsem and I see each invocation of
>>> kernfs_link/unlink_sibling during the above mentioned operations is happening
>>> under kernfs_rwsem.
>>> So the child count should still be protected by kernfs_rwsem and we should not
>>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
>>
>> Yes, that's exactly what I intended (assuming you mean write lock in those cases)
>>
>> when I did it so now I wonder what I saw that lead to my patch, I'll need to look
>>
>> again ...
> 
> Ahh, I see why I thought this ...
> 
> It's the hunk:
> 
> @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
>         kn = inode->i_private;
>         root = kernfs_root(kn);
> 
> -       down_read(&root->kernfs_rwsem);
> +       down_read(&root->kernfs_iattr_rwsem);
>         kernfs_refresh_inode(kn, inode);
>         ret = generic_permission(&nop_mnt_idmap, inode, mask);
> -       up_read(&root->kernfs_rwsem);
> +       up_read(&root->kernfs_iattr_rwsem);
> 
>         return ret;
>  }
> 
> which takes away the kernfs_rwsem and introduces the possibility of
> 
> the change. It may be more instructive to add back taking the read
> 
> lock of kernfs_rwsem in .permission() than altering the sibling link
> 
> and unlink functions, I mean I even caught myself on it.
> 

Yes this was the block I referred to in my second comment [1]. However adding
back read lock of kernfs_rwsem in .permission() will again introduce the
bottleneck that I mentioned at [2]. So I think taking taking the locks in
link/unlink is the better option.
I understand having one lock to synchronize everything makes it easier
debug/development wise but sometimes (such as the case mentioned in [2]), it is
not optimum performance wise.
Thoughts ?

Thanks,
Imran

[1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/
[2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/
> 
> Ian
> 
>>
>>
>>>
>>> Kindly let me know your thoughts. I would still like to see new lockdep traces
>>> with this change.
>>
>> Indeed, I hope Anders can find time to get the trace.
>>
>>
>> Ian
>>
>>>
>>> Thanks,
>>> Imran
>>>
>>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>>>> attributes)
>>>>>
>>>>> Signed-off-by: Ian Kent <raven@themaw.net>
>>>>> Cc: Anders Roxell <anders.roxell@linaro.org>
>>>>> Cc: Imran Khan <imran.f.khan@oracle.com>
>>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>>> Cc: Eric Sandeen <sandeen@sandeen.net>
>>>>> ---
>>>>>    fs/kernfs/dir.c |    4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>>>> index 45b6919903e6..6e84bb69602e 100644
>>>>> --- a/fs/kernfs/dir.c
>>>>> +++ b/fs/kernfs/dir.c
>>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>>>> *kn)
>>>>>        rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>>>          /* successfully added, account subdir number */
>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>>            kn->parent->dir.subdirs++;
>>>>>        kernfs_inc_rev(kn->parent);
>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>          return 0;
>>>>>    }
>>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>>>> kernfs_node *kn)
>>>>>        if (RB_EMPTY_NODE(&kn->rb))
>>>>>            return false;
>>>>>    + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>        if (kernfs_type(kn) == KERNFS_DIR)
>>>>>            kn->parent->dir.subdirs--;
>>>>>        kernfs_inc_rev(kn->parent);
>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>          rb_erase(&kn->rb, &kn->parent->dir.children);
>>>>>        RB_CLEAR_NODE(&kn->rb);
>>>>>

  reply	other threads:[~2023-07-28  1:06 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18  2:32 [PATCH 0/2] kernfs: remove i_lock usage that isn't needed Ian Kent
2022-10-18  2:32 ` [PATCH 1/2] kernfs: dont take i_lock on inode attr read Ian Kent
2022-10-24  8:50   ` Miklos Szeredi
2022-10-31 22:30   ` Tejun Heo
2022-12-21 13:34     ` Anders Roxell
2022-12-22 23:11       ` Ian Kent
2022-12-29  9:20         ` Arnd Bergmann
2022-12-29 13:07           ` Ian Kent
2023-01-23  3:11             ` Ian Kent
2023-07-18 19:00               ` Anders Roxell
2023-07-19  4:23                 ` Ian Kent
2023-07-20  2:03                   ` Ian Kent
2023-07-26 13:49                     ` Miklos Szeredi
2023-07-27  0:38                     ` Ian Kent
2023-07-27  4:30                       ` Imran Khan
2023-07-27  5:35                         ` Imran Khan
2023-07-28  0:00                         ` Ian Kent
2023-07-28  0:16                           ` Ian Kent
2023-07-28  1:06                             ` Imran Khan [this message]
2023-07-28  1:29                               ` Ian Kent
2022-10-18  2:32 ` [PATCH 2/2] kernfs: dont take i_lock on revalidate Ian Kent
2022-10-24  8:38   ` Miklos Szeredi
2022-10-31 22:31   ` Tejun Heo
2022-11-01  7:46   ` Amir Goldstein
2022-11-01  8:09     ` Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=120330e2-2b37-1d0c-de60-18ae66de573f@oracle.com \
    --to=imran.f.khan@oracle.com \
    --cc=anders.roxell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=cmaiolino@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=elver@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=minchan@kernel.org \
    --cc=raven@themaw.net \
    --cc=ricklind@linux.vnet.ibm.com \
    --cc=sandeen@sandeen.net \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).