* Does ceph_fill_inode() mishandle I_NEW?
@ 2025-03-13 10:17 David Howells
2025-03-13 19:14 ` slava
0 siblings, 1 reply; 7+ messages in thread
From: David Howells @ 2025-03-13 10:17 UTC (permalink / raw)
To: Viacheslav Dubeyko
Cc: dhowells, Alex Markuze, Xiubo Li, Ilya Dryomov, Christian Brauner,
ceph-devel, linux-fsdevel, linux-kernel
ceph_fill_inode() seems to be mishandling I_NEW. It only check I_NEW when
setting i_mode. It then goes on to clobber a bunch of things in the inode
struct and ceph_inode_info struct (granted in some cases it's overwriting with
the same thing), irrespective of whether the inode is already set up
(i.e. if I_NEW isn't set).
It looks like I_NEW has been interpreted as to indicating that the inode is
being created as a filesystem object (e.g. by mkdir) whereas it's actually
merely about allocation and initialisation of struct inode in memory.
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 10:17 Does ceph_fill_inode() mishandle I_NEW? David Howells
@ 2025-03-13 19:14 ` slava
2025-03-13 20:47 ` David Howells
0 siblings, 1 reply; 7+ messages in thread
From: slava @ 2025-03-13 19:14 UTC (permalink / raw)
To: David Howells
Cc: Alex Markuze, Xiubo Li, Ilya Dryomov, Christian Brauner,
ceph-devel, linux-fsdevel, linux-kernel, Slava.Dubeyko
On Thu, 2025-03-13 at 10:17 +0000, David Howells wrote:
> ceph_fill_inode() seems to be mishandling I_NEW. It only check I_NEW
> when
> setting i_mode. It then goes on to clobber a bunch of things in the
> inode
> struct and ceph_inode_info struct (granted in some cases it's
> overwriting with
> the same thing), irrespective of whether the inode is already set up
> (i.e. if I_NEW isn't set).
>
> It looks like I_NEW has been interpreted as to indicating that the
> inode is
> being created as a filesystem object (e.g. by mkdir) whereas it's
> actually
> merely about allocation and initialisation of struct inode in memory.
>
What do you mean by mishandling? Do you imply that Ceph has to set up
the I_NEW somehow? Is it not VFS responsibility?
Thanks,
Slava.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 19:14 ` slava
@ 2025-03-13 20:47 ` David Howells
2025-03-13 21:46 ` Viacheslav Dubeyko
2025-03-13 22:47 ` Jeff Layton
0 siblings, 2 replies; 7+ messages in thread
From: David Howells @ 2025-03-13 20:47 UTC (permalink / raw)
To: slava
Cc: dhowells, Alex Markuze, Xiubo Li, Ilya Dryomov, Jeff Layton,
Alexander Viro, Christian Brauner, ceph-devel, linux-fsdevel,
linux-kernel, Slava.Dubeyko
slava@dubeyko.com wrote:
> What do you mean by mishandling? Do you imply that Ceph has to set up
> the I_NEW somehow? Is it not VFS responsibility?
No - I mean that if I_NEW *isn't* set when the function is called,
ceph_fill_inode() will go and partially reinitialise the inode. Now, having
reviewed the code in more depth and talked to Jeff Layton about it, I think
that the non-I_NEW pass will only change pointers with some sort of locking
and will release the old target - though it may overwrite some pointers with
the same value without protection (i_fops for example).
That said, if it's possible for *two* processes to be going through that
function without I_NEW set, you can get places where both of them will try
freeing the old data and replacing it with new without any locking - but I
don't know if that can happen.
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 20:47 ` David Howells
@ 2025-03-13 21:46 ` Viacheslav Dubeyko
2025-03-13 23:47 ` David Howells
2025-03-13 22:47 ` Jeff Layton
1 sibling, 1 reply; 7+ messages in thread
From: Viacheslav Dubeyko @ 2025-03-13 21:46 UTC (permalink / raw)
To: slava@dubeyko.com, David Howells
Cc: Xiubo Li, linux-fsdevel@vger.kernel.org,
ceph-devel@vger.kernel.org, brauner@kernel.org,
linux-kernel@vger.kernel.org, Alex Markuze, jlayton@kernel.org,
idryomov@gmail.com, viro@zeniv.linux.org.uk
On Thu, 2025-03-13 at 20:47 +0000, David Howells wrote:
> slava@dubeyko.com wrote:
>
> > What do you mean by mishandling? Do you imply that Ceph has to set up
> > the I_NEW somehow? Is it not VFS responsibility?
>
> No - I mean that if I_NEW *isn't* set when the function is called,
> ceph_fill_inode() will go and partially reinitialise the inode. Now, having
> reviewed the code in more depth and talked to Jeff Layton about it, I think
> that the non-I_NEW pass will only change pointers with some sort of locking
> and will release the old target - though it may overwrite some pointers with
> the same value without protection (i_fops for example).
>
> That said, if it's possible for *two* processes to be going through that
> function without I_NEW set, you can get places where both of them will try
> freeing the old data and replacing it with new without any locking - but I
> don't know if that can happen.
>
I see your point now.
As far as I can see, ceph_fill_inode() has comment: "Populate an inode based on
info from mds. May be called on new or existing inodes". It sounds to me that
particular CephFS kernel client could have obsolete state of inode compared with
MDS's state. And we need to "re-new" the existing inode with the actual state
that we received from MDS side. My vision is that we need to take into account
the distributed nature of Ceph and inode metadata can be updated from multiple
CephFS kernel client instances. Am I right here?
Thanks,
Slava.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 21:46 ` Viacheslav Dubeyko
@ 2025-03-13 23:47 ` David Howells
0 siblings, 0 replies; 7+ messages in thread
From: David Howells @ 2025-03-13 23:47 UTC (permalink / raw)
To: Viacheslav Dubeyko
Cc: dhowells, slava@dubeyko.com, Xiubo Li,
linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
brauner@kernel.org, linux-kernel@vger.kernel.org, Alex Markuze,
jlayton@kernel.org, idryomov@gmail.com, viro@zeniv.linux.org.uk
Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> As far as I can see, ceph_fill_inode() has comment: "Populate an inode based
> on info from mds. May be called on new or existing inodes". It sounds to me
> that particular CephFS kernel client could have obsolete state of inode
> compared with MDS's state. And we need to "re-new" the existing inode with
> the actual state that we received from MDS side. My vision is that we need
> to take into account the distributed nature of Ceph and inode metadata can
> be updated from multiple CephFS kernel client instances. Am I right here?
As I mentioned in my reply to Jeff, I'm thinking of what happens in the event
that we have a file that has hard links in several directories in a situation
where several of those links are looked up simultaneously. Can we end up with
ceph_fill_inode() being run in parallel on several threads on the same inode?
Actually, the use of ci->i_ceph_lock looks like it should make it safe, now
that I look at it again.
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 20:47 ` David Howells
2025-03-13 21:46 ` Viacheslav Dubeyko
@ 2025-03-13 22:47 ` Jeff Layton
2025-03-13 23:37 ` David Howells
1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2025-03-13 22:47 UTC (permalink / raw)
To: David Howells, slava
Cc: Alex Markuze, Xiubo Li, Ilya Dryomov, Alexander Viro,
Christian Brauner, ceph-devel, linux-fsdevel, linux-kernel,
Slava.Dubeyko
On Thu, 2025-03-13 at 20:47 +0000, David Howells wrote:
> slava@dubeyko.com wrote:
>
> > What do you mean by mishandling? Do you imply that Ceph has to set up
> > the I_NEW somehow? Is it not VFS responsibility?
>
> No - I mean that if I_NEW *isn't* set when the function is called,
> ceph_fill_inode() will go and partially reinitialise the inode. Now, having
> reviewed the code in more depth and talked to Jeff Layton about it, I think
> that the non-I_NEW pass will only change pointers with some sort of locking
> and will release the old target - though it may overwrite some pointers with
> the same value without protection (i_fops for example).
>
> That said, if it's possible for *two* processes to be going through that
> function without I_NEW set, you can get places where both of them will try
> freeing the old data and replacing it with new without any locking - but I
> don't know if that can happen.
>
I don't think that can happen. An I_NEW inode hasn't been properly
hashed yet, so nothing should be able to find it until
unlock_new_inode() is called.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Does ceph_fill_inode() mishandle I_NEW?
2025-03-13 22:47 ` Jeff Layton
@ 2025-03-13 23:37 ` David Howells
0 siblings, 0 replies; 7+ messages in thread
From: David Howells @ 2025-03-13 23:37 UTC (permalink / raw)
To: Jeff Layton
Cc: dhowells, slava, Alex Markuze, Xiubo Li, Ilya Dryomov,
Alexander Viro, Christian Brauner, ceph-devel, linux-fsdevel,
linux-kernel, Slava.Dubeyko
Jeff Layton <jlayton@kernel.org> wrote:
> I don't think that can happen. An I_NEW inode hasn't been properly
> hashed yet, so nothing should be able to find it until
> unlock_new_inode() is called.
That's not where the issue lies. I'm talking about *after* I_NEW has been
cleared.
Imagine you have a file that has hard links in several directories. Can
simultaneous lookup on a number of those hard links result in you going
through ceph_fill_inode() a number of times in parallel?
David
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-03-13 23:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13 10:17 Does ceph_fill_inode() mishandle I_NEW? David Howells
2025-03-13 19:14 ` slava
2025-03-13 20:47 ` David Howells
2025-03-13 21:46 ` Viacheslav Dubeyko
2025-03-13 23:47 ` David Howells
2025-03-13 22:47 ` Jeff Layton
2025-03-13 23:37 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox