* [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
@ 2010-02-10 10:04 Al Viro
2010-02-10 10:12 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-10 10:04 UTC (permalink / raw)
To: zbr; +Cc: linux-fsdevel, linux-kernel
a) pohmelfs_construct_path_string() will do interesting things if you
call it while chrooted into jail and pohmelfs mounted deeper in that
jail. Try it.
b) just why do we care about root of chroot jail in pohmelfs_path_length()?
Not to mention anything else, current->fs->root/mnt may be changed under
you if you share current->fs with another thread, but even aside of that,
why does filesystem care about chroot of caller at all?
What's going on there?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 10:04 [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs Al Viro
@ 2010-02-10 10:12 ` Evgeniy Polyakov
2010-02-10 10:24 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-10 10:12 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 10:04:28AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> a) pohmelfs_construct_path_string() will do interesting things if you
> call it while chrooted into jail and pohmelfs mounted deeper in that
> jail. Try it.
Should it walk upto mountpoint?
> b) just why do we care about root of chroot jail in pohmelfs_path_length()?
> Not to mention anything else, current->fs->root/mnt may be changed under
> you if you share current->fs with another thread, but even aside of that,
> why does filesystem care about chroot of caller at all?
>
> What's going on there?
It tries to construct a full path upto mountpoint. Effectively it should
do similar to non-exported dentry_path() things. There is a race between
getting buffer size and filling with the actual path, but we take care
about that by restarting if needed.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 10:12 ` Evgeniy Polyakov
@ 2010-02-10 10:24 ` Al Viro
2010-02-10 10:45 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-10 10:24 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 01:12:46PM +0300, Evgeniy Polyakov wrote:
> On Wed, Feb 10, 2010 at 10:04:28AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > a) pohmelfs_construct_path_string() will do interesting things if you
> > call it while chrooted into jail and pohmelfs mounted deeper in that
> > jail. Try it.
>
> Should it walk upto mountpoint?
It will happily give you path from absolute root to root of chroot jail +
path from fs root to your dentry. Which is probably not what you want.
> > b) just why do we care about root of chroot jail in pohmelfs_path_length()?
> > Not to mention anything else, current->fs->root/mnt may be changed under
> > you if you share current->fs with another thread, but even aside of that,
> > why does filesystem care about chroot of caller at all?
> >
> > What's going on there?
>
> It tries to construct a full path upto mountpoint. Effectively it should
> do similar to non-exported dentry_path() things. There is a race between
> getting buffer size and filling with the actual path, but we take care
> about that by restarting if needed.
To mountpoint or to fs root? And what's going on with d_find_alias()?
AFAICS, you are doing that for regular files as well as directories,
and you do support link(2) in there, so dentry (and path) obtained from
that will be random.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 10:24 ` Al Viro
@ 2010-02-10 10:45 ` Evgeniy Polyakov
2010-02-10 11:00 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-10 10:45 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 10:24:22AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> On Wed, Feb 10, 2010 at 01:12:46PM +0300, Evgeniy Polyakov wrote:
> > On Wed, Feb 10, 2010 at 10:04:28AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > > a) pohmelfs_construct_path_string() will do interesting things if you
> > > call it while chrooted into jail and pohmelfs mounted deeper in that
> > > jail. Try it.
> >
> > Should it walk upto mountpoint?
>
> It will happily give you path from absolute root to root of chroot jail +
> path from fs root to your dentry. Which is probably not what you want.
I thouht of always providing path to mountpoint instead of just root.
Full path names are rather messy to work with, so I consider
per-directory objects only in the next version, but drawback is lookup
time.
> > > b) just why do we care about root of chroot jail in pohmelfs_path_length()?
> > > Not to mention anything else, current->fs->root/mnt may be changed under
> > > you if you share current->fs with another thread, but even aside of that,
> > > why does filesystem care about chroot of caller at all?
> > >
> > > What's going on there?
> >
> > It tries to construct a full path upto mountpoint. Effectively it should
> > do similar to non-exported dentry_path() things. There is a race between
> > getting buffer size and filling with the actual path, but we take care
> > about that by restarting if needed.
>
> To mountpoint or to fs root? And what's going on with d_find_alias()?
To root if it happend to be under mountpoint.
> AFAICS, you are doing that for regular files as well as directories,
> and you do support link(2) in there, so dentry (and path) obtained from
> that will be random.
Not exactly random, but can change.
Links support is rather subtle because of that, yes.
Plan was to add external attribute or increase inode size to include
parent name, but when I coded that it was so messy in respect of
renames, that was dropped.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 10:45 ` Evgeniy Polyakov
@ 2010-02-10 11:00 ` Al Viro
2010-02-10 11:11 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-10 11:00 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 01:45:15PM +0300, Evgeniy Polyakov wrote:
> > To mountpoint or to fs root? And what's going on with d_find_alias()?
>
> To root if it happend to be under mountpoint.
HUH? How the hell can root of filesystem be under the mountpoint of
that filesystem? What are you talking about?
> > AFAICS, you are doing that for regular files as well as directories,
> > and you do support link(2) in there, so dentry (and path) obtained from
> > that will be random.
>
> Not exactly random, but can change.
> Links support is rather subtle because of that, yes.
>
> Plan was to add external attribute or increase inode size to include
> parent name, but when I coded that it was so messy in respect of
> renames, that was dropped.
Why not use the dentries you've been given by VFS?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 11:00 ` Al Viro
@ 2010-02-10 11:11 ` Evgeniy Polyakov
2010-02-10 11:59 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-10 11:11 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 11:00:11AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > > To mountpoint or to fs root? And what's going on with d_find_alias()?
> >
> > To root if it happend to be under mountpoint.
>
> HUH? How the hell can root of filesystem be under the mountpoint of
> that filesystem? What are you talking about?
Let me guess... Mmmm, it was in the yesterday newspaper, I remember.
Maybe when we chroot somewhere. I meant not mounted fs root, but
thread's root.
> > > AFAICS, you are doing that for regular files as well as directories,
> > > and you do support link(2) in there, so dentry (and path) obtained from
> > > that will be random.
> >
> > Not exactly random, but can change.
> > Links support is rather subtle because of that, yes.
> >
> > Plan was to add external attribute or increase inode size to include
> > parent name, but when I coded that it was so messy in respect of
> > renames, that was dropped.
>
> Why not use the dentries you've been given by VFS?
At writeback we do not have parents, so must find a path somehow.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 11:11 ` Evgeniy Polyakov
@ 2010-02-10 11:59 ` Al Viro
2010-02-10 13:30 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-10 11:59 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 02:11:15PM +0300, Evgeniy Polyakov wrote:
> On Wed, Feb 10, 2010 at 11:00:11AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > > > To mountpoint or to fs root? And what's going on with d_find_alias()?
> > >
> > > To root if it happend to be under mountpoint.
> >
> > HUH? How the hell can root of filesystem be under the mountpoint of
> > that filesystem? What are you talking about?
>
> Let me guess... Mmmm, it was in the yesterday newspaper, I remember.
> Maybe when we chroot somewhere. I meant not mounted fs root, but
> thread's root.
Why would a filesystem give a damn about the chroot of syscall originator
in the first place?
> > Why not use the dentries you've been given by VFS?
>
> At writeback we do not have parents, so must find a path somehow.
Most of the places do have those just fine and unlike the writeback,
rename et.al. really care which pathname is being dealt with...
BTW, what prevents writeback vs. rename races?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 11:59 ` Al Viro
@ 2010-02-10 13:30 ` Evgeniy Polyakov
2010-02-10 21:02 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-10 13:30 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 11:59:39AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > Let me guess... Mmmm, it was in the yesterday newspaper, I remember.
> > Maybe when we chroot somewhere. I meant not mounted fs root, but
> > thread's root.
>
> Why would a filesystem give a damn about the chroot of syscall originator
> in the first place?
That's the point - it is not needed.
> > > Why not use the dentries you've been given by VFS?
> >
> > At writeback we do not have parents, so must find a path somehow.
>
> Most of the places do have those just fine and unlike the writeback,
> rename et.al. really care which pathname is being dealt with...
POHMELFS uses writeback cache also for metadata, so effectively most of
such operations are also postponed. Later I turned that off though.
> BTW, what prevents writeback vs. rename races?
There are proper locks for such operations.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 13:30 ` Evgeniy Polyakov
@ 2010-02-10 21:02 ` Al Viro
2010-02-10 21:29 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-10 21:02 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 04:30:07PM +0300, Evgeniy Polyakov wrote:
> > > > Why not use the dentries you've been given by VFS?
> > >
> > > At writeback we do not have parents, so must find a path somehow.
> >
> > Most of the places do have those just fine and unlike the writeback,
> > rename et.al. really care which pathname is being dealt with...
>
> POHMELFS uses writeback cache also for metadata, so effectively most of
> such operations are also postponed. Later I turned that off though.
>
> > BTW, what prevents writeback vs. rename races?
>
> There are proper locks for such operations.
Which would be... ? E.g. between writepages() and rename(). What serializes
your write_inode_create() wrt renames? IOW, how can the server decide that
data from writepages() should go to the same object regardless of the
rename?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 21:02 ` Al Viro
@ 2010-02-10 21:29 ` Evgeniy Polyakov
2010-02-11 3:02 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-10 21:29 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Feb 10, 2010 at 09:02:48PM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> Which would be... ? E.g. between writepages() and rename(). What serializes
> your write_inode_create() wrt renames? IOW, how can the server decide that
> data from writepages() should go to the same object regardless of the
> rename?
rename and some other metadata operations as well as write itself
request remote lock (if not grabbed already), acknowledge forces writeback to old path.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-10 21:29 ` Evgeniy Polyakov
@ 2010-02-11 3:02 ` Al Viro
2010-02-11 15:08 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-11 3:02 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Thu, Feb 11, 2010 at 12:29:33AM +0300, Evgeniy Polyakov wrote:
> On Wed, Feb 10, 2010 at 09:02:48PM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > Which would be... ? E.g. between writepages() and rename(). What serializes
> > your write_inode_create() wrt renames? IOW, how can the server decide that
> > data from writepages() should go to the same object regardless of the
> > rename?
>
> rename and some other metadata operations as well as write itself
> request remote lock (if not grabbed already), acknowledge forces writeback to old path.
Um. You do realize that d_move() happens with none of your locks held,
right? It's done in vfs_rename_{other,dir}() and the only thing held
is s_vfs_rename_sem and i_mutex on parents. How could your code in
writeback be able to distinguish
rename() is done
d_move() has happened, we see new pathname in dcache
from
rename() is done
d_move() has not yet happened, we see old pathname in dcache
and generate the right on-the-wire traffic in both cases? Note that here
server has already seen rename request; as far as server and client are
concerned the rename() is over.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-11 3:02 ` Al Viro
@ 2010-02-11 15:08 ` Evgeniy Polyakov
2010-02-11 17:10 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-11 15:08 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Thu, Feb 11, 2010 at 03:02:54AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> Um. You do realize that d_move() happens with none of your locks held,
> right? It's done in vfs_rename_{other,dir}() and the only thing held
> is s_vfs_rename_sem and i_mutex on parents. How could your code in
> writeback be able to distinguish
No, it happens with my lock held. It is not a lock, but kind of
IO delegation, i.e. it is not dropped when rename or other protected
operation completed. Instead another client sends request to grab it and
server asks current holder to drop cache, perform writeback or whatever
else is needed.
It can be a problem though if d_move() is called outside of path
protected by the VFS dir operations like rename/created/unlink and so
on, i.e. on behalf of some entity in the kernel which decides to move
dentries on itself. In this case POHMELFS is not protected.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-11 15:08 ` Evgeniy Polyakov
@ 2010-02-11 17:10 ` Al Viro
2010-02-11 19:13 ` Evgeniy Polyakov
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2010-02-11 17:10 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: linux-fsdevel, linux-kernel
On Thu, Feb 11, 2010 at 06:08:05PM +0300, Evgeniy Polyakov wrote:
> On Thu, Feb 11, 2010 at 03:02:54AM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > Um. You do realize that d_move() happens with none of your locks held,
> > right? It's done in vfs_rename_{other,dir}() and the only thing held
> > is s_vfs_rename_sem and i_mutex on parents. How could your code in
> > writeback be able to distinguish
>
> No, it happens with my lock held. It is not a lock, but kind of
> IO delegation, i.e. it is not dropped when rename or other protected
> operation completed. Instead another client sends request to grab it and
> server asks current holder to drop cache, perform writeback or whatever
> else is needed.
And should such a request come between return from ->rename() and call
of d_move() that follows it?
> It can be a problem though if d_move() is called outside of path
> protected by the VFS dir operations like rename/created/unlink and so
> on, i.e. on behalf of some entity in the kernel which decides to move
> dentries on itself. In this case POHMELFS is not protected.
Not an issue; there's no such fs-independent callers.
Fundamentally, how do you deal with MOESI when the mapping from strings you
are using as object IDs to actual objects can change as the result of
operations? What's more, operation on one object can change that mapping
for a huge number of other objects (rename() close to fs root changing
pathnames of all files anywhere in the subtree being moved).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs
2010-02-11 17:10 ` Al Viro
@ 2010-02-11 19:13 ` Evgeniy Polyakov
0 siblings, 0 replies; 14+ messages in thread
From: Evgeniy Polyakov @ 2010-02-11 19:13 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Thu, Feb 11, 2010 at 05:10:18PM +0000, Al Viro (viro@ZenIV.linux.org.uk) wrote:
> > No, it happens with my lock held. It is not a lock, but kind of
> > IO delegation, i.e. it is not dropped when rename or other protected
> > operation completed. Instead another client sends request to grab it and
> > server asks current holder to drop cache, perform writeback or whatever
> > else is needed.
>
> And should such a request come between return from ->rename() and call
> of d_move() that follows it?
We mark inode dirty (including parent) and it has to be written back thus
there will be no ack until writeback completed, which in turn messes
with i_mutex, so will be postponed until rename and d_move() completed.
> > It can be a problem though if d_move() is called outside of path
> > protected by the VFS dir operations like rename/created/unlink and so
> > on, i.e. on behalf of some entity in the kernel which decides to move
> > dentries on itself. In this case POHMELFS is not protected.
>
> Not an issue; there's no such fs-independent callers.
>
> Fundamentally, how do you deal with MOESI when the mapping from strings you
> are using as object IDs to actual objects can change as the result of
> operations? What's more, operation on one object can change that mapping
> for a huge number of other objects (rename() close to fs root changing
> pathnames of all files anywhere in the subtree being moved).
MOESI is actually a crap in a clustered filesystem. It does not scale
well since number of messages grows exponentially with nodes and
interconnects. Plus without PAXOS it does not provide needed redundancy
level.
Basic idea currently is to delegate every operation and ask server if it
is allowed or not. Thus server-side locking must be serialized to that
level. I believe that i_mutex is enough there - its the only lock used
actually except in some helpers like d_path().
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-02-11 19:13 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-10 10:04 [WTF] ... is going on with current->fs->{root,mnt} accesses in pohmelfs Al Viro
2010-02-10 10:12 ` Evgeniy Polyakov
2010-02-10 10:24 ` Al Viro
2010-02-10 10:45 ` Evgeniy Polyakov
2010-02-10 11:00 ` Al Viro
2010-02-10 11:11 ` Evgeniy Polyakov
2010-02-10 11:59 ` Al Viro
2010-02-10 13:30 ` Evgeniy Polyakov
2010-02-10 21:02 ` Al Viro
2010-02-10 21:29 ` Evgeniy Polyakov
2010-02-11 3:02 ` Al Viro
2010-02-11 15:08 ` Evgeniy Polyakov
2010-02-11 17:10 ` Al Viro
2010-02-11 19:13 ` Evgeniy Polyakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).