* NFS hard read-only mount option - again @ 2010-04-26 23:58 Valerie Aurora 2010-04-27 11:51 ` Jeff Layton 0 siblings, 1 reply; 8+ messages in thread From: Valerie Aurora @ 2010-04-26 23:58 UTC (permalink / raw) To: linux-fsdevel; +Cc: Trond Myklebust, J. Bruce Fields, Jeff Layton I want to restart the discussion we had last July (!) about an NFS hard read-only mount option. A common use case of union mounts is a cluster with NFS mounted read-only root file systems, with a local fs union mounted on top. Here's the last discussion we had: http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread We can assume a local mechanism that lets the server enforce the read-only-ness of the file system on the local machine (the server can increment sb->s_hard_readonly_users on the local fs and the VFS will take care of the rest). The main question is what to do on the client side when the server changes its mind and wants to write to that file system. On the server side, there's a clear synchronization point: sb->s_hard_readonly_users needs to be decremented, and so we don't have to worry about a hard readonly exported file system going read-write willy-nilly. But the client has to cope with the sudden withdrawal of the read-only guarantee. A lowest common denominator starting point is to treat it as though the mount went away entirely, and force the client to remount and/or reboot. I also have vague ideas about doing something smart with stale file handles and generation numbers to avoid a remount. This looks a little bit like the forced umount patches too, where we could EIO any open file descriptors on the old file system. How long would it take to implement the dumb "NFS server not responding" version? -VAL ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-04-26 23:58 NFS hard read-only mount option - again Valerie Aurora @ 2010-04-27 11:51 ` Jeff Layton 2010-04-28 20:07 ` Valerie Aurora 0 siblings, 1 reply; 8+ messages in thread From: Jeff Layton @ 2010-04-27 11:51 UTC (permalink / raw) To: Valerie Aurora; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields On Mon, 26 Apr 2010 19:58:33 -0400 Valerie Aurora <vaurora@redhat.com> wrote: > I want to restart the discussion we had last July (!) about an NFS > hard read-only mount option. A common use case of union mounts is a > cluster with NFS mounted read-only root file systems, with a local fs > union mounted on top. Here's the last discussion we had: > > http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread > > We can assume a local mechanism that lets the server enforce the > read-only-ness of the file system on the local machine (the server can > increment sb->s_hard_readonly_users on the local fs and the VFS will > take care of the rest). > > The main question is what to do on the client side when the server > changes its mind and wants to write to that file system. On the > server side, there's a clear synchronization point: > sb->s_hard_readonly_users needs to be decremented, and so we don't > have to worry about a hard readonly exported file system going > read-write willy-nilly. > > But the client has to cope with the sudden withdrawal of the read-only > guarantee. A lowest common denominator starting point is to treat it > as though the mount went away entirely, and force the client to > remount and/or reboot. I also have vague ideas about doing something > smart with stale file handles and generation numbers to avoid a > remount. This looks a little bit like the forced umount patches too, > where we could EIO any open file descriptors on the old file system. > > How long would it take to implement the dumb "NFS server not > responding" version? > > -VAL Ok, so the problem is this: You have a client with the aforementioned union mount (r/o NFS layer with a local r/w layer on top). "Something" changes on the server and you need a way to cope with the change? What happens if you do nothing here and just expect the client to deal with it? Obviously you have the potential for inconsistent data on the clients until they remount along with problems like -ESTALE errors, etc. For the use case you describe however, an admin would have to be insane to think that they could safely change the filesystem while it was online and serving out data to clients. If I had a cluster like you describe, my upgrade plan would look like this: 1) update a copy of the master r/o filesytem offline 2) test it, test it, test it 3) shut down the clients 4) unexport the old filesystem, export the new one 5) bring the clients back up ...anything else would be playing with fire. Unfortunately I haven't been keeping up with your patchset as well as I probably should. What happens to the r/w layer when the r/o layer changes? Does it become completely invalid and you have to rebuild it? Or can it cope with a situation where the r/o filesystem is changed while the r/w layer isn't mounted on top of it? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-04-27 11:51 ` Jeff Layton @ 2010-04-28 20:07 ` Valerie Aurora 2010-04-28 20:34 ` Jeff Layton 0 siblings, 1 reply; 8+ messages in thread From: Valerie Aurora @ 2010-04-28 20:07 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields On Tue, Apr 27, 2010 at 07:51:59AM -0400, Jeff Layton wrote: > On Mon, 26 Apr 2010 19:58:33 -0400 > Valerie Aurora <vaurora@redhat.com> wrote: > > > I want to restart the discussion we had last July (!) about an NFS > > hard read-only mount option. A common use case of union mounts is a > > cluster with NFS mounted read-only root file systems, with a local fs > > union mounted on top. Here's the last discussion we had: > > > > http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread > > > > We can assume a local mechanism that lets the server enforce the > > read-only-ness of the file system on the local machine (the server can > > increment sb->s_hard_readonly_users on the local fs and the VFS will > > take care of the rest). > > > > The main question is what to do on the client side when the server > > changes its mind and wants to write to that file system. On the > > server side, there's a clear synchronization point: > > sb->s_hard_readonly_users needs to be decremented, and so we don't > > have to worry about a hard readonly exported file system going > > read-write willy-nilly. > > > > But the client has to cope with the sudden withdrawal of the read-only > > guarantee. A lowest common denominator starting point is to treat it > > as though the mount went away entirely, and force the client to > > remount and/or reboot. I also have vague ideas about doing something > > smart with stale file handles and generation numbers to avoid a > > remount. This looks a little bit like the forced umount patches too, > > where we could EIO any open file descriptors on the old file system. > > > > How long would it take to implement the dumb "NFS server not > > responding" version? > > > > -VAL > > Ok, so the problem is this: > > You have a client with the aforementioned union mount (r/o NFS layer > with a local r/w layer on top). "Something" changes on the server and > you need a way to cope with the change? > > What happens if you do nothing here and just expect the client to deal > with it? Obviously you have the potential for inconsistent data on the > clients until they remount along with problems like -ESTALE errors, etc. > > For the use case you describe however, an admin would have to be insane > to think that they could safely change the filesystem while it was > online and serving out data to clients. If I had a cluster like you > describe, my upgrade plan would look like this: > > 1) update a copy of the master r/o filesytem offline > 2) test it, test it, test it > 3) shut down the clients > 4) unexport the old filesystem, export the new one > 5) bring the clients back up > > ...anything else would be playing with fire. Yes, you are totally correct, that's the only scenario that would actually work. This feature is just detecting when someone tries to do this without step 3). > Unfortunately I haven't been keeping up with your patchset as well as I > probably should. What happens to the r/w layer when the r/o layer > changes? Does it become completely invalid and you have to rebuild it? > Or can it cope with a situation where the r/o filesystem is changed > while the r/w layer isn't mounted on top of it? The short version is that we can't cope with the r/o file system being changed while it's mounted as the bottom layer of a union mount on a client. This assumption is what makes a non-panicking union mount implementation possible. What I need can be summarized in the distinction between the following scenarios: Scenario A: The NFS server reboots while a client has the file system mounted as the r/o layer of a union mount. The server does not change the exported file system at all and re-exports it as hard read-only. This should work. Scenario B: The NFS server reboots as in the above scenario, but performs "touch /exports/client_root/a_file" before re-exporting the file system as hard read-only. This is _not_ okay and in some form will cause a panic on the client if the client doesn't detect it and stop accessing the mount. How to tell the difference between scenarios A and B? Thanks, -VAL ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-04-28 20:07 ` Valerie Aurora @ 2010-04-28 20:34 ` Jeff Layton 2010-04-28 20:56 ` J. Bruce Fields 0 siblings, 1 reply; 8+ messages in thread From: Jeff Layton @ 2010-04-28 20:34 UTC (permalink / raw) To: Valerie Aurora; +Cc: linux-fsdevel, Trond Myklebust, J. Bruce Fields On Wed, 28 Apr 2010 16:07:46 -0400 Valerie Aurora <vaurora@redhat.com> wrote: > > What I need can be summarized in the distinction between the following > scenarios: > > Scenario A: The NFS server reboots while a client has the file system > mounted as the r/o layer of a union mount. The server does not change > the exported file system at all and re-exports it as hard read-only. > This should work. > Nitpick: This should be fine regardless of how it's exported. You don't want the clients going bonkers just because someone pulled the plug on the server accidentally. NFS was designed such that clients really shouldn't be affected when the server reboots (aside from stalling out on RPC calls while the server comes back up). > Scenario B: The NFS server reboots as in the above scenario, but > performs "touch /exports/client_root/a_file" before re-exporting the > file system as hard read-only. This is _not_ okay and in some form > will cause a panic on the client if the client doesn't detect it and > stop accessing the mount. > > How to tell the difference between scenarios A and B? > I don't believe you can, at least not with standard NFS protocols. I think the best you can do is detect these problems on an as-needed basis. Anything that relies on server behavior won't be very robust. In the above case, the mtime on /exports/client_root will (likely) have changed. At that point you can try to handle the situation without oopsing. There are quite a few ways to screw up your NFS server too that don't involve changing the underlying fs. Suppose the server is clustered and someone screws up the fsid's such that you get ESTALE's when you try to access the root inode? It would be best if union mounts could cope with that sort of problem without oopsing (even if the alternative is EIO's for everything that touches it). -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-04-28 20:34 ` Jeff Layton @ 2010-04-28 20:56 ` J. Bruce Fields 2010-05-04 22:51 ` Valerie Aurora 0 siblings, 1 reply; 8+ messages in thread From: J. Bruce Fields @ 2010-04-28 20:56 UTC (permalink / raw) To: Jeff Layton; +Cc: Valerie Aurora, linux-fsdevel, Trond Myklebust On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote: > On Wed, 28 Apr 2010 16:07:46 -0400 > Valerie Aurora <vaurora@redhat.com> wrote: > > > > > What I need can be summarized in the distinction between the following > > scenarios: > > > > Scenario A: The NFS server reboots while a client has the file system > > mounted as the r/o layer of a union mount. The server does not change > > the exported file system at all and re-exports it as hard read-only. > > This should work. > > > > Nitpick: This should be fine regardless of how it's exported. You > don't want the clients going bonkers just because someone pulled the > plug on the server accidentally. NFS was designed such that clients > really shouldn't be affected when the server reboots (aside from > stalling out on RPC calls while the server comes back up). > > > Scenario B: The NFS server reboots as in the above scenario, but > > performs "touch /exports/client_root/a_file" before re-exporting the > > file system as hard read-only. This is _not_ okay and in some form > > will cause a panic on the client if the client doesn't detect it and > > stop accessing the mount. > > > > How to tell the difference between scenarios A and B? > > > > I don't believe you can, at least not with standard NFS protocols. I > think the best you can do is detect these problems on an as-needed > basis. Anything that relies on server behavior won't be very robust. Yeah. Even if the server had a way to tell the client "this filesystem will never ever change, I promise" (and actually I think 4.1 might have something like that--see STATUS4_FIXED?)--there's still so many opportunities for operator error, network problems, etc., that in pratice a client that panics in that situation probably isn't going to be considered reliable or secure. So the unionfs code has to be prepared to deal with the possibility. If dealing with it fairly harshly is the simplest thing to do for now, I agree, that sounds fine--but panicking sounds too harsh! I'm not sure if we're answering your question. --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-04-28 20:56 ` J. Bruce Fields @ 2010-05-04 22:51 ` Valerie Aurora 2010-05-05 7:22 ` Jeff Layton 0 siblings, 1 reply; 8+ messages in thread From: Valerie Aurora @ 2010-05-04 22:51 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Jeff Layton, linux-fsdevel, Trond Myklebust On Wed, Apr 28, 2010 at 04:56:00PM -0400, J. Bruce Fields wrote: > On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote: > > On Wed, 28 Apr 2010 16:07:46 -0400 > > Valerie Aurora <vaurora@redhat.com> wrote: > > > > > > > > What I need can be summarized in the distinction between the following > > > scenarios: > > > > > > Scenario A: The NFS server reboots while a client has the file system > > > mounted as the r/o layer of a union mount. The server does not change > > > the exported file system at all and re-exports it as hard read-only. > > > This should work. > > > > > > > Nitpick: This should be fine regardless of how it's exported. You > > don't want the clients going bonkers just because someone pulled the > > plug on the server accidentally. NFS was designed such that clients > > really shouldn't be affected when the server reboots (aside from > > stalling out on RPC calls while the server comes back up). > > > > > Scenario B: The NFS server reboots as in the above scenario, but > > > performs "touch /exports/client_root/a_file" before re-exporting the > > > file system as hard read-only. This is _not_ okay and in some form > > > will cause a panic on the client if the client doesn't detect it and > > > stop accessing the mount. > > > > > > How to tell the difference between scenarios A and B? > > > > > > > I don't believe you can, at least not with standard NFS protocols. I > > think the best you can do is detect these problems on an as-needed > > basis. Anything that relies on server behavior won't be very robust. > > Yeah. Even if the server had a way to tell the client "this filesystem > will never ever change, I promise" (and actually I think 4.1 might have > something like that--see STATUS4_FIXED?)--there's still so many > opportunities for operator error, network problems, etc., that in > pratice a client that panics in that situation probably isn't going to > be considered reliable or secure. > > So the unionfs code has to be prepared to deal with the possibility. If > dealing with it fairly harshly is the simplest thing to do for now, I > agree, that sounds fine--but panicking sounds too harsh! > > I'm not sure if we're answering your question. This is definitely going in the right direction, thank you. Mainly I'm just really ignorant of actual NFS implementation. :) Let's focus on detecting a write to a file or directory the client has read and still has in cache. This would be the case of an NFS dentry in cache on the client that is written on the server. So what is the actual code path if the client has an NFS dentry in cache and it is altered or goes away on the client? Can we hook in there and disable the union mount? Is this a totally dumb idea? -VAL ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-05-04 22:51 ` Valerie Aurora @ 2010-05-05 7:22 ` Jeff Layton 2010-05-06 15:01 ` Valerie Aurora 0 siblings, 1 reply; 8+ messages in thread From: Jeff Layton @ 2010-05-05 7:22 UTC (permalink / raw) To: Valerie Aurora; +Cc: J. Bruce Fields, linux-fsdevel, Trond Myklebust On Tue, 4 May 2010 18:51:56 -0400 Valerie Aurora <vaurora@redhat.com> wrote: > On Wed, Apr 28, 2010 at 04:56:00PM -0400, J. Bruce Fields wrote: > > On Wed, Apr 28, 2010 at 04:34:47PM -0400, Jeff Layton wrote: > > > On Wed, 28 Apr 2010 16:07:46 -0400 > > > Valerie Aurora <vaurora@redhat.com> wrote: > > > > > > > > > > > What I need can be summarized in the distinction between the following > > > > scenarios: > > > > > > > > Scenario A: The NFS server reboots while a client has the file system > > > > mounted as the r/o layer of a union mount. The server does not change > > > > the exported file system at all and re-exports it as hard read-only. > > > > This should work. > > > > > > > > > > Nitpick: This should be fine regardless of how it's exported. You > > > don't want the clients going bonkers just because someone pulled the > > > plug on the server accidentally. NFS was designed such that clients > > > really shouldn't be affected when the server reboots (aside from > > > stalling out on RPC calls while the server comes back up). > > > > > > > Scenario B: The NFS server reboots as in the above scenario, but > > > > performs "touch /exports/client_root/a_file" before re-exporting the > > > > file system as hard read-only. This is _not_ okay and in some form > > > > will cause a panic on the client if the client doesn't detect it and > > > > stop accessing the mount. > > > > > > > > How to tell the difference between scenarios A and B? > > > > > > > > > > I don't believe you can, at least not with standard NFS protocols. I > > > think the best you can do is detect these problems on an as-needed > > > basis. Anything that relies on server behavior won't be very robust. > > > > Yeah. Even if the server had a way to tell the client "this filesystem > > will never ever change, I promise" (and actually I think 4.1 might have > > something like that--see STATUS4_FIXED?)--there's still so many > > opportunities for operator error, network problems, etc., that in > > pratice a client that panics in that situation probably isn't going to > > be considered reliable or secure. > > > > So the unionfs code has to be prepared to deal with the possibility. If > > dealing with it fairly harshly is the simplest thing to do for now, I > > agree, that sounds fine--but panicking sounds too harsh! > > > > I'm not sure if we're answering your question. > > This is definitely going in the right direction, thank you. Mainly > I'm just really ignorant of actual NFS implementation. :) > > Let's focus on detecting a write to a file or directory the client has > read and still has in cache. This would be the case of an NFS dentry > in cache on the client that is written on the server. So what is the > actual code path if the client has an NFS dentry in cache and it is > altered or goes away on the client? Can we hook in there and disable > the union mount? Is this a totally dumb idea? > > -VAL Well...we typically can tell if an inode changed -- see nfs_update_inode for most of that logic. Note that the methods we use there are not perfect -- NFSv2/3 rely heavily on timestamps and if the server is using a filesystem with coarse-grained timestamps (e.g. ext3) then it's possible for things to change and the client won't notice (whee!) Dentries don't really change like inodes do, but we do generally check whether they are correct before trusting them. That's done via the d_revalidate methods for NFS. Mostly that involves checking whether the directory that contains the it has changed since the dentry was spawned. That's probably where you'll want to place your hooks, but I wonder whether it would be better to do that at a higher level -- in the generic VFS. Whenever a d_revalidate op returns false, then you know that something has happened. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: NFS hard read-only mount option - again 2010-05-05 7:22 ` Jeff Layton @ 2010-05-06 15:01 ` Valerie Aurora 0 siblings, 0 replies; 8+ messages in thread From: Valerie Aurora @ 2010-05-06 15:01 UTC (permalink / raw) To: Jeff Layton; +Cc: J. Bruce Fields, linux-fsdevel, Trond Myklebust On Wed, May 05, 2010 at 09:22:30AM +0200, Jeff Layton wrote: > On Tue, 4 May 2010 18:51:56 -0400 > Valerie Aurora <vaurora@redhat.com> wrote: > > > > This is definitely going in the right direction, thank you. Mainly > > I'm just really ignorant of actual NFS implementation. :) > > > > Let's focus on detecting a write to a file or directory the client has > > read and still has in cache. This would be the case of an NFS dentry > > in cache on the client that is written on the server. So what is the > > actual code path if the client has an NFS dentry in cache and it is > > altered or goes away on the client? Can we hook in there and disable > > the union mount? Is this a totally dumb idea? > > > > -VAL > > Well...we typically can tell if an inode changed -- see > nfs_update_inode for most of that logic. Note that the methods we use > there are not perfect -- NFSv2/3 rely heavily on timestamps and if the > server is using a filesystem with coarse-grained timestamps (e.g. ext3) > then it's possible for things to change and the client won't notice > (whee!) > > Dentries don't really change like inodes do, but we do generally check > whether they are correct before trusting them. That's done via the > d_revalidate methods for NFS. Mostly that involves checking whether the > directory that contains the it has changed since the dentry was spawned. > > That's probably where you'll want to place your hooks, but I wonder > whether it would be better to do that at a higher level -- in the > generic VFS. Whenever a d_revalidate op returns false, then you know > that something has happened. My gut feeling is that you are right and the VFS call of the d_revalidate op is the right place to check this. My guess is that ->d_revalidate() should never fail in the lower/read-only layers of a union mount, no matter what the file system is. Can you think of a d_revalidate implementation that would fail for a reason other than a write on the server? Thanks, -VAL ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-05-06 15:02 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-26 23:58 NFS hard read-only mount option - again Valerie Aurora 2010-04-27 11:51 ` Jeff Layton 2010-04-28 20:07 ` Valerie Aurora 2010-04-28 20:34 ` Jeff Layton 2010-04-28 20:56 ` J. Bruce Fields 2010-05-04 22:51 ` Valerie Aurora 2010-05-05 7:22 ` Jeff Layton 2010-05-06 15:01 ` Valerie Aurora
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).