* [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags @ 2013-09-16 17:42 Aditya Kali 2013-09-17 2:40 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Aditya Kali @ 2013-09-16 17:42 UTC (permalink / raw) To: viro, linux-fsdevel, linux-kernel; +Cc: anatol, Aditya Kali During remount of a bind mount (mount -o remount,bind,ro,... /mnt/mntpt), we currently take down_write(&sb->s_umount). This causes the remount operation to get blocked behind writes occuring on device (possibly mounted somewhere else). We have observed that simply trying to change the bind-mount from read-write to read-only can take several seconds becuase writeback is in progress. Looking at the code it seems to me that we need s_umount lock only around the do_remount_sb() call. vfsmount_lock seems enough to protect the flag change on the mount. So this patch fixes the locking so that changing of flags can happen outside the down_write(&sb->s_umount). I wanted to get comments if I am violating any assumption around this code. Another thing that I was curious about was if we need the {lock|unlock}_mount(path) around this code. Please advise. Signed-off-by: Aditya Kali <adityakali@google.com> --- fs/namespace.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index da5c494..4b9c839 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1838,20 +1838,21 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (err) return err; - down_write(&sb->s_umount); if (flags & MS_BIND) err = change_mount_flags(path->mnt, flags); else if (!capable(CAP_SYS_ADMIN)) err = -EPERM; - else + else { + down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); + up_write(&sb->s_umount); + } if (!err) { br_write_lock(&vfsmount_lock); mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; mnt->mnt.mnt_flags = mnt_flags; br_write_unlock(&vfsmount_lock); } - up_write(&sb->s_umount); if (!err) { br_write_lock(&vfsmount_lock); touch_mnt_namespace(mnt->mnt_ns); -- 1.8.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-16 17:42 [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags Aditya Kali @ 2013-09-17 2:40 ` Al Viro 2013-09-19 20:13 ` Aditya Kali 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2013-09-17 2:40 UTC (permalink / raw) To: Aditya Kali; +Cc: linux-fsdevel, linux-kernel, anatol On Mon, Sep 16, 2013 at 10:42:30AM -0700, Aditya Kali wrote: > During remount of a bind mount (mount -o remount,bind,ro,... /mnt/mntpt), > we currently take down_write(&sb->s_umount). This causes the remount > operation to get blocked behind writes occuring on device (possibly > mounted somewhere else). We have observed that simply trying to change > the bind-mount from read-write to read-only can take several seconds > becuase writeback is in progress. Looking at the code it seems to me that > we need s_umount lock only around the do_remount_sb() call. > vfsmount_lock seems enough to protect the flag change on the mount. > So this patch fixes the locking so that changing of flags can happen > outside the down_write(&sb->s_umount). What's to prevent mount -o remount,ro /mnt and mount -o remount,rw,nodev /mnt racing and ending up with that sucker rw and without nodev? As for lock_mount... nope - we carefully do *not* hold namespace_sem over any kind of fs operations. Anything getting stuck while holding it will have really nasty consequences. So ->s_umount here is inelegant, but alternatives sucks worse... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-17 2:40 ` Al Viro @ 2013-09-19 20:13 ` Aditya Kali 2013-09-30 17:54 ` Aditya Kali 0 siblings, 1 reply; 7+ messages in thread From: Aditya Kali @ 2013-09-19 20:13 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, anatol On 09/16/2013 07:40 PM, Al Viro wrote: > On Mon, Sep 16, 2013 at 10:42:30AM -0700, Aditya Kali wrote: >> During remount of a bind mount (mount -o remount,bind,ro,... /mnt/mntpt), >> we currently take down_write(&sb->s_umount). This causes the remount >> operation to get blocked behind writes occuring on device (possibly >> mounted somewhere else). We have observed that simply trying to change >> the bind-mount from read-write to read-only can take several seconds >> becuase writeback is in progress. Looking at the code it seems to me that >> we need s_umount lock only around the do_remount_sb() call. >> vfsmount_lock seems enough to protect the flag change on the mount. >> So this patch fixes the locking so that changing of flags can happen >> outside the down_write(&sb->s_umount). > > What's to prevent mount -o remount,ro /mnt and mount -o remount,rw,nodev /mnt > racing and ending up with that sucker rw and without nodev? Thanks for the reply! I see the problem in my patch. Please find the second attempt at this patch below. I have tried to keep the non-MS_BIND remount semantics same while moving the MS_BIND remount code outside of s_umount lock. Is it OK to not synchronize the non-MS_BIND do_remount_sb() call with change of mnt_flags in MS_BIND case? --- fs/namespace.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index da5c494..25c4faf 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -454,11 +454,13 @@ void mnt_drop_write_file(struct file *file) } EXPORT_SYMBOL(mnt_drop_write_file); +/* + * Must be called under br_write_lock(&vfsmount_lock); + */ static int mnt_make_readonly(struct mount *mnt) { int ret = 0; - br_write_lock(&vfsmount_lock); mnt->mnt.mnt_flags |= MNT_WRITE_HOLD; /* * After storing MNT_WRITE_HOLD, we'll read the counters. This store @@ -492,15 +494,15 @@ static int mnt_make_readonly(struct mount *mnt) */ smp_wmb(); mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD; - br_write_unlock(&vfsmount_lock); return ret; } +/* + * Must be called under br_write_lock(&vfsmount_lock); + */ static void __mnt_unmake_readonly(struct mount *mnt) { - br_write_lock(&vfsmount_lock); mnt->mnt.mnt_flags &= ~MNT_READONLY; - br_write_unlock(&vfsmount_lock); } int sb_prepare_remount_readonly(struct super_block *sb) @@ -1838,20 +1840,27 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (err) return err; - down_write(&sb->s_umount); - if (flags & MS_BIND) + if (flags & MS_BIND) { + br_write_lock(&vfsmount_lock); err = change_mount_flags(path->mnt, flags); - else if (!capable(CAP_SYS_ADMIN)) + if (!err) { + mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; + mnt->mnt.mnt_flags = mnt_flags; + } + br_write_unlock(&vfsmount_lock); + } else if (!capable(CAP_SYS_ADMIN)) err = -EPERM; - else + else { + down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); - if (!err) { - br_write_lock(&vfsmount_lock); - mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; - mnt->mnt.mnt_flags = mnt_flags; - br_write_unlock(&vfsmount_lock); + if (!err) { + br_write_lock(&vfsmount_lock); + mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; + mnt->mnt.mnt_flags = mnt_flags; + br_write_unlock(&vfsmount_lock); + } + up_write(&sb->s_umount); } - up_write(&sb->s_umount); if (!err) { br_write_lock(&vfsmount_lock); touch_mnt_namespace(mnt->mnt_ns); -- 1.8.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-19 20:13 ` Aditya Kali @ 2013-09-30 17:54 ` Aditya Kali 2013-09-30 18:13 ` Aditya Kali 0 siblings, 1 reply; 7+ messages in thread From: Aditya Kali @ 2013-09-30 17:54 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel, Anatol Pomazau Hi Al and other fs-developers, Please let me know what you think about this patch. Thanks, -- Aditya On Thu, Sep 19, 2013 at 1:13 PM, Aditya Kali <adityakali@google.com> wrote: > > > On 09/16/2013 07:40 PM, Al Viro wrote: >> >> On Mon, Sep 16, 2013 at 10:42:30AM -0700, Aditya Kali wrote: >>> >>> During remount of a bind mount (mount -o remount,bind,ro,... /mnt/mntpt), >>> we currently take down_write(&sb->s_umount). This causes the remount >>> operation to get blocked behind writes occuring on device (possibly >>> mounted somewhere else). We have observed that simply trying to change >>> the bind-mount from read-write to read-only can take several seconds >>> becuase writeback is in progress. Looking at the code it seems to me that >>> we need s_umount lock only around the do_remount_sb() call. >>> vfsmount_lock seems enough to protect the flag change on the mount. >>> So this patch fixes the locking so that changing of flags can happen >>> outside the down_write(&sb->s_umount). >> >> >> What's to prevent mount -o remount,ro /mnt and mount -o remount,rw,nodev >> /mnt >> racing and ending up with that sucker rw and without nodev? > > > Thanks for the reply! I see the problem in my patch. Please find the second > attempt at this patch below. I have tried to keep the non-MS_BIND remount > semantics same while moving the MS_BIND remount code outside of s_umount > lock. Is it OK to not synchronize the non-MS_BIND do_remount_sb() call with > change of mnt_flags in MS_BIND case? > > > --- > fs/namespace.c | 37 +++++++++++++++++++++++-------------- > 1 file changed, 23 insertions(+), 14 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index da5c494..25c4faf 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > > @@ -454,11 +454,13 @@ void mnt_drop_write_file(struct file *file) > } > EXPORT_SYMBOL(mnt_drop_write_file); > > +/* > + * Must be called under br_write_lock(&vfsmount_lock); > + */ > static int mnt_make_readonly(struct mount *mnt) > { > int ret = 0; > > - br_write_lock(&vfsmount_lock); > mnt->mnt.mnt_flags |= MNT_WRITE_HOLD; > /* > * After storing MNT_WRITE_HOLD, we'll read the counters. This store > @@ -492,15 +494,15 @@ static int mnt_make_readonly(struct mount *mnt) > */ > smp_wmb(); > mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD; > - br_write_unlock(&vfsmount_lock); > return ret; > } > > +/* > + * Must be called under br_write_lock(&vfsmount_lock); > + */ > static void __mnt_unmake_readonly(struct mount *mnt) > { > - br_write_lock(&vfsmount_lock); > mnt->mnt.mnt_flags &= ~MNT_READONLY; > - br_write_unlock(&vfsmount_lock); > } > > int sb_prepare_remount_readonly(struct super_block *sb) > @@ -1838,20 +1840,27 @@ static int do_remount(struct path *path, int flags, > int mnt_flags, > > if (err) > return err; > > - down_write(&sb->s_umount); > - if (flags & MS_BIND) > + if (flags & MS_BIND) { > + br_write_lock(&vfsmount_lock); > err = change_mount_flags(path->mnt, flags); > - else if (!capable(CAP_SYS_ADMIN)) > + if (!err) { > + mnt_flags |= mnt->mnt.mnt_flags & > MNT_PROPAGATION_MASK; > + mnt->mnt.mnt_flags = mnt_flags; > + } > + br_write_unlock(&vfsmount_lock); > + } else if (!capable(CAP_SYS_ADMIN)) > > err = -EPERM; > - else > + else { > + down_write(&sb->s_umount); > err = do_remount_sb(sb, flags, data, 0); > - if (!err) { > - br_write_lock(&vfsmount_lock); > - mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; > - mnt->mnt.mnt_flags = mnt_flags; > - br_write_unlock(&vfsmount_lock); > + if (!err) { > + br_write_lock(&vfsmount_lock); > + mnt_flags |= mnt->mnt.mnt_flags & > MNT_PROPAGATION_MASK; > + mnt->mnt.mnt_flags = mnt_flags; > + br_write_unlock(&vfsmount_lock); > + } > + up_write(&sb->s_umount); > } > - up_write(&sb->s_umount); > if (!err) { > br_write_lock(&vfsmount_lock); > touch_mnt_namespace(mnt->mnt_ns); > -- > 1.8.4 > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-30 17:54 ` Aditya Kali @ 2013-09-30 18:13 ` Aditya Kali 2013-09-30 20:03 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Aditya Kali @ 2013-09-30 18:13 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, linux-kernel, Anatol Pomazau, Theodore Ts'o, tj, axboe +Ted Ts'o, Tejun Heo, Jens Axboe On 09/30/2013 10:54 AM, Aditya Kali wrote: > Hi Al and other fs-developers, > > Please let me know what you think about this patch. > > Thanks, > > On Thu, Sep 19, 2013 at 1:13 PM, Aditya Kali <adityakali@google.com> wrote: >> >> >> On 09/16/2013 07:40 PM, Al Viro wrote: >>> >>> On Mon, Sep 16, 2013 at 10:42:30AM -0700, Aditya Kali wrote: >>>> >>>> During remount of a bind mount (mount -o remount,bind,ro,... /mnt/mntpt), >>>> we currently take down_write(&sb->s_umount). This causes the remount >>>> operation to get blocked behind writes occuring on device (possibly >>>> mounted somewhere else). We have observed that simply trying to change >>>> the bind-mount from read-write to read-only can take several seconds >>>> becuase writeback is in progress. Looking at the code it seems to me that >>>> we need s_umount lock only around the do_remount_sb() call. >>>> vfsmount_lock seems enough to protect the flag change on the mount. >>>> So this patch fixes the locking so that changing of flags can happen >>>> outside the down_write(&sb->s_umount). >>> >>> >>> What's to prevent mount -o remount,ro /mnt and mount -o remount,rw,nodev >>> /mnt >>> racing and ending up with that sucker rw and without nodev? >> >> >> Thanks for the reply! I see the problem in my patch. Please find the second >> attempt at this patch below. I have tried to keep the non-MS_BIND remount >> semantics same while moving the MS_BIND remount code outside of s_umount >> lock. Is it OK to not synchronize the non-MS_BIND do_remount_sb() call with >> change of mnt_flags in MS_BIND case? >> --- fs/namespace.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index da5c494..25c4faf 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -454,11 +454,13 @@ void mnt_drop_write_file(struct file *file) } EXPORT_SYMBOL(mnt_drop_write_file); +/* + * Must be called under br_write_lock(&vfsmount_lock); + */ static int mnt_make_readonly(struct mount *mnt) { int ret = 0; - br_write_lock(&vfsmount_lock); mnt->mnt.mnt_flags |= MNT_WRITE_HOLD; /* * After storing MNT_WRITE_HOLD, we'll read the counters. This store @@ -492,15 +494,15 @@ static int mnt_make_readonly(struct mount *mnt) */ smp_wmb(); mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD; - br_write_unlock(&vfsmount_lock); return ret; } +/* + * Must be called under br_write_lock(&vfsmount_lock); + */ static void __mnt_unmake_readonly(struct mount *mnt) { - br_write_lock(&vfsmount_lock); mnt->mnt.mnt_flags &= ~MNT_READONLY; - br_write_unlock(&vfsmount_lock); } int sb_prepare_remount_readonly(struct super_block *sb) @@ -1838,20 +1840,27 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (err) return err; - down_write(&sb->s_umount); - if (flags & MS_BIND) + if (flags & MS_BIND) { + br_write_lock(&vfsmount_lock); err = change_mount_flags(path->mnt, flags); - else if (!capable(CAP_SYS_ADMIN)) + if (!err) { + mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; + mnt->mnt.mnt_flags = mnt_flags; + } + br_write_unlock(&vfsmount_lock); + } else if (!capable(CAP_SYS_ADMIN)) err = -EPERM; - else + else { + down_write(&sb->s_umount); err = do_remount_sb(sb, flags, data, 0); - if (!err) { - br_write_lock(&vfsmount_lock); - mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; - mnt->mnt.mnt_flags = mnt_flags; - br_write_unlock(&vfsmount_lock); + if (!err) { + br_write_lock(&vfsmount_lock); + mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK; + mnt->mnt.mnt_flags = mnt_flags; + br_write_unlock(&vfsmount_lock); + } + up_write(&sb->s_umount); } - up_write(&sb->s_umount); if (!err) { br_write_lock(&vfsmount_lock); touch_mnt_namespace(mnt->mnt_ns); -- 1.8.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-30 18:13 ` Aditya Kali @ 2013-09-30 20:03 ` Al Viro 2013-09-30 21:44 ` Aditya Kali 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2013-09-30 20:03 UTC (permalink / raw) To: Aditya Kali Cc: linux-fsdevel, linux-kernel, Anatol Pomazau, Theodore Ts'o, tj, axboe On Mon, Sep 30, 2013 at 11:13:23AM -0700, Aditya Kali wrote: > +Ted Ts'o, Tejun Heo, Jens Axboe > > > On 09/30/2013 10:54 AM, Aditya Kali wrote: > >Hi Al and other fs-developers, > > > >Please let me know what you think about this patch. Don't top-post, please... What prevents a race between MS_BIND remounts and plain ones? Used to be serialized on ->s_umount, but... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags 2013-09-30 20:03 ` Al Viro @ 2013-09-30 21:44 ` Aditya Kali 0 siblings, 0 replies; 7+ messages in thread From: Aditya Kali @ 2013-09-30 21:44 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel, linux-kernel, Anatol Pomazau, Theodore Ts'o, tj, axboe On Mon, Sep 30, 2013 at 1:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Sep 30, 2013 at 11:13:23AM -0700, Aditya Kali wrote: >> +Ted Ts'o, Tejun Heo, Jens Axboe >> >> >> On 09/30/2013 10:54 AM, Aditya Kali wrote: >> >Hi Al and other fs-developers, >> > >> >Please let me know what you think about this patch. > > Don't top-post, please... What prevents a race between MS_BIND remounts > and plain ones? Used to be serialized on ->s_umount, but... The rational here is that MS_BIND remounts need to act on the bind mount-point only (struct vfsmount *) and not on the underlying superblock (struct super_block *). So, intuitively, it should not be necessary to take a superblock level lock for the MS_BIND case. As for the modification of mnt_flags on vfsmount, there is already vfsmount_lock taken in both MS_BIND and non-MS_BIND case to prevent the race. Following is the example that demonstrates the problem that I am trying to address with this patch: (1) /dev/sda is mounted at /mnt/sda (2) /mnt/sda/users/user1 is bind mounted at /home/user1/ ; /mnt/sda/users/user2 is bind mounted at /home/user2/ (3) user1 is doing buffered writes in /home/user1/logs/ (4) admin tries to make /home/user2/bin/ read-only using a bind-mounts: $ mount --bind /home/user2/bin/ /home/user2/bin/ # this is fast $ mount --bind -o remount,ro /home/user2/bin/ # this blocks behind any writeback happening on sda (because of sb->s_umount write lock) Please advise. Thanks, -- Aditya ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-09-30 21:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-16 17:42 [RFC] vfs: avoid sb->s_umount lock while changing bind-mount flags Aditya Kali 2013-09-17 2:40 ` Al Viro 2013-09-19 20:13 ` Aditya Kali 2013-09-30 17:54 ` Aditya Kali 2013-09-30 18:13 ` Aditya Kali 2013-09-30 20:03 ` Al Viro 2013-09-30 21:44 ` Aditya Kali
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).