linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karel Zak <kzak@redhat.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2] proc/mounts: add cursor
Date: Thu, 9 Apr 2020 21:36:35 +0200	[thread overview]
Message-ID: <CAJfpegtZ3T+1bN-pg-vmVvWZs-7chDWxBr0T+j4x_Lt4x0T8MQ@mail.gmail.com> (raw)
In-Reply-To: <20200409183008.GG23230@ZenIV.linux.org.uk>

On Thu, Apr 9, 2020 at 8:30 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Apr 09, 2020 at 05:54:46PM +0100, Al Viro wrote:
> > On Thu, Apr 09, 2020 at 05:50:48PM +0100, Al Viro wrote:
> > > On Thu, Apr 09, 2020 at 04:16:19PM +0200, Miklos Szeredi wrote:
> > > > Solve this by adding a cursor entry for each open instance.  Taking the
> > > > global namespace_sem for write seems excessive, since we are only dealing
> > > > with a per-namespace list.  Instead add a per-namespace spinlock and use
> > > > that together with namespace_sem taken for read to protect against
> > > > concurrent modification of the mount list.  This may reduce parallelism of
> > > > is_local_mountpoint(), but it's hardly a big contention point.  We could
> > > > also use RCU freeing of cursors to make traversal not need additional
> > > > locks, if that turns out to be neceesary.
> > >
> > > Umm...  That can do more than reduction of parallelism - longer lists take
> > > longer to scan and moving cursors dirties cachelines in a bunch of struct
> > > mount instances.  And I'm not convinced that your locking in m_next() is
> > > correct.
> > >
> > > What's to stop umount_tree() from removing the next entry from the list
> > > just as your m_next() tries to move the cursor?  I don't see any common
> > > locks for those two...
> >
> > Ah, you still have namespace_sem taken (shared) by m_start().  Nevermind
> > that one, then...  Let me get through mnt_list users and see if I can
> > catch anything.
>
> OK...  Locking is safe, but it's not obvious.  And your changes do make it
> scarier.   There are several kinds of lists that can be threaded through
> ->mnt_list and your code depends upon never having those suckers appear
> in e.g. anon namespace ->list.  It is true (AFAICS), but...

See analysis below.

> Another fun question is ns->mounts rules - it used to be "the number of
> entries in ns->list", now it's "the number of non-cursor entries there".
> Incidentally, we might have a problem with that logics wrt count_mount().

Nope, count_mount() iterates through the mount tree, not through mnt_ns->list.

> Sigh...  The damn thing has grown much too convoluted over years ;-/
>
> I'm still not happy with that patch; at the very least it needs a lot more
> detailed analysis to go along with it.

Functions touching mnt_list:

In pnode.c:

umount_one:
umount_list:
propagate_umount: both of the above are indirectly called from this.
The only caller is umount_tree(), which has lots of different call
paths, but in each one has namespace_sem taken for write:

do_move_mount
  attach_recursive_mnt
    umount_tree

do_loopback
  graft_tree
    attach_recursive_mnt
      umount_tree

do_new_mount_fc
  do_add_mount
    graft_tree
      attach_recursive_mnt
        umount_tree

finish_automount
  do_add_mount
    graft_tree
      attach_recursive_mnt
        umount_tree

do_umount
  shrink_submounts
    umount_tree

namespace.c:

__is_local_mountpoint: takes namespace_sem for read

commit_tree: has namespace_sem for write (only caller being
attach_recursive_mnt, see above for call paths).

m_start:
m_next:
m_show: all have namespace_sem for read

umount_tree: all callers have namespace_sem for write (se above for call paths)

do_umount: has namespace_sem for write

copy_tree: all members are newly allocated

iterate_mounts: operates on private copy built by collect_mounts()

open_detached_copy: takes namespace_sem for write

copy_mnt_ns: takes namespace_sem for write

mount_subtree: adds onto a newly allocated mnt_namespace

sys_fsmount: ditto

init_mount_tree: ditto

mnt_already_visible: takes namespace_sem for read

Patch adds ns_lock locking to all places that only have namespace_sem
for read.  So everyone is still excluded:  those taking namespace_sem
for write against everyone else obviously, and those taking
namespace_sem for read because they take ns_lock.

Thanks,
Miklos

  reply	other threads:[~2020-04-09 19:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-09 14:16 [PATCH v2] proc/mounts: add cursor Miklos Szeredi
2020-04-09 16:22 ` Aurélien Aptel
2020-04-09 16:50 ` Al Viro
2020-04-09 16:54   ` Al Viro
2020-04-09 18:30     ` Al Viro
2020-04-09 19:36       ` Miklos Szeredi [this message]
2020-04-09 18:45 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJfpegtZ3T+1bN-pg-vmVvWZs-7chDWxBr0T+j4x_Lt4x0T8MQ@mail.gmail.com \
    --to=miklos@szeredi.hu \
    --cc=kzak@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).