Re: [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	netdev@vger.kernel.org,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs
Date: Mon, 06 Jun 2011 12:03:52 -0700	[thread overview]
Message-ID: <m1k4cy6auf.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20110604001518.GT11521@ZenIV.linux.org.uk> (Al Viro's message of "Sat, 4 Jun 2011 01:15:19 +0100")

Al Viro <viro@ZenIV.linux.org.uk> writes:

> What I'm planning to do (for unrelated reasons - ubifs needs it) is to add
> an analog of iterate_supers() that would go over the superblocks of given
> type and call a function on them.  I would like to convert sysfs_exit_ns()
> to it and kill the last abuser of s_instances (and one of the last sb_lock
> ones), but that really depends on what kind of locking is needed for
> readdir() and friends - as it is, the damn thing looks *wrong*.

Answering your primary question first, what locking is needed in
sysfs_exit_ns().

Wrapping my head around this code again, to the best of my memory
the intent was.

For consistency "info->type[ns]" is an atomic value (void *) so
it can be safely read and written without relying on locks.  For
things like lookup 

The assumption is that is primarily an atomic value and so it can
be safely read by things like sysfs_readdir() and get a valid value
without relying on the locks. 

sysfs_mutex is needed in things like sysfs_lookup if we want the
value to not change.

There is indeed a small race in sysfs_readdir.

As for sysfs_lookup it looks like my code to handle untagged members in
directories where everything else is tagged, such as
"/sys/class/net/bonding_masters" introduced an overloading of what NULL
means in the context of sysfs_readdir and sysfs_lookup.   ns == NULL can
either mean that we have type == KOBJ_NS_TYPE_NONE, or ns == NULL can
mean that the namespace has gone away beneath us.  I looks like I need
to fix that.

To sysfs_exit_ns() we have the call chain:
cleanup_net()
  ops_exit_list()
     net_kobj_ns_exit()
        sysfs_exit_ns()

Which makes the locking order needed for that call path.
net_mutex()
   rtnl_lock()
   sysfs_mutex()

Now somewhere I was also careful that mount did not cause problems,
with sysfs_exit_ns() but I forget where.

You were asking about kobj_ns_current.
kboj_ns_current()
  net_current_ns()
     current->nsproxy->net_ns

And current has a reference on it's network namespace.

Other pieces of information that should be helpful to know.
- All sysfs directory entries for a network namespace should be
  removed from sysfs by the time sysfs_exit_ns is called.

Al hopefully that is enough to get you going and I will what I can
do with the rest of the sysfs ugliness.

Eric

WARNING: multiple messages have this Message-ID (diff)

From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	<netdev@vger.kernel.org>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs
Date: Mon, 06 Jun 2011 12:03:52 -0700	[thread overview]
Message-ID: <m1k4cy6auf.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20110604001518.GT11521@ZenIV.linux.org.uk> (Al Viro's message of "Sat, 4 Jun 2011 01:15:19 +0100")

Al Viro <viro@ZenIV.linux.org.uk> writes:

> What I'm planning to do (for unrelated reasons - ubifs needs it) is to add
> an analog of iterate_supers() that would go over the superblocks of given
> type and call a function on them.  I would like to convert sysfs_exit_ns()
> to it and kill the last abuser of s_instances (and one of the last sb_lock
> ones), but that really depends on what kind of locking is needed for
> readdir() and friends - as it is, the damn thing looks *wrong*.

Answering your primary question first, what locking is needed in
sysfs_exit_ns().

Wrapping my head around this code again, to the best of my memory
the intent was.

For consistency "info->type[ns]" is an atomic value (void *) so
it can be safely read and written without relying on locks.  For
things like lookup 

The assumption is that is primarily an atomic value and so it can
be safely read by things like sysfs_readdir() and get a valid value
without relying on the locks. 

sysfs_mutex is needed in things like sysfs_lookup if we want the
value to not change.

There is indeed a small race in sysfs_readdir.

As for sysfs_lookup it looks like my code to handle untagged members in
directories where everything else is tagged, such as
"/sys/class/net/bonding_masters" introduced an overloading of what NULL
means in the context of sysfs_readdir and sysfs_lookup.   ns == NULL can
either mean that we have type == KOBJ_NS_TYPE_NONE, or ns == NULL can
mean that the namespace has gone away beneath us.  I looks like I need
to fix that.

To sysfs_exit_ns() we have the call chain:
cleanup_net()
  ops_exit_list()
     net_kobj_ns_exit()
        sysfs_exit_ns()

Which makes the locking order needed for that call path.
net_mutex()
   rtnl_lock()
   sysfs_mutex()

Now somewhere I was also careful that mount did not cause problems,
with sysfs_exit_ns() but I forget where.

You were asking about kobj_ns_current.
kboj_ns_current()
  net_current_ns()
     current->nsproxy->net_ns

And current has a reference on it's network namespace.

Other pieces of information that should be helpful to know.
- All sysfs directory entries for a network namespace should be
  removed from sysfs by the time sysfs_exit_ns is called.

Al hopefully that is enough to get you going and I will what I can
do with the rest of the sysfs ugliness.

Eric

next prev parent reply	other threads:[~2011-06-06 19:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-04  0:15 [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs Al Viro
2011-06-04 21:22 ` Al Viro
2011-06-04 21:55   ` Linus Torvalds
2011-06-04 22:23     ` Al Viro
2011-06-06 19:03 ` Eric W. Biederman [this message]
2011-06-06 19:03   ` Eric W. Biederman
2011-06-07 21:58   ` Al Viro
2011-06-07 22:59     ` Al Viro
2011-06-09  1:26       ` Al Viro
2011-06-12  7:15         ` Eric W. Biederman
2011-06-12 17:59           ` Linus Torvalds
2011-06-12 18:17             ` Al Viro
2011-06-12 18:35           ` Al Viro
  -- strict thread matches above, loose matches on Subject: below --
2011-06-04 22:25 Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1k4cy6auf.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=containers@lists.osdl.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.