netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Stephen Hemminger <shemminger@osdl.org>
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, akpm@osdl.org, josht@us.ibm.com
Subject: Re: [PATCH 4/5] net: socket family using RCU
Date: Thu, 10 Aug 2006 17:46:39 -0700	[thread overview]
Message-ID: <20060811004639.GO1298@us.ibm.com> (raw)
In-Reply-To: <20060810132827.481e8a6b@localhost.localdomain>

On Thu, Aug 10, 2006 at 01:28:27PM -0700, Stephen Hemminger wrote:
> >
> >On Wed, Aug 09, 2006 at 11:31:42AM -0700, Stephen Hemminger wrote:
> >> Replace the gross custom locking done in socket code for net_family[]
> >> with simple RCU usage. Some reordering necessary to avoid sleep
> >> issues with sock_alloc.
> >
> >Definitely a good use of RCU from a read-intensive standpoint -- does
> >anyone other than Linux-kernel networking developers change the elements
> >of the net_family[] array except at boot and shutdown?  ;-)
> >
> >Some comments included below.  Looks good, but one question about
> >things like atalk_create() being able to sleep and a place or two
> >where a comment would be good.
> >
> 
> ...
> 
> >>  
> >> +	/*
> >> +	 *	Allocate the socket and allow the family to set things up. if
> >> +	 *	the protocol is 0, the family is instructed to select an appropriate
> >> +	 *	default.
> >> +	 */
> >> +	sock = sock_alloc();
> >> +	if (!sock) {
> >> +		printk(KERN_WARNING "socket: no more sockets\n");
> >> +		return -ENFILE;	/* Not exactly a match, but its the
> >> +				   closest posix thing */
> >> +	}
> >> +
> >> +	sock->type = type;
> >> +
> >>  #if defined(CONFIG_KMOD)
> >>  	/* Attempt to load a protocol module if the find failed.
> >>  	 *
> >> @@ -1166,70 +1138,59 @@
> >>  	 * requested real, full-featured networking support upon configuration.
> >>  	 * Otherwise module support will break!
> >>  	 */
> >> -	if (net_families[family] == NULL) {
> >> +	if (net_families[family] == NULL)
> >>  		request_module("net-pf-%d", family);
> >
> >OK, I'll bite...
> >
> >What happens if the module is not present?  Or is this what the
> >"Otherwise module support will break" comment is getting at?
> 
> request_module loads the module (and blocks). One would
> expect that the module loaded would set net_families[] via
> sock_register, so later reference would succeed. Comment is
> historical since intention was to make base socket code itself modular
> which never was done, and is probably a bad idea to even consider.
> 
> If module is not present, then net_families[] will still be NULL.

Got it!

> >Also, this reference to "net_families[family]" is done without
> >rcu_dereference() and without any clear update-side lock.  This
> >just happens to be OK, since we are only testing for NULL, but
> >should at least have a comment.
> 
> >> -	}
> >>  #endif
> >>  
> >> -	net_family_read_lock();
> >> -	if (net_families[family] == NULL) {
> >> -		err = -EAFNOSUPPORT;
> >> -		goto out;
> >> -	}
> >> -
> >> -/*
> >> - *	Allocate the socket and allow the family to set things up. if
> >> - *	the protocol is 0, the family is instructed to select an appropriate
> >> - *	default.
> >> - */
> >> -
> >> -	if (!(sock = sock_alloc())) {
> >> -		printk(KERN_WARNING "socket: no more sockets\n");
> >> -		err = -ENFILE;	/* Not exactly a match, but its the
> >> -				   closest posix thing */
> >> -		goto out;
> >> -	}
> >> -
> >> -	sock->type = type;
> >> +	rcu_read_lock();
> >> +	pf = rcu_dereference(net_families[family]);
> >
> >OK, so the elements of the net_families array are protected by RCU.
> >All references should either be under rcu_read_lock() and accessed
> >via rcu_dereference() or under the update-side lock, whatever that
> >might be.
> >
> 
> Yes, the net_family_lock

Good.

> >>  
> >> -/*
> >> +/**
> >> + *	sock_unregister - remove a protocol handler
> >> + *	@family: protocol family to remove
> >> + *
> >>   *	This function is called by a protocol handler that wants to
> >>   *	remove its address family, and have it unlinked from the
> >> - *	SOCKET module.
> >> + *	new socket creation.
> >> + *
> >> + *	If protocol handler is a module, then it can use module reference
> >> + *	counts to protect against new references. If protocol handler is not
> >> + *	a module then it needs to provide its own protection in
> >> + *	the ops->create routine.
> >>   */
> >> -
> >>  int sock_unregister(int family)
> >>  {
> >>  	if (family < 0 || family >= NPROTO)
> >> -		return -1;
> >> +		return -EINVAL;
> >>  
> >> -	net_family_write_lock();
> >> +	spin_lock(&net_family_lock);
> >>  	net_families[family] = NULL;
> >
> >And this one is covered by net_families_lock, so we are set, since this
> >is the last one.
> >
> >> -	net_family_write_unlock();
> >> +	spin_unlock(&net_family_lock);
> >> +
> >> +	synchronize_rcu();
> >
> >OK, and the caller is presumably going to free up whatever needs to be
> >freed.
> >
> >Or, if nothing need be freed, beyond this point, we know that all
> >non-sleeping code paths through any of the net_protocol_family
> >functions have completed.
> >
> >(So, are all of the functions non-sleeping, or do we care?  The
> >definition of net_protocol_family in include/linux/net.h doesn't say
> >that they need to be non-sleeping...)
> >
> >atalk_create() can potentially sleep in the following line of code:
> >
> >	sk = sk_alloc(PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
> 
> The module reference counts are used to prevent that. Since
> appletalk module can't be unloaded until there are no more appletalk
> sockets open (ie ref count of appletalk module is zero). To prevent
> new references there is a call to try_module_get() before the
> net_families[family]->create() call. This happens inside
> rcu_read_lock.
> 
> >What prevents atalk_create() running concurrently with sock_unregister()?
> 
> Module ref counts. I didn't want to start ref counting the net_families
> because that would cause another atomic op, and would penalize the normal
> case of non-modular IPV4 (and non-moduler IPV6).

OK.

> >(One possible reason is that ->create is only called in __sock_create(),
> >and that there is something preventing sock_unregister() from being called
> >before __sock_create() returns -- but I must defer to people who understand
> >networking better than do I.)
> 
> That is why I added a comment to sock_unregister() describing how for the
> normal case of protocol as module, this is safe. But if protocol wants
> to call sock_unregister() other times it needs to do it's own ref counting.
> 
> There are a couple of calls to sock_unregister() from module init routines,
> but these are in benign error unwind cases. The code in these cases
> could be reorganized to do sock_register() last during initilization,
> but it hardly matters.

Fair enough!

							Thanx, Paul

  reply	other threads:[~2006-08-11  0:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-09 18:31 [PATCH 0/5] net socket family patches Stephen Hemminger
2006-08-09 18:31 ` [PATCH 1/5] sock_create bad error return Stephen Hemminger
2006-08-10  3:47   ` David Miller
2006-08-10 17:06     ` Stephen Hemminger
2006-08-09 18:31 ` [PATCH 2/5] socket: code style cleanup Stephen Hemminger
2006-08-10  3:49   ` David Miller
2006-08-10  8:19   ` Andrew Morton
2006-08-10  8:55     ` David Miller
2006-08-09 18:31 ` [PATCH 3/5] net: drop unused elements from net_proto_family Stephen Hemminger
2006-08-10  3:50   ` David Miller
2006-08-09 18:31 ` [PATCH 4/5] net: socket family using RCU Stephen Hemminger
2006-08-10  4:00   ` David Miller
2006-08-10 18:36   ` Paul E. McKenney
2006-08-10 20:28     ` Stephen Hemminger
2006-08-11  0:46       ` Paul E. McKenney [this message]
2006-08-09 18:31 ` [PATCH 5/5] sock_register interface changes Stephen Hemminger
2006-08-10  4:03   ` David Miller
2006-08-10  0:06 ` [PATCH 0/5] net socket family patches David Miller
2006-08-10  5:36   ` Alexey Toptygin
2006-08-10 18:55     ` Randy.Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060811004639.GO1298@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=josht@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).