From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Stephen Hemminger <shemminger@osdl.org>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, akpm@osdl.org, josht@us.ibm.com
Subject: Re: [PATCH 4/5] net: socket family using RCU
Date: Thu, 10 Aug 2006 17:46:39 -0700 [thread overview]
Message-ID: <20060811004639.GO1298@us.ibm.com> (raw)
In-Reply-To: <20060810132827.481e8a6b@localhost.localdomain>
On Thu, Aug 10, 2006 at 01:28:27PM -0700, Stephen Hemminger wrote:
> >
> >On Wed, Aug 09, 2006 at 11:31:42AM -0700, Stephen Hemminger wrote:
> >> Replace the gross custom locking done in socket code for net_family[]
> >> with simple RCU usage. Some reordering necessary to avoid sleep
> >> issues with sock_alloc.
> >
> >Definitely a good use of RCU from a read-intensive standpoint -- does
> >anyone other than Linux-kernel networking developers change the elements
> >of the net_family[] array except at boot and shutdown? ;-)
> >
> >Some comments included below. Looks good, but one question about
> >things like atalk_create() being able to sleep and a place or two
> >where a comment would be good.
> >
>
> ...
>
> >>
> >> + /*
> >> + * Allocate the socket and allow the family to set things up. if
> >> + * the protocol is 0, the family is instructed to select an appropriate
> >> + * default.
> >> + */
> >> + sock = sock_alloc();
> >> + if (!sock) {
> >> + printk(KERN_WARNING "socket: no more sockets\n");
> >> + return -ENFILE; /* Not exactly a match, but its the
> >> + closest posix thing */
> >> + }
> >> +
> >> + sock->type = type;
> >> +
> >> #if defined(CONFIG_KMOD)
> >> /* Attempt to load a protocol module if the find failed.
> >> *
> >> @@ -1166,70 +1138,59 @@
> >> * requested real, full-featured networking support upon configuration.
> >> * Otherwise module support will break!
> >> */
> >> - if (net_families[family] == NULL) {
> >> + if (net_families[family] == NULL)
> >> request_module("net-pf-%d", family);
> >
> >OK, I'll bite...
> >
> >What happens if the module is not present? Or is this what the
> >"Otherwise module support will break" comment is getting at?
>
> request_module loads the module (and blocks). One would
> expect that the module loaded would set net_families[] via
> sock_register, so later reference would succeed. Comment is
> historical since intention was to make base socket code itself modular
> which never was done, and is probably a bad idea to even consider.
>
> If module is not present, then net_families[] will still be NULL.
Got it!
> >Also, this reference to "net_families[family]" is done without
> >rcu_dereference() and without any clear update-side lock. This
> >just happens to be OK, since we are only testing for NULL, but
> >should at least have a comment.
>
> >> - }
> >> #endif
> >>
> >> - net_family_read_lock();
> >> - if (net_families[family] == NULL) {
> >> - err = -EAFNOSUPPORT;
> >> - goto out;
> >> - }
> >> -
> >> -/*
> >> - * Allocate the socket and allow the family to set things up. if
> >> - * the protocol is 0, the family is instructed to select an appropriate
> >> - * default.
> >> - */
> >> -
> >> - if (!(sock = sock_alloc())) {
> >> - printk(KERN_WARNING "socket: no more sockets\n");
> >> - err = -ENFILE; /* Not exactly a match, but its the
> >> - closest posix thing */
> >> - goto out;
> >> - }
> >> -
> >> - sock->type = type;
> >> + rcu_read_lock();
> >> + pf = rcu_dereference(net_families[family]);
> >
> >OK, so the elements of the net_families array are protected by RCU.
> >All references should either be under rcu_read_lock() and accessed
> >via rcu_dereference() or under the update-side lock, whatever that
> >might be.
> >
>
> Yes, the net_family_lock
Good.
> >>
> >> -/*
> >> +/**
> >> + * sock_unregister - remove a protocol handler
> >> + * @family: protocol family to remove
> >> + *
> >> * This function is called by a protocol handler that wants to
> >> * remove its address family, and have it unlinked from the
> >> - * SOCKET module.
> >> + * new socket creation.
> >> + *
> >> + * If protocol handler is a module, then it can use module reference
> >> + * counts to protect against new references. If protocol handler is not
> >> + * a module then it needs to provide its own protection in
> >> + * the ops->create routine.
> >> */
> >> -
> >> int sock_unregister(int family)
> >> {
> >> if (family < 0 || family >= NPROTO)
> >> - return -1;
> >> + return -EINVAL;
> >>
> >> - net_family_write_lock();
> >> + spin_lock(&net_family_lock);
> >> net_families[family] = NULL;
> >
> >And this one is covered by net_families_lock, so we are set, since this
> >is the last one.
> >
> >> - net_family_write_unlock();
> >> + spin_unlock(&net_family_lock);
> >> +
> >> + synchronize_rcu();
> >
> >OK, and the caller is presumably going to free up whatever needs to be
> >freed.
> >
> >Or, if nothing need be freed, beyond this point, we know that all
> >non-sleeping code paths through any of the net_protocol_family
> >functions have completed.
> >
> >(So, are all of the functions non-sleeping, or do we care? The
> >definition of net_protocol_family in include/linux/net.h doesn't say
> >that they need to be non-sleeping...)
> >
> >atalk_create() can potentially sleep in the following line of code:
> >
> > sk = sk_alloc(PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
>
> The module reference counts are used to prevent that. Since
> appletalk module can't be unloaded until there are no more appletalk
> sockets open (ie ref count of appletalk module is zero). To prevent
> new references there is a call to try_module_get() before the
> net_families[family]->create() call. This happens inside
> rcu_read_lock.
>
> >What prevents atalk_create() running concurrently with sock_unregister()?
>
> Module ref counts. I didn't want to start ref counting the net_families
> because that would cause another atomic op, and would penalize the normal
> case of non-modular IPV4 (and non-moduler IPV6).
OK.
> >(One possible reason is that ->create is only called in __sock_create(),
> >and that there is something preventing sock_unregister() from being called
> >before __sock_create() returns -- but I must defer to people who understand
> >networking better than do I.)
>
> That is why I added a comment to sock_unregister() describing how for the
> normal case of protocol as module, this is safe. But if protocol wants
> to call sock_unregister() other times it needs to do it's own ref counting.
>
> There are a couple of calls to sock_unregister() from module init routines,
> but these are in benign error unwind cases. The code in these cases
> could be reorganized to do sock_register() last during initilization,
> but it hardly matters.
Fair enough!
Thanx, Paul
next prev parent reply other threads:[~2006-08-11 0:45 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-09 18:31 [PATCH 0/5] net socket family patches Stephen Hemminger
2006-08-09 18:31 ` [PATCH 1/5] sock_create bad error return Stephen Hemminger
2006-08-10 3:47 ` David Miller
2006-08-10 17:06 ` Stephen Hemminger
2006-08-09 18:31 ` [PATCH 2/5] socket: code style cleanup Stephen Hemminger
2006-08-10 3:49 ` David Miller
2006-08-10 8:19 ` Andrew Morton
2006-08-10 8:55 ` David Miller
2006-08-09 18:31 ` [PATCH 3/5] net: drop unused elements from net_proto_family Stephen Hemminger
2006-08-10 3:50 ` David Miller
2006-08-09 18:31 ` [PATCH 4/5] net: socket family using RCU Stephen Hemminger
2006-08-10 4:00 ` David Miller
2006-08-10 18:36 ` Paul E. McKenney
2006-08-10 20:28 ` Stephen Hemminger
2006-08-11 0:46 ` Paul E. McKenney [this message]
2006-08-09 18:31 ` [PATCH 5/5] sock_register interface changes Stephen Hemminger
2006-08-10 4:03 ` David Miller
2006-08-10 0:06 ` [PATCH 0/5] net socket family patches David Miller
2006-08-10 5:36 ` Alexey Toptygin
2006-08-10 18:55 ` Randy.Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060811004639.GO1298@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=akpm@osdl.org \
--cc=davem@davemloft.net \
--cc=josht@us.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=shemminger@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).