Re: [patch net-next V5] net: introduce ethernet teaming device

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jiri Pirko <jpirko@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	bhutchings@solarflare.com, shemminger@vyatta.com,
	fubar@us.ibm.com, andy@greyhouse.net, tgraf@infradead.org,
	ebiederm@xmission.com, mirqus@gmail.com, kaber@trash.net,
	greearb@candelatech.com, jesse@nicira.com, fbl@redhat.com,
	benjamin.poirier@gmail.com, jzupka@redhat.com
Subject: Re: [patch net-next V5] net: introduce ethernet teaming device
Date: Wed, 26 Oct 2011 10:49:57 +0200	[thread overview]
Message-ID: <20111026084946.GA2030@minipsycho.redhat.com> (raw)
In-Reply-To: <1319549255.10883.16.camel@edumazet-laptop>

Tue, Oct 25, 2011 at 03:27:35PM CEST, eric.dumazet@gmail.com wrote:
>Le mardi 25 octobre 2011 à 15:03 +0200, Jiri Pirko a écrit :
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>> 
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>> 
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>> 
>> Python binding basis for libteam was recently introduced (some need
>> still need to be done on it though). Daemon providing arpmon/miimon
>> active-backup functionality will be introduced shortly.
>> All what's necessary is already implemented in kernel team driver.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> 
>> v4->v5:
>> 	- team_change_mtu() uses team->lock while travesing though port
>> 	  list
>> 	- mac address changes are moved completely to jurisdiction of
>> 	  userspace daemon. This way the daemon can do FOM1, FOM2 and
>> 	  possibly other weird things with mac addresses.
>> 	  Only round-robin mode sets up all ports to bond's address then
>> 	  enslaved.
>> 	- Extended Kconfig text
>
>
>Following is not called under rcu, but rtnl or team spinlock held.
>
>Therefore, team_get_port_by_index_rcu() is not the right thing.

Yes, this is bug, I missed this.

>
>+static void __reconstruct_port_hlist(struct team *team, int rm_index)
>+{
>+       int i;
>+       struct team_port *port;
>+
>+       for (i = rm_index + 1; i < team->port_count; i++) {
>+               port = team_get_port_by_index_rcu(team, i);
>+               hlist_del_rcu(&port->hlist);
>+               port->index--;
>+               hlist_add_head_rcu(&port->hlist,
>+                                  team_port_index_hash(team, port->index));
>+       }
>+}
>+
>
>In fact, I claim most of your rcu_read_lock() in management side are
>bogus and obfuscate code.
>
>RCU is an exact science, not a commodity.
>
>When RCU is used, rcu_read_lock()/rcu_read_unlock() are used by readers,
>not managers :
>
>They should use a spin/mutex(rtnl usually in network land)
>and normal reads, no need for rcu_something
>
>Only writes must take care of concurrent readers (aka rcu_assign_pointer())
>
>For example:
>
>team_nl_fill_port_list_get_changed() should not use
>	list_for_each_entry_rcu(port, &team->port_list, list) {
>but a regular
>	list_for_each_entry(port, &team->port_list, list) {

Nod. This I missed as well.

>
>
>team_nl_team_get() should not play with RCU either.
>	It can use __dev_get_by_index() instead of dev_get_by_index_rcu()

Not true. RTNL is not held here.


>
>Comment in front of team_nl_team_get() is bogus as well :
>
>/*
> * Netlink cmd functions should be locked by following two functions.
> * To ensure team_uninit would not be called in between, hold rcu_read_lock
> * all the time.
> */
>
>How can holding rcu_read_lock() can prevent another cpu doing whatever he wants ?
>
>It seems you believe rcu_read_lock() is a read_lock(), but it isnt.

I'm aware. But in this particular case, holding rcu_read_lock
effectively does the thing. Because team_uninit is called from
rollback_registered_many() after synchronize_net is called. The thing is
that holding rcu_read_lock ensures that team->dev does not disappear
until team_nl_team_put() is called.

>
>Using right API is essential to get appropriate LOCKDEP semantic and
>code maintainability.
>
>
>

     prev parent reply	other threads:[~2011-10-26  8:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-25 13:03 [patch net-next V5] net: introduce ethernet teaming device Jiri Pirko
2011-10-25 13:27 ` Eric Dumazet
2011-10-26  8:49   ` Jiri Pirko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111026084946.GA2030@minipsycho.redhat.com \
    --to=jpirko@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=benjamin.poirier@gmail.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fbl@redhat.com \
    --cc=fubar@us.ibm.com \
    --cc=greearb@candelatech.com \
    --cc=jesse@nicira.com \
    --cc=jzupka@redhat.com \
    --cc=kaber@trash.net \
    --cc=mirqus@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    --cc=tgraf@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).