Re: [patch net-next V5] net: introduce ethernet teaming device

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jiri Pirko <jpirko@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	bhutchings@solarflare.com, shemminger@vyatta.com,
	fubar@us.ibm.com, andy@greyhouse.net, tgraf@infradead.org,
	ebiederm@xmission.com, mirqus@gmail.com, kaber@trash.net,
	greearb@candelatech.com, jesse@nicira.com, fbl@redhat.com,
	benjamin.poirier@gmail.com, jzupka@redhat.com
Subject: Re: [patch net-next V5] net: introduce ethernet teaming device
Date: Wed, 26 Oct 2011 10:49:57 +0200	[thread overview]
Message-ID: <20111026084946.GA2030@minipsycho.redhat.com> (raw)
In-Reply-To: <1319549255.10883.16.camel@edumazet-laptop>

Tue, Oct 25, 2011 at 03:27:35PM CEST, eric.dumazet@gmail.com wrote:
>Le mardi 25 octobre 2011 à 15:03 +0200, Jiri Pirko a écrit :
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>> 
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>> 
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>> 
>> Python binding basis for libteam was recently introduced (some need
>> still need to be done on it though). Daemon providing arpmon/miimon
>> active-backup functionality will be introduced shortly.
>> All what's necessary is already implemented in kernel team driver.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> 
>> v4->v5:
>> 	- team_change_mtu() uses team->lock while travesing though port
>> 	  list
>> 	- mac address changes are moved completely to jurisdiction of
>> 	  userspace daemon. This way the daemon can do FOM1, FOM2 and
>> 	  possibly other weird things with mac addresses.
>> 	  Only round-robin mode sets up all ports to bond's address then
>> 	  enslaved.
>> 	- Extended Kconfig text
>
>
>Following is not called under rcu, but rtnl or team spinlock held.
>
>Therefore, team_get_port_by_index_rcu() is not the right thing.

Yes, this is bug, I missed this.

>
>+static void __reconstruct_port_hlist(struct team *team, int rm_index)
>+{
>+       int i;
>+       struct team_port *port;
>+
>+       for (i = rm_index + 1; i < team->port_count; i++) {
>+               port = team_get_port_by_index_rcu(team, i);
>+               hlist_del_rcu(&port->hlist);
>+               port->index--;
>+               hlist_add_head_rcu(&port->hlist,
>+                                  team_port_index_hash(team, port->index));
>+       }
>+}
>+
>
>In fact, I claim most of your rcu_read_lock() in management side are
>bogus and obfuscate code.
>
>RCU is an exact science, not a commodity.
>
>When RCU is used, rcu_read_lock()/rcu_read_unlock() are used by readers,
>not managers :
>
>They should use a spin/mutex(rtnl usually in network land)
>and normal reads, no need for rcu_something
>
>Only writes must take care of concurrent readers (aka rcu_assign_pointer())
>
>For example:
>
>team_nl_fill_port_list_get_changed() should not use
>	list_for_each_entry_rcu(port, &team->port_list, list) {
>but a regular
>	list_for_each_entry(port, &team->port_list, list) {

Nod. This I missed as well.

>
>
>team_nl_team_get() should not play with RCU either.
>	It can use __dev_get_by_index() instead of dev_get_by_index_rcu()

Not true. RTNL is not held here.


>
>Comment in front of team_nl_team_get() is bogus as well :
>
>/*
> * Netlink cmd functions should be locked by following two functions.
> * To ensure team_uninit would not be called in between, hold rcu_read_lock
> * all the time.
> */
>
>How can holding rcu_read_lock() can prevent another cpu doing whatever he wants ?
>
>It seems you believe rcu_read_lock() is a read_lock(), but it isnt.

I'm aware. But in this particular case, holding rcu_read_lock
effectively does the thing. Because team_uninit is called from
rollback_registered_many() after synchronize_net is called. The thing is
that holding rcu_read_lock ensures that team->dev does not disappear
until team_nl_team_put() is called.

>
>Using right API is essential to get appropriate LOCKDEP semantic and
>code maintainability.
>
>
>

     prev parent reply	other threads:[~2011-10-26  8:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-25 13:03 [patch net-next V5] net: introduce ethernet teaming device Jiri Pirko
2011-10-25 13:27 ` Eric Dumazet
2011-10-26  8:49   ` Jiri Pirko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111026084946.GA2030@minipsycho.redhat.com \
    --to=jpirko@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=benjamin.poirier@gmail.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fbl@redhat.com \
    --cc=fubar@us.ibm.com \
    --cc=greearb@candelatech.com \
    --cc=jesse@nicira.com \
    --cc=jzupka@redhat.com \
    --cc=kaber@trash.net \
    --cc=mirqus@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    --cc=tgraf@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.