From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Poirier Subject: Re: [patch net-next V4] net: introduce ethernet teaming device Date: Mon, 24 Oct 2011 09:09:19 -0400 Message-ID: <20111024130918.GB24473@synalogic.ca> References: <1319444005-1281-1-git-send-email-jpirko@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org, davem@davemloft.net, eric.dumazet@gmail.com, bhutchings@solarflare.com, shemminger@vyatta.com, fubar@us.ibm.com, andy@greyhouse.net, tgraf@infradead.org, ebiederm@xmission.com, mirqus@gmail.com, kaber@trash.net, greearb@candelatech.com, jesse@nicira.com, fbl@redhat.com, jzupka@redhat.com, Dipankar Sarma , "Paul E. McKenney" To: Jiri Pirko Return-path: Received: from mail-gy0-f174.google.com ([209.85.160.174]:42745 "EHLO mail-gy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932318Ab1JXNJ1 (ORCPT ); Mon, 24 Oct 2011 09:09:27 -0400 Received: by gyb13 with SMTP id 13so5701789gyb.19 for ; Mon, 24 Oct 2011 06:09:27 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1319444005-1281-1-git-send-email-jpirko@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On 11/10/24 10:13, Jiri Pirko wrote: > This patch introduces new network device called team. It supposes to be > very fast, simple, userspace-driven alternative to existing bonding > driver. > > Userspace library called libteam with couple of demo apps is available > here: > https://github.com/jpirko/libteam > Note it's still in its dipers atm. > > team<->libteam use generic netlink for communication. That and rtnl > suppose to be the only way to configure team device, no sysfs etc. > > Python binding basis for libteam was recently introduced (some need > still need to be done on it though). Daemon providing arpmon/miimon > active-backup functionality will be introduced shortly. > All what's necessary is already implemented in kernel team driver. > > Signed-off-by: Jiri Pirko > > v3->v4: > - remove redundant synchronize_rcu from __team_change_mode() > - revert "set and clear of mode_ops happens per pointer, not per > byte" > - extend comment of function __team_change_mode() > > v2->v3: > - team_change_mtu() user rcu version of list traversal to unwind > - set and clear of mode_ops happens per pointer, not per byte > - port hashlist changed to be embedded into team structure > - error branch in team_port_enter() does cleanup now > - fixed rtln->rtnl > > v1->v2: > - modes are made as modules. Makes team more modular and > extendable. > - several commenters' nitpicks found on v1 were fixed > - several other bugs were fixed. > - note I ignored Eric's comment about roundrobin port selector > as Eric's way may be easily implemented as another mode (mode > "random") in future. > --- > Documentation/networking/team.txt | 2 + > MAINTAINERS | 7 + > drivers/net/Kconfig | 2 + > drivers/net/Makefile | 1 + > drivers/net/team/Kconfig | 38 + > drivers/net/team/Makefile | 7 + > drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++ > drivers/net/team/team_mode_activebackup.c | 152 +++ > drivers/net/team/team_mode_roundrobin.c | 107 ++ > include/linux/Kbuild | 1 + > include/linux/if.h | 1 + > include/linux/if_team.h | 231 +++++ > include/linux/rculist.h | 14 + I think you're missing some CC's for the modifications to this file. I've taken the liberty of adding Dipankar and Paul to the discussion. > 13 files changed, 2136 insertions(+), 0 deletions(-) > create mode 100644 Documentation/networking/team.txt > create mode 100644 drivers/net/team/Kconfig > create mode 100644 drivers/net/team/Makefile > create mode 100644 drivers/net/team/team.c > create mode 100644 drivers/net/team/team_mode_activebackup.c > create mode 100644 drivers/net/team/team_mode_roundrobin.c > create mode 100644 include/linux/if_team.h > [...] > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > new file mode 100644 > index 0000000..acfef4c > --- /dev/null > +++ b/drivers/net/team/team.c > + [...] > +static int team_change_mtu(struct net_device *dev, int new_mtu) > +{ > + struct team *team = netdev_priv(dev); > + struct team_port *port; > + int err; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(port, &team->port_list, list) { > + err = dev_set_mtu(port->dev, new_mtu); > + if (err) { > + netdev_err(dev, "Device %s failed to change mtu", > + port->dev->name); > + goto unwind; > + } > + } > + rcu_read_unlock(); > + > + dev->mtu = new_mtu; > + > + return 0; > + > +unwind: > + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list) > + dev_set_mtu(port->dev, dev->mtu); > + > + rcu_read_unlock(); > + return err; > +} > + > + [...] > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index d079290..7586b2c 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list, > pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) > > /** > + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point > + * @pos: the type * to use as a loop cursor. > + * @head: the head for your list. > + * @member: the name of the list_struct within the struct. > + * > + * Start to iterate over list of given type backwards, continuing after > + * the current position. > + */ > +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \ > + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \ > + &pos->member != (head); \ > + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)) > + rcu lists can be modified while they are traversed with *_rcu() primitives. This benefit comes with the constraint that they may only be traversed forwards. This is implicit in the choice of *_rcu() list-traversal primitives: they only go forwards. You suggest to add a backwards rcu list-traversal primitive. But consider what happens in this sequence: CPU0 CPU1 list_for_each_entry_continue_reverse_rcu(...) pos = list_entry_rcu(pos->member.prev, typeof(*pos), member) list_del_rcu(&pos->member) { (&pos->member)->prev = LIST_POISON2 } pos = list_entry_rcu(pos->member.prev, typeof(*pos), member) = container_of(LIST_POISON2, typeof(*pos), member) do_something(*pos) BAM! Going back to the problem you're trying to solve in team_change_mtu(), I think you could either: 1) take team->lock instead of rcu_read_lock() throughout this particular function 2) save each deleted element in a separate list on the side in case it's necessary to roll back 3) remove the rcu double locking, rely on rtnl and add some ASSERT_RTNL() if desired. You've said that you don't want to rely on rtnl and you want to use separate locking but I fail to see what advantage that brings to balance out the extra complexity in code and execution? Please clarify this. Thanks, -Ben > +/** > * hlist_del_rcu - deletes entry from hash list without re-initialization > * @n: the element to delete from the hash list. > * > -- > 1.7.6 >