netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jpirko@redhat.com>
To: Benjamin Poirier <benjamin.poirier@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	eric.dumazet@gmail.com, bhutchings@solarflare.com,
	shemminger@vyatta.com, fubar@us.ibm.com, andy@greyhouse.net,
	tgraf@infradead.org, ebiederm@xmission.com, mirqus@gmail.com,
	kaber@trash.net, greearb@candelatech.com, jesse@nicira.com,
	fbl@redhat.com, jzupka@redhat.com,
	Dipankar Sarma <dipankar@in.ibm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [patch net-next V4] net: introduce ethernet teaming device
Date: Mon, 24 Oct 2011 15:50:43 +0200	[thread overview]
Message-ID: <20111024135042.GA8113@minipsycho.brq.redhat.com> (raw)
In-Reply-To: <20111024130918.GB24473@synalogic.ca>

Mon, Oct 24, 2011 at 03:09:19PM CEST, benjamin.poirier@gmail.com wrote:
>On 11/10/24 10:13, Jiri Pirko wrote:
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>> 
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>> 
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>> 
>> Python binding basis for libteam was recently introduced (some need
>> still need to be done on it though). Daemon providing arpmon/miimon
>> active-backup functionality will be introduced shortly.
>> All what's necessary is already implemented in kernel team driver.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> 
>> v3->v4:
>> 	- remove redundant synchronize_rcu from __team_change_mode()
>> 	- revert "set and clear of mode_ops happens per pointer, not per
>> 	  byte"
>> 	- extend comment of function __team_change_mode()
>> 
>> v2->v3:
>> 	- team_change_mtu() user rcu version of list traversal to unwind
>> 	- set and clear of mode_ops happens per pointer, not per byte
>> 	- port hashlist changed to be embedded into team structure
>> 	- error branch in team_port_enter() does cleanup now
>> 	- fixed rtln->rtnl
>> 
>> v1->v2:
>> 	- modes are made as modules. Makes team more modular and
>> 	  extendable.
>> 	- several commenters' nitpicks found on v1 were fixed
>> 	- several other bugs were fixed.
>> 	- note I ignored Eric's comment about roundrobin port selector
>> 	  as Eric's way may be easily implemented as another mode (mode
>> 	  "random") in future.
>> ---
>>  Documentation/networking/team.txt         |    2 +
>>  MAINTAINERS                               |    7 +
>>  drivers/net/Kconfig                       |    2 +
>>  drivers/net/Makefile                      |    1 +
>>  drivers/net/team/Kconfig                  |   38 +
>>  drivers/net/team/Makefile                 |    7 +
>>  drivers/net/team/team.c                   | 1573 +++++++++++++++++++++++++++++
>>  drivers/net/team/team_mode_activebackup.c |  152 +++
>>  drivers/net/team/team_mode_roundrobin.c   |  107 ++
>>  include/linux/Kbuild                      |    1 +
>>  include/linux/if.h                        |    1 +
>>  include/linux/if_team.h                   |  231 +++++
>>  include/linux/rculist.h                   |   14 +
>
>I think you're missing some CC's for the modifications to this file.
>I've taken the liberty of adding Dipankar and Paul to the discussion.
>
>>  13 files changed, 2136 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/networking/team.txt
>>  create mode 100644 drivers/net/team/Kconfig
>>  create mode 100644 drivers/net/team/Makefile
>>  create mode 100644 drivers/net/team/team.c
>>  create mode 100644 drivers/net/team/team_mode_activebackup.c
>>  create mode 100644 drivers/net/team/team_mode_roundrobin.c
>>  create mode 100644 include/linux/if_team.h
>> 
>
>[...]
>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> new file mode 100644
>> index 0000000..acfef4c
>> --- /dev/null
>> +++ b/drivers/net/team/team.c
>> +
>[...]
>> +static int team_change_mtu(struct net_device *dev, int new_mtu)
>> +{
>> +	struct team *team = netdev_priv(dev);
>> +	struct team_port *port;
>> +	int err;
>> +
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(port, &team->port_list, list) {
>> +		err = dev_set_mtu(port->dev, new_mtu);
>> +		if (err) {
>> +			netdev_err(dev, "Device %s failed to change mtu",
>> +				   port->dev->name);
>> +			goto unwind;
>> +		}
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	dev->mtu = new_mtu;
>> +
>> +	return 0;
>> +
>> +unwind:
>> +	list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
>> +		dev_set_mtu(port->dev, dev->mtu);
>> +
>> +	rcu_read_unlock();
>> +	return err;
>> +}
>> +
>> +
>
>[...]
>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index d079290..7586b2c 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
>>  	     pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>  
>>  /**
>> + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
>> + * @pos:	the type * to use as a loop cursor.
>> + * @head:	the head for your list.
>> + * @member:	the name of the list_struct within the struct.
>> + *
>> + * Start to iterate over list of given type backwards, continuing after
>> + * the current position.
>> + */
>> +#define list_for_each_entry_continue_reverse_rcu(pos, head, member)	\
>> +	for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
>> +	     &pos->member != (head);	\
>> +	     pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
>> +
>
>rcu lists can be modified while they are traversed with *_rcu()
>primitives. This benefit comes with the constraint that they may only be
>traversed forwards. This is implicit in the choice of *_rcu()
>list-traversal primitives: they only go forwards.
>
>You suggest to add a backwards rcu list-traversal primitive. But
>consider what happens in this sequence:
>
>CPU0					CPU1
>list_for_each_entry_continue_reverse_rcu(...)
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
>					list_del_rcu(&pos->member)
>					{ (&pos->member)->prev = LIST_POISON2 }
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
>    = container_of(LIST_POISON2, typeof(*pos), member)
>do_something(*pos)
>    BAM!

Doh, you are right. :(

>
>Going back to the problem you're trying to solve in team_change_mtu(),
>I think you could either:
>1) take team->lock instead of rcu_read_lock() throughout this particular
>function
>2) save each deleted element in a separate list on the side in case it's
>necessary to roll back

I would go with this one.

>3) remove the rcu double locking, rely on rtnl and add some
>ASSERT_RTNL() if desired. You've said that you don't want to rely on
>rtnl and you want to use separate locking but I fail to see what
>advantage that brings to balance out the extra complexity in code and
>execution? Please clarify this.

I do not want to use rtnl. For example in gennetlink code rtnl is not
held so I need to depend on own lock.


>
>Thanks,
>-Ben
>
>> +/**
>>   * hlist_del_rcu - deletes entry from hash list without re-initialization
>>   * @n: the element to delete from the hash list.
>>   *
>> -- 
>> 1.7.6
>> 

  reply	other threads:[~2011-10-24 13:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-24  8:13 [patch net-next V4] net: introduce ethernet teaming device Jiri Pirko
2011-10-24 13:09 ` Benjamin Poirier
2011-10-24 13:50   ` Jiri Pirko [this message]
2011-10-24 14:11   ` Paul E. McKenney
2011-10-24 17:22 ` Michał Mirosław
2011-10-25  7:02   ` Jiri Pirko
2011-10-25 13:22   ` Flavio Leitner
2011-10-25  0:02 ` Andy Gospodarek
2011-10-25  6:42   ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111024135042.GA8113@minipsycho.brq.redhat.com \
    --to=jpirko@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=benjamin.poirier@gmail.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fbl@redhat.com \
    --cc=fubar@us.ibm.com \
    --cc=greearb@candelatech.com \
    --cc=jesse@nicira.com \
    --cc=jzupka@redhat.com \
    --cc=kaber@trash.net \
    --cc=mirqus@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=shemminger@vyatta.com \
    --cc=tgraf@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).