From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Aleksandrov Subject: Re: [RFC net-next 1/5] bonding: 3ad: use curr_slave_lock instead of bond->lock Date: Sat, 06 Sep 2014 12:19:01 +0200 Message-ID: <540ADF95.3070606@redhat.com> References: <1409941011-5494-1-git-send-email-nikolay@redhat.com> <1409941011-5494-2-git-send-email-nikolay@redhat.com> <23382.1409949468@localhost.localdomain> <540A247A.60202@redhat.com> <27301.1409965688@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, vfalico@gmail.com, andy@greyhouse.net, davem@davemloft.net To: Jay Vosburgh Return-path: Received: from mx1.redhat.com ([209.132.183.28]:33207 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750838AbaIFKTc (ORCPT ); Sat, 6 Sep 2014 06:19:32 -0400 In-Reply-To: <27301.1409965688@localhost.localdomain> Sender: netdev-owner@vger.kernel.org List-ID: On 09/06/2014 03:08 AM, Jay Vosburgh wrote: > Nikolay Aleksandrov wrote: > >> On 09/05/2014 10:37 PM, Jay Vosburgh wrote: >>> Nikolay Aleksandrov wrote: >>> >>>> In 3ad mode the only syncing needed by bond->lock is for the wq >>>> and the recv handler, so change them to use curr_slave_lock. >>>> There're no locking dependencies here as 3ad doesn't use >>>> curr_slave_lock at all. >>> >>> One subtle aspect of the 3ad locking is that it's not really >>> using the "read" property of the read lock with regard to the state >>> machine; it's largely using it as a spin lock, because there is at most >>> one reader and at most one writer in the full state machine code >>> (although there are multiple reader possibilities elsewhere). The code >>> would break if there actually were multiple read-lock holders in the >>> full state machine or aggregator selection logic simultaneously. >>> >>> Because the state machine and incoming LACPDU cases both acquire >>> the read lock for read, there is a separate per-port "state machine" >>> spin lock to protect only the per-port fields that LACPDU and periodic >>> state machine both touch. The incoming LACPDU case doesn't call into >>> the full state machine, only the RX processing which can't go into agg >>> selection, so this works. >>> >>> The agg selection can be entered via the unbind path or the >>> periodic state machine (and only these two paths), and relies on the >>> "one reader max" usage of the read lock to mutex the code paths that may >>> enter agg selection. >>> >>> I suspect that what 3ad may need is a spin lock, not a read >>> lock, because the multiple reader property isn't really being utilized; >>> the incoming LACPDU and periodic state machine both acquire the read >>> lock for read, but then acquire a second per-port spin lock. If the >>> "big" (bond->lock currently) lock is a spin lock, then the per-port >>> state machine lock is unnecessary, as the only purpose of the per-port >>> lock is to mutex the one case that does have multiple readers. >>> >>> In actual practice I doubt there are multiple simultaneous >>> readers very often; the periodic machine runs every 100 ms, but LACPDUs >>> arrive for each port either every second or every 30 seconds (depending >>> on admin configuration). >>> >>> Since contention on these locks is generally low, we're probably >>> better off in the long run with something simpler to understand. >>> >>> So, what I'm kind of saying here is that this patch isn't a bad >>> first step, but at least for the 3ad case, removal of the bond->lock >>> itself doesn't really simplify the locking as much as could be done. >>> >>> Thoughts? >>> >>> -J >>> >>> >> >> Hi Jay, >> That is a very good point, my main idea was to protect __bond_release_one >> and the machine handler otherwise I'd have removed it altogether. I know >> that this doesn't improve on the 3ad situation, I did it mostly to get rid >> of bond->lock first. Going with a spinlock certainly makes sense there as >> we don't spend much time inside and the contention is not high as you said >> and would simplify the 3ad code so I like it :-) >> I will include it in my bond-locking todo list and will post a follow-up >> once I've cleared the details up as I'm speaking from the top of my head >> right now, but first I'd like to clean the current lock use especially with >> regard to curr_slave_lock and bring it to the necessary minimum. In the >> long run I think that we either might be able to remove curr_slave_lock >> completely or at least reduce it to ~ 3 places and with the spinlock that >> you suggested here, we'll be definitely able to remove it from the 3ad code. > > I looked into curr_slave_lock some time ago, and as I recall > there's not much it really protects that's not also covered by RTNL, > since changes to the active slave are all happening under RTNL these > days. And that was before the RCU conversion; with the current code, > I'm not sure the curr_slave_lock is providing much mutexing beyond what > we get from RTNL. > Right, that was one of the main things that drove me to start this, after reviewing Mahesh's patches I realized most of the callers already had RTNL and curr_slave_lock looked mostly redundant, and bond->lock completely redundant. > There are a couple of special cases, like the TLB rebalance in > bond_alb_monitor, but that happens once every 10 seconds, and could just > grab RTNL for this bit: > > if (slave == rcu_access_pointer(bond->curr_active_slave)) { > SLAVE_TLB_INFO(slave).load = > bond_info->unbalanced_load / > BOND_TLB_REBALANCE_INTERVAL; > bond_info->unbalanced_load = 0; > > the "unbalanced_load," now that I'm looking at it, might already > have some race problems since it's now updated outside of bonding locks > in bond_do_alb_xmit. It'll probably race with multiple bond_do_alb_xmit > functions running simultaneously as well as the tx rebalance in > bond_alb_monitor. I think the worst that will happen is that the tx > traffic load is distributed suboptimally for 10 seconds. > Yes, I don't think this needs fixing, but we might revisit it once the rest of the pieces are in place. > The rlb_clear_slave case that acquires curr_slave_lock already > also has RTNL, so I'm not sure that removing the curr_slave_lock will > have any impact there, either. Many of the other curr_slave_lock > holders bounce the curr_slave_lock to call into net/core functions (set > promisc, change MAC, etc), so there's already reliance on RTNL. > Yes, rlb_clear_slave is not the problem, the problem if we remove curr_slave_lock was that both rlb_clear_slave and bond_alb_monitor can be transmitting packets at the same time or a mac swap might happen triggering packet transmission while bond_alb_monitor() is also transmitting i.e. pretty much everything that now gets curr_slave_lock for writing on the ALB side and transmits could transmit with bond_alb_monitor(), that's what I was trying to avoid. > Separately, it also might be possible to combine the various > per-mode special locks internally into a generic "mode lock" so the alb > and rlb hashtbl lock and the 802.3ad state machine lock could be a > single "bond->mode_lock" that mutexes whatever special sauce the active > mode needs protected. Not sure if that's worth the trouble or not, but > it seems plausible at first glance. > Yes, I had something similar in mind /actually called it sync_lock :)/, but I only went as far as to use it for a few places that needed syncing beyond RTNL after curr_slave_lock was removed, you've went one step further and this certainly looks doable. I'll keep it in mind. > -J > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com >