From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Mahesh Bandewar <maheshb@google.com>,
Andy Gospodarek <andy@greyhouse.net>,
Veaceslav Falico <vfalico@gmail.com>,
David Miller <davem@davemloft.net>,
Maciej Zenczykowski <maze@google.com>,
netdev <netdev@vger.kernel.org>,
Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net-next 0/4] bonding work-queues, try_rtnl() & notifications
Date: Thu, 12 Mar 2015 12:10:07 -0700 [thread overview]
Message-ID: <13968.1426187407@famine> (raw)
In-Reply-To: <5501A1B1.3030907@redhat.com>
Nikolay Aleksandrov <nikolay@redhat.com> wrote:
>On 12/03/15 15:21, Nikolay Aleksandrov wrote:
>> On 12/03/15 15:11, Eric Dumazet wrote:
>>> On Thu, 2015-03-12 at 13:09 +0100, Nikolay Aleksandrov wrote:
>>>> On 12/03/15 06:54, Mahesh Bandewar wrote:
>>>>> This patch series tries to address the issue discovered in various work-
>>>>> queues and the way these handlers deal with the RTNL. Especially for
>>>>> notification handling. If RTNL can not be acquired, these handlers ignore
>>>>> sending notifications and just re-arm the timer. This could be very
>>>>> problematic if the re-arm timer has larger value (e.g. in minutes).
>>>>>
>>>>>
>>>>> Mahesh Bandewar (4):
>>>>> bonding: Handle notifications during work-queue processing gracefully
>>>>> bonding: Do not ignore notifications for miimon-work-queue
>>>>> bonding: Do not ignore notifications for AD-work-queue
>>>>> bonding: Do not ignore notifications for ARP-work-queue
>>>>>
>>>>> drivers/net/bonding/bond_3ad.c | 22 ++++++++++++---
>>>>> drivers/net/bonding/bond_main.c | 62 ++++++++++++++++++++++++-----------------
>>>>> include/net/bonding.h | 22 +++++++++++++++
>>>>> 3 files changed, 77 insertions(+), 29 deletions(-)
>>>>>
>>>>
>>>> Hello Mahesh,
>>>> The current behaviour was chosen as a best-effort type because such change
>>>> could skew the monitoring/AD timers because you re-schedule every time you
>>>> cannot acquire rtnl and move their timers with 1 tick ahead which could lead
>>>> to false link drops, state machines running late and so on.
>>>> Currently the monitoring/AD functions have priority over the notifications as in
>>>> we might miss a notification but we try not to miss a monitoring/AD event, thus I'd
>>>> suggest to solve this in a different manner. If you'd like to have the best of both
>>>> worlds i.e. not miss a monitoring event and also not miss any notifications you should
>>>> use a different strategy.
>>>
>>>
>>> I think I disagree here.
>>>
>>> All rtnl_trylock() call sites must be very careful about what they are
>>> doing.
>>>
>>> bonding is not, and we should fix this.
>>>
>>> Mahesh fix seems very reasonable. If you need something else, please
>>> provide your own patch.
>>>
>>> When code is the following :
>>>
>>> if (!rtnl_trylock())
>>> return;
>>> call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, bond->dev);
>>> rtnl_unlock();
>>>
>>> Then, one must understand what happens if at this point rtnl_trylock()
>>> never succeeds.
>>>
>>> Like, what happens if you remove the whole thing.
>>>
>>> If the code is not needed, remove it, instead of having flaky behavior.
>>>
>>
>> I agree that it should be fixed and that would work, my only concern is that in
>> rare cases this might lead to skewing of the monitoring/AD timers because with
>> every failed attempt at obtaining rtnl it moves the associated timer with 1 tick
>> ahead, because when it successfully obtains it then the timer gets re-armed with the
>> full timeout. What I suggested has already been done actually by the slave array update
>> which uses a work item of its own because of the rtnl update, so we could just pull all
>> of the call sites that request rtnl in a single work item which checks some bits and
>> calls the appropriate notifications or re-schedules itself if necessary, that would keep
>> all the monitoring/AD timers closer to being correct.
>> I can see that this would be a very rare case so I don't have a strong feeling about
>> my argument, I just wanted to make sure it gets considered. If you and the others decide
>> it's not worth it, then so be it. Also pulling all rtnl call sites in a single place
>> looks cleaner to me instead of having the same logic in each work item's function.
>>
>> Cheers,
>> Nik
>>
>
>Well, of course that has its own problems of missed updates. Hrm.. Okay
>let's leave it this way. Never mind my argument about the timer
>skewing, consider only fixing the RCU problem.
I don't see an issue with how this changes the notification
processing. I was initially concerned with the 802.3ad state machine,
but the actual state machine processing is skipped for the "accelerated"
invocation that sends the notifier. I don't see that the state machine
itself will become skewed due to delays unless rtnl_trylock fails
continuously for an extended period (upwards of 2 seconds, I think, long
enough to risk the 3 second LACPDU timeout if LACP rate is set to fast).
When the "rtnl_trylock" business was added, the primary concern
was deadlock avoidance, not timely delivery of the notifications. The
deadlock still looks to be a concern; some paths come in with RTNL held
and acquire the bond->mode_lock, the various monitoring functions are
doing the opposite order.
I recall another suggestion a while back to modify the bonding
periodic functions to acquire RTNL for every pass (followed by the
mode_lock) to eliminate the trylock, but I don't see that as preferrable
due to the amount of spurious RTNL acquisitions.
That said, it would be better to have a more deterministic
system (e.g., moving the "needs RTNL" logic into its own work item
without conditional locking), but as an incremental change to what's
there now, I don't see a problem with this patch set.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2015-03-12 19:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-12 5:54 [PATCH net-next 0/4] bonding work-queues, try_rtnl() & notifications Mahesh Bandewar
2015-03-12 12:09 ` Nikolay Aleksandrov
2015-03-12 14:11 ` Eric Dumazet
2015-03-12 14:21 ` Nikolay Aleksandrov
2015-03-12 14:24 ` Nikolay Aleksandrov
2015-03-12 19:10 ` Jay Vosburgh [this message]
2015-03-12 19:20 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=13968.1426187407@famine \
--to=jay.vosburgh@canonical.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=maheshb@google.com \
--cc=maze@google.com \
--cc=netdev@vger.kernel.org \
--cc=nikolay@redhat.com \
--cc=vfalico@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).