From: Jay Vosburgh <fubar@us.ibm.com>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Laurent Chavey <chavey@google.com>,
Johannes Berg <johannes@sipsolutions.net>,
Mikhail Markine <markine@google.com>,
netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net,
Petri Gynther <pgynther@google.com>,
David Miller <davem@davemloft.net>
Subject: Re: [Bonding-devel] [PATCH] bonding: cancel_delayed_work() -> cancel_delayed_work_sync()
Date: Thu, 17 Dec 2009 13:40:29 -0800 [thread overview]
Message-ID: <9587.1261086029@death.nxdomain.ibm.com> (raw)
In-Reply-To: <20091217205617.GB2578@ami.dom.local>
Jarek Poplawski <jarkao2@gmail.com> wrote:
>On Thu, Dec 17, 2009 at 11:37:42AM -0800, Jay Vosburgh wrote:
>> Laurent Chavey <chavey@google.com> wrote:
>>
>> >one instance that could be a problem
>> >
>> >__exit bonding_exit(void)
>> > bond_free_all()
>> > bond_work_cancel_all(bond);
>> > unregister_netdevice(bond_dev)
>> >
>> >could the above result in an invalid pointer when trying
>> >to use bond-> in one of the timer CB ?
>>
>> The bonding teardown logic was reworked in October, and there is
>> no longer a bond_free_all in the current mainline. What kernel are you
>> looking at?
>>
>> The bond_close function will stop the various work items, and
>> the ndo_uninit (bond_uninit) will call bond_work_cancel_all as well.
>>
>> Actually, on looking at it (it being current mainline),
>> bond_uninit might need some kind of logic to wait and insure that all
>> timers have completed before returning. It comes from unregister, so
>> the next thing that happens after it returns is that the memory will be
>> freed (via netdev_run_todo, during rtnl_unlock, if I'm following it
>> correctly).
>>
>> The bond_uninit function is called under RTNL, though, so the
>> timer functions (bond_mii_monitor, et al) may need additional checks for
>> kill_timers to insure they don't attempt to acquire RTNL if a cancel is
>> pending.
>>
>> That's kind of tricky itself, since the lock ordering requires
>> RTNL to be acquired first, so there's no way for bond_mii_monitor (et
>> al) to check for kill_timers prior to already having RTNL (because the
>> function acquires RTNL conditionally, only if needed; to do that, it
>> unlocks the bond lock, then acquires RTNL, then re-locks the bond lock).
>>
>> So, the lock dance to acquire RTNL in bond_mii_monitor (et al)
>> would need some trickery, perhaps a rtnl_trylock loop, that checks
>> kill_timers each time the trylock fails, e.g.,
>>
>> if (bond_miimon_inspect(bond)) {
>> read_unlock(&bond->lock);
>> while (!rtnl_trylock) {
>> read_lock(&bond->lock);
>> if (bond->kill_timers)
>> goto out;
>> read_unlock(&bond->lock);
>> /* msleep ? */
>> }
>>
>> bond_miimon_commit(bond);
>> [...]
>>
>> So, with the above (and similar changes to the other delayed
>> work functions, and a big honkin' comment somewhere to explain this), I
>> suspect that bond_work_cancel_all could use the _sync variant to cancel
>> the work, as long as kill_timers is set before the cancel_sync is
>> called.
>>
>> Am I missing anything? Does this seem rational?
>
>It seems OK to me ...if there is nothing better ;-) But such endless
>loops are tricky (they omit lockdep, plus can hide some hidden
>dependancies between different tasks, even in the future). If it's
>possible we could consider a limited loop with re-arming on failure;
>then cancel_delayed_work_sync() (with its standard logic) could be
>used everywhere, and kill_timers might be useless too (if there is no
>re-arming between different works).
A less evil alternative would be to punt and reschedule the work
if the rtnl_trylock failed, e.g.,
if (bond_miimon_inspect(bond)) {
read_unlock(&bond->lock);
if (!rtnl_trylock()) {
queue_work(...);
return;
}
read_lock(&bond->lock);
bond_miimon_commit(bond);
[...]
I'm not sure what the usual contention level on rtnl is (and,
therefore, how often this will punt for the normal case that's not the
race we're trying to avoid here).
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
next prev parent reply other threads:[~2009-12-17 21:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-17 0:28 [PATCH] bonding: cancel_delayed_work() -> cancel_delayed_work_sync() Mikhail Markine
2009-12-17 7:49 ` Jarek Poplawski
2009-12-17 13:36 ` Jarek Poplawski
2009-12-17 14:30 ` Johannes Berg
2009-12-17 16:12 ` [Bonding-devel] " Jay Vosburgh
2009-12-17 18:40 ` Jarek Poplawski
2009-12-17 18:49 ` Laurent Chavey
2009-12-17 19:37 ` Jay Vosburgh
2009-12-17 20:56 ` Jarek Poplawski
2009-12-17 21:16 ` Jarek Poplawski
2009-12-17 21:40 ` Jay Vosburgh [this message]
2009-12-17 21:58 ` Jarek Poplawski
2009-12-17 22:33 ` Jarek Poplawski
2009-12-17 21:25 ` Laurent Chavey
2009-12-17 21:31 ` Mikhail Markine
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9587.1261086029@death.nxdomain.ibm.com \
--to=fubar@us.ibm.com \
--cc=bonding-devel@lists.sourceforge.net \
--cc=chavey@google.com \
--cc=davem@davemloft.net \
--cc=jarkao2@gmail.com \
--cc=johannes@sipsolutions.net \
--cc=markine@google.com \
--cc=netdev@vger.kernel.org \
--cc=pgynther@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.