From: Jay Vosburgh <fubar@us.ibm.com>
To: Andy Gospodarek <andy@greyhouse.net>
Cc: Krzysztof Oledzki <olel@ans.pl>,
Herbert Xu <herbert@gondor.apana.org.au>,
Andrew Morton <akpm@linux-foundation.org>,
bugme-daemon@bugzilla.kernel.org,
shemminger@linux-foundation.org, davem@davemloft.net,
netdev@vger.kernel.org
Subject: Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
Date: Mon, 07 Jan 2008 12:40:25 -0800 [thread overview]
Message-ID: <3528.1199738425@death> (raw)
In-Reply-To: <20080107202626.GC8728@gospo.usersys.redhat.com>
Andy Gospodarek <andy@greyhouse.net> wrote:
>On Mon, Jan 07, 2008 at 06:57:25PM +0100, Krzysztof Oledzki wrote:
>>
>>
>> On Wed, 19 Dec 2007, Andy Gospodarek wrote:
>>
>> >On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
>> >>
>> >>
>> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>
>> >>>On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>> >>>>
>> >>>>
>> >>>>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>>>
>> >>>>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>> >>>>>>
>> >>>>>>>Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> >>>>>>>
>> >>>>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>> >>>>>>>>>out:
>> >>>>>>>>> write_unlock_bh(&bond->lock);
>> >>>>>>>>>
>> >>>>>>>>>- rtnl_unlock();
>> >>>>>>>>>-
>> >>>>>>>>
>> >>>>>>>>Looking at the changeset that added this perhaps the intention
>> >>>>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
>> >>>>>>>>of the function.
>> >>>>>>>
>> >>>>>>> Yes, this function needs to hold locks, and more than just
>> >>>>>>>what's there now. I believe the following should be correct; I
>> >>>>>>>haven't
>> >>>>>>>tested it, though (I'm supposedly on vacation right now).
>> >>>>>>>
>> >>>>>>> The following change should be correct for the
>> >>>>>>>bonding_store_primary case discussed in this thread, and also
>> >>>>>>>corrects
>> >>>>>>>the bonding_store_active case which performs similar functions.
>> >>>>>>>
>> >>>>>>> The bond_change_active_slave and bond_select_active_slave
>> >>>>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
>> >>>>>>>for
>> >>>>>>>write_bh, and no other locks. This is so that the lower level
>> >>>>>>>mode-specific functions can release locks down to just rtnl in order
>> >>>>>>>to
>> >>>>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl
>> >>>>>>>only).
>> >>>>>>>
>> >>>>>>>Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
>> >>>>>>>
>> >>>>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>index 11b76b3..28a2d80 100644
>> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>> struct slave *slave;
>> >>>>>>> struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>>- write_lock_bh(&bond->lock);
>> >>>>>>>+ rtnl_lock();
>> >>>>>>>+ read_lock(&bond->lock);
>> >>>>>>>+ write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>+
>> >>>>>>> if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>> printk(KERN_INFO DRV_NAME
>> >>>>>>> ": %s: Unable to set primary slave; %s is in
>> >>>>>>> mode
>> >>>>>>> %d\n",
>> >>>>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>> }
>> >>>>>>> }
>> >>>>>>>out:
>> >>>>>>>- write_unlock_bh(&bond->lock);
>> >>>>>>>-
>> >>>>>>>+ write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+ read_unlock(&bond->lock);
>> >>>>>>> rtnl_unlock();
>> >>>>>>>
>> >>>>>>> return count;
>> >>>>>>>@@ -1190,7 +1193,8 @@ static ssize_t
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>> struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>> rtnl_lock();
>> >>>>>>>- write_lock_bh(&bond->lock);
>> >>>>>>>+ read_lock(&bond->lock);
>> >>>>>>>+ write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>
>> >>>>>>> if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>> printk(KERN_INFO DRV_NAME
>> >>>>>>>@@ -1247,7 +1251,8 @@ static ssize_t
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>> }
>> >>>>>>> }
>> >>>>>>>out:
>> >>>>>>>- write_unlock_bh(&bond->lock);
>> >>>>>>>+ write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+ read_unlock(&bond->lock);
>> >>>>>>> rtnl_unlock();
>> >>>>>>>
>> >>>>>>> return count;
>> >>>>>>
>> >>>>>>Vanilla 2.6.24-rc5 plus this patch:
>> >>>>>>
>> >>>>>>=========================================================
>> >>>>>>[ INFO: possible irq lock inversion dependency detected ]
>> >>>>>>2.6.24-rc5 #1
>> >>>>>>---------------------------------------------------------
>> >>>>>>events/0/9 just changed the state of lock:
>> >>>>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>]
>> >>>>>>mld_ifc_timer_expire+0x130/0x1fb
>> >>>>>>but this lock took another, soft-read-irq-unsafe lock in the past:
>> >>>>>>(&bond->lock){-.--}
>> >>>>>>
>> >>>>>>and interrupts could create inverse lock ordering between them.
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>Grrr, I should have seen that -- sorry. Try your luck with this
>> >>>>>instead:
>> >>>><CUT>
>> >>>>
>> >>>>No luck.
>> >>>>
>> >>>
>> >>>
>> >>>I'm guessing if we go back to using a write-lock for bond->lock this
>> >>>will go back to working again, but I'm not totally convinced since there
>> >>>are plenty of places where we used a read-lock with it.
>> >>
>> >>Should I check this patch or rather, based on a future discussion, wait
>> >>for another version?
>> >>
>> >>>
>> >>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>b/drivers/net/bonding/bond_sysfs.c
>> >>>index 11b76b3..635b857 100644
>> >>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>> struct slave *slave;
>> >>> struct bonding *bond = to_bond(d);
>> >>>
>> >>>+ rtnl_lock();
>> >>> write_lock_bh(&bond->lock);
>> >>>+ write_lock_bh(&bond->curr_slave_lock);
>> >>>+
>> >>> if (!USES_PRIMARY(bond->params.mode)) {
>> >>> printk(KERN_INFO DRV_NAME
>> >>> ": %s: Unable to set primary slave; %s is in mode
>> >>> %d\n",
>> >>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>> }
>> >>> }
>> >>>out:
>> >>>+ write_unlock_bh(&bond->curr_slave_lock);
>> >>> write_unlock_bh(&bond->lock);
>> >>>-
>> >>> rtnl_unlock();
>> >>>
>> >>> return count;
>> >>>@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>>
>> >>> rtnl_lock();
>> >>> write_lock_bh(&bond->lock);
>> >>>+ write_lock_bh(&bond->curr_slave_lock);
>> >>>
>> >>> if (!USES_PRIMARY(bond->params.mode)) {
>> >>> printk(KERN_INFO DRV_NAME
>> >>>@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>> }
>> >>> }
>> >>>out:
>> >>>+ write_unlock_bh(&bond->curr_slave_lock);
>> >>> write_unlock_bh(&bond->lock);
>> >>> rtnl_unlock();
>> >>>
>> >>
>> >>
>> >>Best regards,
>> >>
>> >> Krzysztof Olędzki
>> >
>> >For now, I prefer Jay's original patch -- with the read_locks (rather
>> >than read/write_lock_bh) and the added rtnl_lock. There is still a
>> >lockdep issue that we need to sort-out, but this patch is needed first.
>>
>> This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any
>> chances to cure it before 2.6.24-final?
>>
>> Best regards,
>>
>> Krzysztof Olędzki
>
>Krzysztof,
>
>I doubt the lockdep issue will be fixed, but the patch Jay posted and I
>acked needs to be included in 2.6.24.
I'm (finally) back from vacation and am working on the lock
problem right now; there are a couple of other changes that need to go
in (in addition to what was posted previously). One is a spurious RTNL
warning, the other is a similar 'wrong lock' type of problem that arises
during module unload.
I should have a patch set for this posted in a couple of hours.
>I played around with the locking when setting the multicast list and I
>can make the lockdep issue go away, but I need to be sure that it's OK
>to switch it to a read-lock from a write-lock (and I don't really think
>it is).
I haven't looked at the lockdep problem yet. If you want to be
brave and post your working patch for the lockdep thing, I might be able
to crush your hopes that it's ok.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
prev parent reply other threads:[~2008-01-07 20:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-9543-10286@http.bugzilla.kernel.org/>
2007-12-11 11:46 ` [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055) Andrew Morton
2007-12-11 15:04 ` Krzysztof Oledzki
2007-12-11 20:30 ` Andrew Morton
2007-12-12 13:31 ` Herbert Xu
2007-12-12 17:46 ` Jay Vosburgh
2007-12-12 19:07 ` Andy Gospodarek
2007-12-14 16:14 ` Krzysztof Oledzki
2007-12-14 18:26 ` Andy Gospodarek
2007-12-14 18:57 ` Krzysztof Oledzki
2007-12-14 22:03 ` Andy Gospodarek
2007-12-14 22:11 ` Krzysztof Oledzki
2007-12-14 22:27 ` Andy Gospodarek
2007-12-18 19:52 ` Krzysztof Oledzki
2007-12-14 22:47 ` Andy Gospodarek
2007-12-15 4:10 ` Herbert Xu
2007-12-15 15:09 ` Andy Gospodarek
2007-12-16 2:27 ` Herbert Xu
2007-12-16 3:17 ` Andy Gospodarek
2007-12-16 3:23 ` Herbert Xu
2007-12-18 19:53 ` Krzysztof Oledzki
2007-12-19 14:42 ` Andy Gospodarek
2008-01-07 17:57 ` Krzysztof Oledzki
2008-01-07 20:26 ` Andy Gospodarek
2008-01-07 20:40 ` Jay Vosburgh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3528.1199738425@death \
--to=fubar@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andy@greyhouse.net \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@vger.kernel.org \
--cc=olel@ans.pl \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).