All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jay Vosburgh <fubar@us.ibm.com>
To: Andy Gospodarek <andy@greyhouse.net>
Cc: Krzysztof Oledzki <olel@ans.pl>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	bugme-daemon@bugzilla.kernel.org,
	shemminger@linux-foundation.org, davem@davemloft.net,
	netdev@vger.kernel.org
Subject: Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
Date: Mon, 07 Jan 2008 12:40:25 -0800	[thread overview]
Message-ID: <3528.1199738425@death> (raw)
In-Reply-To: <20080107202626.GC8728@gospo.usersys.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Mon, Jan 07, 2008 at 06:57:25PM +0100, Krzysztof Oledzki wrote:
>> 
>> 
>> On Wed, 19 Dec 2007, Andy Gospodarek wrote:
>> 
>> >On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
>> >>
>> >>
>> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>
>> >>>On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>> >>>>
>> >>>>
>> >>>>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>>>
>> >>>>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>> >>>>>>
>> >>>>>>>Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> >>>>>>>
>> >>>>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>> >>>>>>>>>out:
>> >>>>>>>>>    write_unlock_bh(&bond->lock);
>> >>>>>>>>>
>> >>>>>>>>>-       rtnl_unlock();
>> >>>>>>>>>-
>> >>>>>>>>
>> >>>>>>>>Looking at the changeset that added this perhaps the intention
>> >>>>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
>> >>>>>>>>of the function.
>> >>>>>>>
>> >>>>>>>	Yes, this function needs to hold locks, and more than just
>> >>>>>>>what's there now.  I believe the following should be correct; I 
>> >>>>>>>haven't
>> >>>>>>>tested it, though (I'm supposedly on vacation right now).
>> >>>>>>>
>> >>>>>>>	The following change should be correct for the
>> >>>>>>>bonding_store_primary case discussed in this thread, and also 
>> >>>>>>>corrects
>> >>>>>>>the bonding_store_active case which performs similar functions.
>> >>>>>>>
>> >>>>>>>	The bond_change_active_slave and bond_select_active_slave
>> >>>>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
>> >>>>>>>for
>> >>>>>>>write_bh, and no other locks.  This is so that the lower level
>> >>>>>>>mode-specific functions can release locks down to just rtnl in order 
>> >>>>>>>to
>> >>>>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl 
>> >>>>>>>only).
>> >>>>>>>
>> >>>>>>>Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
>> >>>>>>>
>> >>>>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>index 11b76b3..28a2d80 100644
>> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>>	struct slave *slave;
>> >>>>>>>	struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>>-	write_lock_bh(&bond->lock);
>> >>>>>>>+	rtnl_lock();
>> >>>>>>>+	read_lock(&bond->lock);
>> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>+
>> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>>		printk(KERN_INFO DRV_NAME
>> >>>>>>>		       ": %s: Unable to set primary slave; %s is in 
>> >>>>>>>		       mode
>> >>>>>>>		       %d\n",
>> >>>>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>>		}
>> >>>>>>>	}
>> >>>>>>>out:
>> >>>>>>>-	write_unlock_bh(&bond->lock);
>> >>>>>>>-
>> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+	read_unlock(&bond->lock);
>> >>>>>>>	rtnl_unlock();
>> >>>>>>>
>> >>>>>>>	return count;
>> >>>>>>>@@ -1190,7 +1193,8 @@ static ssize_t 
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>>	struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>>	rtnl_lock();
>> >>>>>>>-	write_lock_bh(&bond->lock);
>> >>>>>>>+	read_lock(&bond->lock);
>> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>
>> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>>		printk(KERN_INFO DRV_NAME
>> >>>>>>>@@ -1247,7 +1251,8 @@ static ssize_t 
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>>		}
>> >>>>>>>	}
>> >>>>>>>out:
>> >>>>>>>-	write_unlock_bh(&bond->lock);
>> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+	read_unlock(&bond->lock);
>> >>>>>>>	rtnl_unlock();
>> >>>>>>>
>> >>>>>>>	return count;
>> >>>>>>
>> >>>>>>Vanilla 2.6.24-rc5 plus this patch:
>> >>>>>>
>> >>>>>>=========================================================
>> >>>>>>[ INFO: possible irq lock inversion dependency detected ]
>> >>>>>>2.6.24-rc5 #1
>> >>>>>>---------------------------------------------------------
>> >>>>>>events/0/9 just changed the state of lock:
>> >>>>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>] 
>> >>>>>>mld_ifc_timer_expire+0x130/0x1fb
>> >>>>>>but this lock took another, soft-read-irq-unsafe lock in the past:
>> >>>>>>(&bond->lock){-.--}
>> >>>>>>
>> >>>>>>and interrupts could create inverse lock ordering between them.
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>Grrr, I should have seen that -- sorry.  Try your luck with this 
>> >>>>>instead:
>> >>>><CUT>
>> >>>>
>> >>>>No luck.
>> >>>>
>> >>>
>> >>>
>> >>>I'm guessing if we go back to using a write-lock for bond->lock this
>> >>>will go back to working again, but I'm not totally convinced since there
>> >>>are plenty of places where we used a read-lock with it.
>> >>
>> >>Should I check this patch or rather, based on a future discussion, wait
>> >>for another version?
>> >>
>> >>>
>> >>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>b/drivers/net/bonding/bond_sysfs.c
>> >>>index 11b76b3..635b857 100644
>> >>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>>	struct slave *slave;
>> >>>	struct bonding *bond = to_bond(d);
>> >>>
>> >>>+	rtnl_lock();
>> >>>	write_lock_bh(&bond->lock);
>> >>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>+
>> >>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>		printk(KERN_INFO DRV_NAME
>> >>>		       ": %s: Unable to set primary slave; %s is in mode
>> >>>		       %d\n",
>> >>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>>		}
>> >>>	}
>> >>>out:
>> >>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>	write_unlock_bh(&bond->lock);
>> >>>-
>> >>>	rtnl_unlock();
>> >>>
>> >>>	return count;
>> >>>@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>>
>> >>>	rtnl_lock();
>> >>>	write_lock_bh(&bond->lock);
>> >>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>
>> >>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>		printk(KERN_INFO DRV_NAME
>> >>>@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>>		}
>> >>>	}
>> >>>out:
>> >>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>	write_unlock_bh(&bond->lock);
>> >>>	rtnl_unlock();
>> >>>
>> >>
>> >>
>> >>Best regards,
>> >>
>> >>					Krzysztof Olędzki
>> >
>> >For now, I prefer Jay's original patch -- with the read_locks (rather
>> >than read/write_lock_bh) and the added rtnl_lock.  There is still a
>> >lockdep issue that we need to sort-out, but this patch is needed first.
>> 
>> This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
>> chances to cure it before 2.6.24-final?
>> 
>> Best regards,
>> 
>> 				Krzysztof Olędzki
>
>Krzysztof,
>
>I doubt the lockdep issue will be fixed, but the patch Jay posted and I
>acked needs to be included in 2.6.24.

	I'm (finally) back from vacation and am working on the lock
problem right now; there are a couple of other changes that need to go
in (in addition to what was posted previously).  One is a spurious RTNL
warning, the other is a similar 'wrong lock' type of problem that arises
during module unload.

	I should have a patch set for this posted in a couple of hours.

>I played around with the locking when setting the multicast list and I
>can make the lockdep issue go away, but I need to be sure that it's OK
>to switch it to a read-lock from a write-lock (and I don't really think
>it is).

	I haven't looked at the lockdep problem yet.  If you want to be
brave and post your working patch for the lockdep thing, I might be able
to crush your hopes that it's ok.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

      reply	other threads:[~2008-01-07 20:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-9543-10286@http.bugzilla.kernel.org/>
2007-12-11 11:46 ` [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055) Andrew Morton
2007-12-11 15:04   ` Krzysztof Oledzki
2007-12-11 20:30     ` Andrew Morton
2007-12-12 13:31   ` Herbert Xu
2007-12-12 17:46     ` Jay Vosburgh
2007-12-12 19:07       ` Andy Gospodarek
2007-12-14 16:14       ` Krzysztof Oledzki
2007-12-14 18:26         ` Andy Gospodarek
2007-12-14 18:57           ` Krzysztof Oledzki
2007-12-14 22:03             ` Andy Gospodarek
2007-12-14 22:11               ` Krzysztof Oledzki
2007-12-14 22:27                 ` Andy Gospodarek
2007-12-18 19:52               ` Krzysztof Oledzki
2007-12-14 22:47             ` Andy Gospodarek
2007-12-15  4:10               ` Herbert Xu
2007-12-15 15:09                 ` Andy Gospodarek
2007-12-16  2:27                   ` Herbert Xu
2007-12-16  3:17                     ` Andy Gospodarek
2007-12-16  3:23                       ` Herbert Xu
2007-12-18 19:53               ` Krzysztof Oledzki
2007-12-19 14:42                 ` Andy Gospodarek
2008-01-07 17:57                   ` Krzysztof Oledzki
2008-01-07 20:26                     ` Andy Gospodarek
2008-01-07 20:40                       ` Jay Vosburgh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3528.1199738425@death \
    --to=fubar@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andy@greyhouse.net \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=olel@ans.pl \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.