* [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
@ 2013-10-24 3:08 Ding Tianhong
2013-10-24 9:35 ` Veaceslav Falico
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Ding Tianhong @ 2013-10-24 3:08 UTC (permalink / raw)
To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Veaceslav Falico, Netdev
Hi:
The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
for monitor function, maybe it is a better way, I will wait to accept any relay for it.
Thanks for the Veaceslav Falico opinion.
v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
Ding Tianhong (5):
bonding: remove bond read lock for bond_mii_monitor()
bonding: remove bond read lock for bond_alb_monitor()
bonding: remove bond read lock for bond_loadbalance_arp_mon()
bonding: remove bond read lock for bond_activebackup_arp_mon()
bonding: remove bond read lock for bond_3ad_state_machine_handler()
drivers/net/bonding/bond_3ad.c | 9 ++--
drivers/net/bonding/bond_alb.c | 20 ++------
drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
3 files changed, 40 insertions(+), 89 deletions(-)
--
1.8.2.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-24 3:08 [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding Ding Tianhong
@ 2013-10-24 9:35 ` Veaceslav Falico
2013-10-27 20:37 ` David Miller
2013-10-27 22:53 ` Veaceslav Falico
2 siblings, 0 replies; 12+ messages in thread
From: Veaceslav Falico @ 2013-10-24 9:35 UTC (permalink / raw)
To: Ding Tianhong
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Netdev
On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>Hi:
>
>The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>
>now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>
>Thanks for the Veaceslav Falico opinion.
No problem.
Nacked-by: Veaceslav Falico <vfalico@redhat.com>
Now, please, make some normal changelogs, or convince David that I'm an
idiot.
>
>v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>
>Ding Tianhong (5):
> bonding: remove bond read lock for bond_mii_monitor()
> bonding: remove bond read lock for bond_alb_monitor()
> bonding: remove bond read lock for bond_loadbalance_arp_mon()
> bonding: remove bond read lock for bond_activebackup_arp_mon()
> bonding: remove bond read lock for bond_3ad_state_machine_handler()
>
> drivers/net/bonding/bond_3ad.c | 9 ++--
> drivers/net/bonding/bond_alb.c | 20 ++------
> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
> 3 files changed, 40 insertions(+), 89 deletions(-)
>
>--
>1.8.2.1
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-24 3:08 [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding Ding Tianhong
2013-10-24 9:35 ` Veaceslav Falico
@ 2013-10-27 20:37 ` David Miller
2013-10-27 21:10 ` Veaceslav Falico
2013-10-27 22:53 ` Veaceslav Falico
2 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2013-10-27 20:37 UTC (permalink / raw)
To: dingtianhong; +Cc: fubar, andy, nikolay, vfalico, netdev
From: Ding Tianhong <dingtianhong@huawei.com>
Date: Thu, 24 Oct 2013 11:08:35 +0800
> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>
> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>
> Thanks for the Veaceslav Falico opinion.
>
> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
Series applied, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 20:37 ` David Miller
@ 2013-10-27 21:10 ` Veaceslav Falico
2013-10-27 21:44 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Veaceslav Falico @ 2013-10-27 21:10 UTC (permalink / raw)
To: David Miller; +Cc: dingtianhong, fubar, andy, nikolay, netdev
On Sun, Oct 27, 2013 at 04:37:12PM -0400, David Miller wrote:
>From: Ding Tianhong <dingtianhong@huawei.com>
>Date: Thu, 24 Oct 2013 11:08:35 +0800
>
>> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>>
>> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>>
>> Thanks for the Veaceslav Falico opinion.
>>
>> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>
>Series applied, thanks.
David,
Either I've missed something, or I don't understand something. I've
explicitly Nacked it - http://www.spinics.net/lists/netdev/msg254998.html .
All the changelogs for the patches are *the same*, and, while they try to
explain what's done overall, the don't explain what's done per-patch, why
it's done and why is it safe to move those locks around.
And, while the code changes are small, they touch really sensitive stuff -
locking, and not only bond->lock, but also rtnl. Without proper changelogs
it's really hard to review them per-patch, that's why I've asked for them
several times.
Anyway, care to explain what did I miss?
Thank you!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 21:10 ` Veaceslav Falico
@ 2013-10-27 21:44 ` David Miller
2013-10-27 22:10 ` Veaceslav Falico
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2013-10-27 21:44 UTC (permalink / raw)
To: vfalico; +Cc: dingtianhong, fubar, andy, nikolay, netdev
From: Veaceslav Falico <vfalico@redhat.com>
Date: Sun, 27 Oct 2013 22:10:48 +0100
> All the changelogs for the patches are *the same*, and, while they try
> to
> explain what's done overall, the don't explain what's done per-patch,
> why
> it's done and why is it safe to move those locks around.
He did say so, he listed in fact three alternative ways to fix the
locking problem and then explciitly stated which of the three he
choose.
I would have preferred that he did all of this in the initial 0/N
patch posting, but I can't defer forever.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 21:44 ` David Miller
@ 2013-10-27 22:10 ` Veaceslav Falico
2013-10-28 4:00 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Veaceslav Falico @ 2013-10-27 22:10 UTC (permalink / raw)
To: David Miller; +Cc: dingtianhong, fubar, andy, nikolay, netdev
On Sun, Oct 27, 2013 at 05:44:58PM -0400, David Miller wrote:
>From: Veaceslav Falico <vfalico@redhat.com>
>Date: Sun, 27 Oct 2013 22:10:48 +0100
>
>> All the changelogs for the patches are *the same*, and, while they try
>> to
>> explain what's done overall, the don't explain what's done per-patch,
>> why
>> it's done and why is it safe to move those locks around.
>
>He did say so, he listed in fact three alternative ways to fix the
>locking problem and then explciitly stated which of the three he
>choose.
He just rephrased me - http://www.spinics.net/lists/netdev/msg254618.html .
And still the patches didn't say how he did it and why is it safe/good to
do it the way he did it. That's, basically, code without commit messages,
which touches really sensitive parts. As I've said in the above link, it's
really hard to review them this way.
>
>I would have preferred that he did all of this in the initial 0/N
>patch posting, but I can't defer forever.
Maybe I'm too picky. Anyway - understood, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-24 3:08 [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding Ding Tianhong
2013-10-24 9:35 ` Veaceslav Falico
2013-10-27 20:37 ` David Miller
@ 2013-10-27 22:53 ` Veaceslav Falico
2013-10-28 1:15 ` Ding Tianhong
2013-10-28 4:01 ` David Miller
2 siblings, 2 replies; 12+ messages in thread
From: Veaceslav Falico @ 2013-10-27 22:53 UTC (permalink / raw)
To: Ding Tianhong
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Netdev
On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>Hi:
>
>The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>
>now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>
>Thanks for the Veaceslav Falico opinion.
>
>v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>
>Ding Tianhong (5):
> bonding: remove bond read lock for bond_mii_monitor()
> bonding: remove bond read lock for bond_alb_monitor()
> bonding: remove bond read lock for bond_loadbalance_arp_mon()
> bonding: remove bond read lock for bond_activebackup_arp_mon()
This patch introduces a regression by boot-test with active backup mode:
bond_activebackup_arp_mon() is already not holding the bond->lock, however
it might call bond_change_active_slave(), which does (in case of new_active):
912 write_unlock_bh(&bond->curr_slave_lock);
913 read_unlock(&bond->lock);
914
915 call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, bond->dev);
916 if (should_notify_peers)
917 call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
918 bond->dev);
919
920 read_lock(&bond->lock);
921 write_lock_bh(&bond->curr_slave_lock);
so it drops the bond->lock (which wasn't taken previously), and then takes
it (without anyone dropping it afterwards).
I don't know how to fix it - cause a lot of other callers already take it,
and we can't just drop them (we'd race), and we can't remove it here (cause
we can't call notifiers while atomic).
Which begs the question - was this patchset tested at all?
[ 21.796823] =====================================
[ 21.796823] [ BUG: bad unlock balance detected! ]
[ 21.796823] 3.12.0-rc6+ #305 Tainted: G I
[ 21.796823] -------------------------------------
[ 21.796823] kworker/u8:5/59 is trying to release lock (&bond->lock) at:
[ 21.796823] [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] but there are no more locks to release!
[ 21.796823]
[ 21.796823] other info that might help us debug this:
[ 21.796823] 3 locks held by kworker/u8:5/59:
[ 21.796823] #0: (%s#4){.+.+..}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
[ 21.796823] #1: ((&(&bond->arp_work)->work)){+.+...}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
[ 21.796823] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff8169ea05>] rtnl_trylock+0x15/0x20
[ 21.796823]
[ 21.796823] stack backtrace:
[ 21.796823] CPU: 0 PID: 59 Comm: kworker/u8:5 Tainted: G I 3.12.0-rc6+ #305
[ 21.796823] Hardware name: Hewlett-Packard HP xw4600 Workstation/0AA0h, BIOS 786F3 v01.15 08/28/2008
[ 21.796823] Workqueue: bond0 bond_activebackup_arp_mon [bonding]
[ 21.796823] ffffffffa00b6c38 ffff880079ecdae8 ffffffff817aa048 0000000000000002
[ 21.796823] ffff880079ec4b40 ffff880079ecdb18 ffffffff81129af9 00000000001d5400
[ 21.796823] ffff880079ec4b40 ffff880078a36c88 ffff880079ec5440 ffff880079ecdba8
[ 21.796823] Call Trace:
[ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] [<ffffffff817aa048>] dump_stack+0x59/0x81
[ 21.796823] [<ffffffff81129af9>] print_unlock_imbalance_bug+0xf9/0x100
[ 21.796823] [<ffffffff8112d67f>] lock_release_non_nested+0x26f/0x3f0
[ 21.796823] [<ffffffff810f3aa8>] ? sched_clock_cpu+0xb8/0x120
[ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] [<ffffffff8112d892>] __lock_release+0x92/0x1b0
[ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] [<ffffffff8112da0b>] lock_release+0x5b/0x130
[ 21.796823] [<ffffffff817b0553>] _raw_read_unlock+0x23/0x50
[ 21.796823] [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
[ 21.796823] [<ffffffffa00b6df7>] bond_select_active_slave+0xf7/0x1d0 [bonding]
[ 21.796823] [<ffffffffa00b7006>] bond_ab_arp_commit+0x136/0x200 [bonding]
[ 21.796823] [<ffffffffa00b9dd8>] bond_activebackup_arp_mon+0xc8/0xd0 [bonding]
[ 21.796823] [<ffffffff810cff2a>] process_one_work+0x1fa/0x580
[ 21.796823] [<ffffffff810cfeb9>] ? process_one_work+0x189/0x580
[ 21.796823] [<ffffffff810d231f>] worker_thread+0x11f/0x3a0
[ 21.796823] [<ffffffff810d2200>] ? manage_workers+0x170/0x170
[ 21.796823] [<ffffffff810dbdfe>] kthread+0xee/0x100
[ 21.796823] [<ffffffff8112d93b>] ? __lock_release+0x13b/0x1b0
[ 21.796823] [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
[ 21.796823] [<ffffffff817ba3ec>] ret_from_fork+0x7c/0xb0
[ 21.796823] [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
> bonding: remove bond read lock for bond_3ad_state_machine_handler()
>
> drivers/net/bonding/bond_3ad.c | 9 ++--
> drivers/net/bonding/bond_alb.c | 20 ++------
> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
> 3 files changed, 40 insertions(+), 89 deletions(-)
>
>--
>1.8.2.1
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 22:53 ` Veaceslav Falico
@ 2013-10-28 1:15 ` Ding Tianhong
2013-10-28 1:34 ` Veaceslav Falico
2013-10-28 4:01 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Ding Tianhong @ 2013-10-28 1:15 UTC (permalink / raw)
To: Veaceslav Falico
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Netdev
On 2013/10/28 6:53, Veaceslav Falico wrote:
> On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>> Hi:
>>
>> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>>
>> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>>
>> Thanks for the Veaceslav Falico opinion.
>>
>> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>>
>> Ding Tianhong (5):
>> bonding: remove bond read lock for bond_mii_monitor()
>> bonding: remove bond read lock for bond_alb_monitor()
>> bonding: remove bond read lock for bond_loadbalance_arp_mon()
>> bonding: remove bond read lock for bond_activebackup_arp_mon()
>
> This patch introduces a regression by boot-test with active backup mode:
>
> bond_activebackup_arp_mon() is already not holding the bond->lock, however
> it might call bond_change_active_slave(), which does (in case of new_active):
>
> 912 write_unlock_bh(&bond->curr_slave_lock);
> 913 read_unlock(&bond->lock);
> 914 915 call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, bond->dev);
> 916 if (should_notify_peers)
> 917 call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
> 918 bond->dev);
> 919 920 read_lock(&bond->lock);
> 921 write_lock_bh(&bond->curr_slave_lock);
>
> so it drops the bond->lock (which wasn't taken previously), and then takes
> it (without anyone dropping it afterwards).
>
> I don't know how to fix it - cause a lot of other callers already take it,
> and we can't just drop them (we'd race), and we can't remove it here (cause
> we can't call notifiers while atomic).
>
> Which begs the question - was this patchset tested at all?
>
> [ 21.796823] =====================================
> [ 21.796823] [ BUG: bad unlock balance detected! ]
> [ 21.796823] 3.12.0-rc6+ #305 Tainted: G I [ 21.796823] -------------------------------------
> [ 21.796823] kworker/u8:5/59 is trying to release lock (&bond->lock) at:
> [ 21.796823] [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] but there are no more locks to release!
> [ 21.796823] [ 21.796823] other info that might help us debug this:
> [ 21.796823] 3 locks held by kworker/u8:5/59:
> [ 21.796823] #0: (%s#4){.+.+..}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
> [ 21.796823] #1: ((&(&bond->arp_work)->work)){+.+...}, at: [<ffffffff810cfeb9>] process_one_work+0x189/0x580
> [ 21.796823] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff8169ea05>] rtnl_trylock+0x15/0x20
> [ 21.796823] [ 21.796823] stack backtrace:
> [ 21.796823] CPU: 0 PID: 59 Comm: kworker/u8:5 Tainted: G I 3.12.0-rc6+ #305
> [ 21.796823] Hardware name: Hewlett-Packard HP xw4600 Workstation/0AA0h, BIOS 786F3 v01.15 08/28/2008
> [ 21.796823] Workqueue: bond0 bond_activebackup_arp_mon [bonding]
> [ 21.796823] ffffffffa00b6c38 ffff880079ecdae8 ffffffff817aa048 0000000000000002
> [ 21.796823] ffff880079ec4b40 ffff880079ecdb18 ffffffff81129af9 00000000001d5400
> [ 21.796823] ffff880079ec4b40 ffff880078a36c88 ffff880079ec5440 ffff880079ecdba8
> [ 21.796823] Call Trace:
> [ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] [<ffffffff817aa048>] dump_stack+0x59/0x81
> [ 21.796823] [<ffffffff81129af9>] print_unlock_imbalance_bug+0xf9/0x100
> [ 21.796823] [<ffffffff8112d67f>] lock_release_non_nested+0x26f/0x3f0
> [ 21.796823] [<ffffffff810f3aa8>] ? sched_clock_cpu+0xb8/0x120
> [ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] [<ffffffff8112d892>] __lock_release+0x92/0x1b0
> [ 21.796823] [<ffffffffa00b6c38>] ? bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] [<ffffffff8112da0b>] lock_release+0x5b/0x130
> [ 21.796823] [<ffffffff817b0553>] _raw_read_unlock+0x23/0x50
> [ 21.796823] [<ffffffffa00b6c38>] bond_change_active_slave+0x2c8/0x390 [bonding]
> [ 21.796823] [<ffffffffa00b6df7>] bond_select_active_slave+0xf7/0x1d0 [bonding]
> [ 21.796823] [<ffffffffa00b7006>] bond_ab_arp_commit+0x136/0x200 [bonding]
> [ 21.796823] [<ffffffffa00b9dd8>] bond_activebackup_arp_mon+0xc8/0xd0 [bonding]
> [ 21.796823] [<ffffffff810cff2a>] process_one_work+0x1fa/0x580
> [ 21.796823] [<ffffffff810cfeb9>] ? process_one_work+0x189/0x580
> [ 21.796823] [<ffffffff810d231f>] worker_thread+0x11f/0x3a0
> [ 21.796823] [<ffffffff810d2200>] ? manage_workers+0x170/0x170
> [ 21.796823] [<ffffffff810dbdfe>] kthread+0xee/0x100
> [ 21.796823] [<ffffffff8112d93b>] ? __lock_release+0x13b/0x1b0
> [ 21.796823] [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
> [ 21.796823] [<ffffffff817ba3ec>] ret_from_fork+0x7c/0xb0
> [ 21.796823] [<ffffffff810dbd10>] ? __init_kthread_worker+0x70/0x70
>
>
>> bonding: remove bond read lock for bond_3ad_state_machine_handler()
>>
>> drivers/net/bonding/bond_3ad.c | 9 ++--
>> drivers/net/bonding/bond_alb.c | 20 ++------
>> drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
>> 3 files changed, 40 insertions(+), 89 deletions(-)
>>
>> --
>> 1.8.2.1
>>
>>
Hi David:
yes, exactly I miss it and make a mistake, the bond_select_active_slave is still have the protect problem and
need to be processed, I miss it, sorry, I will send a patch to fix the bug soon.
Hi Veaceslav:
sorry about the commit, I will pay more attention to the commit and test, thanks for your advise and report the bug,
I have to admin that I was too careless.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> .
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-28 1:15 ` Ding Tianhong
@ 2013-10-28 1:34 ` Veaceslav Falico
2013-10-28 3:02 ` Ding Tianhong
0 siblings, 1 reply; 12+ messages in thread
From: Veaceslav Falico @ 2013-10-28 1:34 UTC (permalink / raw)
To: Ding Tianhong
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Netdev
On Mon, Oct 28, 2013 at 09:15:52AM +0800, Ding Tianhong wrote:
>On 2013/10/28 6:53, Veaceslav Falico wrote:
>> On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>>> Hi:
>>>
>>> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>>> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>>> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>>> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>>>
>>> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>>> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>>> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>>> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>>>
>>> Thanks for the Veaceslav Falico opinion.
>>>
>>> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>>>
>>> Ding Tianhong (5):
>>> bonding: remove bond read lock for bond_mii_monitor()
>>> bonding: remove bond read lock for bond_alb_monitor()
>>> bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>> bonding: remove bond read lock for bond_activebackup_arp_mon()
>>
>> This patch introduces a regression by boot-test with active backup mode:
>>
>> bond_activebackup_arp_mon() is already not holding the bond->lock, however
>> it might call bond_change_active_slave(), which does (in case of new_active):
>>
>> 912 write_unlock_bh(&bond->curr_slave_lock);
>> 913 read_unlock(&bond->lock);
>> 914 915 call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, bond->dev);
>> 916 if (should_notify_peers)
>> 917 call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
>> 918 bond->dev);
>> 919 920 read_lock(&bond->lock);
>> 921 write_lock_bh(&bond->curr_slave_lock);
>>
>> so it drops the bond->lock (which wasn't taken previously), and then takes
>> it (without anyone dropping it afterwards).
>>
>> I don't know how to fix it - cause a lot of other callers already take it,
>> and we can't just drop them (we'd race), and we can't remove it here (cause
>> we can't call notifiers while atomic).
>>
>> Which begs the question - was this patchset tested at all?
>>
>> [ 21.796823] =====================================
>> [ 21.796823] [ BUG: bad unlock balance detected! ]
... snip ...
>
>Hi David:
>yes, exactly I miss it and make a mistake, the bond_select_active_slave is still have the protect problem and
>need to be processed, I miss it, sorry, I will send a patch to fix the bug soon.
>
>Hi Veaceslav:
>sorry about the commit, I will pay more attention to the commit and test, thanks for your advise and report the bug,
>I have to admin that I was too careless.
I'll ask you once again, even though it seems that my NACK doesn't block
the patchset - try writing commit messages that actually describe why and
how you do it.
It's, actually, not only for the reviewers - it's also really good for you
- cause while writing the commit log you also understand a lot more what
are you doing, and might spot some corner cases (like this one).
Sorry for being negative, however it costs me *much* more time to review
patches without proper commit messages. I've done it once, twice, several
times more - but that's it, I refuse to spend my time on your skipped
homework.
It might also help to split patches into really small steps - as in - do
only one thing at a time, and describe it. This will help evade bugs *a
lot*. It also helps people who'll bisect it, bugfix it and review it -
because every patch will be a small, well-documented change, instead of a
chunk of code with a description 'lets remove bond->lock'.
Thank you and hope that helps.
>
>
>
>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-28 1:34 ` Veaceslav Falico
@ 2013-10-28 3:02 ` Ding Tianhong
0 siblings, 0 replies; 12+ messages in thread
From: Ding Tianhong @ 2013-10-28 3:02 UTC (permalink / raw)
To: Veaceslav Falico
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
Nikolay Aleksandrov, Netdev
On 2013/10/28 9:34, Veaceslav Falico wrote:
> On Mon, Oct 28, 2013 at 09:15:52AM +0800, Ding Tianhong wrote:
>> On 2013/10/28 6:53, Veaceslav Falico wrote:
>>> On Thu, Oct 24, 2013 at 11:08:35AM +0800, Ding Tianhong wrote:
>>>> Hi:
>>>>
>>>> The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
>>>> which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
>>>> but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
>>>> in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.
>>>>
>>>> now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
>>>> way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
>>>> to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
>>>> for monitor function, maybe it is a better way, I will wait to accept any relay for it.
>>>>
>>>> Thanks for the Veaceslav Falico opinion.
>>>>
>>>> v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.
>>>>
>>>> Ding Tianhong (5):
>>>> bonding: remove bond read lock for bond_mii_monitor()
>>>> bonding: remove bond read lock for bond_alb_monitor()
>>>> bonding: remove bond read lock for bond_loadbalance_arp_mon()
>>>> bonding: remove bond read lock for bond_activebackup_arp_mon()
>>>
>>> This patch introduces a regression by boot-test with active backup mode:
>>>
>>> bond_activebackup_arp_mon() is already not holding the bond->lock, however
>>> it might call bond_change_active_slave(), which does (in case of new_active):
>>>
>>> 912 write_unlock_bh(&bond->curr_slave_lock);
>>> 913 read_unlock(&bond->lock);
>>> 914 915 call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, bond->dev);
>>> 916 if (should_notify_peers)
>>> 917 call_netdevice_notifiers(NETDEV_NOTIFY_PEERS,
>>> 918 bond->dev);
>>> 919 920 read_lock(&bond->lock);
>>> 921 write_lock_bh(&bond->curr_slave_lock);
>>>
>>> so it drops the bond->lock (which wasn't taken previously), and then takes
>>> it (without anyone dropping it afterwards).
>>>
>>> I don't know how to fix it - cause a lot of other callers already take it,
>>> and we can't just drop them (we'd race), and we can't remove it here (cause
>>> we can't call notifiers while atomic).
>>>
>>> Which begs the question - was this patchset tested at all?
>>>
>>> [ 21.796823] =====================================
>>> [ 21.796823] [ BUG: bad unlock balance detected! ]
> ... snip ...
>>
>> Hi David:
>> yes, exactly I miss it and make a mistake, the bond_select_active_slave is still have the protect problem and
>> need to be processed, I miss it, sorry, I will send a patch to fix the bug soon.
>>
>> Hi Veaceslav:
>> sorry about the commit, I will pay more attention to the commit and test, thanks for your advise and report the bug,
>> I have to admin that I was too careless.
>
> I'll ask you once again, even though it seems that my NACK doesn't block
> the patchset - try writing commit messages that actually describe why and
> how you do it.
>
> It's, actually, not only for the reviewers - it's also really good for you
> - cause while writing the commit log you also understand a lot more what
> are you doing, and might spot some corner cases (like this one).
>
> Sorry for being negative, however it costs me *much* more time to review
> patches without proper commit messages. I've done it once, twice, several
> times more - but that's it, I refuse to spend my time on your skipped
> homework.
>
> It might also help to split patches into really small steps - as in - do
> only one thing at a time, and describe it. This will help evade bugs *a
> lot*. It also helps people who'll bisect it, bugfix it and review it -
> because every patch will be a small, well-documented change, instead of a
> chunk of code with a description 'lets remove bond->lock'.
>
> Thank you and hope that helps.
>
sincere receiving all opinions.
thanks.
Regards.
Ding
>>
>>
>>
>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>>
>>
>>
>
> .
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 22:10 ` Veaceslav Falico
@ 2013-10-28 4:00 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2013-10-28 4:00 UTC (permalink / raw)
To: vfalico; +Cc: dingtianhong, fubar, andy, nikolay, netdev
From: Veaceslav Falico <vfalico@redhat.com>
Date: Sun, 27 Oct 2013 23:10:27 +0100
> On Sun, Oct 27, 2013 at 05:44:58PM -0400, David Miller wrote:
>>I would have preferred that he did all of this in the initial 0/N
>>patch posting, but I can't defer forever.
>
> Maybe I'm too picky. Anyway - understood, thanks.
I'm happy to revert his changes if you find real problems with them,
but I'd rather you two work more closely together in the future so I
don't have to break the deadlock like I did this time.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
2013-10-27 22:53 ` Veaceslav Falico
2013-10-28 1:15 ` Ding Tianhong
@ 2013-10-28 4:01 ` David Miller
1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2013-10-28 4:01 UTC (permalink / raw)
To: vfalico; +Cc: dingtianhong, fubar, andy, nikolay, netdev
From: Veaceslav Falico <vfalico@redhat.com>
Date: Sun, 27 Oct 2013 23:53:17 +0100
> This patch introduces a regression by boot-test with active backup
> mode:
Ok I'm reverting.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-10-28 4:01 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-24 3:08 [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding Ding Tianhong
2013-10-24 9:35 ` Veaceslav Falico
2013-10-27 20:37 ` David Miller
2013-10-27 21:10 ` Veaceslav Falico
2013-10-27 21:44 ` David Miller
2013-10-27 22:10 ` Veaceslav Falico
2013-10-28 4:00 ` David Miller
2013-10-27 22:53 ` Veaceslav Falico
2013-10-28 1:15 ` Ding Tianhong
2013-10-28 1:34 ` Veaceslav Falico
2013-10-28 3:02 ` Ding Tianhong
2013-10-28 4:01 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).