* [PATCH 0/2] bonding: couple of bug fixes @ 2013-06-06 11:55 nikolay 2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: nikolay @ 2013-06-06 11:55 UTC (permalink / raw) To: netdev; +Cc: andy, fubar, davem From: Nikolay Aleksandrov <nikolay@redhat.com> Hello, Patch 01 resets the master's mac if the first enslave fails and the slave's mac was set to the master's prior. Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend param and was silently not working if the value was >127) so it can go up to 255 (as per documentation). It also fixes two tricky race conditions which were hidden because of the previous bug. Best regards, Nikolay Aleksandrov Nikolay Aleksandrov (2): bonding: reset master mac on first enslave failure bonding: fix igmp_retrans type and two related races drivers/net/bonding/bond_main.c | 19 +++++++++++++++---- drivers/net/bonding/bonding.h | 2 +- 2 files changed, 16 insertions(+), 5 deletions(-) -- 1.8.1.4 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] bonding: reset master mac on first enslave failure 2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay @ 2013-06-06 11:55 ` nikolay 2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay 2013-06-11 9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller 2 siblings, 0 replies; 8+ messages in thread From: nikolay @ 2013-06-06 11:55 UTC (permalink / raw) To: netdev; +Cc: andy, fubar, davem From: Nikolay Aleksandrov <nikolay@redhat.com> If the bond device is supposed to get the first slave's MAC address and the first enslavement fails then we need to reset the master's MAC otherwise it will stay the same as the failed slave device. We do it after err_undo_flags since that is the first place where the MAC can be changed and we check if it should've been the first slave and if the bond's MAC was set to it because that err place is used by multiple locations prior to changing the master's MAC address. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> --- drivers/net/bonding/bond_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 29b846c..473633a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1957,6 +1957,10 @@ err_free: err_undo_flags: bond_compute_features(bond); + /* Enslave of first slave has failed and we need to fix master's mac */ + if (bond->slave_cnt == 0 && + ether_addr_equal(bond_dev->dev_addr, slave_dev->dev_addr)) + eth_hw_addr_random(bond_dev); return res; } -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] bonding: fix igmp_retrans type and two related races 2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay 2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay @ 2013-06-06 11:55 ` nikolay 2013-06-07 1:00 ` Jay Vosburgh 2013-06-11 9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller 2 siblings, 1 reply; 8+ messages in thread From: nikolay @ 2013-06-06 11:55 UTC (permalink / raw) To: netdev; +Cc: andy, fubar, davem From: Nikolay Aleksandrov <nikolay@redhat.com> First the type of igmp_retrans (which is the actual counter of igmp_resend parameter) is changed to u8 to be able to store values up to 255 (as per documentation). There are two races that were hidden there and which are easy to trigger after the previous fix, the first is between bond_resend_igmp_join_requests and bond_change_active_slave where igmp_retrans is set and can be altered by the periodic. The second race condition is between multiple running instances of the periodic (upon execution it can be scheduled again for immediate execution which can cause the counter to go < 0 which in the unsigned case leads to unnecessary igmp retransmissions). Since in bond_change_active_slave bond->lock is held for reading and curr_slave_lock for writing, we use curr_slave_lock for mutual exclusion. We can't drop them as there're cases where RTNL is not held when bond_change_active_slave is called. RCU is unlocked in bond_resend_igmp_join_requests before getting curr_slave_lock since we don't need it there and it's pointless to delay. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> --- drivers/net/bonding/bond_main.c | 15 +++++++++++---- drivers/net/bonding/bonding.h | 2 +- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 473633a..02d9ae7 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) struct net_device *bond_dev, *vlan_dev, *upper_dev; struct vlan_entry *vlan; - rcu_read_lock(); read_lock(&bond->lock); + rcu_read_lock(); bond_dev = bond->dev; @@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) if (vlan_dev) __bond_resend_igmp_join_requests(vlan_dev); } + rcu_read_unlock(); - if (--bond->igmp_retrans > 0) + /* We use curr_slave_lock to protect against concurrent access to + * igmp_retrans from multiple running instances of this function and + * bond_change_active_slave + */ + write_lock_bh(&bond->curr_slave_lock); + if (bond->igmp_retrans > 1) { + bond->igmp_retrans--; queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5); - + } + write_unlock_bh(&bond->curr_slave_lock); read_unlock(&bond->lock); - rcu_read_unlock(); } static void bond_resend_igmp_join_requests_delayed(struct work_struct *work) diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index 2baec24..f989e15 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -225,7 +225,7 @@ struct bonding { rwlock_t curr_slave_lock; u8 send_peer_notif; s8 setup_by_slave; - s8 igmp_retrans; + u8 igmp_retrans; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; char proc_file_name[IFNAMSIZ]; -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races 2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay @ 2013-06-07 1:00 ` Jay Vosburgh 2013-06-07 9:37 ` Nikolay Aleksandrov 0 siblings, 1 reply; 8+ messages in thread From: Jay Vosburgh @ 2013-06-07 1:00 UTC (permalink / raw) To: nikolay; +Cc: netdev, andy, davem nikolay@redhat.com wrote: >From: Nikolay Aleksandrov <nikolay@redhat.com> > >First the type of igmp_retrans (which is the actual counter of >igmp_resend parameter) is changed to u8 to be able to store values up >to 255 (as per documentation). There are two races that were hidden >there and which are easy to trigger after the previous fix, the first is >between bond_resend_igmp_join_requests and bond_change_active_slave >where igmp_retrans is set and can be altered by the periodic. The second >race condition is between multiple running instances of the periodic >(upon execution it can be scheduled again for immediate execution which >can cause the counter to go < 0 which in the unsigned case leads to >unnecessary igmp retransmissions). >Since in bond_change_active_slave bond->lock is held for reading and >curr_slave_lock for writing, we use curr_slave_lock for mutual >exclusion. We can't drop them as there're cases where RTNL is not held >when bond_change_active_slave is called. RCU is unlocked in >bond_resend_igmp_join_requests before getting curr_slave_lock since we >don't need it there and it's pointless to delay. My first thought is that it would be much simpler to change the limit in the documentation and code from 255 to 127 and be done with it. I'm skeptical that anybody uses values for igmp_retrans even as high as 10, much less 100 (which would take 20 seconds to complete at 5 per second). That said, this is technically correct, although I have one question, below. >Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> >--- > drivers/net/bonding/bond_main.c | 15 +++++++++++---- > drivers/net/bonding/bonding.h | 2 +- > 2 files changed, 12 insertions(+), 5 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 473633a..02d9ae7 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) > struct net_device *bond_dev, *vlan_dev, *upper_dev; > struct vlan_entry *vlan; > >- rcu_read_lock(); > read_lock(&bond->lock); >+ rcu_read_lock(); > > bond_dev = bond->dev; > >@@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) > if (vlan_dev) > __bond_resend_igmp_join_requests(vlan_dev); > } >+ rcu_read_unlock(); > >- if (--bond->igmp_retrans > 0) >+ /* We use curr_slave_lock to protect against concurrent access to >+ * igmp_retrans from multiple running instances of this function and >+ * bond_change_active_slave >+ */ >+ write_lock_bh(&bond->curr_slave_lock); >+ if (bond->igmp_retrans > 1) { >+ bond->igmp_retrans--; > queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5); Why split out the -- from the comparison? -J >- >+ } >+ write_unlock_bh(&bond->curr_slave_lock); > read_unlock(&bond->lock); >- rcu_read_unlock(); > } > > static void bond_resend_igmp_join_requests_delayed(struct work_struct *work) >diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h >index 2baec24..f989e15 100644 >--- a/drivers/net/bonding/bonding.h >+++ b/drivers/net/bonding/bonding.h >@@ -225,7 +225,7 @@ struct bonding { > rwlock_t curr_slave_lock; > u8 send_peer_notif; > s8 setup_by_slave; >- s8 igmp_retrans; >+ u8 igmp_retrans; > #ifdef CONFIG_PROC_FS > struct proc_dir_entry *proc_entry; > char proc_file_name[IFNAMSIZ]; >-- >1.8.1.4 > --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races 2013-06-07 1:00 ` Jay Vosburgh @ 2013-06-07 9:37 ` Nikolay Aleksandrov 0 siblings, 0 replies; 8+ messages in thread From: Nikolay Aleksandrov @ 2013-06-07 9:37 UTC (permalink / raw) To: Jay Vosburgh; +Cc: netdev, andy, davem On 07/06/13 03:00, Jay Vosburgh wrote: > nikolay@redhat.com wrote: > >> From: Nikolay Aleksandrov <nikolay@redhat.com> >> >> First the type of igmp_retrans (which is the actual counter of >> igmp_resend parameter) is changed to u8 to be able to store values up >> to 255 (as per documentation). There are two races that were hidden >> there and which are easy to trigger after the previous fix, the first is >> between bond_resend_igmp_join_requests and bond_change_active_slave >> where igmp_retrans is set and can be altered by the periodic. The second >> race condition is between multiple running instances of the periodic >> (upon execution it can be scheduled again for immediate execution which >> can cause the counter to go < 0 which in the unsigned case leads to >> unnecessary igmp retransmissions). >> Since in bond_change_active_slave bond->lock is held for reading and >> curr_slave_lock for writing, we use curr_slave_lock for mutual >> exclusion. We can't drop them as there're cases where RTNL is not held >> when bond_change_active_slave is called. RCU is unlocked in >> bond_resend_igmp_join_requests before getting curr_slave_lock since we >> don't need it there and it's pointless to delay. > > My first thought is that it would be much simpler to change the > limit in the documentation and code from 255 to 127 and be done with it. > I'm skeptical that anybody uses values for igmp_retrans even as high as > 10, much less 100 (which would take 20 seconds to complete at 5 per > second). > > That said, this is technically correct, although I have one > question, below. > Yes, I was thinking the same thing at first and even discussed it with Andy. Although the race between bond_resend_igmp_join_requests and bond_change_active_slave will still be valid. >> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> >> --- >> drivers/net/bonding/bond_main.c | 15 +++++++++++---- >> drivers/net/bonding/bonding.h | 2 +- >> 2 files changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> index 473633a..02d9ae7 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) >> struct net_device *bond_dev, *vlan_dev, *upper_dev; >> struct vlan_entry *vlan; >> >> - rcu_read_lock(); >> read_lock(&bond->lock); >> + rcu_read_lock(); >> >> bond_dev = bond->dev; >> >> @@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond) >> if (vlan_dev) >> __bond_resend_igmp_join_requests(vlan_dev); >> } >> + rcu_read_unlock(); >> >> - if (--bond->igmp_retrans > 0) >> + /* We use curr_slave_lock to protect against concurrent access to >> + * igmp_retrans from multiple running instances of this function and >> + * bond_change_active_slave >> + */ >> + write_lock_bh(&bond->curr_slave_lock); >> + if (bond->igmp_retrans > 1) { >> + bond->igmp_retrans--; >> queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5); > > Why split out the -- from the comparison? > > -J This one was very tricky, because we can have more than 2 instances running concurrently and if we unconditionally decrement the value it can still drop < 0. Example with 3 instances running and igmp_retrans == 1 (with check bond->igmp_retrans-- > 1): f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0 f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255 f3 does the unnecessary retransmissions. I also have an interesting solution with cmpxchg without curr_slave_lock but this is more straightforward and since this is not a fast path I think it's preferrable. Nik >> - >> + } >> + write_unlock_bh(&bond->curr_slave_lock); >> read_unlock(&bond->lock); >> - rcu_read_unlock(); >> } >> >> static void bond_resend_igmp_join_requests_delayed(struct work_struct *work) >> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h >> index 2baec24..f989e15 100644 >> --- a/drivers/net/bonding/bonding.h >> +++ b/drivers/net/bonding/bonding.h >> @@ -225,7 +225,7 @@ struct bonding { >> rwlock_t curr_slave_lock; >> u8 send_peer_notif; >> s8 setup_by_slave; >> - s8 igmp_retrans; >> + u8 igmp_retrans; >> #ifdef CONFIG_PROC_FS >> struct proc_dir_entry *proc_entry; >> char proc_file_name[IFNAMSIZ]; >> -- >> 1.8.1.4 >> > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] bonding: couple of bug fixes 2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay 2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay 2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay @ 2013-06-11 9:45 ` David Miller 2013-06-11 16:42 ` Jay Vosburgh 2 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2013-06-11 9:45 UTC (permalink / raw) To: nikolay; +Cc: netdev, andy, fubar From: nikolay@redhat.com Date: Thu, 6 Jun 2013 13:55:00 +0200 > From: Nikolay Aleksandrov <nikolay@redhat.com> > > Hello, > Patch 01 resets the master's mac if the first enslave fails and the slave's > mac was set to the master's prior. > Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend > param and was silently not working if the value was >127) so it can go up to 255 > (as per documentation). It also fixes two tricky race conditions which were > hidden because of the previous bug. There was some minor back and forth between Jay and Nikolay on this set, but I'd like to see some ACKs before applying this stuff. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] bonding: couple of bug fixes 2013-06-11 9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller @ 2013-06-11 16:42 ` Jay Vosburgh 2013-06-11 16:50 ` Nikolay Aleksandrov 0 siblings, 1 reply; 8+ messages in thread From: Jay Vosburgh @ 2013-06-11 16:42 UTC (permalink / raw) To: David Miller; +Cc: nikolay, netdev, andy David Miller <davem@davemloft.net> wrote: >From: nikolay@redhat.com >Date: Thu, 6 Jun 2013 13:55:00 +0200 > >> From: Nikolay Aleksandrov <nikolay@redhat.com> >> >> Hello, >> Patch 01 resets the master's mac if the first enslave fails and the slave's >> mac was set to the master's prior. >> Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend >> param and was silently not working if the value was >127) so it can go up to 255 >> (as per documentation). It also fixes two tricky race conditions which were >> hidden because of the previous bug. > >There was some minor back and forth between Jay and Nikolay on this set, >but I'd like to see some ACKs before applying this stuff. I'm fine with the code changes, although I think a description of the three way race that Nikolay described in his last email should be in the log message, perhaps along with a brief description of what conditions would trigger the problem. With the above caveat: Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] bonding: couple of bug fixes 2013-06-11 16:42 ` Jay Vosburgh @ 2013-06-11 16:50 ` Nikolay Aleksandrov 0 siblings, 0 replies; 8+ messages in thread From: Nikolay Aleksandrov @ 2013-06-11 16:50 UTC (permalink / raw) To: Jay Vosburgh; +Cc: David Miller, netdev, andy On 11/06/13 18:42, Jay Vosburgh wrote: > David Miller <davem@davemloft.net> wrote: > >> From: nikolay@redhat.com >> Date: Thu, 6 Jun 2013 13:55:00 +0200 >> >>> From: Nikolay Aleksandrov <nikolay@redhat.com> >>> >>> Hello, >>> Patch 01 resets the master's mac if the first enslave fails and the slave's >>> mac was set to the master's prior. >>> Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend >>> param and was silently not working if the value was >127) so it can go up to 255 >>> (as per documentation). It also fixes two tricky race conditions which were >>> hidden because of the previous bug. >> >> There was some minor back and forth between Jay and Nikolay on this set, >> but I'd like to see some ACKs before applying this stuff. > > I'm fine with the code changes, although I think a description > of the three way race that Nikolay described in his last email should be > in the log message, perhaps along with a brief description of what > conditions would trigger the problem. > > With the above caveat: > > Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for the review Jay. Dave how would you like me to handle this ? Should I resubmit a v2 with updated commit message or just a reply with new commit message will suffice ? Nik ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-06-11 17:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay 2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay 2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay 2013-06-07 1:00 ` Jay Vosburgh 2013-06-07 9:37 ` Nikolay Aleksandrov 2013-06-11 9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller 2013-06-11 16:42 ` Jay Vosburgh 2013-06-11 16:50 ` Nikolay Aleksandrov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).