netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@redhat.com>
To: Jay Vosburgh <fubar@us.ibm.com>
Cc: netdev@vger.kernel.org, andy@greyhouse.net, davem@davemloft.net
Subject: Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races
Date: Fri, 07 Jun 2013 11:37:30 +0200	[thread overview]
Message-ID: <51B1A9DA.9020600@redhat.com> (raw)
In-Reply-To: <4475.1370566850@death.nxdomain>

On 07/06/13 03:00, Jay Vosburgh wrote:
> nikolay@redhat.com wrote:
> 
>> From: Nikolay Aleksandrov <nikolay@redhat.com>
>>
>> First the type of igmp_retrans (which is the actual counter of
>> igmp_resend parameter) is changed to u8 to be able to store values up
>> to 255 (as per documentation). There are two races that were hidden
>> there and which are easy to trigger after the previous fix, the first is
>> between bond_resend_igmp_join_requests and bond_change_active_slave
>> where igmp_retrans is set and can be altered by the periodic. The second
>> race condition is between multiple running instances of the periodic
>> (upon execution it can be scheduled again for immediate execution which
>> can cause the counter to go < 0 which in the unsigned case leads to
>> unnecessary igmp retransmissions).
>> Since in bond_change_active_slave bond->lock is held for reading and
>> curr_slave_lock for writing, we use curr_slave_lock for mutual
>> exclusion. We can't drop them as there're cases where RTNL is not held
>> when bond_change_active_slave is called. RCU is unlocked in
>> bond_resend_igmp_join_requests before getting curr_slave_lock since we
>> don't need it there and it's pointless to delay.
> 
> 	My first thought is that it would be much simpler to change the
> limit in the documentation and code from 255 to 127 and be done with it.
> I'm skeptical that anybody uses values for igmp_retrans even as high as
> 10, much less 100 (which would take 20 seconds to complete at 5 per
> second).
> 
> 	That said, this is technically correct, although I have one
> question, below.
> 
Yes, I was thinking the same thing at first and even discussed it with
Andy. Although the race between bond_resend_igmp_join_requests and
bond_change_active_slave will still be valid.

>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> drivers/net/bonding/bond_main.c | 15 +++++++++++----
>> drivers/net/bonding/bonding.h   |  2 +-
>> 2 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 473633a..02d9ae7 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 	struct net_device *bond_dev, *vlan_dev, *upper_dev;
>> 	struct vlan_entry *vlan;
>>
>> -	rcu_read_lock();
>> 	read_lock(&bond->lock);
>> +	rcu_read_lock();
>>
>> 	bond_dev = bond->dev;
>>
>> @@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 		if (vlan_dev)
>> 			__bond_resend_igmp_join_requests(vlan_dev);
>> 	}
>> +	rcu_read_unlock();
>>
>> -	if (--bond->igmp_retrans > 0)
>> +	/* We use curr_slave_lock to protect against concurrent access to
>> +	 * igmp_retrans from multiple running instances of this function and
>> +	 * bond_change_active_slave
>> +	 */
>> +	write_lock_bh(&bond->curr_slave_lock);
>> +	if (bond->igmp_retrans > 1) {
>> +		bond->igmp_retrans--;
>> 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
> 
> 	Why split out the -- from the comparison?
> 
> 	-J
This one was very tricky, because we can have more than 2 instances
running concurrently and if we unconditionally decrement the value it
can still drop < 0. Example with 3 instances running and igmp_retrans ==
1 (with check bond->igmp_retrans-- > 1):
f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
f3 does the unnecessary retransmissions.

I also have an interesting solution with cmpxchg without curr_slave_lock
but this is more straightforward and since this is not a fast path I
think it's preferrable.

Nik
>> -
>> +	}
>> +	write_unlock_bh(&bond->curr_slave_lock);
>> 	read_unlock(&bond->lock);
>> -	rcu_read_unlock();
>> }
>>
>> static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 2baec24..f989e15 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -225,7 +225,7 @@ struct bonding {
>> 	rwlock_t curr_slave_lock;
>> 	u8	 send_peer_notif;
>> 	s8	 setup_by_slave;
>> -	s8       igmp_retrans;
>> +	u8       igmp_retrans;
>> #ifdef CONFIG_PROC_FS
>> 	struct   proc_dir_entry *proc_entry;
>> 	char     proc_file_name[IFNAMSIZ];
>> -- 
>> 1.8.1.4
>>
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

  reply	other threads:[~2013-06-07  9:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay
2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay
2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay
2013-06-07  1:00   ` Jay Vosburgh
2013-06-07  9:37     ` Nikolay Aleksandrov [this message]
2013-06-11  9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller
2013-06-11 16:42   ` Jay Vosburgh
2013-06-11 16:50     ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B1A9DA.9020600@redhat.com \
    --to=nikolay@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=fubar@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).