All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Aleksandrov <nikolay@redhat.com>
To: Jay Vosburgh <fubar@us.ibm.com>
Cc: netdev@vger.kernel.org, andy@greyhouse.net, davem@davemloft.net
Subject: Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races
Date: Fri, 07 Jun 2013 11:37:30 +0200	[thread overview]
Message-ID: <51B1A9DA.9020600@redhat.com> (raw)
In-Reply-To: <4475.1370566850@death.nxdomain>

On 07/06/13 03:00, Jay Vosburgh wrote:
> nikolay@redhat.com wrote:
> 
>> From: Nikolay Aleksandrov <nikolay@redhat.com>
>>
>> First the type of igmp_retrans (which is the actual counter of
>> igmp_resend parameter) is changed to u8 to be able to store values up
>> to 255 (as per documentation). There are two races that were hidden
>> there and which are easy to trigger after the previous fix, the first is
>> between bond_resend_igmp_join_requests and bond_change_active_slave
>> where igmp_retrans is set and can be altered by the periodic. The second
>> race condition is between multiple running instances of the periodic
>> (upon execution it can be scheduled again for immediate execution which
>> can cause the counter to go < 0 which in the unsigned case leads to
>> unnecessary igmp retransmissions).
>> Since in bond_change_active_slave bond->lock is held for reading and
>> curr_slave_lock for writing, we use curr_slave_lock for mutual
>> exclusion. We can't drop them as there're cases where RTNL is not held
>> when bond_change_active_slave is called. RCU is unlocked in
>> bond_resend_igmp_join_requests before getting curr_slave_lock since we
>> don't need it there and it's pointless to delay.
> 
> 	My first thought is that it would be much simpler to change the
> limit in the documentation and code from 255 to 127 and be done with it.
> I'm skeptical that anybody uses values for igmp_retrans even as high as
> 10, much less 100 (which would take 20 seconds to complete at 5 per
> second).
> 
> 	That said, this is technically correct, although I have one
> question, below.
> 
Yes, I was thinking the same thing at first and even discussed it with
Andy. Although the race between bond_resend_igmp_join_requests and
bond_change_active_slave will still be valid.

>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> drivers/net/bonding/bond_main.c | 15 +++++++++++----
>> drivers/net/bonding/bonding.h   |  2 +-
>> 2 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 473633a..02d9ae7 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 	struct net_device *bond_dev, *vlan_dev, *upper_dev;
>> 	struct vlan_entry *vlan;
>>
>> -	rcu_read_lock();
>> 	read_lock(&bond->lock);
>> +	rcu_read_lock();
>>
>> 	bond_dev = bond->dev;
>>
>> @@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 		if (vlan_dev)
>> 			__bond_resend_igmp_join_requests(vlan_dev);
>> 	}
>> +	rcu_read_unlock();
>>
>> -	if (--bond->igmp_retrans > 0)
>> +	/* We use curr_slave_lock to protect against concurrent access to
>> +	 * igmp_retrans from multiple running instances of this function and
>> +	 * bond_change_active_slave
>> +	 */
>> +	write_lock_bh(&bond->curr_slave_lock);
>> +	if (bond->igmp_retrans > 1) {
>> +		bond->igmp_retrans--;
>> 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
> 
> 	Why split out the -- from the comparison?
> 
> 	-J
This one was very tricky, because we can have more than 2 instances
running concurrently and if we unconditionally decrement the value it
can still drop < 0. Example with 3 instances running and igmp_retrans ==
1 (with check bond->igmp_retrans-- > 1):
f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
f3 does the unnecessary retransmissions.

I also have an interesting solution with cmpxchg without curr_slave_lock
but this is more straightforward and since this is not a fast path I
think it's preferrable.

Nik
>> -
>> +	}
>> +	write_unlock_bh(&bond->curr_slave_lock);
>> 	read_unlock(&bond->lock);
>> -	rcu_read_unlock();
>> }
>>
>> static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 2baec24..f989e15 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -225,7 +225,7 @@ struct bonding {
>> 	rwlock_t curr_slave_lock;
>> 	u8	 send_peer_notif;
>> 	s8	 setup_by_slave;
>> -	s8       igmp_retrans;
>> +	u8       igmp_retrans;
>> #ifdef CONFIG_PROC_FS
>> 	struct   proc_dir_entry *proc_entry;
>> 	char     proc_file_name[IFNAMSIZ];
>> -- 
>> 1.8.1.4
>>
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

  reply	other threads:[~2013-06-07  9:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 11:55 [PATCH 0/2] bonding: couple of bug fixes nikolay
2013-06-06 11:55 ` [PATCH 1/2] bonding: reset master mac on first enslave failure nikolay
2013-06-06 11:55 ` [PATCH 2/2] bonding: fix igmp_retrans type and two related races nikolay
2013-06-07  1:00   ` Jay Vosburgh
2013-06-07  9:37     ` Nikolay Aleksandrov [this message]
2013-06-11  9:45 ` [PATCH 0/2] bonding: couple of bug fixes David Miller
2013-06-11 16:42   ` Jay Vosburgh
2013-06-11 16:50     ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B1A9DA.9020600@redhat.com \
    --to=nikolay@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=fubar@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.