netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] bonding: fix div by zero while enslaving and transmitting
@ 2014-09-12 12:22 Nikolay Aleksandrov
  2014-09-12 13:09 ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-12 12:22 UTC (permalink / raw)
  To: netdev
  Cc: Nikolay Aleksandrov, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit() bond_has_slaves()
is used to evaluate further transmission and it becomes true after the
slave is linked in, but when slave_cnt is used in the xmit path it is
still 0. The problem was introduced in commit:
5378c2e6ea236d ("bonding: move bond-specific init after enslave happens")
when the slave_cnt increment was moved after the linking.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077]  [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118]  [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146]  RSP <ffff88005b72b7f8>

CC: Jiri Pirko <jiri@resnulli.us>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
Fixes: 5378c2e6ea236d ("bonding: move bond-specific init after enslave happens")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 57912ee231cb..10ad434ea184 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1552,6 +1552,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		goto err_detach;
 	}
 
+	/* Increment slave_cnt before linking in the slave so we won't end up in
+	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
+	 */
+	bond->slave_cnt++;
 	res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
 	if (res) {
 		netdev_dbg(bond_dev, "Error %d calling bond_master_upper_dev_link\n", res);
@@ -1564,7 +1568,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		goto err_upper_unlink;
 	}
 
-	bond->slave_cnt++;
 	bond_compute_features(bond);
 	bond_set_carrier(bond);
 
@@ -1590,6 +1593,7 @@ err_upper_unlink:
 
 err_unregister:
 	netdev_rx_handler_unregister(slave_dev);
+	bond->slave_cnt--;
 
 err_detach:
 	if (!bond_uses_primary(bond))
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 12:22 [PATCH net] bonding: fix div by zero while enslaving and transmitting Nikolay Aleksandrov
@ 2014-09-12 13:09 ` Eric Dumazet
  2014-09-12 13:27   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2014-09-12 13:09 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

On Fri, 2014-09-12 at 14:22 +0200, Nikolay Aleksandrov wrote:
...
> 
> CC: Jiri Pirko <jiri@resnulli.us>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> Fixes: 5378c2e6ea236d ("bonding: move bond-specific init after enslave happens")
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 57912ee231cb..10ad434ea184 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1552,6 +1552,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>  		goto err_detach;
>  	}
>  
> +	/* Increment slave_cnt before linking in the slave so we won't end up in
> +	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
> +	 */
> +	bond->slave_cnt++;

It looks like explicit barriers are missing.

#define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) 

So your increment into slave_cnt must be committed into memory before
any change to slave_list. But you need to check how removal of a slave
is handled.

Now I wonder why bond_has_slaves(bond) is not a test against
bond->slave_cnt

Note that even if this would be the case, bond xmit seems racy :

if (bond_has_slaves(bond))
	ret = __bond_start_xmit(skb, dev);

As slave_cnt could change (and eventually reach 0) between the two
places.

My feeling is that RCU conversion is not properly done in this driver.

Either bond->slave_cnt should be read _once_ for the whole duration of
bond_start_xmit() call, _OR_, be stored in a real Read Copy structure,
so that struct->slave_cnt _cannot_ change during bond_start_xmit()

>  	res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
>  	if (res) {
>  		netdev_dbg(bond_dev, "Error %d calling bond_master_upper_dev_link\n", res);
> @@ -1564,7 +1568,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>  		goto err_upper_unlink;
>  	}
>  
> -	bond->slave_cnt++;
>  	bond_compute_features(bond);
>  	bond_set_carrier(bond);
>  
> @@ -1590,6 +1593,7 @@ err_upper_unlink:
>  
>  err_unregister:
>  	netdev_rx_handler_unregister(slave_dev);
> +	bond->slave_cnt--;
>  
>  err_detach:
>  	if (!bond_uses_primary(bond))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 13:09 ` Eric Dumazet
@ 2014-09-12 13:27   ` Nikolay Aleksandrov
  2014-09-12 13:33     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-12 13:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

On 09/12/2014 03:09 PM, Eric Dumazet wrote:
> On Fri, 2014-09-12 at 14:22 +0200, Nikolay Aleksandrov wrote:
> ...
>>
>> CC: Jiri Pirko <jiri@resnulli.us>
>> CC: Andy Gospodarek <andy@greyhouse.net>
>> CC: Jay Vosburgh <j.vosburgh@gmail.com>
>> CC: Veaceslav Falico <vfalico@gmail.com>
>> Fixes: 5378c2e6ea236d ("bonding: move bond-specific init after enslave happens")
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 57912ee231cb..10ad434ea184 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1552,6 +1552,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>  		goto err_detach;
>>  	}
>>  
>> +	/* Increment slave_cnt before linking in the slave so we won't end up in
>> +	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
>> +	 */
>> +	bond->slave_cnt++;
> 
> It looks like explicit barriers are missing.
> 
> #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) 
> 
> So your increment into slave_cnt must be committed into memory before
> any change to slave_list. But you need to check how removal of a slave
> is handled.
> 
That is handled by decrementing slave_cnt after executing synchronize_rcu()
after unlinking the last slave thus making the list empty and all xmitters
entering will see bond_has_slaves() as empty before they see slave_cnt as 0.
In every other case the worst that could happen is that a few packets will
see wrong slave_cnt, but that is not a problem since we walk the list to
find the slave with the id.

> Now I wonder why bond_has_slaves(bond) is not a test against
> bond->slave_cnt
> 
It used to be once, I don't remember the reason it's not anymore.

> Note that even if this would be the case, bond xmit seems racy :
> 
> if (bond_has_slaves(bond))
> 	ret = __bond_start_xmit(skb, dev);
> 
Yes, true but we make sure it doesn't see slave_cnt as 0 with
bond_has_slaves() evaluating to true.

> As slave_cnt could change (and eventually reach 0) between the two
> places.
This shouldn't be possible because of the synchronize_rcu() after unlinking
the slave. slave_cnt is decremented only after that so every reader will
see the list empty before they see slave_cnt as 0.

> 
> My feeling is that RCU conversion is not properly done in this driver.
> 
> Either bond->slave_cnt should be read _once_ for the whole duration of
> bond_start_xmit() call, _OR_, be stored in a real Read Copy structure,
> so that struct->slave_cnt _cannot_ change during bond_start_xmit()
> 
>>  	res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
>>  	if (res) {
>>  		netdev_dbg(bond_dev, "Error %d calling bond_master_upper_dev_link\n", res);
>> @@ -1564,7 +1568,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>  		goto err_upper_unlink;
>>  	}
>>  
>> -	bond->slave_cnt++;
>>  	bond_compute_features(bond);
>>  	bond_set_carrier(bond);
>>  
>> @@ -1590,6 +1593,7 @@ err_upper_unlink:
>>  
>>  err_unregister:
>>  	netdev_rx_handler_unregister(slave_dev);
>> +	bond->slave_cnt--;
>>  
>>  err_detach:
>>  	if (!bond_uses_primary(bond))
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 13:27   ` Nikolay Aleksandrov
@ 2014-09-12 13:33     ` Nikolay Aleksandrov
  2014-09-12 14:45       ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-12 13:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

On 09/12/2014 03:27 PM, Nikolay Aleksandrov wrote:
> On 09/12/2014 03:09 PM, Eric Dumazet wrote:
>> On Fri, 2014-09-12 at 14:22 +0200, Nikolay Aleksandrov wrote:
>> ...
>>>
>>> CC: Jiri Pirko <jiri@resnulli.us>
>>> CC: Andy Gospodarek <andy@greyhouse.net>
>>> CC: Jay Vosburgh <j.vosburgh@gmail.com>
>>> CC: Veaceslav Falico <vfalico@gmail.com>
>>> Fixes: 5378c2e6ea236d ("bonding: move bond-specific init after enslave happens")
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 57912ee231cb..10ad434ea184 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1552,6 +1552,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>>  		goto err_detach;
>>>  	}
>>>  
>>> +	/* Increment slave_cnt before linking in the slave so we won't end up in
>>> +	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
>>> +	 */
>>> +	bond->slave_cnt++;
>>
>> It looks like explicit barriers are missing.
>>
One more thing, netdev_master_upper_dev_link_private() which is called
after the increment uses list_add_rcu() (i.e. rcu_assign_pointer) to insert
the slave, so there's a barrier there to ensure this is visible before the
slave is linked.

>> #define bond_has_slaves(bond) !list_empty(bond_slave_list(bond)) 
>>
>> So your increment into slave_cnt must be committed into memory before
>> any change to slave_list. But you need to check how removal of a slave
>> is handled.
>>
> That is handled by decrementing slave_cnt after executing synchronize_rcu()
> after unlinking the last slave thus making the list empty and all xmitters
> entering will see bond_has_slaves() as empty before they see slave_cnt as 0.
> In every other case the worst that could happen is that a few packets will
> see wrong slave_cnt, but that is not a problem since we walk the list to
> find the slave with the id.
> 
>> Now I wonder why bond_has_slaves(bond) is not a test against
>> bond->slave_cnt
>>
> It used to be once, I don't remember the reason it's not anymore.
> 
>> Note that even if this would be the case, bond xmit seems racy :
>>
>> if (bond_has_slaves(bond))
>> 	ret = __bond_start_xmit(skb, dev);
>>
> Yes, true but we make sure it doesn't see slave_cnt as 0 with
> bond_has_slaves() evaluating to true.
> 
>> As slave_cnt could change (and eventually reach 0) between the two
>> places.
> This shouldn't be possible because of the synchronize_rcu() after unlinking
> the slave. slave_cnt is decremented only after that so every reader will
> see the list empty before they see slave_cnt as 0.
> 
>>
>> My feeling is that RCU conversion is not properly done in this driver.
>>
>> Either bond->slave_cnt should be read _once_ for the whole duration of
>> bond_start_xmit() call, _OR_, be stored in a real Read Copy structure,
>> so that struct->slave_cnt _cannot_ change during bond_start_xmit()
>>
>>>  	res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
>>>  	if (res) {
>>>  		netdev_dbg(bond_dev, "Error %d calling bond_master_upper_dev_link\n", res);
>>> @@ -1564,7 +1568,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>>  		goto err_upper_unlink;
>>>  	}
>>>  
>>> -	bond->slave_cnt++;
>>>  	bond_compute_features(bond);
>>>  	bond_set_carrier(bond);
>>>  
>>> @@ -1590,6 +1593,7 @@ err_upper_unlink:
>>>  
>>>  err_unregister:
>>>  	netdev_rx_handler_unregister(slave_dev);
>>> +	bond->slave_cnt--;
>>>  
>>>  err_detach:
>>>  	if (!bond_uses_primary(bond))
>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 13:33     ` Nikolay Aleksandrov
@ 2014-09-12 14:45       ` Eric Dumazet
  2014-09-12 14:55         ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2014-09-12 14:45 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

On Fri, 2014-09-12 at 15:33 +0200, Nikolay Aleksandrov wrote:

> One more thing, netdev_master_upper_dev_link_private() which is called
> after the increment uses list_add_rcu() (i.e. rcu_assign_pointer) to insert
> the slave, so there's a barrier there to ensure this is visible before the
> slave is linked.

You missed my point.

You fixed the writer side, without adding barriers on the read side.

Without looking at the code, just reading your patch, I spot a problem.

Following code is fundamentally broken :

rcu_read_lock();

if (bond->list) {
    x = y % bond->slave;  // bug : cpu could fetch bond->slave before  bond->list

rcu_read_unlock();


Because it needs a read barrier, since the writer does after your patch:

	update bond->slave
	smp_wmb();
	update bond->list

I repeat : adding few rcu_read_lock() / rcu_read_unlock() /
rcu_assign_pointer() calls is not enough to guarantee code is not
broken.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 14:45       ` Eric Dumazet
@ 2014-09-12 14:55         ` Nikolay Aleksandrov
  2014-09-12 15:38           ` [PATCH net v2] " Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-12 14:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jiri Pirko, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

On 09/12/2014 04:45 PM, Eric Dumazet wrote:
> On Fri, 2014-09-12 at 15:33 +0200, Nikolay Aleksandrov wrote:
> 
>> One more thing, netdev_master_upper_dev_link_private() which is called
>> after the increment uses list_add_rcu() (i.e. rcu_assign_pointer) to insert
>> the slave, so there's a barrier there to ensure this is visible before the
>> slave is linked.
> 
> You missed my point.
> 
> You fixed the writer side, without adding barriers on the read side.
> 
> Without looking at the code, just reading your patch, I spot a problem.
> 
> Following code is fundamentally broken :
> 
> rcu_read_lock();
> 
> if (bond->list) {
>     x = y % bond->slave;  // bug : cpu could fetch bond->slave before  bond->list
> 
> rcu_read_unlock();
> 
> 
Ah yes, you're absolutely correct.
Then get the value of slave_cnt in a local variable using ACCESS_ONCE() + a
check for != 0 would be a better solution than adding a read barrier, what
do you think ?
Since it's not a problem to see slave_cnt > 0 when there're no slaves, but
the other way around is problematic.

> Because it needs a read barrier, since the writer does after your patch:
> 
> 	update bond->slave
> 	smp_wmb();
> 	update bond->list
> 
> I repeat : adding few rcu_read_lock() / rcu_read_unlock() /
> rcu_assign_pointer() calls is not enough to guarantee code is not
> broken.
> 
Yes, I see now.

Thanks,
 Nik
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v2] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 14:55         ` Nikolay Aleksandrov
@ 2014-09-12 15:38           ` Nikolay Aleksandrov
  2014-09-13 21:17             ` David Miller
  2014-09-17  6:15             ` Ding Tianhong
  0 siblings, 2 replies; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-12 15:38 UTC (permalink / raw)
  To: netdev
  Cc: Nikolay Aleksandrov, Eric Dumazet, Andy Gospodarek, Jay Vosburgh,
	Veaceslav Falico

The problem is that the slave is first linked and slave_cnt is
incremented afterwards leading to a div by zero in the modes that use it
as a modulus. What happens is that in bond_start_xmit()
bond_has_slaves() is used to evaluate further transmission and it becomes
true after the slave is linked in, but when slave_cnt is used in the xmit
path it is still 0, so fetch it once and transmit based on that. Since
it is used only in round-robin and XOR modes, the fix is only for them.
Thanks to Eric Dumazet for pointing out the fault in my first try to fix
this.

Call trace (took it out of net-next kernel, but it's the same with net):
[46934.330038] divide error: 0000 [#1] SMP
[46934.330041] Modules linked in: bonding(O) 9p fscache
snd_hda_codec_generic crct10dif_pclmul
[46934.330041] bond0: Enslaving eth1 as an active interface with an up
link
[46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
pata_acpi floppy [last unloaded: bonding]
[46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
3.17.0-rc4+ #27
[46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
ffff88005b728000
[46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
bond_start_xmit+0x1c3/0x450 [bonding]
[46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
[46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
000000000000002a
[46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
ffff88004b077940
[46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
ffff88004a83e000
[46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
ffff88004b3f0500
[46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
ffff88004b077940
[46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
knlGS:0000000000000000
[46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
00000000000406e0
[46934.330069] Stack:
[46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
ffff88004b3f0500
[46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
ffff88005b72b8a0
[46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
ffff88005b302000
[46934.330073] Call Trace:
[46934.330077]  [<ffffffff811e6169>] ?
__kmalloc_node_track_caller+0x119/0x300
[46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
[46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
[46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
[46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
[46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
[46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
[46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
[46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
[46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
[46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
[46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
[46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
[46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
[46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
[46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
[46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
[46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
[46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
[46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
[46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
[46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
[46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
[46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
[46934.330115] bond0: Adding slave eth2
[46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
[46934.330118]  [<ffffffff81603248>] ?
move_addr_to_kernel.part.20+0x28/0x80
[46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
[46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
[46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
[46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
[46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
[46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
[46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
[46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
[46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
[46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
[bonding]
[46934.330146]  RSP <ffff88005b72b7f8>

CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
Fixes: 278b208375 ("bonding: initial RCU conversion")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: Based on Eric's feedback change the fix to fetch the value once in a
    local variable in the affected modes and to act based on that.

I believe we don't need to move the increment now as it doesn't really
matter if it'll be before the wmb or after.

 drivers/net/bonding/bond_main.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 57912ee231cb..798ae69fb63c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3659,8 +3659,14 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 		else
 			bond_xmit_slave_id(bond, skb, 0);
 	} else {
-		slave_id = bond_rr_gen_slave_id(bond);
-		bond_xmit_slave_id(bond, skb, slave_id % bond->slave_cnt);
+		int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
+
+		if (likely(slave_cnt)) {
+			slave_id = bond_rr_gen_slave_id(bond);
+			bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
+		} else {
+			dev_kfree_skb_any(skb);
+		}
 	}
 
 	return NETDEV_TX_OK;
@@ -3691,8 +3697,13 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
 
-	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb) % bond->slave_cnt);
+	if (likely(slave_cnt))
+		bond_xmit_slave_id(bond, skb,
+				   bond_xmit_hash(bond, skb) % slave_cnt);
+	else
+		dev_kfree_skb_any(skb);
 
 	return NETDEV_TX_OK;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 15:38           ` [PATCH net v2] " Nikolay Aleksandrov
@ 2014-09-13 21:17             ` David Miller
  2014-09-17  6:15             ` Ding Tianhong
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2014-09-13 21:17 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, eric.dumazet, andy, j.vosburgh, vfalico

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Fri, 12 Sep 2014 17:38:18 +0200

> The problem is that the slave is first linked and slave_cnt is
> incremented afterwards leading to a div by zero in the modes that use it
> as a modulus. What happens is that in bond_start_xmit()
> bond_has_slaves() is used to evaluate further transmission and it becomes
> true after the slave is linked in, but when slave_cnt is used in the xmit
> path it is still 0, so fetch it once and transmit based on that. Since
> it is used only in round-robin and XOR modes, the fix is only for them.
> Thanks to Eric Dumazet for pointing out the fault in my first try to fix
> this.
  ...
> Fixes: 278b208375 ("bonding: initial RCU conversion")
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> v2: Based on Eric's feedback change the fix to fetch the value once in a
>     local variable in the affected modes and to act based on that.

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] bonding: fix div by zero while enslaving and transmitting
  2014-09-12 15:38           ` [PATCH net v2] " Nikolay Aleksandrov
  2014-09-13 21:17             ` David Miller
@ 2014-09-17  6:15             ` Ding Tianhong
  2014-09-17 11:08               ` Nikolay Aleksandrov
  1 sibling, 1 reply; 11+ messages in thread
From: Ding Tianhong @ 2014-09-17  6:15 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: Eric Dumazet, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico

On 2014/9/12 23:38, Nikolay Aleksandrov wrote:
> The problem is that the slave is first linked and slave_cnt is
> incremented afterwards leading to a div by zero in the modes that use it
> as a modulus. What happens is that in bond_start_xmit()
> bond_has_slaves() is used to evaluate further transmission and it becomes
> true after the slave is linked in, but when slave_cnt is used in the xmit
> path it is still 0, so fetch it once and transmit based on that. Since
> it is used only in round-robin and XOR modes, the fix is only for them.
> Thanks to Eric Dumazet for pointing out the fault in my first try to fix
> this.
> 

Hi, I think no need to add more checks in the xmit fast path, why not add a barrier to make
sure the slave_cnt inc to 1 before access it.

+	/* Increment slave_cnt before linking in the slave so we won't end up in
+	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
+	 */
+	bond->slave_cnt++;
+	wmb();

I think it looks more efficiency, sorry for reply so late.

Regards
Ding


> Call trace (took it out of net-next kernel, but it's the same with net):
> [46934.330038] divide error: 0000 [#1] SMP
> [46934.330041] Modules linked in: bonding(O) 9p fscache
> snd_hda_codec_generic crct10dif_pclmul
> [46934.330041] bond0: Enslaving eth1 as an active interface with an up
> link
> [46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
> ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
> serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
> snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
> virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
> pata_acpi floppy [last unloaded: bonding]
> [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
> 3.17.0-rc4+ #27
> [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
> ffff88005b728000
> [46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
> bond_start_xmit+0x1c3/0x450 [bonding]
> [46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
> [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
> 000000000000002a
> [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
> ffff88004b077940
> [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
> ffff88004a83e000
> [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
> ffff88004b3f0500
> [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
> ffff88004b077940
> [46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
> knlGS:0000000000000000
> [46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
> 00000000000406e0
> [46934.330069] Stack:
> [46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
> ffff88004b3f0500
> [46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
> ffff88005b72b8a0
> [46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
> ffff88005b302000
> [46934.330073] Call Trace:
> [46934.330077]  [<ffffffff811e6169>] ?
> __kmalloc_node_track_caller+0x119/0x300
> [46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
> [46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
> [46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
> [46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
> [46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
> [46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
> [46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
> [46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
> [46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
> [46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
> [46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
> [46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
> [46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
> [46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
> [46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
> [46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
> [46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
> [46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
> [46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
> [46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
> [46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
> [46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
> [46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
> [46934.330115] bond0: Adding slave eth2
> [46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
> [46934.330118]  [<ffffffff81603248>] ?
> move_addr_to_kernel.part.20+0x28/0x80
> [46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
> [46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
> [46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
> [46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
> [46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
> [46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
> [46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
> [46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
> [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
> 1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
> 89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
> [46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
> [bonding]
> [46934.330146]  RSP <ffff88005b72b7f8>
> 
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> Fixes: 278b208375 ("bonding: initial RCU conversion")
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> v2: Based on Eric's feedback change the fix to fetch the value once in a
>     local variable in the affected modes and to act based on that.
> 
> I believe we don't need to move the increment now as it doesn't really
> matter if it'll be before the wmb or after.
> 
>  drivers/net/bonding/bond_main.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 57912ee231cb..798ae69fb63c 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3659,8 +3659,14 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
>  		else
>  			bond_xmit_slave_id(bond, skb, 0);
>  	} else {
> -		slave_id = bond_rr_gen_slave_id(bond);
> -		bond_xmit_slave_id(bond, skb, slave_id % bond->slave_cnt);
> +		int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
> +
> +		if (likely(slave_cnt)) {
> +			slave_id = bond_rr_gen_slave_id(bond);
> +			bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
> +		} else {
> +			dev_kfree_skb_any(skb);
> +		}
>  	}
>  
>  	return NETDEV_TX_OK;
> @@ -3691,8 +3697,13 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
>  static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
>  {
>  	struct bonding *bond = netdev_priv(bond_dev);
> +	int slave_cnt = ACCESS_ONCE(bond->slave_cnt);
>  
> -	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb) % bond->slave_cnt);
> +	if (likely(slave_cnt))
> +		bond_xmit_slave_id(bond, skb,
> +				   bond_xmit_hash(bond, skb) % slave_cnt);
> +	else
> +		dev_kfree_skb_any(skb);
>  
>  	return NETDEV_TX_OK;
>  }
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] bonding: fix div by zero while enslaving and transmitting
  2014-09-17  6:15             ` Ding Tianhong
@ 2014-09-17 11:08               ` Nikolay Aleksandrov
  2014-09-18 10:59                 ` Ding Tianhong
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2014-09-17 11:08 UTC (permalink / raw)
  To: Ding Tianhong, netdev
  Cc: Eric Dumazet, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico

On 17/09/14 08:15, Ding Tianhong wrote:
> On 2014/9/12 23:38, Nikolay Aleksandrov wrote:
>> The problem is that the slave is first linked and slave_cnt is
>> incremented afterwards leading to a div by zero in the modes that use it
>> as a modulus. What happens is that in bond_start_xmit()
>> bond_has_slaves() is used to evaluate further transmission and it becomes
>> true after the slave is linked in, but when slave_cnt is used in the xmit
>> path it is still 0, so fetch it once and transmit based on that. Since
>> it is used only in round-robin and XOR modes, the fix is only for them.
>> Thanks to Eric Dumazet for pointing out the fault in my first try to fix
>> this.
>>
>
> Hi, I think no need to add more checks in the xmit fast path, why not add a barrier to make
> sure the slave_cnt inc to 1 before access it.
>
> +	/* Increment slave_cnt before linking in the slave so we won't end up in
> +	 * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
> +	 */
> +	bond->slave_cnt++;
> +	wmb();
>
> I think it looks more efficiency, sorry for reply so late.
>
> Regards
> Ding
>
>

Hi Ding,
You should re-read Eric's comment to my first fix. In my first attempt I moved 
the increment before the slave linking which does rcu_assign_pointer() which 
implies a full memory barrier, IIRC. The issue is that this fixes the writer 
side and makes sure the increment is visible before linking the slave, but I 
missed that on the reader side (bond_start_xmit()) we don't have any barriers, 
so the CPU is free to do whatever it likes with the access to slave_cnt F.e. it 
can fetch it before the slave list.
Now, this fix shouldn't be felt much performance-wise since the likely() hint 
will be correct 99% of the time because the situation where slave_cnt is not in 
sync is only in a very short period of time while enslaving and releasing 
slaves. If you'd like to further remove this one check - you could. You can 
fetch slave_cnt only once in bond_start_xmit() and use that as a check for 
further transmitting instead of empty slave list but you must pass down the 
fetched value to the xmitting functions, that is you should not re-fetch it, so 
it'd probably require you to add additional parameter to all modes' xmit 
functions so you can pass it down from bond_start_xmit(). Since only 2 modes 
actually use slave_cnt I don't think that is necessary.
In any case net should be merged with net-next first.

Cheers,
  Nik

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2] bonding: fix div by zero while enslaving and transmitting
  2014-09-17 11:08               ` Nikolay Aleksandrov
@ 2014-09-18 10:59                 ` Ding Tianhong
  0 siblings, 0 replies; 11+ messages in thread
From: Ding Tianhong @ 2014-09-18 10:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: Eric Dumazet, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico

On 2014/9/17 19:08, Nikolay Aleksandrov wrote:
> On 17/09/14 08:15, Ding Tianhong wrote:
>> On 2014/9/12 23:38, Nikolay Aleksandrov wrote:
>>> The problem is that the slave is first linked and slave_cnt is
>>> incremented afterwards leading to a div by zero in the modes that use it
>>> as a modulus. What happens is that in bond_start_xmit()
>>> bond_has_slaves() is used to evaluate further transmission and it becomes
>>> true after the slave is linked in, but when slave_cnt is used in the xmit
>>> path it is still 0, so fetch it once and transmit based on that. Since
>>> it is used only in round-robin and XOR modes, the fix is only for them.
>>> Thanks to Eric Dumazet for pointing out the fault in my first try to fix
>>> this.
>>>
>>
>> Hi, I think no need to add more checks in the xmit fast path, why not add a barrier to make
>> sure the slave_cnt inc to 1 before access it.
>>
>> +    /* Increment slave_cnt before linking in the slave so we won't end up in
>> +     * bond_start_xmit with bond_has_slaves() true and slave_cnt == 0.
>> +     */
>> +    bond->slave_cnt++;
>> +    wmb();
>>
>> I think it looks more efficiency, sorry for reply so late.
>>
>> Regards
>> Ding
>>
>>
> 
> Hi Ding,
> You should re-read Eric's comment to my first fix. In my first attempt I moved the increment before the slave linking which does rcu_assign_pointer() which implies a full memory barrier, IIRC. The issue is that this fixes the writer side and makes sure the increment is visible before linking the slave, but I missed that on the reader side (bond_start_xmit()) we don't have any barriers, so the CPU is free to do whatever it likes with the access to slave_cnt F.e. it can fetch it before the slave list.
> Now, this fix shouldn't be felt much performance-wise since the likely() hint will be correct 99% of the time because the situation where slave_cnt is not in sync is only in a very short period of time while enslaving and releasing slaves. If you'd like to further remove this one check - you could. You can fetch slave_cnt only once in bond_start_xmit() and use that as a check for further transmitting instead of empty slave list but you must pass down the fetched value to the xmitting functions, that is you should not re-fetch it, so it'd probably require you to add additional parameter to all modes' xmit functions so you can pass it down from bond_start_xmit(). Since only 2 modes actually use slave_cnt I don't think that is necessary.
> In any case net should be merged with net-next first.
> 
> Cheers,
>  Nik
> 

Hi Nik:

Thanks for your explanation, I got it, I need to think more about it, thanks.

Ding
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-09-18 10:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-12 12:22 [PATCH net] bonding: fix div by zero while enslaving and transmitting Nikolay Aleksandrov
2014-09-12 13:09 ` Eric Dumazet
2014-09-12 13:27   ` Nikolay Aleksandrov
2014-09-12 13:33     ` Nikolay Aleksandrov
2014-09-12 14:45       ` Eric Dumazet
2014-09-12 14:55         ` Nikolay Aleksandrov
2014-09-12 15:38           ` [PATCH net v2] " Nikolay Aleksandrov
2014-09-13 21:17             ` David Miller
2014-09-17  6:15             ` Ding Tianhong
2014-09-17 11:08               ` Nikolay Aleksandrov
2014-09-18 10:59                 ` Ding Tianhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).