netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression in bonding driver - devices without set_mac
@ 2015-11-04  3:49 Toby Corkindale
  2015-11-04  7:45 ` Toby Corkindale
  2015-11-06 22:46 ` Jay Vosburgh
  0 siblings, 2 replies; 6+ messages in thread
From: Toby Corkindale @ 2015-11-04  3:49 UTC (permalink / raw)
  To: netdev

Hi,
I've subscribed to this list to try and sort out a regression in the
bonding network driver.
I originally reported this in Dec 2014, as:
https://bugzilla.kernel.org/show_bug.cgi?id=89161

The bug is still present in the 4.2 Linux kernel.

If you look at the code, here:
https://github.com/torvalds/linux/commit/f54424412b6b2f64cae4d7c39d981ca14ce0052c

Then you see that the intention is for devices without set_mac support
to be supported. Although in older code they DID work, and that
ability died sometime near this commit. And has never returned since.

The code path in current kernels does allow devices through that block
of code, but it fails somewhere else -- the devices are not
successfully added as slaves, but the only error printed is the
warning about not supporting MAC address setting.

Is there anything I can do to try and sort this regression out, with you?

I can explain how to set up two virtual machines with PPPoE client and
server in order to test, if you like? Or I can try out patches and
give feedback.

Cheers
Toby

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in bonding driver - devices without set_mac
  2015-11-04  3:49 Regression in bonding driver - devices without set_mac Toby Corkindale
@ 2015-11-04  7:45 ` Toby Corkindale
  2015-11-05 23:12   ` Toby Corkindale
  2015-11-06 22:46 ` Jay Vosburgh
  1 sibling, 1 reply; 6+ messages in thread
From: Toby Corkindale @ 2015-11-04  7:45 UTC (permalink / raw)
  To: netdev

On 4 November 2015 at 14:49, Toby Corkindale <tjc@wintrmute.net> wrote:
> Hi,
> I've subscribed to this list to try and sort out a regression in the
> bonding network driver.
> I originally reported this in Dec 2014, as:
> https://bugzilla.kernel.org/show_bug.cgi?id=89161
>
> The bug is still present in the 4.2 Linux kernel.
>
> If you look at the code, here:
> https://github.com/torvalds/linux/commit/f54424412b6b2f64cae4d7c39d981ca14ce0052c
>
> Then you see that the intention is for devices without set_mac support
> to be supported. Although in older code they DID work, and that
> ability died sometime near this commit. And has never returned since.
>
> The code path in current kernels does allow devices through that block
> of code, but it fails somewhere else -- the devices are not
> successfully added as slaves, but the only error printed is the
> warning about not supporting MAC address setting.
>
> Is there anything I can do to try and sort this regression out, with you?
>
> I can explain how to set up two virtual machines with PPPoE client and
> server in order to test, if you like? Or I can try out patches and
> give feedback.
>
> Cheers
> Toby


The following patch fixes the regression, for me, although I suspect
it's not suitable for direct application.
But maybe knowing that it fixes the issue, helps you create something
appropriate?

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 771a449..3f44875 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1460,8 +1460,8 @@ int bond_enslave(struct net_device *bond_dev,
struct net_device *slave_dev
)
       addr.sa_family = slave_dev->type;
       res = dev_set_mac_address(slave_dev, &addr);
       if (res) {
-           netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
-           goto err_restore_mtu;
+           netdev_warn(bond_dev, "Error %d calling set_mac_address\n", res);
       }
   }

diff --git a/drivers/net/bonding/bond_netlink.c
b/drivers/net/bonding/bond_netlink.c
index db760e8..e4ffc94 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -49,9 +49,8 @@ static int bond_fill_slave_info(struct sk_buff *skb,
           slave->link_failure_count))
       goto nla_put_failure;

-   if (nla_put(skb, IFLA_BOND_SLAVE_PERM_HWADDR,
-           slave_dev->addr_len, slave->perm_hwaddr))
-       goto nla_put_failure;
+   nla_put(skb, IFLA_BOND_SLAVE_PERM_HWADDR, slave_dev->addr_len,
slave->perm_hwaddr);

   if (nla_put_u16(skb, IFLA_BOND_SLAVE_QUEUE_ID, slave->queue_id))
       goto nla_put_failure;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Regression in bonding driver - devices without set_mac
  2015-11-04  7:45 ` Toby Corkindale
@ 2015-11-05 23:12   ` Toby Corkindale
  0 siblings, 0 replies; 6+ messages in thread
From: Toby Corkindale @ 2015-11-05 23:12 UTC (permalink / raw)
  To: netdev

OK, wow, this list is super high traffic. I'm going to unsubscribe now.

Feel free to get in contact with me directly (not via the list) if you like.

It'd be lovely if someone wants to fix the regression I reported,
especially as it looks pretty simple to achieve. (Just don't silently fail
if you can't get a MAC address for the netlink info).

Bye,
Toby

On 4 November 2015 at 18:45, Toby Corkindale <tjc@wintrmute.net> wrote:
>
> On 4 November 2015 at 14:49, Toby Corkindale <tjc@wintrmute.net> wrote:
> > Hi,
> > I've subscribed to this list to try and sort out a regression in the
> > bonding network driver.
> > I originally reported this in Dec 2014, as:
> > https://bugzilla.kernel.org/show_bug.cgi?id=89161
> >
> > The bug is still present in the 4.2 Linux kernel.
> >
> > If you look at the code, here:
> > https://github.com/torvalds/linux/commit/f54424412b6b2f64cae4d7c39d981ca14ce0052c
> >
> > Then you see that the intention is for devices without set_mac support
> > to be supported. Although in older code they DID work, and that
> > ability died sometime near this commit. And has never returned since.
> >
> > The code path in current kernels does allow devices through that block
> > of code, but it fails somewhere else -- the devices are not
> > successfully added as slaves, but the only error printed is the
> > warning about not supporting MAC address setting.
> >
> > Is there anything I can do to try and sort this regression out, with you?
> >
> > I can explain how to set up two virtual machines with PPPoE client and
> > server in order to test, if you like? Or I can try out patches and
> > give feedback.
> >
> > Cheers
> > Toby
>
>
> The following patch fixes the regression, for me, although I suspect
> it's not suitable for direct application.
> But maybe knowing that it fixes the issue, helps you create something
> appropriate?
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 771a449..3f44875 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1460,8 +1460,8 @@ int bond_enslave(struct net_device *bond_dev,
> struct net_device *slave_dev
> )
>        addr.sa_family = slave_dev->type;
>        res = dev_set_mac_address(slave_dev, &addr);
>        if (res) {
> -           netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
> -           goto err_restore_mtu;
> +           netdev_warn(bond_dev, "Error %d calling set_mac_address\n", res);
>        }
>    }
>
> diff --git a/drivers/net/bonding/bond_netlink.c
> b/drivers/net/bonding/bond_netlink.c
> index db760e8..e4ffc94 100644
> --- a/drivers/net/bonding/bond_netlink.c
> +++ b/drivers/net/bonding/bond_netlink.c
> @@ -49,9 +49,8 @@ static int bond_fill_slave_info(struct sk_buff *skb,
>            slave->link_failure_count))
>        goto nla_put_failure;
>
> -   if (nla_put(skb, IFLA_BOND_SLAVE_PERM_HWADDR,
> -           slave_dev->addr_len, slave->perm_hwaddr))
> -       goto nla_put_failure;
> +   nla_put(skb, IFLA_BOND_SLAVE_PERM_HWADDR, slave_dev->addr_len,
> slave->perm_hwaddr);
>
>    if (nla_put_u16(skb, IFLA_BOND_SLAVE_QUEUE_ID, slave->queue_id))
>        goto nla_put_failure;




-- 
Turning and turning in the widening gyre
The falcon cannot hear the falconer
Things fall apart; the center cannot hold
Mere anarchy is loosed upon the world

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in bonding driver - devices without set_mac
  2015-11-04  3:49 Regression in bonding driver - devices without set_mac Toby Corkindale
  2015-11-04  7:45 ` Toby Corkindale
@ 2015-11-06 22:46 ` Jay Vosburgh
       [not found]   ` <CABEgq96o2dXi06rLh6xmLGYVzh0-fVPs0rpjv8X3NTsk_=8+JQ@mail.gmail.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Jay Vosburgh @ 2015-11-06 22:46 UTC (permalink / raw)
  To: Toby Corkindale; +Cc: netdev, Veaceslav Falico

Toby Corkindale <tjc@wintrmute.net> wrote:

>I originally reported this in Dec 2014, as:
>https://bugzilla.kernel.org/show_bug.cgi?id=89161
>
>The bug is still present in the 4.2 Linux kernel.

	I had some time to look into this today, and there is a related
code path that panics the kernel, so I'm working on a patch to resolve
things, and will be posting that shortly.

>If you look at the code, here:
>https://github.com/torvalds/linux/commit/f54424412b6b2f64cae4d7c39d981ca14ce0052c
>
>Then you see that the intention is for devices without set_mac support
>to be supported. Although in older code they DID work, and that
>ability died sometime near this commit. And has never returned since.

	Unfortunately, I believe the referenced commit is in error.  The
change log states:

	[...] It's wrong because we
    only require ndo_set_mac_address in case bonding is in active-backup mode
    and FOM isn't FOM_ACTIVE.

	This assertion is incorrect; ndo_set_mac_address is necessary
for all modes with the only exception being active-backup with
fail_over_mac enabled.

	All of the load balance modes (everything except active-backup)
will change the MAC of the slave devices at some point, and thus require
ndo_set_mac_address support.

	PPP devices should function in active-backup mode, and should
enable fail_over_mac automatically (if the PPP device is the first
slave) if f_o_m is not already enabled.

	The prior discussion on this patch can be found here:

http://www.spinics.net/lists/netdev/msg289311.html

>The code path in current kernels does allow devices through that block
>of code, but it fails somewhere else -- the devices are not
>successfully added as slaves, but the only error printed is the
>warning about not supporting MAC address setting.

	What is happening in the current code is that, for load balance
modes, e.g., balance-rr, the test for ndo_set_mac_address support
passes, but, later, bond_enslave attempts to actually set the MAC of the
new slave, and this fails (because there is no ndo_set_mac).

>Is there anything I can do to try and sort this regression out, with you?

	As I said, I do not believe this is a regression.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in bonding driver - devices without set_mac
       [not found]   ` <CABEgq96o2dXi06rLh6xmLGYVzh0-fVPs0rpjv8X3NTsk_=8+JQ@mail.gmail.com>
@ 2015-11-07  2:31     ` Jay Vosburgh
  2015-11-07  6:05       ` Toby Corkindale
  0 siblings, 1 reply; 6+ messages in thread
From: Jay Vosburgh @ 2015-11-07  2:31 UTC (permalink / raw)
  To: Toby Corkindale; +Cc: netdev, Veaceslav Falico

Toby Corkindale <tjc@wintrmute.net> wrote:
[...]
>The thing is, I've been running PPP devices in balance-rr mode for
>quite a while. It worked fine, and I'd like to continue having that
>functionality.  These changes to the kernel have broken that
>functionality, by requiring the MAC address support unconditionally.
>
>I guess using PPP devices is a special case, but it would be great if
>that special case could continue to be supported.

	Could you describe your configuration / use case a bit more?

	The nominal reason for the MAC change in balance-rr or
balance-xor modes is that those modes are designed to interoperate with
Etherchannel (static link aggregation) type of systems, which expect all
ports participating in the aggregation to use the same MAC address.

	Relaxing the restriction is likely a matter of causing
fail_over_mac to be obeyed in addtional modes other than active-backup,
which is actually what's catching the ppp case right now.  The check for
the set_mac function passes, but later, the test for "should the slave's
MAC be changed" is

        if (!bond->params.fail_over_mac ||
            BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
		[ change the slave MAC ]
	}

	so this block is entered for anything other than active-backup
mode with fail_over_mac enabled.  This block was changed for

commit 00503b6f702eaf23e7257d6287da72805d7d014c
Author: dingtianhong <dingtianhong@huawei.com>
Date:   Sat Jan 25 13:00:29 2014 +0800

    bonding: fail_over_mac should only affect AB mode at enslave and removal pro
cessing

	So that's probably when you saw the behavior change.

	Are you able to test a patch?  I believe the following would
change the behavior, although some of the release logic may not be
correct (I've not tested this).

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b4351caf8e01..5f3632544d52 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1451,8 +1451,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	 */
 	ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr);
 
-	if (!bond->params.fail_over_mac ||
-	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
+	if (!bond->params.fail_over_mac) {
 		/* Set slave to master's mac address.  The application already
 		 * set the master's mac address to that of the first slave
 		 */
@@ -1725,8 +1724,7 @@ err_close:
 	dev_close(slave_dev);
 
 err_restore_mac:
-	if (!bond->params.fail_over_mac ||
-	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
+	if (!bond->params.fail_over_mac) {
 		/* XXX TODO - fom follow mode needs to change master's
 		 * MAC if this slave's MAC is in use by the bond, or at
 		 * least print a warning.
@@ -1823,8 +1821,7 @@ static int __bond_release_one(struct net_device *bond_dev,
 
 	RCU_INIT_POINTER(bond->current_arp_slave, NULL);
 
-	if (!all && (!bond->params.fail_over_mac ||
-		     BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
+	if (!all && (!bond->params.fail_over_mac)) {
 		if (ether_addr_equal_64bits(bond_dev->dev_addr, slave->perm_hwaddr) &&
 		    bond_has_slaves(bond))
 			netdev_warn(bond_dev, "the permanent HWaddr of %s - %pM - is still in use by %s - set the HWaddr of %s to a different address to avoid conflicts\n",
@@ -1905,8 +1902,7 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* close slave before restoring its mac address */
 	dev_close(slave_dev);
 
-	if (bond->params.fail_over_mac != BOND_FOM_ACTIVE ||
-	    BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
+	if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
 		/* restore original ("permanent") mac address */
 		ether_addr_copy(addr.sa_data, slave->perm_hwaddr);
 		addr.sa_family = slave_dev->type;


	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Regression in bonding driver - devices without set_mac
  2015-11-07  2:31     ` Jay Vosburgh
@ 2015-11-07  6:05       ` Toby Corkindale
  0 siblings, 0 replies; 6+ messages in thread
From: Toby Corkindale @ 2015-11-07  6:05 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, Veaceslav Falico

On 7 November 2015 at 13:31, Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Toby Corkindale <tjc@wintrmute.net> wrote:
> [...]
> >The thing is, I've been running PPP devices in balance-rr mode for
> >quite a while. It worked fine, and I'd like to continue having that
> >functionality.  These changes to the kernel have broken that
> >functionality, by requiring the MAC address support unconditionally.
> >
> >I guess using PPP devices is a special case, but it would be great if
> >that special case could continue to be supported.
>
>         Could you describe your configuration / use case a bit more?

Hi Jay,
Certainly, I can.

Some internet service providers allow you to purchase multiple DSL
services, which are bonded together. This provides performance equal
to the total aggregate bandwidth, allowing for high-speed connections
in areas that are only serviced by basic ADSL. To use these services,
you need multiple ADSL modems, and the bonding network driver
aggregates them for receiving data, and for sending data back
upstream.

It works quite nicely, and avoids the need to purchase specialised
multi-port ADSL modems, or dedicated Cisco devices.


>         The nominal reason for the MAC change in balance-rr or
> balance-xor modes is that those modes are designed to interoperate with
> Etherchannel (static link aggregation) type of systems, which expect all
> ports participating in the aggregation to use the same MAC address.
>
>         Relaxing the restriction is likely a matter of causing
> fail_over_mac to be obeyed in addtional modes other than active-backup,
> which is actually what's catching the ppp case right now.  The check for
> the set_mac function passes, but later, the test for "should the slave's
> MAC be changed" is
>
>         if (!bond->params.fail_over_mac ||
>             BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>                 [ change the slave MAC ]
>         }
>
>         so this block is entered for anything other than active-backup
> mode with fail_over_mac enabled.  This block was changed for
>
> commit 00503b6f702eaf23e7257d6287da72805d7d014c
> Author: dingtianhong <dingtianhong@huawei.com>
> Date:   Sat Jan 25 13:00:29 2014 +0800
>
>     bonding: fail_over_mac should only affect AB mode at enslave and removal pro
> cessing
>
>         So that's probably when you saw the behavior change.
>
>         Are you able to test a patch?  I believe the following would
> change the behavior, although some of the release logic may not be
> correct (I've not tested this).

I don't know if you saw one of my later posts on the topic, but I
submitted a patch. There's a couple of spots that try to fetch the MAC
address to put in a netlink info struct, and which will fail out if
that is not successful.
You will need to incorporate that into your own patch before it will work.
You can find my patch in the earlier post under this thread, or else
at the Bugzilla ticket here:
https://bugzilla.kernel.org/show_bug.cgi?id=89161

It's the middle of the weekend here in Australia, but I can test out
patches tomorrow evening (24 hours from now).

Cheers
Toby

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-07  6:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04  3:49 Regression in bonding driver - devices without set_mac Toby Corkindale
2015-11-04  7:45 ` Toby Corkindale
2015-11-05 23:12   ` Toby Corkindale
2015-11-06 22:46 ` Jay Vosburgh
     [not found]   ` <CABEgq96o2dXi06rLh6xmLGYVzh0-fVPs0rpjv8X3NTsk_=8+JQ@mail.gmail.com>
2015-11-07  2:31     ` Jay Vosburgh
2015-11-07  6:05       ` Toby Corkindale

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).