* Re: Issue with LACP mode in linux bonding driver [not found] <CADAe=+LvNgHV+crN_+E4xB+Pcz=KSvZpK5ADKzYFjHL2gGJq2Q@mail.gmail.com> @ 2015-06-26 2:15 ` Jay Vosburgh 2015-06-26 13:57 ` Ajith Adapa 0 siblings, 1 reply; 6+ messages in thread From: Jay Vosburgh @ 2015-06-26 2:15 UTC (permalink / raw) To: Ajith Adapa; +Cc: vfalico, gospo, netdev Ajith Adapa <adapa.ajith@gmail.com> wrote: >Hi, > >Sorry for direct mail. Since the question is more specific about >supporting LACP standard I decided to communicate directly with the >MAINTAINERS. My issue is related to multiaggregation support in LACP. I saw your message this morning, but didn't have an opportunity to look into it today. >Linux Flavour: Centos7. >Setup topology: Back to back connected Linux server and a L2 switch >with 2 interfaces eth0 and eth1 (on both sides). > >On Switch I have mapped eth0 to po1 and eth1 to po2. On Linux server I >have created a single bond interface with both interfaces eth0 and >eth1. > >On switch both po1 and po2 has same system-id but the Actor key is >different i.e. on PO1 it is 16385 and on PO2 it is 32768. As per the >information available regarding bond0 on Linux server which is given below >Active aggregator ID is 1 which is mapped to eth0. > >But we have observed that eth1 on Linux server is also sending LACPDUS >with Collecting/Distributing bit set as 1. Which will result in single >bond interface on Linux server is splitted into multiple port-channels >on Switch causing duplication of frames on Linux server. I'd suggest enabling the dynamic_debug for the bonding driver and observe the state machine activity within the 802.3ad code. This is described in the Documentation/dynamic-debug-howto.txt that is part of the kernel source; off the top of my head, I think you'll need something like: echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control This should put the bonding LACP state machine activity into the system log. If a port on a non-active aggregator is actually in collecting / distributing state, that is probably bad, as I'd only expect that to be true for ports in the active aggregator. -J --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Issue with LACP mode in linux bonding driver 2015-06-26 2:15 ` Issue with LACP mode in linux bonding driver Jay Vosburgh @ 2015-06-26 13:57 ` Ajith Adapa 2015-06-26 21:19 ` Jay Vosburgh 0 siblings, 1 reply; 6+ messages in thread From: Ajith Adapa @ 2015-06-26 13:57 UTC (permalink / raw) To: Jay Vosburgh; +Cc: vfalico, gospo, netdev On 26 June 2015 at 07:45, Jay Vosburgh <jay.vosburgh@canonical.com> wrote: > echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control Hi, thanks for the reply. Linux server (enp0s8)(bond0) ======== (po1)(xe1) switch Linux server (enp0s9)(bond0) ======== (po2)(xe2) switch I have tried the steps mentioned in the previous mail. But I can only see RX state machine related logs as shown below [14775.575048] bonding: Received LACPDU on port 2 [14775.575051] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14775.638060] bonding: Received LACPDU on port 1 [14775.638063] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14775.650975] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14775.750280] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14775.850771] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14775.950817] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.052255] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.155483] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.259759] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.360020] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.462303] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.560375] bonding: Received LACPDU on port 2 [14776.560378] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14776.562899] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.622476] bonding: Received LACPDU on port 1 [14776.622478] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14776.665981] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.767769] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.867544] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14776.967442] bonding: Periodic Machine: Port=1, Last State=3, Curr State=4 [14776.967510] bonding: Sent LACPDU on port 1 [14776.967524] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.068075] bonding: Periodic Machine: Port=1, Last State=4, Curr State=3 [14777.068080] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.169102] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.270426] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.371119] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.473184] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.542800] bonding: Received LACPDU on port 2 [14777.542837] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14777.574911] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.604569] bonding: Received LACPDU on port 1 [14777.604599] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14777.674451] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.774171] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.875429] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14777.976074] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.077979] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.179497] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.280577] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.382005] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.484380] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.528073] bonding: Received LACPDU on port 2 [14778.528076] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14778.582553] bonding: Periodic Machine: Port=2, Last State=3, Curr State=4 [14778.582641] bonding: Sent LACPDU on port 2 [14778.586482] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.588507] bonding: Received LACPDU on port 1 [14778.588509] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14778.682574] bonding: Periodic Machine: Port=2, Last State=4, Curr State=3 [14778.686117] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.788289] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.890620] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14778.993106] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.093547] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.195574] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.296085] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.398869] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.499408] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.527556] bonding: Received LACPDU on port 2 [14779.527559] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14779.587923] bonding: Received LACPDU on port 1 [14779.587927] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14779.599856] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.700372] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.800274] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14779.900126] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.000140] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.100073] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.202081] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.304725] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.404988] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.505805] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.521825] bonding: Received LACPDU on port 2 [14780.521829] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14780.580985] bonding: Received LACPDU on port 1 [14780.580989] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 [14780.606091] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.706416] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.807086] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14780.907126] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.008729] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.109174] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.210017] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.312609] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.413590] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.513075] bonding: bond_should_notify_peers: bond bond0 slave enp0s8 [14781.519042] bonding: Received LACPDU on port 2 [14781.519045] bonding: Rx Machine: Port=2, Last State=6, Curr State=6 [14781.577600] bonding: Received LACPDU on port 1 [14781.577628] bonding: Rx Machine: Port=1, Last State=6, Curr State=6 I am even attaching the tcpdump log on back to back connected links below as the lacpdu generated from linux server has collecting distributing bit set as TRUE. LACPDU generated by switch 19:15:33.941161 08:00:27:81:1e:a1 (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110 Actor Information TLV (0x01), length 20 System 08:00:27:46:4d:1d (oui Unknown), System Priority 32768, Key 2, Port 4, Port Priority 32768 State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing] Partner Information TLV (0x02), length 20 System 08:00:27:18:ae:4b (oui Unknown), System Priority 65535, Key 17, Port 2, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Collector Information TLV (0x03), length 16 Max Delay 5 Terminator TLV (0x00), length 0 LACPDU generated by linux server 19:15:34.718987 08:00:27:76:35:a2 (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110 Actor Information TLV (0x01), length 20 System 08:00:27:18:ae:4b (oui Unknown), System Priority 65535, Key 17, Port 2, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Partner Information TLV (0x02), length 20 System 08:00:27:46:4d:1d (oui Unknown), System Priority 32768, Key 2, Port 4, Port Priority 32768 State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing] Collector Information TLV (0x03), length 16 Max Delay 0 Terminator TLV (0x00), length 0 =================================================== LACPDU generated by switch 19:20:08.246164 08:00:27:81:50:43 (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110 Actor Information TLV (0x01), length 20 System 08:00:27:46:4d:1d (oui Unknown), System Priority 32768, Key 1, Port 3, Port Priority 32768 State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing] Partner Information TLV (0x02), length 20 System 08:00:27:18:ae:4b (oui Unknown), System Priority 65535, Key 17, Port 1, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Collector Information TLV (0x03), length 16 Max Delay 5 Terminator TLV (0x00), length 0 LACPDU generated by linux server 19:20:08.611534 08:00:27:18:ae:4b (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110 Actor Information TLV (0x01), length 20 System 08:00:27:18:ae:4b (oui Unknown), System Priority 65535, Key 17, Port 1, Port Priority 255 State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing] Partner Information TLV (0x02), length 20 System 08:00:27:46:4d:1d (oui Unknown), System Priority 32768, Key 1, Port 3, Port Priority 32768 State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing] Collector Information TLV (0x03), length 16 Max Delay 0 Terminator TLV (0x00), length 0 Regards, Ajith ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Issue with LACP mode in linux bonding driver 2015-06-26 13:57 ` Ajith Adapa @ 2015-06-26 21:19 ` Jay Vosburgh 2015-06-26 23:57 ` Jonathan Toppins 0 siblings, 1 reply; 6+ messages in thread From: Jay Vosburgh @ 2015-06-26 21:19 UTC (permalink / raw) To: Ajith Adapa; +Cc: vfalico, gospo, netdev Ajith Adapa <adapa.ajith@gmail.com> wrote: >On 26 June 2015 at 07:45, Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >> echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control >Hi, > >thanks for the reply. I tried this out a bit here, and could reproduce the problem on 3.13, but not on 4.0.0. A bit of checking suggests that this problem is fixed by the following commit: commit 63b46242f707849a1df10b70e026281bfa40e849 Author: Wilson Kok <wkok@cumulusnetworks.com> Date: Mon Jan 26 01:16:59 2015 -0500 bonding: fix incorrect lacp mux state when agg not active which looks to have first appeared in the 4.0 kernel. I did a quick backport of that to 3.13 (leaving out the style and pr_debug changes), and it appears to resolve the problem. Ajith: can you test this patch? If this resolves the problem for you, we can request this patch for -stable to get it into the older kernels. diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index dc0c56a..8c62f90 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -890,19 +890,23 @@ static void ad_mux_machine(struct port *port) case AD_MUX_ATTACHED: // check also if agg_select_timer expired(so the edable port will take place only after this timer) if ((port->sm_vars & AD_PORT_SELECTED) && (port->partner_oper.port_state & AD_STATE_SYNCHRONIZATION) && !__check_agg_selection_timer(port)) { - port->sm_mux_state = AD_MUX_COLLECTING_DISTRIBUTING;// next state + if (port->aggregator->is_active) + port->sm_mux_state = AD_MUX_COLLECTING_DISTRIBUTING;// next state } else if (!(port->sm_vars & AD_PORT_SELECTED) || (port->sm_vars & AD_PORT_STANDBY)) { // if UNSELECTED or STANDBY port->sm_vars &= ~AD_PORT_READY_N; // in order to withhold the selection logic to check all ports READY_N value // every callback cycle to update ready variable, we check READY_N and update READY here __set_agg_ports_ready(port->aggregator, __agg_ports_are_ready(port->aggregator)); port->sm_mux_state = AD_MUX_DETACHED;// next state + } else if (port->aggregator->is_active) { + port->actor_oper_port_state |= + AD_STATE_SYNCHRONIZATION; } break; case AD_MUX_COLLECTING_DISTRIBUTING: if (!(port->sm_vars & AD_PORT_SELECTED) || (port->sm_vars & AD_PORT_STANDBY) || - !(port->partner_oper.port_state & AD_STATE_SYNCHRONIZATION) - ) { + !(port->partner_oper.port_state & AD_STATE_SYNCHRONIZATION) || + !(port->actor_oper_port_state & AD_STATE_SYNCHRONIZATION)) { port->sm_mux_state = AD_MUX_ATTACHED;// next state } else { @@ -941,7 +945,12 @@ static void ad_mux_machine(struct port *port) break; case AD_MUX_ATTACHED: __attach_bond_to_agg(port); - port->actor_oper_port_state |= AD_STATE_SYNCHRONIZATION; + if (port->aggregator->is_active) + port->actor_oper_port_state |= + AD_STATE_SYNCHRONIZATION; + else + port->actor_oper_port_state &= + ~AD_STATE_SYNCHRONIZATION; port->actor_oper_port_state &= ~AD_STATE_COLLECTING; port->actor_oper_port_state &= ~AD_STATE_DISTRIBUTING; ad_disable_collecting_distributing(port); @@ -950,6 +959,7 @@ static void ad_mux_machine(struct port *port) case AD_MUX_COLLECTING_DISTRIBUTING: port->actor_oper_port_state |= AD_STATE_COLLECTING; port->actor_oper_port_state |= AD_STATE_DISTRIBUTING; + port->actor_oper_port_state |= AD_STATE_SYNCHRONIZATION; ad_enable_collecting_distributing(port); port->ntt = true; break; @@ -1350,6 +1360,9 @@ static void ad_port_selection_logic(struct port *port) aggregator = __get_first_agg(port); ad_agg_selection_logic(aggregator); + + if (!port->aggregator->is_active) + port->actor_oper_port_state &= ~AD_STATE_SYNCHRONIZATION; } /* -J --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Issue with LACP mode in linux bonding driver 2015-06-26 21:19 ` Jay Vosburgh @ 2015-06-26 23:57 ` Jonathan Toppins 2015-06-27 2:27 ` Ajith Adapa 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Toppins @ 2015-06-26 23:57 UTC (permalink / raw) To: Jay Vosburgh, Ajith Adapa; +Cc: vfalico, gospo, netdev On 6/26/15 5:19 PM, Jay Vosburgh wrote: > Ajith Adapa <adapa.ajith@gmail.com> wrote: > >> On 26 June 2015 at 07:45, Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >>> echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control >> Hi, >> >> thanks for the reply. > > I tried this out a bit here, and could reproduce the problem on > 3.13, but not on 4.0.0. A bit of checking suggests that this problem is > fixed by the following commit: > > commit 63b46242f707849a1df10b70e026281bfa40e849 > Author: Wilson Kok <wkok@cumulusnetworks.com> > Date: Mon Jan 26 01:16:59 2015 -0500 > > bonding: fix incorrect lacp mux state when agg not active > Yes this should be picked up by the stable trees, apologies if I dropped the ball on cc'ing stable@. We (cumulus) are running a version of this patch in 3.2 code, so all stable trees should receive it. -Jon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Issue with LACP mode in linux bonding driver 2015-06-26 23:57 ` Jonathan Toppins @ 2015-06-27 2:27 ` Ajith Adapa 2015-07-02 21:29 ` Please backport 63b46242f707849 [was: Issue with LACP mode in linux bonding driver] Jonathan Toppins 0 siblings, 1 reply; 6+ messages in thread From: Ajith Adapa @ 2015-06-27 2:27 UTC (permalink / raw) To: Jonathan Toppins; +Cc: Jay Vosburgh, vfalico, gospo, netdev Hi, Thanks for the Update. I agree with Jonathon and this change has to be picked by the stable release. Configuration like this might end up creating huge duplication of frames for end nodes. Regards, Ajith On 27 June 2015 at 05:27, Jonathan Toppins <jtoppins@cumulusnetworks.com> wrote: > On 6/26/15 5:19 PM, Jay Vosburgh wrote: >> >> Ajith Adapa <adapa.ajith@gmail.com> wrote: >> >>> On 26 June 2015 at 07:45, Jay Vosburgh <jay.vosburgh@canonical.com> >>> wrote: >>>> >>>> echo 'module bonding =p' > /sys/kernel/debug/dynamic_debug/control >>> >>> Hi, >>> >>> thanks for the reply. >> >> >> I tried this out a bit here, and could reproduce the problem on >> 3.13, but not on 4.0.0. A bit of checking suggests that this problem is >> fixed by the following commit: >> >> commit 63b46242f707849a1df10b70e026281bfa40e849 >> Author: Wilson Kok <wkok@cumulusnetworks.com> >> Date: Mon Jan 26 01:16:59 2015 -0500 >> >> bonding: fix incorrect lacp mux state when agg not active >> > > Yes this should be picked up by the stable trees, apologies if I dropped the > ball on cc'ing stable@. We (cumulus) are running a version of this patch in > 3.2 code, so all stable trees should receive it. > > -Jon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Please backport 63b46242f707849 [was: Issue with LACP mode in linux bonding driver] 2015-06-27 2:27 ` Ajith Adapa @ 2015-07-02 21:29 ` Jonathan Toppins 0 siblings, 0 replies; 6+ messages in thread From: Jonathan Toppins @ 2015-07-02 21:29 UTC (permalink / raw) To: David S. Miller; +Cc: Ajith Adapa, Jay Vosburgh, vfalico, gospo, netdev Please back port the following to the LTS trees v3.2, v3.4, v3.10, v3.12, v3.14, and v3.18 commit 63b46242f707849a1df10b70e026281bfa40e849 Author: Wilson Kok <wkok@cumulusnetworks.com> Date: Mon Jan 26 01:16:59 2015 -0500 bonding: fix incorrect lacp mux state when agg not active Sending this to verify the above patch is being backported to the trees listed. Have verified this patch does not exist in them currently. The patch fixes LACP mux machine state changes which if incorrect can cause a partner switch to send duplicate frames to a Linux host. This patch has been verified to fix a problem seen on a v3.10 system, and Cumulus is shipping this patch on kernel versions as old as v3.2. Thank you, -Jon ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-07-02 21:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CADAe=+LvNgHV+crN_+E4xB+Pcz=KSvZpK5ADKzYFjHL2gGJq2Q@mail.gmail.com>
2015-06-26 2:15 ` Issue with LACP mode in linux bonding driver Jay Vosburgh
2015-06-26 13:57 ` Ajith Adapa
2015-06-26 21:19 ` Jay Vosburgh
2015-06-26 23:57 ` Jonathan Toppins
2015-06-27 2:27 ` Ajith Adapa
2015-07-02 21:29 ` Please backport 63b46242f707849 [was: Issue with LACP mode in linux bonding driver] Jonathan Toppins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).