Netdev List
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: 孙守鑫 <sunshouxin@chinatelecom.cn>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net,
	davem@davemloft.net, kuba@kernel.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	huyd12@chinatelecom.cn
Subject: Re: [PATCH V2] net: bonding: Add support for IPV6 ns/na
Date: Mon, 20 Dec 2021 07:05:10 -0800	[thread overview]
Message-ID: <5db962ec-4953-c25a-5f9e-0f18003e37b3@gmail.com> (raw)
In-Reply-To: <eb4c1dfe-d7c6-3011-e550-7498ccd023d7@chinatelecom.cn>


On 12/19/21 5:57 PM, 孙守鑫 wrote:
>
> 在 2021/12/14 16:05, Eric Dumazet 写道:
>>
>> On 12/10/21 5:08 AM, Sun Shouxin wrote:
>>> Since ipv6 neighbor solicitation and advertisement messages
>>> isn't handled gracefully in bonding6 driver, we can see packet
>>> drop due to inconsistency bewteen mac address in the option
>>> message and source MAC .
>>>
>>> Another examples is ipv6 neighbor solicitation and advertisement
>>> messages from VM via tap attached to host brighe, the src mac
>>> mighe be changed through balance-alb mode, but it is not synced
>>> with Link-layer address in the option message.
>>>
>>> The patch implements bond6's tx handle for ipv6 neighbor
>>> solicitation and advertisement messages.
>>>
>>>             Border-Leaf
>>>             /        \
>>>                /          \
>>>             Tunnel1    Tunnel2
>>>              /              \
>>>                 /                \
>>>           Leaf-1--Tunnel3--Leaf-2
>>>             \                /
>>>              \              /
>>>               \            /
>>>                \          /
>>>                NIC1    NIC2
>>>             \      /
>>>             server
>>>
>>> We can see in our lab the Border-Leaf receives occasionally
>>> a NA packet which is assigned to NIC1 mac in ND/NS option
>>> message, but actaully send out via NIC2 mac due to tx-alb,
>>> as a result, it will cause inconsistency between MAC table
>>> and ND Table in Border-Leaf, i.e, NIC1 = Tunnel2 in ND table
>>> and  NIC1 = Tunnel1 in mac table.
>>>
>>> And then, Border-Leaf starts to forward packet destinated
>>> to the Server, it will only check the ND table entry in some
>>> switch to encapsulate the destination MAC of the message as
>>> NIC1 MAC, and then send it out from Tunnel2 by ND table.
>>> Then, Leaf-2 receives the packet, it notices the destination
>>> MAC of message is NIC1 MAC and should forword it to Tunne1
>>> by Tunnel3.
>>>
>>> However, this traffic forward will be failure due to split
>>> horizon of VxLAN tunnels.
>>>
>>> Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
>>> Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>
>>> ---
>>>   drivers/net/bonding/bond_alb.c | 131 
>>> +++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 131 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_alb.c 
>>> b/drivers/net/bonding/bond_alb.c
>>> index 533e476..afa386b 100644
>>> --- a/drivers/net/bonding/bond_alb.c
>>> +++ b/drivers/net/bonding/bond_alb.c
>>> @@ -22,6 +22,7 @@
>>>   #include <asm/byteorder.h>
>>>   #include <net/bonding.h>
>>>   #include <net/bond_alb.h>
>>> +#include <net/ndisc.h>
>>>     static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
>>>       0x33, 0x33, 0x00, 0x00, 0x00, 0x01
>>> @@ -1269,6 +1270,119 @@ static int alb_set_mac_address(struct 
>>> bonding *bond, void *addr)
>>>       return res;
>>>   }
>>>   +/*determine if the packet is NA or NS*/
>>> +static bool alb_determine_nd(struct icmp6hdr *hdr)
>>> +{
>>> +    if (hdr->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT ||
>>> +        hdr->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) {
>>> +        return true;
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +static void alb_change_nd_option(struct sk_buff *skb, void *data)
>>> +{
>>> +    struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
>>> +    struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
>>> +    struct net_device *dev = skb->dev;
>>> +    struct icmp6hdr *icmp6h = icmp6_hdr(skb);
>>> +    struct ipv6hdr *ip6hdr = ipv6_hdr(skb);
>>> +    u8 *lladdr = NULL;
>>> +    u32 ndoptlen = skb_tail_pointer(skb) - 
>>> (skb_transport_header(skb) +
>>> +                offsetof(struct nd_msg, opt));
>>> +
>>> +    while (ndoptlen) {
>>> +        int l;
>>> +
>>> +        switch (nd_opt->nd_opt_type) {
>>> +        case ND_OPT_SOURCE_LL_ADDR:
>>> +        case ND_OPT_TARGET_LL_ADDR:
>>> +        lladdr = ndisc_opt_addr_data(nd_opt, dev);
>>> +        break;
>>> +
>>> +        default:
>>> +        lladdr = NULL;
>>> +        break;
>>> +        }
>>> +
>>> +        l = nd_opt->nd_opt_len << 3;
>>> +
>>> +        if (ndoptlen < l || l == 0)
>>> +            return;
>>> +
>>> +        if (lladdr) {
>>> +            memcpy(lladdr, data, dev->addr_len);
>>
>> I am not sure it is allowed to change skb content without
>>
>> making sure skb ->head is private.
>>
>> (Think of tcpdump -i slaveX : we want to see the packet content 
>> before your change)
>>
>> I would think skb_cow_head() or something similar is needed.
>>
>> This is tricky of course, since all cached pointers (icmp6h, ip6hdr, 
>> msg, nd_opt)
>>
>> would need to be fetched again, since skb->head/data might be changed
>>
>> by skb_cow_head().
> The tcpdump should show the last packet which sent off from NIC in the 
> end.
> could you light me up specific conditions?


I think I have been clear.

You can not modify skb->head unless it is allowed to.

tcpdump on the slave must show the exact packet being received,

before your modifications in bonding driver.



>>
>>
>>
>>
>>
>>> +            icmp6h->icmp6_cksum = 0;
>>> +
>>> +            icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
>>> +                                  &ip6hdr->daddr,
>>> +                        ntohs(ip6hdr->payload_len),
>>> +                        IPPROTO_ICMPV6,
>>> +                        csum_partial(icmp6h,
>>> + ntohs(ip6hdr->payload_len), 0));
>>> +        }
>>> +        ndoptlen -= l;
>>> +        nd_opt = ((void *)nd_opt) + l;
>>> +    }
>>> +}
>>> +
>>> +static u8 *alb_get_lladdr(struct sk_buff *skb)
>>> +{
>>> +    struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
>>> +    struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
>>> +    struct net_device *dev = skb->dev;
>>> +    u8 *lladdr = NULL;
>>> +    u32 ndoptlen = skb_tail_pointer(skb) - 
>>> (skb_transport_header(skb) +
>>> +                offsetof(struct nd_msg, opt));
>>> +
>>> +    while (ndoptlen) {
>>> +        int l;
>>> +
>>> +        switch (nd_opt->nd_opt_type) {
>>> +        case ND_OPT_SOURCE_LL_ADDR:
>>> +        case ND_OPT_TARGET_LL_ADDR:
>>> +            lladdr = ndisc_opt_addr_data(nd_opt, dev);
>>> +            break;
>>> +
>>> +        default:
>>> +            break;
>>> +        }
>>> +
>>> +        l = nd_opt->nd_opt_len << 3;
>>> +
>>> +        if (ndoptlen < l || l == 0)
>>> +            return lladdr;
>>
>>                          return NULL ?
>>
>>                     (or risk out-of-bound access ?)
> Thanks your comment, I'll adjust it and send out V4 soon.
>>
>>> +
>>> +        if (lladdr)
>>> +            return lladdr;
>>> +
>>> +        ndoptlen -= l;
>>> +        nd_opt = ((void *)nd_opt) + l;
>>> +    }
>>> +
>>> +    return lladdr;
>>> +}
>>> +
>>> +static void alb_set_nd_option(struct sk_buff *skb, struct bonding 
>>> *bond,
>>> +                  struct slave *tx_slave)
>>> +{
>>> +    struct ipv6hdr *ip6hdr;
>>> +    struct icmp6hdr *hdr = NULL;
>>> +
>>> +    if (skb->protocol == htons(ETH_P_IPV6)) {
>>> +        if (tx_slave && tx_slave !=
>>> +            rcu_access_pointer(bond->curr_active_slave)) {
>>> +            ip6hdr = ipv6_hdr(skb);
>>> +            if (ip6hdr->nexthdr == IPPROTO_ICMPV6) {
>>> +                hdr = icmp6_hdr(skb);
>>> +                if (alb_determine_nd(hdr))
>>> +                    alb_change_nd_option(skb, 
>>> tx_slave->dev->dev_addr);
>>> +            }
>>> +        }
>>> +    }
>>> +}
>>> +
>>>   /************************ exported alb functions 
>>> ************************/
>>>     int bond_alb_initialize(struct bonding *bond, int rlb_enabled)
>>> @@ -1415,6 +1529,7 @@ struct slave *bond_xmit_alb_slave_get(struct 
>>> bonding *bond,
>>>       }
>>>       case ETH_P_IPV6: {
>>>           const struct ipv6hdr *ip6hdr;
>>> +        struct icmp6hdr *hdr = NULL;
>>>             /* IPv6 doesn't really use broadcast mac address, but leave
>>>            * that here just in case.
>>> @@ -1446,6 +1561,21 @@ struct slave *bond_xmit_alb_slave_get(struct 
>>> bonding *bond,
>>>               break;
>>>           }
>>>   +        if (ip6hdr->nexthdr == IPPROTO_ICMPV6) {
>>> +            hdr = icmp6_hdr(skb);
>>> +            if (alb_determine_nd(hdr)) {
>>> +                u8 *lladdr = NULL;
>>> +
>>> +                lladdr = alb_get_lladdr(skb);
>>> +                if (lladdr) {
>>> +                    if (!bond_slave_has_mac_rx(bond, lladdr)) {
>>> +                        do_tx_balance = false;
>>> +                        break;
>>> +                    }
>>> +                }
>>> +            }
>>> +        }
>>> +
>>>           hash_start = (char *)&ip6hdr->daddr;
>>>           hash_size = sizeof(ip6hdr->daddr);
>>>           break;
>>> @@ -1489,6 +1619,7 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, 
>>> struct net_device *bond_dev)
>>>       struct slave *tx_slave = NULL;
>>>         tx_slave = bond_xmit_alb_slave_get(bond, skb);
>>> +    alb_set_nd_option(skb, bond, tx_slave);
>>>       return bond_do_alb_xmit(skb, bond, tx_slave);
>>>   }

      reply	other threads:[~2021-12-20 15:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 13:08 [PATCH V2] net: bonding: Add support for IPV6 ns/na Sun Shouxin
2021-12-10 18:03 ` kernel test robot
2021-12-10 18:13 ` kernel test robot
2021-12-14  1:02 ` 回复: " huyd12
2021-12-14  1:06   ` Jakub Kicinski
2021-12-14  8:05 ` Eric Dumazet
2021-12-20  1:57   ` 孙守鑫
2021-12-20 15:05     ` Eric Dumazet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5db962ec-4953-c25a-5f9e-0f18003e37b3@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=huyd12@chinatelecom.cn \
    --cc=j.vosburgh@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sunshouxin@chinatelecom.cn \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox