Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
From: Zhu, Lingshan @ 2022-08-16  9:08 UTC (permalink / raw)
  To: Si-Wei Liu, jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar
In-Reply-To: <9b6292f3-9bd5-ecd8-5e42-cd5d12f036e7@oracle.com>



On 8/16/2022 3:58 PM, Si-Wei Liu wrote:
>
>
> On 8/15/2022 6:58 PM, Zhu, Lingshan wrote:
>>
>>
>> On 8/16/2022 7:32 AM, Si-Wei Liu wrote:
>>>
>>>
>>> On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
>>>> Some fields of virtio-net device config space are
>>>> conditional on the feature bits, the spec says:
>>>>
>>>> "The mac address field always exists
>>>> (though is only valid if VIRTIO_NET_F_MAC is set)"
>>>>
>>>> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
>>>> or VIRTIO_NET_F_RSS is set"
>>>>
>>>> "mtu only exists if VIRTIO_NET_F_MTU is set"
>>>>
>>>> so we should read MTU, MAC and MQ in the device config
>>>> space only when these feature bits are offered.
>>>>
>>>> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
>>>> not set, the virtio device should have
>>>> one queue pair as default value, so when userspace querying queue 
>>>> pair numbers,
>>>> it should return mq=1 than zero.
>>>>
>>>> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
>>>> MTU from the device config sapce.
>>>> RFC894 <A Standard for the Transmission of IP Datagrams over 
>>>> Ethernet Networks>
>>>> says:"The minimum length of the data field of a packet sent over an
>>>> Ethernet is 1500 octets, thus the maximum length of an IP datagram
>>>> sent over an Ethernet is 1500 octets.  Implementations are encouraged
>>>> to support full-length packets"
>>> Noted there's a typo in the above "The *maximum* length of the data 
>>> field of a packet sent over an Ethernet is 1500 octets ..." and the 
>>> RFC was written 1984.
>> the spec RFC894 says it is 1500, see <a 
>> href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$">https://www.rfc-editor.org/rfc/rfc894.txt</a>
>>>
>>> Apparently that is no longer true with the introduction of Jumbo 
>>> size frame later in the 2000s. I'm not sure what is the point of 
>>> mention this ancient RFC. It doesn't say default MTU of any Ethernet 
>>> NIC/switch should be 1500 in either  case.
>> This could be a larger number for sure, we are trying to find out the 
>> min value for Ethernet here, to support 1500 octets, MTU should be 
>> 1500 at least, so I assume 1500 should be the default value for MTU
>>>
>>>>
>>>> virtio spec says:"The virtio network device is a virtual ethernet 
>>>> card",
>>> Right,
>>>> so the default MTU value should be 1500 for virtio-net.
>>> ... but it doesn't say the default is 1500. At least, not in 
>>> explicit way. Why it can't be 1492 or even lower? In practice, if 
>>> the network backend has a MTU higher than 1500, there's nothing 
>>> wrong for guest to configure default MTU more than 1500.
>> same as above
>>>
>>>>
>>>> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
>>>> the configuration space mac entry indicates the “physical” address
>>>> of the network card, otherwise the driver would typically
>>>> generate a random local MAC address." So there is no
>>>> default MAC address if VIRTIO_NET_F_MAC not set.
>>>>
>>>> This commits introduces functions vdpa_dev_net_mtu_config_fill()
>>>> and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
>>>> It also fixes vdpa_dev_net_mq_config_fill() to report correct
>>>> MQ when _F_MQ is not present.
>>>>
>>>> These functions should check devices features than driver
>>>> features, and struct vdpa_device is not needed as a parameter
>>>>
>>>> The test & userspace tool output:
>>>>
>>>> Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
>>>> and VIRTIO_NET_F_MAC can be mask out by hardcode.
>>>>
>>>> However, it is challenging to "disable" the related fields
>>>> in the HW device config space, so let's just assume the values
>>>> are meaningless if the feature bits are not set.
>>>>
>>>> Before this change, when feature bits for RSS, MQ, MTU and MAC
>>>> are not set, iproute2 output:
>>>> $vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu 
>>>> 1500
>>>>    negotiated_features
>>>>
>>>> without this commit, function vdpa_dev_net_config_fill()
>>>> reads all config space fields unconditionally, so let's
>>>> assume the MAC and MTU are meaningless, and it checks
>>>> MQ with driver_features, so we don't see max_vq_pairs.
>>>>
>>>> After applying this commit, when feature bits for
>>>> MQ, RSS, MAC and MTU are not set,iproute2 output:
>>>> $vdpa dev config show vdpa0
>>>> vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
>>>>    negotiated_features
>>>>
>>>> As explained above:
>>>> Here is no MAC, because VIRTIO_NET_F_MAC is not set,
>>>> and there is no default value for MAC. It shows
>>>> max_vq_paris = 1 because even without MQ feature,
>>>> a functional virtio-net must have one queue pair.
>>>> mtu = 1500 is the default value as ethernet
>>>> required.
>>>>
>>>> This commit also add supplementary comments for
>>>> __virtio16_to_cpu(true, xxx) operations in
>>>> vdpa_dev_net_config_fill() and vdpa_fill_stats_rec()
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> ---
>>>>   drivers/vdpa/vdpa.c | 60 
>>>> +++++++++++++++++++++++++++++++++++----------
>>>>   1 file changed, 47 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>> index efb55a06e961..a74660b98979 100644
>>>> --- a/drivers/vdpa/vdpa.c
>>>> +++ b/drivers/vdpa/vdpa.c
>>>> @@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct 
>>>> sk_buff *msg, struct netlink_callba
>>>>       return msg->len;
>>>>   }
>>>>   -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>>>> -                       struct sk_buff *msg, u64 features,
>>>> +static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64 
>>>> features,
>>>>                          const struct virtio_net_config *config)
>>>>   {
>>>>       u16 val_u16;
>>>>   -    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
>>>> -        return 0;
>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
>>>> +        (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
>>>> +        val_u16 = 1;
>>>> +    else
>>>> +        val_u16 = __virtio16_to_cpu(true, 
>>>> config->max_virtqueue_pairs);
>>>>   -    val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
>>>>       return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, val_u16);
>>>>   }
>>>>   +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, u64 
>>>> features,
>>>> +                    const struct virtio_net_config *config)
>>>> +{
>>>> +    u16 val_u16;
>>>> +
>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
>>>> +        val_u16 = 1500;
>>> As said, there's no virtio spec defined value for MTU. Please leave 
>>> this field out if feature VIRTIO_NET_F_MTU is not negotiated.
>> same as above
>>>> +    else
>>>> +        val_u16 = __virtio16_to_cpu(true, config->mtu);
>>>> +
>>>> +    return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
>>>> +}
>>>> +
>>>> +static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 
>>>> features,
>>>> +                    const struct virtio_net_config *config)
>>>> +{
>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
>>>> +        return 0;
>>>> +    else
>>>> +        return  nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>> +                sizeof(config->mac), config->mac);
>>>> +}
>>>> +
>>>> +
>>>>   static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, 
>>>> struct sk_buff *msg)
>>>>   {
>>>>       struct virtio_net_config config = {};
>>>> @@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct 
>>>> vdpa_device *vdev, struct sk_buff *ms
>>>>         vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>>   -    if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, 
>>>> sizeof(config.mac),
>>>> -            config.mac))
>>>> -        return -EMSGSIZE;
>>>> +    /*
>>>> +     * Assume little endian for now, userspace can tweak this for
>>>> +     * legacy guest support.
>>> You can leave it as a TODO for kernel (vdpa core limitation), but 
>>> AFAIK there's nothing userspace needs to do to infer the endianness. 
>>> IMHO it's the kernel's job to provide an abstraction rather than 
>>> rely on userspace guessing it.
>> we have discussed it in another thread, and this comment is suggested 
>> by MST.
> Can you provide the context or link? It shouldn't work like this, 
> otherwise it is breaking uABI. E.g. how will a legacy/BE supporting 
> kernel/device be backward compatible with older vdpa tool (which has 
> knowledge of this endianness implication/assumption from day one)?
https://www.spinics.net/lists/netdev/msg837114.html

The challenge is that the status filed is virtio16, not le16, so 
le16_to_cpu(xxx) is wrong anyway. However we can not tell whether it is 
a LE or BE device from struct vdpa_device, so for most cases, we assume 
it is LE, and leave this comment.

Thanks
>
> -Siwei
>
>>>
>>>> +     */
>>>> +    val_u16 = __virtio16_to_cpu(true, config.status);
>>>>         val_u16 = __virtio16_to_cpu(true, config.status);
>>>>       if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
>>>>           return -EMSGSIZE;
>>>>   -    val_u16 = __virtio16_to_cpu(true, config.mtu);
>>>> -    if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>>>> -        return -EMSGSIZE;
>>>> -
>>>>       features_driver = vdev->config->get_driver_features(vdev);
>>>>       if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, 
>>>> features_driver,
>>>>                     VDPA_ATTR_PAD))
>>>> @@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct 
>>>> vdpa_device *vdev, struct sk_buff *ms
>>>>                     VDPA_ATTR_PAD))
>>>>           return -EMSGSIZE;
>>>>   -    return vdpa_dev_net_mq_config_fill(vdev, msg, 
>>>> features_driver, &config);
>>>> +    if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
>>>> +        return -EMSGSIZE;
>>>> +
>>>> +    if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config))
>>>> +        return -EMSGSIZE;
>>>> +
>>>> +    return vdpa_dev_net_mq_config_fill(msg, features_device, 
>>>> &config);
>>>>   }
>>>>     static int
>>>> @@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct 
>>>> vdpa_device *vdev, struct sk_buff *msg,
>>>>       }
>>>>       vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>>   +    /*
>>>> +     * Assume little endian for now, userspace can tweak this for
>>>> +     * legacy guest support.
>>>> +     */
>>>> +
>>> Ditto.
>> same as above
>>
>> Thanks
>>>
>>> Thanks,
>>> -Siwei
>>>>       max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
>>>>       if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
>>>>           return -EMSGSIZE;
>>>
>>
>


^ permalink raw reply

* Re: [PATCH V5 4/6] vDPA: !FEATURES_OK should not block querying device config space
From: Zhu, Lingshan @ 2022-08-16  8:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Si-Wei Liu, virtualization, netdev, kvm, parav, xieyongji,
	gautam.dawar, jasowang
In-Reply-To: <20220816044007-mutt-send-email-mst@kernel.org>



On 8/16/2022 4:41 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 16, 2022 at 04:29:04PM +0800, Zhu, Lingshan wrote:
>>
>> On 8/16/2022 3:41 PM, Si-Wei Liu wrote:
>>
>>      Hi Michael,
>>
>>      I just noticed this patch got pulled to linux-next prematurely without
>>      getting consensus on code review, am not sure why. Hope it was just an
>>      oversight.
>>
>>      Unfortunately this introduced functionality regression to at least two
>>      cases so far as I see:
>>
>>      1. (bogus) VDPA_ATTR_DEV_NEGOTIATED_FEATURES are inadvertently exposed and
>>      displayed in "vdpa dev config show" before feature negotiation is done.
>>      Noted the corresponding features name shown in vdpa tool is called
>>      "negotiated_features" rather than "driver_features". I see in no way the
>>      intended change of the patch should break this user level expectation
>>      regardless of any spec requirement. Do you agree on this point?
>>
>> I will post a patch for iptour2, doing:
>> 1) if iprout2 does not get driver_features from the kernel, then don't show
>> negotiated features in the command output
>> 2) process and decoding the device features.
>>
>>
>>      2. There was also another implicit assumption that is broken by this patch.
>>      There could be a vdpa tool query of config via vdpa_dev_net_config_fill()->
>>      vdpa_get_config_unlocked() that races with the first vdpa_set_features()
>>      call from VMM e.g. QEMU. Since the S_FEATURES_OK blocking condition is
>>      removed, if the vdpa tool query occurs earlier than the first
>>      set_driver_features() call from VMM, the following code will treat the
>>      guest as legacy and then trigger an erroneous vdpa_set_features_unlocked
>>      (... , 0) call to the vdpa driver:
>>
>>       374         /*
>>       375          * Config accesses aren't supposed to trigger before features
>>      are set.
>>       376          * If it does happen we assume a legacy guest.
>>       377          */
>>       378         if (!vdev->features_valid)
>>       379                 vdpa_set_features_unlocked(vdev, 0);
>>       380         ops->get_config(vdev, offset, buf, len);
>>
>>      Depending on vendor driver's implementation, L380 may either return invalid
>>      config data (or invalid endianness if on BE) or only config fields that are
>>      valid in legacy layout. What's more severe is that, vdpa tool query in
>>      theory shouldn't affect feature negotiation at all by making confusing
>>      calls to the device, but now it is possible with the patch. Fixing this
>>      would require more delicate work on the other paths involving the cf_lock
>>      reader/write semaphore.
>>
>>      Not sure what you plan to do next, post the fixes for both issues and get
>>      the community review? Or simply revert the patch in question? Let us know.
>>
>> The spec says:
>> The device MUST allow reading of any device-specific configuration field before
>> FEATURES_OK is set by
>> the driver. This includes fields which are conditional on feature bits, as long
>> as those feature bits are offered
>> by the device.
>>
>> so whether FEATURES_OK should not block reading the device config space.
>> vdpa_get_config_unlocked() will read the features, I don't know why it has a
>> comment:
>>          /*
>>           * Config accesses aren't supposed to trigger before features are set.
>>           * If it does happen we assume a legacy guest.
>>           */
>>
>> This conflicts with the spec.
> Yea well. On the other hand the spec also calls for features to be
> used to detect legacy versus modern driver.
> This part of the spec needs work generally.
so from what I see, there are no race conditions, if features 
negotiation not done,
just assume the driver features are all zero, then return the device 
config space contents.
It can do this even without this comment.

Please help correct me if I misunderstand these

Thanks
Zhu Lingshan
>
>
>> vdpa_get_config_unlocked() checks vdev->features_valid, if not valid, it will
>> set the drivers_features 0, I think this intends to prevent reading random
>> driver_features. This function does not hold any locks, and didn't change
>> anything.
>>
>> So what is the race?
>>
>> Thanks
>>
>>
>>
>>      Thanks,
>>      -Siwei
>>
>>
>>      On 8/12/2022 3:44 AM, Zhu Lingshan wrote:
>>
>>          Users may want to query the config space of a vDPA device,
>>          to choose a appropriate one for a certain guest. This means the
>>          users need to read the config space before FEATURES_OK, and
>>          the existence of config space contents does not depend on
>>          FEATURES_OK.
>>
>>          The spec says:
>>          The device MUST allow reading of any device-specific configuration
>>          field before FEATURES_OK is set by the driver. This includes
>>          fields which are conditional on feature bits, as long as those
>>          feature bits are offered by the device.
>>
>>          Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>          ---
>>            drivers/vdpa/vdpa.c | 8 --------
>>            1 file changed, 8 deletions(-)
>>
>>          diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>          index 6eb3d972d802..bf312d9c59ab 100644
>>          --- a/drivers/vdpa/vdpa.c
>>          +++ b/drivers/vdpa/vdpa.c
>>          @@ -855,17 +855,9 @@ vdpa_dev_config_fill(struct vdpa_device *vdev,
>>          struct sk_buff *msg, u32 portid,
>>            {
>>                u32 device_id;
>>                void *hdr;
>>          -    u8 status;
>>                int err;
>>                  down_read(&vdev->cf_lock);
>>          -    status = vdev->config->get_status(vdev);
>>          -    if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
>>          -        NL_SET_ERR_MSG_MOD(extack, "Features negotiation not
>>          completed");
>>          -        err = -EAGAIN;
>>          -        goto out;
>>          -    }
>>          -
>>                hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags,
>>                          VDPA_CMD_DEV_CONFIG_GET);
>>                if (!hdr) {
>>
>>
>>
>>


^ permalink raw reply

* [syzbot] upstream boot error: general protection fault in nl80211_put_iface_combinations
From: syzbot @ 2022-08-16  8:37 UTC (permalink / raw)
  To: davem, edumazet, johannes, kuba, linux-kernel, linux-wireless,
	netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    568035b01cfb Linux 6.0-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=145d8a47080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=126b81cc3ce4f07e
dashboard link: https://syzkaller.appspot.com/bug?extid=684d4ca200fda0b2141e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+684d4ca200fda0b2141e@syzkaller.appspotmail.com

usbcore: registered new interface driver nfcmrvl
Loading iSCSI transport class v2.0-870.
scsi host0: Virtio SCSI HBA
st: Version 20160209, fixed bufsize 32768, s/g segs 256
Rounding down aligned max_sectors from 4294967295 to 4294967288
db_root: cannot open: /etc/target
slram: not enough parameters.
ftl_cs: FTL header not found.
wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
eql: Equalizer2002: Simon Janes (simon@ncm.com) and David S. Miller (davem@redhat.com)
MACsec IEEE 802.1AE
tun: Universal TUN/TAP device driver, 1.6
vcan: Virtual CAN interface driver
vxcan: Virtual CAN Tunnel driver
slcan: serial line CAN interface driver
CAN device driver interface
usbcore: registered new interface driver usb_8dev
usbcore: registered new interface driver ems_usb
usbcore: registered new interface driver gs_usb
usbcore: registered new interface driver kvaser_usb
usbcore: registered new interface driver mcba_usb
usbcore: registered new interface driver peak_usb
e100: Intel(R) PRO/100 Network Driver
e100: Copyright(c) 1999-2006 Intel Corporation
e1000: Intel(R) PRO/1000 Network Driver
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
mkiss: AX.25 Multikiss, Hans Albas PE1AYX
AX.25: 6pack driver, Revision: 0.3.0
AX.25: bpqether driver version 004
PPP generic driver version 2.4.2
PPP BSD Compression module registered
PPP Deflate Compression module registered
PPP MPPE Compression module registered
NET: Registered PF_PPPOX protocol family
PPTP driver version 0.8.5
SLIP: version 0.8.4-NET3.019-NEWTTY (dynamic channels, max=256) (6 bit encapsulation enabled).
CSLIP: code copyright 1989 Regents of the University of California.
SLIP linefill/keepalive option.
hdlc: HDLC support module revision 1.22
LAPB Ethernet driver version 0.02
usbcore: registered new interface driver ath9k_htc
usbcore: registered new interface driver carl9170
usbcore: registered new interface driver ath6kl_usb
usbcore: registered new interface driver ar5523
usbcore: registered new interface driver ath10k_usb
usbcore: registered new interface driver rndis_wlan
mac80211_hwsim: initializing netlink
general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:nl80211_put_iface_combinations+0x19d/0x4b0 net/wireless/nl80211.c:1632
Code: 00 00 e8 a6 5b 2d fd 48 85 ed 0f 84 d4 00 00 00 e8 98 5b 2d fd 49 8b 06 ba 04 00 00 00 48 89 df 48 8d 4c 24 2c be 01 00 00 00 <42> 0f b7 04 28 89 44 24 2c e8 d5 81 3d fe 31 ff 41 89 c7 89 c6 e8
RSP: 0000:ffffc90000273a50 EFLAGS: 00010293
RAX: ffff000000000000 RBX: ffff888102235800 RCX: ffffc90000273a7c
RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff888102235800
RBP: ffff88810283494c R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000002f8b8 R12: 0000000000000001
R13: 0000000000000000 R14: ffff888106d14c88 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 0000000005a29000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 nl80211_send_wiphy+0x9b4/0x4170 net/wireless/nl80211.c:2648
 nl80211_notify_wiphy+0x8f/0x140 net/wireless/nl80211.c:17164
 wiphy_register+0x112f/0x1400 net/wireless/core.c:942
 ieee80211_register_hw+0x11c9/0x1590 net/mac80211/main.c:1379
 mac80211_hwsim_new_radio+0xc3f/0x1520 drivers/net/wireless/mac80211_hwsim.c:4129
 init_mac80211_hwsim+0x43d/0x5ae drivers/net/wireless/mac80211_hwsim.c:5379
 do_one_initcall+0x5e/0x2e0 init/main.c:1296
 do_initcall_level init/main.c:1369 [inline]
 do_initcalls init/main.c:1385 [inline]
 do_basic_setup init/main.c:1404 [inline]
 kernel_init_freeable+0x255/0x2cf init/main.c:1611
 kernel_init+0x1a/0x1c0 init/main.c:1500
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:nl80211_put_iface_combinations+0x19d/0x4b0 net/wireless/nl80211.c:1632
Code: 00 00 e8 a6 5b 2d fd 48 85 ed 0f 84 d4 00 00 00 e8 98 5b 2d fd 49 8b 06 ba 04 00 00 00 48 89 df 48 8d 4c 24 2c be 01 00 00 00 <42> 0f b7 04 28 89 44 24 2c e8 d5 81 3d fe 31 ff 41 89 c7 89 c6 e8
RSP: 0000:ffffc90000273a50 EFLAGS: 00010293
RAX: ffff000000000000 RBX: ffff888102235800 RCX: ffffc90000273a7c
RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff888102235800
RBP: ffff88810283494c R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000002f8b8 R12: 0000000000000001
R13: 0000000000000000 R14: ffff888106d14c88 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 0000000005a29000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	00 00                	add    %al,(%rax)
   2:	e8 a6 5b 2d fd       	callq  0xfd2d5bad
   7:	48 85 ed             	test   %rbp,%rbp
   a:	0f 84 d4 00 00 00    	je     0xe4
  10:	e8 98 5b 2d fd       	callq  0xfd2d5bad
  15:	49 8b 06             	mov    (%r14),%rax
  18:	ba 04 00 00 00       	mov    $0x4,%edx
  1d:	48 89 df             	mov    %rbx,%rdi
  20:	48 8d 4c 24 2c       	lea    0x2c(%rsp),%rcx
  25:	be 01 00 00 00       	mov    $0x1,%esi
* 2a:	42 0f b7 04 28       	movzwl (%rax,%r13,1),%eax <-- trapping instruction
  2f:	89 44 24 2c          	mov    %eax,0x2c(%rsp)
  33:	e8 d5 81 3d fe       	callq  0xfe3d820d
  38:	31 ff                	xor    %edi,%edi
  3a:	41 89 c7             	mov    %eax,%r15d
  3d:	89 c6                	mov    %eax,%esi
  3f:	e8                   	.byte 0xe8


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* [PATCH xfrm-next v2 2/6] xfrm: allow state full offload mode
From: Leon Romanovsky @ 2022-08-16  8:59 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S . Miller, Herbert Xu, netdev, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660639789.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Allow users to configure xfrm states with full offload mode.
The full mode must be requested both for policy and state, and
such requires us to do not implement fallback.

We explicitly return an error if requested full mode can't
be configured.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       |  4 ++++
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  5 ++++
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    |  5 ++++
 .../mellanox/mlx5/core/en_accel/ipsec.c       |  4 ++++
 drivers/net/netdevsim/ipsec.c                 |  5 ++++
 net/xfrm/xfrm_device.c                        | 24 +++++++++++++++----
 6 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index 585590520076..ca21794281d6 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -283,6 +283,10 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x)
 		pr_debug("Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		pr_debug("Unsupported xfrm offload\n");
+		return -EINVAL;
+	}
 
 	sa_entry = kzalloc(sizeof(*sa_entry), GFP_KERNEL);
 	if (!sa_entry) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 774de63dd93a..53a969e34883 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -585,6 +585,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 9984ebc62d78..c1cf540d162a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -280,6 +280,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 2a8fd7020622..c182b640b80d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -256,6 +256,10 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
 		netdev_info(netdev, "Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_info(netdev, "Unsupported xfrm offload type\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index 386336a38f34..b93baf5c8bee 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -149,6 +149,11 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	/* find the first unused index */
 	ret = nsim_ipsec_find_empty_idx(ipsec);
 	if (ret < 0) {
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 6d1124eb1ec8..5b04e5cdca64 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -215,6 +215,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	struct xfrm_dev_offload *xso = &x->xso;
 	xfrm_address_t *saddr;
 	xfrm_address_t *daddr;
+	bool is_full_offload;
 
 	if (!x->type_offload)
 		return -EINVAL;
@@ -223,9 +224,11 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (x->encap || x->tfcpad)
 		return -EINVAL;
 
-	if (xuo->flags & ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND))
+	if (xuo->flags &
+	    ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND | XFRM_OFFLOAD_FULL))
 		return -EINVAL;
 
+	is_full_offload = xuo->flags & XFRM_OFFLOAD_FULL;
 	dev = dev_get_by_index(net, xuo->ifindex);
 	if (!dev) {
 		if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) {
@@ -240,7 +243,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 					x->props.family,
 					xfrm_smark_get(0, x));
 		if (IS_ERR(dst))
-			return 0;
+			return (is_full_offload) ? -EINVAL : 0;
 
 		dev = dst->dev;
 
@@ -251,7 +254,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (!dev->xfrmdev_ops || !dev->xfrmdev_ops->xdo_dev_state_add) {
 		xso->dev = NULL;
 		dev_put(dev);
-		return 0;
+		return (is_full_offload) ? -EINVAL : 0;
 	}
 
 	if (x->props.flags & XFRM_STATE_ESN &&
@@ -270,7 +273,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	else
 		xso->dir = XFRM_DEV_OFFLOAD_OUT;
 
-	xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
+	if (is_full_offload)
+		xso->type = XFRM_DEV_OFFLOAD_FULL;
+	else
+		xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
 
 	err = dev->xfrmdev_ops->xdo_dev_state_add(x);
 	if (err) {
@@ -280,7 +286,15 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		netdev_put(dev, &xso->dev_tracker);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
-		if (err != -EOPNOTSUPP)
+		/* User explicitly requested full offload mode and configured
+		 * policy in addition to the XFRM state. So be civil to users,
+		 * and return an error instead of taking fallback path.
+		 *
+		 * This WARN_ON() can be seen as a documentation for driver
+		 * authors to do not return -EOPNOTSUPP in full offload mode.
+		 */
+		WARN_ON(err == -EOPNOTSUPP && is_full_offload);
+		if (err != -EOPNOTSUPP || is_full_offload)
 			return err;
 	}
 
-- 
2.37.2


^ permalink raw reply related

* [PATCH] MAINTAINERS: Update email of Neil Armstrong
From: Neil Armstrong @ 2022-08-16  9:56 UTC (permalink / raw)
  To: linux-kernel, devicetree, linux-arm-kernel, linux-amlogic,
	dri-devel, linux-i2c, linux-media, netdev, linux-phy,
	linux-crypto, linux-serial, linux-spi, linux-usb, linux-watchdog
  Cc: Neil Armstrong

From: Neil Armstrong <neil.armstrong@linaro.org>

My professional e-mail will change and the BayLibre one will
bounce after mid-september of 2022.

This updates the MAINTAINERS file, the YAML bindings and adds an
entry in the .mailmap file.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
 .mailmap                                      |  1 +
 .../amlogic/amlogic,meson-gx-ao-secure.yaml   |  2 +-
 .../display/amlogic,meson-dw-hdmi.yaml        |  2 +-
 .../bindings/display/amlogic,meson-vpu.yaml   |  2 +-
 .../display/bridge/analogix,anx7814.yaml      |  2 +-
 .../bindings/display/bridge/ite,it66121.yaml  |  2 +-
 .../display/panel/sgd,gktw70sdae4se.yaml      |  2 +-
 .../bindings/i2c/amlogic,meson6-i2c.yaml      |  2 +-
 .../mailbox/amlogic,meson-gxbb-mhu.yaml       |  2 +-
 .../bindings/media/amlogic,axg-ge2d.yaml      |  2 +-
 .../bindings/media/amlogic,gx-vdec.yaml       |  2 +-
 .../media/amlogic,meson-gx-ao-cec.yaml        |  2 +-
 .../devicetree/bindings/mfd/khadas,mcu.yaml   |  2 +-
 .../bindings/net/amlogic,meson-dwmac.yaml     |  2 +-
 .../bindings/phy/amlogic,axg-mipi-dphy.yaml   |  2 +-
 .../phy/amlogic,meson-g12a-usb2-phy.yaml      |  2 +-
 .../phy/amlogic,meson-g12a-usb3-pcie-phy.yaml |  2 +-
 .../bindings/power/amlogic,meson-ee-pwrc.yaml |  2 +-
 .../bindings/reset/amlogic,meson-reset.yaml   |  2 +-
 .../bindings/rng/amlogic,meson-rng.yaml       |  2 +-
 .../bindings/serial/amlogic,meson-uart.yaml   |  2 +-
 .../bindings/soc/amlogic/amlogic,canvas.yaml  |  2 +-
 .../bindings/spi/amlogic,meson-gx-spicc.yaml  |  2 +-
 .../bindings/spi/amlogic,meson6-spifc.yaml    |  2 +-
 .../usb/amlogic,meson-g12a-usb-ctrl.yaml      |  2 +-
 .../watchdog/amlogic,meson-gxbb-wdt.yaml      |  2 +-
 MAINTAINERS                                   | 20 +++++++++----------
 27 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/.mailmap b/.mailmap
index 2ed1cf869175..04fb67be9b0b 100644
--- a/.mailmap
+++ b/.mailmap
@@ -303,6 +303,7 @@ Morten Welinder <welinder@troll.com>
 Mythri P K <mythripk@ti.com>
 Nadia Yvette Chambers <nyc@holomorphy.com> William Lee Irwin III <wli@holomorphy.com>
 Nathan Chancellor <nathan@kernel.org> <natechancellor@gmail.com>
+Neil Armstrong <neil.armstrong@linaro.org> <narmstrong@baylibre.com>
 Nguyen Anh Quynh <aquynh@gmail.com>
 Nicholas Piggin <npiggin@gmail.com> <npiggen@suse.de>
 Nicholas Piggin <npiggin@gmail.com> <npiggin@kernel.dk>
diff --git a/Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.yaml b/Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.yaml
index 6cc74523ebfd..1748f1605cc7 100644
--- a/Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.yaml
+++ b/Documentation/devicetree/bindings/arm/amlogic/amlogic,meson-gx-ao-secure.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson Firmware registers Interface
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Meson SoCs have a register bank with status and data shared with the
diff --git a/Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml b/Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
index 2e208d2fc98f..7cdffdb131ac 100644
--- a/Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
+++ b/Documentation/devicetree/bindings/display/amlogic,meson-dw-hdmi.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic specific extensions to the Synopsys Designware HDMI Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 allOf:
   - $ref: /schemas/sound/name-prefix.yaml#
diff --git a/Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml b/Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
index 047fd69e0377..6655a93b1874 100644
--- a/Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
+++ b/Documentation/devicetree/bindings/display/amlogic,meson-vpu.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson Display Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Amlogic Meson Display controller is composed of several components
diff --git a/Documentation/devicetree/bindings/display/bridge/analogix,anx7814.yaml b/Documentation/devicetree/bindings/display/bridge/analogix,anx7814.yaml
index bce96b5b0db0..4a5e5d9d6f90 100644
--- a/Documentation/devicetree/bindings/display/bridge/analogix,anx7814.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/analogix,anx7814.yaml
@@ -8,7 +8,7 @@ title: Analogix ANX7814 SlimPort (Full-HD Transmitter)
 
 maintainers:
   - Andrzej Hajda <andrzej.hajda@intel.com>
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Robert Foss <robert.foss@linaro.org>
 
 properties:
diff --git a/Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml b/Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml
index c6e81f532215..1b2185be92cd 100644
--- a/Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml
@@ -8,7 +8,7 @@ title: ITE it66121 HDMI bridge Device Tree Bindings
 
 maintainers:
   - Phong LE <ple@baylibre.com>
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The IT66121 is a high-performance and low-power single channel HDMI
diff --git a/Documentation/devicetree/bindings/display/panel/sgd,gktw70sdae4se.yaml b/Documentation/devicetree/bindings/display/panel/sgd,gktw70sdae4se.yaml
index 44e02decdf3a..2e75e3738ff0 100644
--- a/Documentation/devicetree/bindings/display/panel/sgd,gktw70sdae4se.yaml
+++ b/Documentation/devicetree/bindings/display/panel/sgd,gktw70sdae4se.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Solomon Goldentek Display GKTW70SDAE4SE 7" WVGA LVDS Display Panel
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Thierry Reding <thierry.reding@gmail.com>
 
 allOf:
diff --git a/Documentation/devicetree/bindings/i2c/amlogic,meson6-i2c.yaml b/Documentation/devicetree/bindings/i2c/amlogic,meson6-i2c.yaml
index 6ecb0270d88d..199a354ccb97 100644
--- a/Documentation/devicetree/bindings/i2c/amlogic,meson6-i2c.yaml
+++ b/Documentation/devicetree/bindings/i2c/amlogic,meson6-i2c.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson I2C Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Beniamino Galvani <b.galvani@gmail.com>
 
 allOf:
diff --git a/Documentation/devicetree/bindings/mailbox/amlogic,meson-gxbb-mhu.yaml b/Documentation/devicetree/bindings/mailbox/amlogic,meson-gxbb-mhu.yaml
index ea06976fbbc7..dfd26b998189 100644
--- a/Documentation/devicetree/bindings/mailbox/amlogic,meson-gxbb-mhu.yaml
+++ b/Documentation/devicetree/bindings/mailbox/amlogic,meson-gxbb-mhu.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson Message-Handling-Unit Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Amlogic's Meson SoCs Message-Handling-Unit (MHU) is a mailbox controller
diff --git a/Documentation/devicetree/bindings/media/amlogic,axg-ge2d.yaml b/Documentation/devicetree/bindings/media/amlogic,axg-ge2d.yaml
index bee93bd84771..e551be5e680e 100644
--- a/Documentation/devicetree/bindings/media/amlogic,axg-ge2d.yaml
+++ b/Documentation/devicetree/bindings/media/amlogic,axg-ge2d.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic GE2D Acceleration Unit
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/media/amlogic,gx-vdec.yaml b/Documentation/devicetree/bindings/media/amlogic,gx-vdec.yaml
index 5044c4bb94e0..b827edabcafa 100644
--- a/Documentation/devicetree/bindings/media/amlogic,gx-vdec.yaml
+++ b/Documentation/devicetree/bindings/media/amlogic,gx-vdec.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Video Decoder
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Maxime Jourdan <mjourdan@baylibre.com>
 
 description: |
diff --git a/Documentation/devicetree/bindings/media/amlogic,meson-gx-ao-cec.yaml b/Documentation/devicetree/bindings/media/amlogic,meson-gx-ao-cec.yaml
index d93aea6a0258..8d844f4312d1 100644
--- a/Documentation/devicetree/bindings/media/amlogic,meson-gx-ao-cec.yaml
+++ b/Documentation/devicetree/bindings/media/amlogic,meson-gx-ao-cec.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson AO-CEC Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Amlogic Meson AO-CEC module is present is Amlogic SoCs and its purpose is
diff --git a/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml b/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
index a3b976f101e8..5750cc06e923 100644
--- a/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
+++ b/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Khadas on-board Microcontroller Device Tree Bindings
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   Khadas embeds a microcontroller on their VIM and Edge boards adding some
diff --git a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
index 608e1d62bed5..ddd5a073c3a8 100644
--- a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson DWMAC Ethernet controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Martin Blumenstingl <martin.blumenstingl@googlemail.com>
 
 # We need a select here so we don't match all nodes with 'snps,dwmac'
diff --git a/Documentation/devicetree/bindings/phy/amlogic,axg-mipi-dphy.yaml b/Documentation/devicetree/bindings/phy/amlogic,axg-mipi-dphy.yaml
index be485f500887..5eddaed3d853 100644
--- a/Documentation/devicetree/bindings/phy/amlogic,axg-mipi-dphy.yaml
+++ b/Documentation/devicetree/bindings/phy/amlogic,axg-mipi-dphy.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic AXG MIPI D-PHY
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb2-phy.yaml b/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb2-phy.yaml
index 399ebde45409..f3a5fbabbbb5 100644
--- a/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb2-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb2-phy.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic G12A USB2 PHY
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb3-pcie-phy.yaml b/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb3-pcie-phy.yaml
index 453c083cf44c..868b4e6fde71 100644
--- a/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb3-pcie-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/amlogic,meson-g12a-usb3-pcie-phy.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic G12A USB3 + PCIE Combo PHY
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/power/amlogic,meson-ee-pwrc.yaml b/Documentation/devicetree/bindings/power/amlogic,meson-ee-pwrc.yaml
index f005abac7079..683c191c4921 100644
--- a/Documentation/devicetree/bindings/power/amlogic,meson-ee-pwrc.yaml
+++ b/Documentation/devicetree/bindings/power/amlogic,meson-ee-pwrc.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson Everything-Else Power Domains
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |+
   The Everything-Else Power Domains node should be the child of a syscon
diff --git a/Documentation/devicetree/bindings/reset/amlogic,meson-reset.yaml b/Documentation/devicetree/bindings/reset/amlogic,meson-reset.yaml
index 494a454928ce..98db2aa74dc8 100644
--- a/Documentation/devicetree/bindings/reset/amlogic,meson-reset.yaml
+++ b/Documentation/devicetree/bindings/reset/amlogic,meson-reset.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson SoC Reset Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/rng/amlogic,meson-rng.yaml b/Documentation/devicetree/bindings/rng/amlogic,meson-rng.yaml
index 444be32a8a29..09c6c906b1f9 100644
--- a/Documentation/devicetree/bindings/rng/amlogic,meson-rng.yaml
+++ b/Documentation/devicetree/bindings/rng/amlogic,meson-rng.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson Random number generator
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 properties:
   compatible:
diff --git a/Documentation/devicetree/bindings/serial/amlogic,meson-uart.yaml b/Documentation/devicetree/bindings/serial/amlogic,meson-uart.yaml
index 72e8868db3e0..7822705ad16c 100644
--- a/Documentation/devicetree/bindings/serial/amlogic,meson-uart.yaml
+++ b/Documentation/devicetree/bindings/serial/amlogic,meson-uart.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson SoC UART Serial Interface
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Amlogic Meson SoC UART Serial Interface is present on a large range
diff --git a/Documentation/devicetree/bindings/soc/amlogic/amlogic,canvas.yaml b/Documentation/devicetree/bindings/soc/amlogic/amlogic,canvas.yaml
index 17db87cb9dab..c3c599096353 100644
--- a/Documentation/devicetree/bindings/soc/amlogic/amlogic,canvas.yaml
+++ b/Documentation/devicetree/bindings/soc/amlogic/amlogic,canvas.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Canvas Video Lookup Table
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
   - Maxime Jourdan <mjourdan@baylibre.com>
 
 description: |
diff --git a/Documentation/devicetree/bindings/spi/amlogic,meson-gx-spicc.yaml b/Documentation/devicetree/bindings/spi/amlogic,meson-gx-spicc.yaml
index 50de0da42c13..0c10f7678178 100644
--- a/Documentation/devicetree/bindings/spi/amlogic,meson-gx-spicc.yaml
+++ b/Documentation/devicetree/bindings/spi/amlogic,meson-gx-spicc.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson SPI Communication Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 allOf:
   - $ref: "spi-controller.yaml#"
diff --git a/Documentation/devicetree/bindings/spi/amlogic,meson6-spifc.yaml b/Documentation/devicetree/bindings/spi/amlogic,meson6-spifc.yaml
index 8a9d526d06eb..ac3b2ec300ac 100644
--- a/Documentation/devicetree/bindings/spi/amlogic,meson6-spifc.yaml
+++ b/Documentation/devicetree/bindings/spi/amlogic,meson6-spifc.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson SPI Flash Controller
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 allOf:
   - $ref: "spi-controller.yaml#"
diff --git a/Documentation/devicetree/bindings/usb/amlogic,meson-g12a-usb-ctrl.yaml b/Documentation/devicetree/bindings/usb/amlogic,meson-g12a-usb-ctrl.yaml
index e349fa5de606..daf2a859418d 100644
--- a/Documentation/devicetree/bindings/usb/amlogic,meson-g12a-usb-ctrl.yaml
+++ b/Documentation/devicetree/bindings/usb/amlogic,meson-g12a-usb-ctrl.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Amlogic Meson G12A DWC3 USB SoC Controller Glue
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 description: |
   The Amlogic G12A embeds a DWC3 USB IP Core configured for USB2 and USB3
diff --git a/Documentation/devicetree/bindings/watchdog/amlogic,meson-gxbb-wdt.yaml b/Documentation/devicetree/bindings/watchdog/amlogic,meson-gxbb-wdt.yaml
index c7459cf70e30..497d60408ea0 100644
--- a/Documentation/devicetree/bindings/watchdog/amlogic,meson-gxbb-wdt.yaml
+++ b/Documentation/devicetree/bindings/watchdog/amlogic,meson-gxbb-wdt.yaml
@@ -8,7 +8,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: Meson GXBB SoCs Watchdog timer
 
 maintainers:
-  - Neil Armstrong <narmstrong@baylibre.com>
+  - Neil Armstrong <neil.armstrong@linaro.org>
 
 allOf:
   - $ref: watchdog.yaml#
diff --git a/MAINTAINERS b/MAINTAINERS
index 66bffb24a348..dd319665232f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1769,7 +1769,7 @@ N:	sun[x456789]i
 N:	sun50i
 
 ARM/Amlogic Meson SoC CLOCK FRAMEWORK
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 M:	Jerome Brunet <jbrunet@baylibre.com>
 L:	linux-amlogic@lists.infradead.org
 S:	Maintained
@@ -1794,7 +1794,7 @@ F:	Documentation/devicetree/bindings/sound/amlogic*
 F:	sound/soc/meson/
 
 ARM/Amlogic Meson SoC support
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 M:	Kevin Hilman <khilman@baylibre.com>
 R:	Jerome Brunet <jbrunet@baylibre.com>
 R:	Martin Blumenstingl <martin.blumenstingl@googlemail.com>
@@ -2489,7 +2489,7 @@ W:	http://www.digriz.org.uk/ts78xx/kernel
 F:	arch/arm/mach-orion5x/ts78xx-*
 
 ARM/OXNAS platform support
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 L:	linux-oxnas@groups.io (moderated for non-subscribers)
 S:	Maintained
@@ -6618,7 +6618,7 @@ F:	Documentation/devicetree/bindings/display/allwinner*
 F:	drivers/gpu/drm/sun4i/
 
 DRM DRIVERS FOR AMLOGIC SOCS
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	dri-devel@lists.freedesktop.org
 L:	linux-amlogic@lists.infradead.org
 S:	Supported
@@ -6640,7 +6640,7 @@ F:	drivers/gpu/drm/atmel-hlcdc/
 
 DRM DRIVERS FOR BRIDGE CHIPS
 M:	Andrzej Hajda <andrzej.hajda@intel.com>
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 M:	Robert Foss <robert.foss@linaro.org>
 R:	Laurent Pinchart <Laurent.pinchart@ideasonboard.com>
 R:	Jonas Karlman <jonas@kwiboo.se>
@@ -10575,7 +10575,7 @@ F:	drivers/media/tuners/it913x*
 
 ITE IT66121 HDMI BRIDGE DRIVER
 M:	Phong LE <ple@baylibre.com>
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 S:	Maintained
 T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	Documentation/devicetree/bindings/display/bridge/ite,it66121.yaml
@@ -11081,7 +11081,7 @@ F:	kernel/debug/
 F:	kernel/module/kdb.c
 
 KHADAS MCU MFD DRIVER
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	linux-amlogic@lists.infradead.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
@@ -12951,7 +12951,7 @@ S:	Maintained
 F:	drivers/watchdog/menz69_wdt.c
 
 MESON AO CEC DRIVER FOR AMLOGIC SOCS
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	linux-media@vger.kernel.org
 L:	linux-amlogic@lists.infradead.org
 S:	Supported
@@ -12962,7 +12962,7 @@ F:	drivers/media/cec/platform/meson/ao-cec-g12a.c
 F:	drivers/media/cec/platform/meson/ao-cec.c
 
 MESON GE2D DRIVER FOR AMLOGIC SOCS
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	linux-media@vger.kernel.org
 L:	linux-amlogic@lists.infradead.org
 S:	Supported
@@ -12978,7 +12978,7 @@ F:	Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt
 F:	drivers/mtd/nand/raw/meson_*
 
 MESON VIDEO DECODER DRIVER FOR AMLOGIC SOCS
-M:	Neil Armstrong <narmstrong@baylibre.com>
+M:	Neil Armstrong <neil.armstrong@linaro.org>
 L:	linux-media@vger.kernel.org
 L:	linux-amlogic@lists.infradead.org
 S:	Supported
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH bpf-next] xdp: report rx queue index in xdp_frame
From: Daniel Borkmann @ 2022-08-16  9:54 UTC (permalink / raw)
  To: Lorenzo Bianconi, Lorenzo Bianconi
  Cc: bpf, ast, andrii, netdev, davem, kuba, edumazet, pabeni, hawk,
	john.fastabend
In-Reply-To: <YvtnOloObaUxpR1O@lore-desk>

On 8/16/22 11:45 AM, Lorenzo Bianconi wrote:
>> Report rx queue index in xdp_frame according to the xdp_buff xdp_rxq_info
>> pointer. xdp_frame queue_index is currently used in cpumap code to covert
>> the xdp_frame into a xdp_buff.
>> xdp_frame size is not increased adding queue_index since an alignment padding
>> in the structure is used to insert queue_index field.
>>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>>   include/net/xdp.h   | 2 ++
>>   kernel/bpf/cpumap.c | 2 +-
>>   2 files changed, 3 insertions(+), 1 deletion(-)
> 
> 
> Hi Alexei and Daniel,
> 
> this patch is marked as 'new, archived' in patchwork.
> Do I need to rebase and repost it?

Yes, please rebase and resend. Perhaps also improve the commit description
a bit in terms of what it fixes, it's a bit terse to the reader above on
what effect it has.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: vertexcom-mse102x: Update email address
From: Krzysztof Kozlowski @ 2022-08-16 10:24 UTC (permalink / raw)
  To: Stefan Wahren, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski
  Cc: netdev, devicetree
In-Reply-To: <20220815080626.9688-1-stefan.wahren@i2se.com>

On 15/08/2022 11:06, Stefan Wahren wrote:
> in-tech smart charging is now chargebyte. So update the email address
> accordingly.
> 
> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>

Yet you used third email address... which is fine, just a bit confusing.
Since in-tech.com still works and you might be (or not) different
Stefan, it's difficult to judge...

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH bpf-next] xdp: report rx queue index in xdp_frame
From: Lorenzo Bianconi @ 2022-08-16  9:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Lorenzo Bianconi, bpf, ast, andrii, netdev, davem, kuba, edumazet,
	pabeni, hawk, john.fastabend
In-Reply-To: <4e717cbe-17a2-dbae-d557-0b29eaa28dae@iogearbox.net>

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

> On 8/16/22 11:45 AM, Lorenzo Bianconi wrote:
> > > Report rx queue index in xdp_frame according to the xdp_buff xdp_rxq_info
> > > pointer. xdp_frame queue_index is currently used in cpumap code to covert
> > > the xdp_frame into a xdp_buff.
> > > xdp_frame size is not increased adding queue_index since an alignment padding
> > > in the structure is used to insert queue_index field.
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >   include/net/xdp.h   | 2 ++
> > >   kernel/bpf/cpumap.c | 2 +-
> > >   2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > 
> > Hi Alexei and Daniel,
> > 
> > this patch is marked as 'new, archived' in patchwork.
> > Do I need to rebase and repost it?
> 
> Yes, please rebase and resend. Perhaps also improve the commit description
> a bit in terms of what it fixes, it's a bit terse to the reader above on
> what effect it has.

ack thx, will do.

Regards,
Lorenzo

> 
> Thanks,
> Daniel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH] stmmac: pci: Add LS7A support for dwmac-loongson
From: Feiyang Chen @ 2022-08-16 10:25 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue, joabreu
  Cc: Feiyang Chen, zhangqing, chenhuacai, chris.chenfeiyang, netdev,
	loongarch

Current dwmac-loongson only support LS2K in the "probed with PCI and
configured with DT" manner. We add LS7A support on which the devices
are fully PCI (non-DT).

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
---
 .../ethernet/stmicro/stmmac/dwmac-loongson.c  | 175 ++++++++++++------
 1 file changed, 122 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
index 017dbbda0c1c..50748f047e85 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
@@ -9,14 +9,22 @@
 #include <linux/of_irq.h>
 #include "stmmac.h"
 
-static int loongson_default_data(struct plat_stmmacenet_data *plat)
+struct stmmac_pci_info {
+	int (*setup)(struct pci_dev *pdev, struct plat_stmmacenet_data *plat);
+};
+
+static void common_default_data(struct pci_dev *pdev,
+				struct plat_stmmacenet_data *plat)
 {
+	plat->bus_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
+	plat->interface = PHY_INTERFACE_MODE_GMII;
+
 	plat->clk_csr = 2;	/* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */
 	plat->has_gmac = 1;
 	plat->force_sf_dma_mode = 1;
 
 	/* Set default value for multicast hash bins */
-	plat->multicast_filter_bins = HASH_TABLE_SIZE;
+	plat->multicast_filter_bins = 256;
 
 	/* Set default value for unicast filter entries */
 	plat->unicast_filter_entries = 1;
@@ -35,32 +43,79 @@ static int loongson_default_data(struct plat_stmmacenet_data *plat)
 	/* Disable RX queues routing by default */
 	plat->rx_queues_cfg[0].pkt_route = 0x0;
 
-	/* Default to phy auto-detection */
-	plat->phy_addr = -1;
-
 	plat->dma_cfg->pbl = 32;
 	plat->dma_cfg->pblx8 = true;
 
-	plat->multicast_filter_bins = 256;
+	plat->clk_ref_rate = 125000000;
+	plat->clk_ptp_rate = 125000000;
+}
+
+static int loongson_gmac_data(struct pci_dev *pdev,
+			      struct plat_stmmacenet_data *plat)
+{
+	common_default_data(pdev, plat);
+
+	plat->mdio_bus_data->phy_mask = 0;
+
+	plat->phy_addr = -1;
+	plat->phy_interface = PHY_INTERFACE_MODE_RGMII_ID;
+
+	return 0;
+}
+
+static struct stmmac_pci_info loongson_gmac_pci_info = {
+	.setup = loongson_gmac_data,
+};
+
+static void loongson_gnet_fix_speed(void *priv, unsigned int speed)
+{
+	struct net_device *ndev = (struct net_device *)(*(unsigned long *)priv);
+	struct stmmac_priv *ptr = netdev_priv(ndev);
+
+	if (speed == SPEED_1000) {
+		if (readl(ptr->ioaddr + MAC_CTRL_REG) & (1 << 15) /* PS */) {
+			/* reset phy */
+			phy_set_bits(ndev->phydev, 0 /*MII_BMCR*/,
+				     0x200 /*BMCR_ANRESTART*/);
+		}
+	}
+}
+
+static int loongson_gnet_data(struct pci_dev *pdev,
+			      struct plat_stmmacenet_data *plat)
+{
+	common_default_data(pdev, plat);
+
+	plat->mdio_bus_data->phy_mask = 0xfffffffb;
+
+	plat->phy_addr = 2;
+	plat->phy_interface = PHY_INTERFACE_MODE_GMII;
+
+	/* GNET 1000M speed need workaround */
+	plat->fix_mac_speed = loongson_gnet_fix_speed;
+
+	/* Get netdev pointer address */
+	plat->bsp_priv = &(pdev->dev.driver_data);
+
 	return 0;
 }
 
-static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+static struct stmmac_pci_info loongson_gnet_pci_info = {
+	.setup = loongson_gnet_data,
+};
+
+static int loongson_dwmac_probe(struct pci_dev *pdev,
+				const struct pci_device_id *id)
 {
 	struct plat_stmmacenet_data *plat;
+	struct stmmac_pci_info *info;
 	struct stmmac_resources res;
 	struct device_node *np;
-	int ret, i, phy_mode;
+	int ret, i, bus_id, phy_mode;
 	bool mdio = false;
 
 	np = dev_of_node(&pdev->dev);
-
-	if (!np) {
-		pr_info("dwmac_loongson_pci: No OF node\n");
-		return -ENODEV;
-	}
-
-	if (!of_device_is_compatible(np, "loongson, pci-gmac")) {
+	if (np && !of_device_is_compatible(np, "loongson, pci-gmac")) {
 		pr_info("dwmac_loongson_pci: Incompatible OF node\n");
 		return -ENODEV;
 	}
@@ -74,14 +129,14 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id
 		mdio = true;
 	}
 
-	if (mdio) {
-		plat->mdio_bus_data = devm_kzalloc(&pdev->dev,
-						   sizeof(*plat->mdio_bus_data),
-						   GFP_KERNEL);
-		if (!plat->mdio_bus_data)
-			return -ENOMEM;
+	plat->mdio_bus_data = devm_kzalloc(&pdev->dev,
+					   sizeof(*plat->mdio_bus_data),
+					   GFP_KERNEL);
+	if (!plat->mdio_bus_data)
+		return -ENOMEM;
+
+	if (mdio)
 		plat->mdio_bus_data->needs_reset = true;
-	}
 
 	plat->dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*plat->dma_cfg), GFP_KERNEL);
 	if (!plat->dma_cfg)
@@ -104,42 +159,52 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id
 		break;
 	}
 
-	plat->bus_id = of_alias_get_id(np, "ethernet");
-	if (plat->bus_id < 0)
-		plat->bus_id = pci_dev_id(pdev);
-
-	phy_mode = device_get_phy_mode(&pdev->dev);
-	if (phy_mode < 0) {
-		dev_err(&pdev->dev, "phy_mode not found\n");
-		return phy_mode;
-	}
-
-	plat->phy_interface = phy_mode;
-	plat->interface = PHY_INTERFACE_MODE_GMII;
-
 	pci_set_master(pdev);
 
-	loongson_default_data(plat);
-	pci_enable_msi(pdev);
-	memset(&res, 0, sizeof(res));
-	res.addr = pcim_iomap_table(pdev)[0];
+	info = (struct stmmac_pci_info *)id->driver_data;
+	ret = info->setup(pdev, plat);
+	if (ret)
+		return ret;
 
-	res.irq = of_irq_get_byname(np, "macirq");
-	if (res.irq < 0) {
-		dev_err(&pdev->dev, "IRQ macirq not found\n");
-		ret = -ENODEV;
+	if (np) {
+		bus_id = of_alias_get_id(np, "ethernet");
+		if (bus_id >= 0)
+			plat->bus_id = bus_id;
+
+		phy_mode = device_get_phy_mode(&pdev->dev);
+		if (phy_mode < 0) {
+			dev_err(&pdev->dev, "phy_mode not found\n");
+			return phy_mode;
+		}
+		plat->phy_interface = phy_mode;
 	}
 
-	res.wol_irq = of_irq_get_byname(np, "eth_wake_irq");
-	if (res.wol_irq < 0) {
-		dev_info(&pdev->dev, "IRQ eth_wake_irq not found, using macirq\n");
-		res.wol_irq = res.irq;
-	}
+	pci_enable_msi(pdev);
 
-	res.lpi_irq = of_irq_get_byname(np, "eth_lpi");
-	if (res.lpi_irq < 0) {
-		dev_err(&pdev->dev, "IRQ eth_lpi not found\n");
-		ret = -ENODEV;
+	memset(&res, 0, sizeof(res));
+	res.addr = pcim_iomap_table(pdev)[0];
+	if (np) {
+		res.irq = of_irq_get_byname(np, "macirq");
+		if (res.irq < 0) {
+			dev_err(&pdev->dev, "IRQ macirq not found\n");
+			ret = -ENODEV;
+		}
+
+		res.wol_irq = of_irq_get_byname(np, "eth_wake_irq");
+		if (res.wol_irq < 0) {
+			dev_info(&pdev->dev,
+				 "IRQ eth_wake_irq not found, using macirq\n");
+			res.wol_irq = res.irq;
+		}
+
+		res.lpi_irq = of_irq_get_byname(np, "eth_lpi");
+		if (res.lpi_irq < 0) {
+			dev_err(&pdev->dev, "IRQ eth_lpi not found\n");
+			ret = -ENODEV;
+		}
+	} else {
+		res.irq = pdev->irq;
+		res.wol_irq = pdev->irq;
 	}
 
 	return stmmac_dvr_probe(&pdev->dev, plat, &res);
@@ -199,8 +264,12 @@ static int __maybe_unused loongson_dwmac_resume(struct device *dev)
 static SIMPLE_DEV_PM_OPS(loongson_dwmac_pm_ops, loongson_dwmac_suspend,
 			 loongson_dwmac_resume);
 
+#define PCI_DEVICE_ID_LOONGSON_GMAC	0x7a03
+#define PCI_DEVICE_ID_LOONGSON_GNET	0x7a13
+
 static const struct pci_device_id loongson_dwmac_id_table[] = {
-	{ PCI_VDEVICE(LOONGSON, 0x7a03) },
+	{ PCI_DEVICE_DATA(LOONGSON, GMAC, &loongson_gmac_pci_info) },
+	{ PCI_DEVICE_DATA(LOONGSON, GNET, &loongson_gnet_pci_info) },
 	{}
 };
 MODULE_DEVICE_TABLE(pci, loongson_dwmac_id_table);
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH net 1/1] net_sched: cls_route: disallow handle of 0
From: Jamal Hadi Salim @ 2022-08-16 10:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, netdev, xiyou.wangcong, jiri, kuznet,
	cascardo, linux-distros, security, stephen, dsahern, gregkh
In-Reply-To: <20220815104434.052a53b4@kernel.org>

The earlier discussion was to let the other fix in to plug the CVE
hole (I had proposed
disallowing handle of 0 in that discussion).

cheers,
jamal

On Mon, Aug 15, 2022 at 1:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 14 Aug 2022 11:27:58 +0000 Jamal Hadi Salim wrote:
> > Follows up on:
> > https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/
> >
> > handle of 0 implies from/to of universe realm which is not very
> > sensible.
>
> Heh, I was gonna say, but then you acked the other fix :)

^ permalink raw reply

* Re: [PATCH] net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()
From: Hawkins Jiawei @ 2022-08-16 10:34 UTC (permalink / raw)
  To: dhowells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh
  Cc: linux-kernel, netdev, yin31149, linux-kernel-mentees, bpf
In-Reply-To: <166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk>

On Tue, 16 Aug 2022 at 17:34, David Howells <dhowells@redhat.com> wrote:
>
> Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
> that checks to see if sk->sk_callback_lock() is held and use that here
> instead.
Hi, I wonder if we make this more geniric, for I think maybe the future
code who use __rcu_dereference_sk_user_data_with_flags() may
also meet this bug.

To be more specific, maybe we can refactor
__rcu_dereference_sk_user_data_with_flags() to
__rcu_dereference_sk_user_data_with_flags_check(), like
rcu_dereference() and rcu_dereference_check(). Maybe:

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a1bbdf5805..cf123954eab9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -578,18 +578,27 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
 
 /**
- * __rcu_dereference_sk_user_data_with_flags - return the pointer
- * only if argument flags all has been set in sk_user_data. Otherwise
- * return NULL
+ * __rcu_dereference_sk_user_data_with_flags_check - return the pointer
+ * only if argument flags all has been set in sk_user_data, with debug
+ * checking. Otherwise return NULL
  *
- * @sk: socket
- * @flags: flag bits
+ * Do __rcu_dereference_sk_user_data_with_flags(), but check that the
+ * conditions under which the rcu dereference will take place are correct,
+ * which is a bit like rcu_dereference_check() and rcu_derefence().
+ *
+ * @sk		: socket
+ * @flags	: flag bits
+ * @condition	: the conditions under which the rcu dereference will
+ * take place
  */
 static inline void *
-__rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
-					  uintptr_t flags)
+__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
+						uintptr_t flags, bool condition)
 {
-	uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk));
+	uintptr_t sk_user_data;
+
+	sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
+							condition);
 
 	WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK);
 
@@ -598,6 +607,8 @@ __rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
 	return NULL;
 }
 
+#define __rcu_dereference_sk_user_data_with_flags(sk, flags) \
+	__rcu_dereference_sk_user_data_with_flags_check(sk, flags, 0)
 #define rcu_dereference_sk_user_data(sk)				\
 	__rcu_dereference_sk_user_data_with_flags(sk, 0)
 #define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags)		\

> +/**
> + * __locked_read_sk_user_data_with_flags - return the pointer
> + * only if argument flags all has been set in sk_user_data. Otherwise
> + * return NULL
> + *
> +               (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
> +                                                lockdep_is_held(&sk->sk_callback_lock));

> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 85fa9dbfa8bf..82c61612f382 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk)
>         struct sock __rcu **socks;
> 
>         write_lock_bh(&sk->sk_callback_lock);
> -       socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
> +       socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
>         if (socks) {
>                 WRITE_ONCE(sk->sk_user_data, NULL);
>                 /*
Then, as you point out, we can pass
condition(lockdep_is_held(&sk->sk_callback_lock)) to
__rcu_dereference_sk_user_data_with_flags_check() in order to
make compiler happy as below:

diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 85fa9dbfa8bf..a772610987c5 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -24,7 +24,10 @@ void bpf_sk_reuseport_detach(struct sock *sk)
 	struct sock __rcu **socks;
 
 	write_lock_bh(&sk->sk_callback_lock);
-	socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
+	socks = __rcu_dereference_sk_user_data_with_flags_check(
+			sk, SK_USER_DATA_BPF,
+			lockdep_is_held(&sk->sk_callback_lock));
+
 	if (socks) {
 		WRITE_ONCE(sk->sk_user_data, NULL);
 		/*

^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH v3] igb: Add lock to avoid data race
From: Jankowski, Konrad0 @ 2022-08-16 10:35 UTC (permalink / raw)
  To: Lin Ma, Brandeburg, Jesse, Nguyen, Anthony L, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org,
	john.fastabend@gmail.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org
In-Reply-To: <20220809073542.3390-1-linma@zju.edu.cn>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Lin Ma
> Sent: Tuesday, August 9, 2022 9:36 AM
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org;
> john.fastabend@gmail.com; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> bpf@vger.kernel.org
> Cc: Lin Ma <linma@zju.edu.cn>
> Subject: [Intel-wired-lan] [PATCH v3] igb: Add lock to avoid data race
> 
> The commit c23d92b80e0b ("igb: Teardown SR-IOV before
> unregister_netdev()") places the unregister_netdev() call after the
> igb_disable_sriov() call to avoid functionality issue.
> 
> However, it introduces several race conditions when detaching a device.
> For example, when .remove() is called, the below interleaving leads to use-
> after-free.
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> Moreover, the igb_disable_sriov() also suffers from data race with the
> requests from VF driver.
> 
>  (FREE from device detaching)      |   (USE from requests)
> igb_remove                         |  igb_msix_other
>  igb_disable_sriov                 |   igb_msg_task
>   kfree(adapter->vf_data)          |    vf < adapter->vfs_allocated_count
>   adapter->vfs_allocated_count = 0 |
> 
> To this end, this commit first eliminates the data races from netdev core by
> using rtnl_lock (similar to commit 719479230893 ("dpaa2-eth: add MAC/PHY
> support through phylink")). And then adds a spinlock to eliminate races from
> driver requests. (similar to commit 1e53834ce541
> ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero")
> 
> 
> Fixes: c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()")
> Signed-off-by: Lin Ma <linma@zju.edu.cn>
> ---
> V2 -> V3:  make the commit message much clear
> V1 -> V2:  fix typo in title idb -> igb
> V0 -> V1:  change title from "Add rtnl_lock" to "Add lock"
>            add additional spinlock as suggested by Jakub, according to
>            1e53834ce541 ("ixgbe: Add locking to prevent panic when setting
>            sriov_numvfs to zero")
> 
>  drivers/net/ethernet/intel/igb/igb.h      |  2 ++
>  drivers/net/ethernet/intel/igb/igb_main.c | 12 +++++++++++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb.h
> b/drivers/net/ethernet/intel/igb/igb.h
> index 2d3daf022651..015b78144114 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -664,6 +664,8 @@ struct igb_adapter {

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>

^ permalink raw reply

* [syzbot] upstream boot error: BUG: unable to handle kernel paging request in ieee80211_register_hw
From: syzbot @ 2022-08-16  8:40 UTC (permalink / raw)
  To: davem, edumazet, johannes, kuba, linux-kernel, linux-wireless,
	netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    4a9350597aff Merge tag 'sound-fix-6.0-rc1' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11515705080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bc6716795f118372
dashboard link: https://syzkaller.appspot.com/bug?extid=655209e079e67502f2da
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: i386

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+655209e079e67502f2da@syzkaller.appspotmail.com

usbcore: registered new interface driver nfcmrvl
Loading iSCSI transport class v2.0-870.
scsi host0: Virtio SCSI HBA
st: Version 20160209, fixed bufsize 32768, s/g segs 256
Rounding down aligned max_sectors from 4294967295 to 4294967288
db_root: cannot open: /etc/target
slram: not enough parameters.
ftl_cs: FTL header not found.
wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
eql: Equalizer2002: Simon Janes (simon@ncm.com) and David S. Miller (davem@redhat.com)
MACsec IEEE 802.1AE
tun: Universal TUN/TAP device driver, 1.6
vcan: Virtual CAN interface driver
vxcan: Virtual CAN Tunnel driver
slcan: serial line CAN interface driver
CAN device driver interface
usbcore: registered new interface driver usb_8dev
usbcore: registered new interface driver ems_usb
usbcore: registered new interface driver gs_usb
usbcore: registered new interface driver kvaser_usb
usbcore: registered new interface driver mcba_usb
usbcore: registered new interface driver peak_usb
e100: Intel(R) PRO/100 Network Driver
e100: Copyright(c) 1999-2006 Intel Corporation
e1000: Intel(R) PRO/1000 Network Driver
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
mkiss: AX.25 Multikiss, Hans Albas PE1AYX
AX.25: 6pack driver, Revision: 0.3.0
AX.25: bpqether driver version 004
PPP generic driver version 2.4.2
PPP BSD Compression module registered
PPP Deflate Compression module registered
PPP MPPE Compression module registered
NET: Registered PF_PPPOX protocol family
PPTP driver version 0.8.5
SLIP: version 0.8.4-NET3.019-NEWTTY (dynamic channels, max=256) (6 bit encapsulation enabled).
CSLIP: code copyright 1989 Regents of the University of California.
SLIP linefill/keepalive option.
hdlc: HDLC support module revision 1.22
LAPB Ethernet driver version 0.02
usbcore: registered new interface driver ath9k_htc
usbcore: registered new interface driver carl9170
usbcore: registered new interface driver ath6kl_usb
usbcore: registered new interface driver ar5523
usbcore: registered new interface driver ath10k_usb
usbcore: registered new interface driver rndis_wlan
mac80211_hwsim: initializing netlink
BUG: unable to handle page fault for address: ffffddffe0000001
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 11829067 P4D 11829067 PUD 0 
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-syzkaller-14090-g4a9350597aff #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:ieee80211_register_hw+0x2872/0x3eb0 net/mac80211/main.c:1069
Code: 89 e7 e8 31 3b b8 f8 45 39 ec 0f 84 0b 03 00 00 e8 e3 3e b8 f8 4c 8d 75 f4 48 89 e8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <0f> b6 14 08 48 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 d6
RSP: 0000:ffffc90000067a40 EFLAGS: 00010a03
RAX: 1fffe1ffe0000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff8881401b8000 RSI: ffffffff88c3c82d RDI: 0000000000000005
RBP: ffff0fff0000000c R08: 0000000000000005 R09: 0000000000000000
R10: 000000000000076d R11: 0000000000000000 R12: 0000000000000000
R13: 000000000000076d R14: ffff0fff00000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffddffe0000001 CR3: 000000000bc8e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 mac80211_hwsim_new_radio+0x255f/0x4dd0 drivers/net/wireless/mac80211_hwsim.c:4129
 init_mac80211_hwsim+0x5aa/0x73b drivers/net/wireless/mac80211_hwsim.c:5379
 do_one_initcall+0xfe/0x650 init/main.c:1296
 do_initcall_level init/main.c:1369 [inline]
 do_initcalls init/main.c:1385 [inline]
 do_basic_setup init/main.c:1404 [inline]
 kernel_init_freeable+0x6b1/0x73a init/main.c:1611
 kernel_init+0x1a/0x1d0 init/main.c:1500
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>
Modules linked in:
CR2: ffffddffe0000001
---[ end trace 0000000000000000 ]---
RIP: 0010:ieee80211_register_hw+0x2872/0x3eb0 net/mac80211/main.c:1069
Code: 89 e7 e8 31 3b b8 f8 45 39 ec 0f 84 0b 03 00 00 e8 e3 3e b8 f8 4c 8d 75 f4 48 89 e8 48 b9 00 00 00 00 00 fc ff df 48 c1 e8 03 <0f> b6 14 08 48 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 d6
RSP: 0000:ffffc90000067a40 EFLAGS: 00010a03
RAX: 1fffe1ffe0000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff8881401b8000 RSI: ffffffff88c3c82d RDI: 0000000000000005
RBP: ffff0fff0000000c R08: 0000000000000005 R09: 0000000000000000
R10: 000000000000076d R11: 0000000000000000 R12: 0000000000000000
R13: 000000000000076d R14: ffff0fff00000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffddffe0000001 CR3: 000000000bc8e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
   0:	89 e7                	mov    %esp,%edi
   2:	e8 31 3b b8 f8       	callq  0xf8b83b38
   7:	45 39 ec             	cmp    %r13d,%r12d
   a:	0f 84 0b 03 00 00    	je     0x31b
  10:	e8 e3 3e b8 f8       	callq  0xf8b83ef8
  15:	4c 8d 75 f4          	lea    -0xc(%rbp),%r14
  19:	48 89 e8             	mov    %rbp,%rax
  1c:	48 b9 00 00 00 00 00 	movabs $0xdffffc0000000000,%rcx
  23:	fc ff df
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	0f b6 14 08          	movzbl (%rax,%rcx,1),%edx <-- trapping instruction
  2e:	48 89 e8             	mov    %rbp,%rax
  31:	83 e0 07             	and    $0x7,%eax
  34:	83 c0 03             	add    $0x3,%eax
  37:	38 d0                	cmp    %dl,%al
  39:	7c 08                	jl     0x43
  3b:	84 d2                	test   %dl,%dl
  3d:	0f                   	.byte 0xf
  3e:	85 d6                	test   %edx,%esi


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* [PATCH xfrm-next v2 5/6] xfrm: add RX datapath protection for IPsec full offload mode
From: Leon Romanovsky @ 2022-08-16  8:59 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S . Miller, Herbert Xu, netdev, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660639789.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Traffic received by device with enabled IPsec full offload should be
forwarded to the stack only after decryption, packet headers and
trailers removed.

Such packets are expected to be seen as normal (non-XFRM) ones, while
not-supported packets should be dropped by the HW.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h | 55 +++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 587697eb1d31..b64853df7262 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1094,6 +1094,29 @@ xfrm_state_addr_cmp(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, un
 	return !0;
 }
 
+#ifdef CONFIG_XFRM
+static inline struct xfrm_state *xfrm_input_state(struct sk_buff *skb)
+{
+	struct sec_path *sp = skb_sec_path(skb);
+
+	return sp->xvec[sp->len - 1];
+}
+#endif
+
+static inline struct xfrm_offload *xfrm_offload(struct sk_buff *skb)
+{
+#ifdef CONFIG_XFRM
+	struct sec_path *sp = skb_sec_path(skb);
+
+	if (!sp || !sp->olen || sp->len != sp->olen)
+		return NULL;
+
+	return &sp->ovec[sp->olen - 1];
+#else
+	return NULL;
+#endif
+}
+
 #ifdef CONFIG_XFRM
 int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb,
 			unsigned short family);
@@ -1125,6 +1148,15 @@ static inline int __xfrm_policy_check2(struct sock *sk, int dir,
 {
 	struct net *net = dev_net(skb->dev);
 	int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
+	struct xfrm_offload *xo = xfrm_offload(skb);
+	struct xfrm_state *x;
+
+	if (xo) {
+		x = xfrm_input_state(skb);
+		if (x->xso.type == XFRM_DEV_OFFLOAD_FULL)
+			return (xo->flags & CRYPTO_DONE) &&
+			       (xo->status & CRYPTO_SUCCESS);
+	}
 
 	if (sk && sk->sk_policy[XFRM_POLICY_IN])
 		return __xfrm_policy_check(sk, ndir, skb, family);
@@ -1860,29 +1892,6 @@ static inline void xfrm_states_delete(struct xfrm_state **states, int n)
 }
 #endif
 
-#ifdef CONFIG_XFRM
-static inline struct xfrm_state *xfrm_input_state(struct sk_buff *skb)
-{
-	struct sec_path *sp = skb_sec_path(skb);
-
-	return sp->xvec[sp->len - 1];
-}
-#endif
-
-static inline struct xfrm_offload *xfrm_offload(struct sk_buff *skb)
-{
-#ifdef CONFIG_XFRM
-	struct sec_path *sp = skb_sec_path(skb);
-
-	if (!sp || !sp->olen || sp->len != sp->olen)
-		return NULL;
-
-	return &sp->ovec[sp->olen - 1];
-#else
-	return NULL;
-#endif
-}
-
 void __init xfrm_dev_init(void);
 
 #ifdef CONFIG_XFRM_OFFLOAD
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next v2 3/6] xfrm: add an interface to offload policy
From: Leon Romanovsky @ 2022-08-16  8:59 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S . Miller, Herbert Xu, netdev, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660639789.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Extend netlink interface to add and delete XFRM policy from the device.
This functionality is a first step to implement full IPsec offload solution.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/netdevice.h |  3 ++
 include/net/xfrm.h        | 42 ++++++++++++++++++++++++
 net/xfrm/xfrm_device.c    | 61 ++++++++++++++++++++++++++++++++++-
 net/xfrm/xfrm_policy.c    | 67 +++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      | 17 ++++++++++
 5 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1a3cb93c3dcc..401c52aeab0e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1012,6 +1012,9 @@ struct xfrmdev_ops {
 	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
 				       struct xfrm_state *x);
 	void	(*xdo_dev_state_advance_esn) (struct xfrm_state *x);
+	int	(*xdo_dev_policy_add) (struct xfrm_policy *x);
+	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
+	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
 };
 #endif
 
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b4d487053dfd..587697eb1d31 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -129,6 +129,7 @@ struct xfrm_state_walk {
 enum {
 	XFRM_DEV_OFFLOAD_IN = 1,
 	XFRM_DEV_OFFLOAD_OUT,
+	XFRM_DEV_OFFLOAD_FWD,
 };
 
 enum {
@@ -534,6 +535,8 @@ struct xfrm_policy {
 	struct xfrm_tmpl       	xfrm_vec[XFRM_MAX_DEPTH];
 	struct hlist_node	bydst_inexact_list;
 	struct rcu_head		rcu;
+
+	struct xfrm_dev_offload xdo;
 };
 
 static inline struct net *xp_net(const struct xfrm_policy *xp)
@@ -1577,6 +1580,7 @@ struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq);
 int xfrm_state_delete(struct xfrm_state *x);
 int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync);
 int xfrm_dev_state_flush(struct net *net, struct net_device *dev, bool task_valid);
+int xfrm_dev_policy_flush(struct net *net, struct net_device *dev, bool task_valid);
 void xfrm_sad_getinfo(struct net *net, struct xfrmk_sadinfo *si);
 void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
@@ -1887,6 +1891,8 @@ void xfrm_dev_backlog(struct softnet_data *sd);
 struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t features, bool *again);
 int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		       struct xfrm_user_offload *xuo);
+int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+		       struct xfrm_user_offload *xuo, u8 dir);
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 
 static inline void xfrm_dev_state_advance_esn(struct xfrm_state *x)
@@ -1935,6 +1941,28 @@ static inline void xfrm_dev_state_free(struct xfrm_state *x)
 		netdev_put(dev, &xso->dev_tracker);
 	}
 }
+
+static inline void xfrm_dev_policy_delete(struct xfrm_policy *x)
+{
+	struct xfrm_dev_offload *xdo = &x->xdo;
+	struct net_device *dev = xdo->dev;
+
+	if (dev && dev->xfrmdev_ops && dev->xfrmdev_ops->xdo_dev_policy_delete)
+		dev->xfrmdev_ops->xdo_dev_policy_delete(x);
+}
+
+static inline void xfrm_dev_policy_free(struct xfrm_policy *x)
+{
+	struct xfrm_dev_offload *xdo = &x->xdo;
+	struct net_device *dev = xdo->dev;
+
+	if (dev && dev->xfrmdev_ops) {
+		if (dev->xfrmdev_ops->xdo_dev_policy_free)
+			dev->xfrmdev_ops->xdo_dev_policy_free(x);
+		xdo->dev = NULL;
+		netdev_put(dev, &xdo->dev_tracker);
+	}
+}
 #else
 static inline void xfrm_dev_resume(struct sk_buff *skb)
 {
@@ -1962,6 +1990,20 @@ static inline void xfrm_dev_state_free(struct xfrm_state *x)
 {
 }
 
+static inline int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+				      struct xfrm_user_offload *xuo, u8 dir)
+{
+	return 0;
+}
+
+static inline void xfrm_dev_policy_delete(struct xfrm_policy *x)
+{
+}
+
+static inline void xfrm_dev_policy_free(struct xfrm_policy *x)
+{
+}
+
 static inline bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 {
 	return false;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 5b04e5cdca64..1cc482e9c87d 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -302,6 +302,63 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 }
 EXPORT_SYMBOL_GPL(xfrm_dev_state_add);
 
+int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+			struct xfrm_user_offload *xuo, u8 dir)
+{
+	struct xfrm_dev_offload *xdo = &xp->xdo;
+	struct net_device *dev;
+	int err;
+
+	if (!xuo->flags || xuo->flags & ~XFRM_OFFLOAD_FULL)
+		/* We support only Full offload mode and it means
+		 * that user must set XFRM_OFFLOAD_FULL bit.
+		 */
+		return -EINVAL;
+
+	dev = dev_get_by_index(net, xuo->ifindex);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->xfrmdev_ops || !dev->xfrmdev_ops->xdo_dev_policy_add) {
+		xdo->dev = NULL;
+		dev_put(dev);
+		return -EINVAL;
+	}
+
+	xdo->dev = dev;
+	netdev_tracker_alloc(dev, &xdo->dev_tracker, GFP_ATOMIC);
+	xdo->real_dev = dev;
+	xdo->type = XFRM_DEV_OFFLOAD_FULL;
+	switch (dir) {
+	case XFRM_POLICY_IN:
+		xdo->dir = XFRM_DEV_OFFLOAD_IN;
+		break;
+	case XFRM_POLICY_OUT:
+		xdo->dir = XFRM_DEV_OFFLOAD_OUT;
+		break;
+	case XFRM_POLICY_FWD:
+		xdo->dir = XFRM_DEV_OFFLOAD_FWD;
+		break;
+	default:
+		xdo->dev = NULL;
+		dev_put(dev);
+		return -EINVAL;
+	}
+
+	err = dev->xfrmdev_ops->xdo_dev_policy_add(xp);
+	if (err) {
+		xdo->dev = NULL;
+		xdo->real_dev = NULL;
+		xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
+		xdo->dir = 0;
+		netdev_put(dev, &xdo->dev_tracker);
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xfrm_dev_policy_add);
+
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 {
 	int mtu;
@@ -404,8 +461,10 @@ static int xfrm_api_check(struct net_device *dev)
 
 static int xfrm_dev_down(struct net_device *dev)
 {
-	if (dev->features & NETIF_F_HW_ESP)
+	if (dev->features & NETIF_F_HW_ESP) {
 		xfrm_dev_state_flush(dev_net(dev), dev, true);
+		xfrm_dev_policy_flush(dev_net(dev), dev, true);
+	}
 
 	return NOTIFY_DONE;
 }
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index f1a0bab920a5..3049fdcf8411 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -425,6 +425,7 @@ void xfrm_policy_destroy(struct xfrm_policy *policy)
 	if (del_timer(&policy->timer) || del_timer(&policy->polq.hold_timer))
 		BUG();
 
+	xfrm_dev_policy_free(policy);
 	call_rcu(&policy->rcu, xfrm_policy_destroy_rcu);
 }
 EXPORT_SYMBOL(xfrm_policy_destroy);
@@ -1769,12 +1770,40 @@ xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid)
 	}
 	return err;
 }
+
+static inline int xfrm_dev_policy_flush_secctx_check(struct net *net,
+						     struct net_device *dev,
+						     bool task_valid)
+{
+	struct xfrm_policy *pol;
+	int err = 0;
+
+	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
+		if (pol->walk.dead ||
+		    xfrm_policy_id2dir(pol->index) >= XFRM_POLICY_MAX ||
+		    pol->xdo.dev != dev)
+			continue;
+
+		err = security_xfrm_policy_delete(pol->security);
+		if (err) {
+			xfrm_audit_policy_delete(pol, 0, task_valid);
+			return err;
+		}
+	}
+	return err;
+}
 #else
 static inline int
 xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid)
 {
 	return 0;
 }
+static inline int xfrm_dev_policy_flush_secctx_check(struct net *net,
+						     struct net_device *dev,
+						     bool task_valid)
+{
+	return 0;
+}
 #endif
 
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
@@ -1814,6 +1843,43 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
 }
 EXPORT_SYMBOL(xfrm_policy_flush);
 
+int xfrm_dev_policy_flush(struct net *net, struct net_device *dev, bool task_valid)
+{
+	int dir, err = 0, cnt = 0;
+	struct xfrm_policy *pol;
+
+	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	err = xfrm_dev_policy_flush_secctx_check(net, dev, task_valid);
+	if (err)
+		goto out;
+
+again:
+	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
+		dir = xfrm_policy_id2dir(pol->index);
+		if (pol->walk.dead ||
+		    dir >= XFRM_POLICY_MAX ||
+		    pol->xdo.dev != dev)
+			continue;
+
+		__xfrm_policy_unlink(pol, dir);
+		spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+		cnt++;
+		xfrm_audit_policy_delete(pol, 1, task_valid);
+		xfrm_policy_kill(pol);
+		spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+		goto again;
+	}
+	if (cnt)
+		__xfrm_policy_inexact_flush(net);
+	else
+		err = -ESRCH;
+out:
+	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_dev_policy_flush);
+
 int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk,
 		     int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
@@ -2246,6 +2312,7 @@ int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
 	pol = __xfrm_policy_unlink(pol, dir);
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
 	if (pol) {
+		xfrm_dev_policy_delete(pol);
 		xfrm_policy_kill(pol);
 		return 0;
 	}
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 9c0aef815730..698ff84da6ba 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1747,6 +1747,14 @@ static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_us
 	if (attrs[XFRMA_IF_ID])
 		xp->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
 
+	/* configure the hardware if offload is requested */
+	if (attrs[XFRMA_OFFLOAD_DEV]) {
+		err = xfrm_dev_policy_add(net, xp,
+				nla_data(attrs[XFRMA_OFFLOAD_DEV]), p->dir);
+		if (err)
+			goto error;
+	}
+
 	return xp;
  error:
 	*errp = err;
@@ -1785,6 +1793,7 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 	xfrm_audit_policy_add(xp, err ? 0 : 1, true);
 
 	if (err) {
+		xfrm_dev_policy_delete(xp);
 		security_xfrm_policy_free(xp->security);
 		kfree(xp);
 		return err;
@@ -1897,6 +1906,8 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3213,6 +3224,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3331,6 +3344,8 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3414,6 +3429,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err)
 		goto out_free_skb;
 
-- 
2.37.2


^ permalink raw reply related

* Re: [PATCH net-next 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending
From: Jan Karcher @ 2022-08-16  9:43 UTC (permalink / raw)
  To: D. Wythe, kgraul, wenjia; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
In-Reply-To: <075ff0be35660efac638448cdae7f7e7e04199d4.1660152975.git.alibuda@linux.alibaba.com>



On 10.08.2022 19:47, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch attempts to remove locks named smc_client_lgr_pending and
> smc_server_lgr_pending, which aim to serialize the creation of link
> group. However, once link group existed already, those locks are
> meaningless, worse still, they make incoming connections have to be
> queued one after the other.
> 
> Now, the creation of link group is no longer generated by competition,
> but allocated through following strategy.
> 
> 1. Try to find a suitable link group, if successd, current connection
> is considered as NON first contact connection. ends.
> 
> 2. Check the number of connections currently waiting for a suitable
> link group to be created, if it is not less that the number of link
> groups to be created multiplied by (SMC_RMBS_PER_LGR_MAX - 1), then
> increase the number of link groups to be created, current connection
> is considered as the first contact connection. ends.
> 
> 3. Increase the number of connections currently waiting, and wait
> for woken up.
> 
> 4. Decrease the number of connections currently waiting, goto 1.
> 
> We wake up the connection that was put to sleep in stage 3 through
> the SMC link state change event. Once the link moves out of the
> SMC_LNK_ACTIVATING state, decrease the number of link groups to
> be created, and then wake up at most (SMC_RMBS_PER_LGR_MAX - 1)
> connections.
> 
> In the iplementation, we introduce the concept of lnk cluster, which is
> a collection of links with the same characteristics (see
> smcr_lnk_cluster_cmpfn() with more details), which makes it possible to
> wake up efficiently in the scenario of N v.s 1.
> 
> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
> ---
>   net/smc/af_smc.c   |  11 +-
>   net/smc/smc_core.c | 356 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   net/smc/smc_core.h |  48 ++++++++
>   net/smc/smc_llc.c  |   9 +-
>   4 files changed, 411 insertions(+), 13 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 79c1318..af4b0aa 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -1194,10 +1194,8 @@ static int smc_connect_rdma(struct smc_sock *smc,
>   	if (reason_code)
>   		return reason_code;
> 
> -	mutex_lock(&smc_client_lgr_pending);
>   	reason_code = smc_conn_create(smc, ini);
>   	if (reason_code) {
> -		mutex_unlock(&smc_client_lgr_pending);
>   		return reason_code;
>   	}
> 
> @@ -1289,7 +1287,6 @@ static int smc_connect_rdma(struct smc_sock *smc,
>   		if (reason_code)
>   			goto connect_abort;
>   	}
> -	mutex_unlock(&smc_client_lgr_pending);
> 
>   	smc_copy_sock_settings_to_clc(smc);
>   	smc->connect_nonblock = 0;
> @@ -1299,7 +1296,6 @@ static int smc_connect_rdma(struct smc_sock *smc,
>   	return 0;
>   connect_abort:
>   	smc_conn_abort(smc, ini->first_contact_local);
> -	mutex_unlock(&smc_client_lgr_pending);
>   	smc->connect_nonblock = 0;
> 
>   	return reason_code;


You are removing the locking mechanism out of this function completly, 
which is fine because it is only called for a SMC-R connection.


> @@ -2377,7 +2373,8 @@ static void smc_listen_work(struct work_struct *work)
>   	if (rc)
>   		goto out_decl;
> 
> -	mutex_lock(&smc_server_lgr_pending);
> +	if (ini->is_smcd)
> +		mutex_lock(&smc_server_lgr_pending);
>   	smc_close_init(new_smc);
>   	smc_rx_init(new_smc);
>   	smc_tx_init(new_smc);
> @@ -2415,7 +2412,6 @@ static void smc_listen_work(struct work_struct *work)
>   					    ini->first_contact_local, ini);
>   		if (rc)
>   			goto out_unlock;
> -		mutex_unlock(&smc_server_lgr_pending);
>   	}
>   	smc_conn_save_peer_info(new_smc, cclc);
>   	smc_listen_out_connected(new_smc);
> @@ -2423,7 +2419,8 @@ static void smc_listen_work(struct work_struct *work)
>   	goto out_free;
> 
>   out_unlock:
> -	mutex_unlock(&smc_server_lgr_pending);
> +	if (ini->is_smcd)
> +		mutex_unlock(&smc_server_lgr_pending);


You want to remove the mutex lock for SMC-R so you are only locking for 
a SMC-D connection. So far so good. I think you could also remove this 
unlock call since it is only called in the case of a SMC-R connection - 
which means it is obsolete:

l2398 ff. (with your patch on net-next)

     /* receive SMC Confirm CLC message */
     memset(buf, 0, sizeof(*buf));
     cclc = (struct smc_clc_msg_accept_confirm *)buf;
     rc = smc_clc_wait_msg(new_smc, cclc, sizeof(*buf),
                   SMC_CLC_CONFIRM, CLC_WAIT_TIME);
     if (rc) {
x        if (!ini->is_smcd)
x            goto out_unlock;
         goto out_decl;
     }

>   out_decl:
>   	smc_listen_decline(new_smc, rc, ini ? ini->first_contact_local : 0,
>   			   proposal_version);
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index ff49a11..a3338cc 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -46,6 +46,10 @@ struct smc_lgr_list smc_lgr_list = {	/* established link groups */
>   	.num = 0,
>   };
> 
> +struct smc_lgr_manager smc_lgr_manager = {
> +	.lock = __SPIN_LOCK_UNLOCKED(smc_lgr_manager.lock),
> +};
> +
>   static atomic_t lgr_cnt = ATOMIC_INIT(0); /* number of existing link groups */
>   static DECLARE_WAIT_QUEUE_HEAD(lgrs_deleted);
> 
> @@ -55,6 +59,282 @@ static void smc_buf_free(struct smc_link_group *lgr, bool is_rmb,
> 
>   static void smc_link_down_work(struct work_struct *work);
> 
> +/* SMC-R lnk cluster compare func
> + * All lnks that meet the description conditions of this function
> + * are logically aggregated, called lnk cluster.
> + * For the server side, lnk cluster is used to determine whether
> + * a new group needs to be created when processing new imcoming connections.
> + * For the client side, lnk cluster is used to determine whether
> + * to wait for link ready (in other words, first contact ready).
> + */
> +static int smcr_lnk_cluster_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> +	const struct smc_lnk_cluster_compare_arg *key = arg->key;
> +	const struct smc_lnk_cluster *lnkc = obj;
> +
> +	if (memcmp(key->peer_systemid, lnkc->peer_systemid, SMC_SYSTEMID_LEN))
> +		return 1;
> +
> +	if (memcmp(key->peer_gid, lnkc->peer_gid, SMC_GID_SIZE))
> +		return 1;
> +
> +	if ((key->role == SMC_SERV || key->clcqpn == lnkc->clcqpn) &&
> +	    (key->smcr_version == SMC_V2 ||
> +	    !memcmp(key->peer_mac, lnkc->peer_mac, ETH_ALEN)))
> +		return 0;
> +
> +	return 1;
> +}
> +
> +/* SMC-R lnk cluster hash func */
> +static u32 smcr_lnk_cluster_hashfn(const void *data, u32 len, u32 seed)
> +{
> +	const struct smc_lnk_cluster *lnkc = data;
> +
> +	return jhash2((u32 *)lnkc->peer_systemid, SMC_SYSTEMID_LEN / sizeof(u32), seed)
> +		+ (lnkc->role == SMC_SERV) ? 0 : lnkc->clcqpn;
> +}
> +
> +/* SMC-R lnk cluster compare arg hash func */
> +static u32 smcr_lnk_cluster_compare_arg_hashfn(const void *data, u32 len, u32 seed)
> +{
> +	const struct smc_lnk_cluster_compare_arg *key = data;
> +
> +	return jhash2((u32 *)key->peer_systemid, SMC_SYSTEMID_LEN / sizeof(u32), seed)
> +		+ (key->role == SMC_SERV) ? 0 : key->clcqpn;
> +}
> +
> +static const struct rhashtable_params smcr_lnk_cluster_rhl_params = {
> +	.head_offset = offsetof(struct smc_lnk_cluster, rnode),
> +	.key_len = sizeof(struct smc_lnk_cluster_compare_arg),
> +	.obj_cmpfn = smcr_lnk_cluster_cmpfn,
> +	.obj_hashfn = smcr_lnk_cluster_hashfn,
> +	.hashfn = smcr_lnk_cluster_compare_arg_hashfn,
> +	.automatic_shrinking = true,
> +};
> +
> +/* hold a reference for smc_lnk_cluster */
> +static inline void smc_lnk_cluster_hold(struct smc_lnk_cluster *lnkc)
> +{
> +	if (likely(lnkc))
> +		refcount_inc(&lnkc->ref);
> +}
> +
> +/* release a reference for smc_lnk_cluster */
> +static inline void smc_lnk_cluster_put(struct smc_lnk_cluster *lnkc)
> +{
> +	bool do_free = false;
> +
> +	if (!lnkc)
> +		return;
> +
> +	if (refcount_dec_not_one(&lnkc->ref))
> +		return;
> +
> +	spin_lock_bh(&smc_lgr_manager.lock);
> +	/* last ref */
> +	if (refcount_dec_and_test(&lnkc->ref)) {
> +		do_free = true;
> +		rhashtable_remove_fast(&smc_lgr_manager.lnk_cluster_maps, &lnkc->rnode,
> +				       smcr_lnk_cluster_rhl_params);
> +	}
> +	spin_unlock_bh(&smc_lgr_manager.lock);
> +	if (do_free)
> +		kfree(lnkc);
> +}
> +
> +/* Get or create smc_lnk_cluster by key
> + * This function will hold a reference of returned smc_lnk_cluster
> + * or create a new smc_lnk_cluster with the reference initialized to 1。
> + * caller MUST call smc_lnk_cluster_put after this.
> + */
> +static inline struct smc_lnk_cluster *
> +smcr_lnk_get_or_create_cluster(struct smc_lnk_cluster_compare_arg *key)
> +{
> +	struct smc_lnk_cluster *lnkc, *tmp_lnkc;
> +	bool busy_retry;
> +	int err;
> +
> +	/* serving a hardware or software interrupt, or preemption is disabled */
> +	busy_retry = !in_interrupt();
> +
> +	spin_lock_bh(&smc_lgr_manager.lock);
> +	lnkc = rhashtable_lookup_fast(&smc_lgr_manager.lnk_cluster_maps, key,
> +				      smcr_lnk_cluster_rhl_params);
> +	if (!lnkc) {
> +		lnkc = kzalloc(sizeof(*lnkc), GFP_ATOMIC);
> +		if (unlikely(!lnkc))
> +			goto fail;
> +
> +		/* init cluster */
> +		spin_lock_init(&lnkc->lock);
> +		lnkc->role = key->role;
> +		if (key->role == SMC_CLNT)
> +			lnkc->clcqpn = key->clcqpn;
> +		init_waitqueue_head(&lnkc->first_contact_waitqueue);
> +		memcpy(lnkc->peer_systemid, key->peer_systemid, SMC_SYSTEMID_LEN);
> +		memcpy(lnkc->peer_gid, key->peer_gid, SMC_GID_SIZE);
> +		memcpy(lnkc->peer_mac, key->peer_mac, ETH_ALEN);
> +		refcount_set(&lnkc->ref, 1);
> +
> +		do {
> +			err = rhashtable_insert_fast(&smc_lgr_manager.lnk_cluster_maps,
> +						     &lnkc->rnode, smcr_lnk_cluster_rhl_params);
> +
> +			/* success or fatal error */
> +			if (err != -EBUSY)
> +				break;
> +
> +			/* impossible in fact right now */
> +			if (unlikely(!busy_retry)) {
> +				pr_warn_ratelimited("smc: create lnk cluster in softirq\n");
> +				break;
> +			}
> +
> +			spin_unlock_bh(&smc_lgr_manager.lock);
> +			/* yeild */
> +			cond_resched();
> +			spin_lock_bh(&smc_lgr_manager.lock);
> +
> +			/* after spin_unlock_bh(), lnk_cluster_maps may be changed */
> +			tmp_lnkc = rhashtable_lookup_fast(&smc_lgr_manager.lnk_cluster_maps, key,
> +							  smcr_lnk_cluster_rhl_params);
> +
> +			if (unlikely(tmp_lnkc)) {
> +				pr_warn_ratelimited("smc: create cluster failed dues to duplicat key");
> +				kfree(lnkc);
> +				lnkc = NULL;
> +				goto fail;
> +			}
> +		} while (1);
> +
> +		if (unlikely(err)) {
> +			pr_warn_ratelimited("smc: rhashtable_insert_fast failed (%d)", err);
> +			kfree(lnkc);
> +			lnkc = NULL;
> +		}
> +	} else {
> +		smc_lnk_cluster_hold(lnkc);
> +	}
> +fail:
> +	spin_unlock_bh(&smc_lgr_manager.lock);
> +	return lnkc;
> +}
> +
> +/* Get or create a smc_lnk_cluster by lnk
> + * caller MUST call smc_lnk_cluster_put after this.
> + */
> +static inline struct smc_lnk_cluster *smcr_lnk_get_cluster(struct smc_link *lnk)
> +{
> +	struct smc_lnk_cluster_compare_arg key;
> +	struct smc_link_group *lgr;
> +
> +	lgr = lnk->lgr;
> +	if (!lgr || lgr->is_smcd)
> +		return NULL;
> +
> +	key.smcr_version = lgr->smc_version;
> +	key.peer_systemid = lgr->peer_systemid;
> +	key.peer_gid = lnk->peer_gid;
> +	key.peer_mac = lnk->peer_mac;
> +	key.role	 = lgr->role;
> +	if (key.role == SMC_CLNT)
> +		key.clcqpn = lnk->peer_qpn;
> +
> +	return smcr_lnk_get_or_create_cluster(&key);
> +}
> +
> +/* Get or create a smc_lnk_cluster by ini
> + * caller MUST call smc_lnk_cluster_put after this.
> + */
> +static inline struct smc_lnk_cluster *
> +smcr_lnk_get_cluster_by_ini(struct smc_init_info *ini, int role)
> +{
> +	struct smc_lnk_cluster_compare_arg key;
> +
> +	if (ini->is_smcd)
> +		return NULL;
> +
> +	key.smcr_version = ini->smcr_version;
> +	key.peer_systemid = ini->peer_systemid;
> +	key.peer_gid = ini->peer_gid;
> +	key.peer_mac = ini->peer_mac;
> +	key.role	= role;
> +	if (role == SMC_CLNT)
> +		key.clcqpn	= ini->ib_clcqpn;
> +
> +	return smcr_lnk_get_or_create_cluster(&key);
> +}
> +
> +/* callback when smc link state change */
> +void smcr_lnk_cluster_on_lnk_state(struct smc_link *lnk)
> +{
> +	struct smc_lnk_cluster *lnkc;
> +	int nr = 0;
> +
> +	/* barrier for lnk->state */
> +	smp_mb();
> +
> +	/* only first link can made connections block on
> +	 * first_contact_waitqueue
> +	 */
> +	if (lnk->link_idx != SMC_SINGLE_LINK)
> +		return;
> +
> +	/* state already seen  */
> +	if (lnk->state_record & SMC_LNK_STATE_BIT(lnk->state))
> +		return;
> +
> +	lnkc = smcr_lnk_get_cluster(lnk);
> +
> +	if (unlikely(!lnkc))
> +		return;
> +
> +	spin_lock_bh(&lnkc->lock);
> +
> +	/* all lnk state change should be
> +	 * 1. SMC_LNK_UNUSED -> SMC_LNK_TEAR_DWON (link init failed)

Should this really be DWON and not DOWN?

> +	 * 2. SMC_LNK_UNUSED -> SMC_LNK_ACTIVATING -> SMC_LNK_TEAR_DWON
> +	 * 3. SMC_LNK_UNUSED -> SMC_LNK_ACTIVATING -> SMC_LNK_INACTIVE -> SMC_LNK_TEAR_DWON
> +	 * 4. SMC_LNK_UNUSED -> SMC_LNK_ACTIVATING -> SMC_LNK_INACTIVE -> SMC_LNK_TEAR_DWON
> +	 * 5. SMC_LNK_UNUSED -> SMC_LNK_ATIVATING -> SMC_LNK_ACTIVE ->SMC_LNK_INACTIVE
> +	 * -> SMC_LNK_TEAR_DWON
> +	 */
> +	switch (lnk->state) {
> +	case SMC_LNK_ACTIVATING:
> +		/* It's safe to hold a reference without lock
> +		 * dues to the smcr_lnk_get_cluster already hold one
> +		 */
> +		smc_lnk_cluster_hold(lnkc);
> +		break;
> +	case SMC_LNK_TEAR_DWON:
> +		if (lnk->state_record & SMC_LNK_STATE_BIT(SMC_LNK_ACTIVATING))
> +			/* smc_lnk_cluster_hold in SMC_LNK_ACTIVATING */
> +			smc_lnk_cluster_put(lnkc);
> +		fallthrough;
> +	case SMC_LNK_ACTIVE:
> +	case SMC_LNK_INACTIVE:
> +		if (!(lnk->state_record &
> +			(SMC_LNK_STATE_BIT(SMC_LNK_ACTIVE)
> +			| SMC_LNK_STATE_BIT(SMC_LNK_INACTIVE)))) {
> +			lnkc->pending_capability -= (SMC_RMBS_PER_LGR_MAX - 1);
> +			/* TODO: wakeup just one to perfrom first contact
> +			 * if record state has no SMC_LNK_ACTIVE
> +			 */


Todo in a patch.

> +			nr = SMC_RMBS_PER_LGR_MAX - 1;
> +		}
> +		break;
> +	case SMC_LNK_UNUSED:
> +		pr_warn_ratelimited("net/smc: invalid lnk state. ");
> +		break;
> +	}
> +	SMC_LNK_STATE_RECORD(lnk, lnk->state);
> +	spin_unlock_bh(&lnkc->lock);
> +	if (nr)
> +		wake_up_nr(&lnkc->first_contact_waitqueue, nr);
> +	smc_lnk_cluster_put(lnkc);	/* smc_lnk_cluster_hold in smcr_lnk_get_cluster */
> +}
> +
>   /* return head of link group list and its lock for a given link group */
>   static inline struct list_head *smc_lgr_list_head(struct smc_link_group *lgr,
>   						  spinlock_t **lgr_lock)
> @@ -651,8 +931,10 @@ static void smcr_lgr_link_deactivate_all(struct smc_link_group *lgr)
>   	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
>   		struct smc_link *lnk = &lgr->lnk[i];
> 
> -		if (smc_link_sendable(lnk))
> +		if (smc_link_sendable(lnk)) {
>   			lnk->state = SMC_LNK_INACTIVE;
> +			smcr_lnk_cluster_on_lnk_state(lnk);
> +		}
>   	}
>   	wake_up_all(&lgr->llc_msg_waiter);
>   	wake_up_all(&lgr->llc_flow_waiter);
> @@ -762,6 +1044,9 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
>   	atomic_set(&lnk->conn_cnt, 0);
>   	smc_llc_link_set_uid(lnk);
>   	INIT_WORK(&lnk->link_down_wrk, smc_link_down_work);
> +	lnk->peer_qpn = ini->ib_clcqpn;
> +	memcpy(lnk->peer_gid, ini->peer_gid, SMC_GID_SIZE);
> +	memcpy(lnk->peer_mac, ini->peer_mac, sizeof(lnk->peer_mac));
>   	if (!lnk->smcibdev->initialized) {
>   		rc = (int)smc_ib_setup_per_ibdev(lnk->smcibdev);
>   		if (rc)
> @@ -792,6 +1077,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
>   	if (rc)
>   		goto destroy_qp;
>   	lnk->state = SMC_LNK_ACTIVATING;
> +	smcr_lnk_cluster_on_lnk_state(lnk);
>   	return 0;
> 
>   destroy_qp:
> @@ -806,6 +1092,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk,
>   	smc_ibdev_cnt_dec(lnk);
>   	put_device(&lnk->smcibdev->ibdev->dev);
>   	smcibdev = lnk->smcibdev;
> +	lnk->state = SMC_LNK_TEAR_DWON;
> +	smcr_lnk_cluster_on_lnk_state(lnk);
>   	memset(lnk, 0, sizeof(struct smc_link));
>   	lnk->state = SMC_LNK_UNUSED;
>   	if (!atomic_dec_return(&smcibdev->lnk_cnt))
> @@ -1263,6 +1551,8 @@ void smcr_link_clear(struct smc_link *lnk, bool log)
>   	if (!lnk->lgr || lnk->clearing ||
>   	    lnk->state == SMC_LNK_UNUSED)
>   		return;
> +	lnk->state = SMC_LNK_TEAR_DWON;
> +	smcr_lnk_cluster_on_lnk_state(lnk);
>   	lnk->clearing = 1;
>   	lnk->peer_qpn = 0;
>   	smc_llc_link_clear(lnk, log);
> @@ -1712,6 +2002,7 @@ void smcr_link_down_cond(struct smc_link *lnk)
>   {
>   	if (smc_link_downing(&lnk->state)) {
>   		trace_smcr_link_down(lnk, __builtin_return_address(0));
> +		smcr_lnk_cluster_on_lnk_state(lnk);
>   		smcr_link_down(lnk);
>   	}
>   }
> @@ -1721,6 +2012,7 @@ void smcr_link_down_cond_sched(struct smc_link *lnk)
>   {
>   	if (smc_link_downing(&lnk->state)) {
>   		trace_smcr_link_down(lnk, __builtin_return_address(0));
> +		smcr_lnk_cluster_on_lnk_state(lnk);
>   		schedule_work(&lnk->link_down_wrk);
>   	}
>   }
> @@ -1850,11 +2142,13 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   {
>   	struct smc_connection *conn = &smc->conn;
>   	struct net *net = sock_net(&smc->sk);
> +	DECLARE_WAITQUEUE(wait, current);
> +	struct smc_lnk_cluster *lnkc = NULL;

Declared as NULL.

>   	struct list_head *lgr_list;
>   	struct smc_link_group *lgr;
>   	enum smc_lgr_role role;
>   	spinlock_t *lgr_lock;
> -	int rc = 0;
> +	int rc = 0, timeo = CLC_WAIT_TIME;
> 
>   	lgr_list = ini->is_smcd ? &ini->ism_dev[ini->ism_selected]->lgr_list :
>   				  &smc_lgr_list.list;
> @@ -1862,12 +2156,26 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   				  &smc_lgr_list.lock;
>   	ini->first_contact_local = 1;
>   	role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
> -	if (role == SMC_CLNT && ini->first_contact_peer)
> +
> +	if (!ini->is_smcd) {
> +		lnkc = smcr_lnk_get_cluster_by_ini(ini, role);

Here linkc is set if it is SMC-R.

> +		if (unlikely(!lnkc))
> +			return SMC_CLC_DECL_INTERR;
> +	}
> +
> +	if (role == SMC_CLNT && ini->first_contact_peer) {
> +		/* first_contact */
> +		spin_lock_bh(&lnkc->lock);

And here SMC-D dies because of the NULL address. This kills our Systems 
if we try to talk via SMC-D.

[  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
[  779.516391] Fault in home space mode while using kernel ASCE.
[  779.516395] AS:0000000069628007 R3:00000000ffbf0007 
S:00000000ffbef800 P:000000000000003d
[  779.516431] Oops: 0004 ilc:2 [#1] SMP
[  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib 
ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
6 nf_defrag_ipv4 ip_set nf_tables n
[  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 
5.19.0-13940-g22a46254655a #3
[  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)

[  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
[  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c 
(smc_conn_create+0x174/0x968 [smc])
[  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 RI:0 EA:3
[  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 
0000000000000001 0000000000000000
[  779.522773]            000000008a4128a0 000003ff803f21aa 
000000008e30d640 0000000086d72000
[  779.522776]            0000000086d72000 000000008a412803 
000000008a412800 000000008e30d650
[  779.522779]            0000000080934200 0000000000000000 
000003ff803cb954 00000380002dfa88
[  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg 
%r1,232(%r15)
[  779.522789]            000003ff803da894: a7180000            lhi %r1,0
[  779.522789]           #000003ff803da898: 582003ac            l %r2,940
[  779.522789]           >000003ff803da89c: ba123020            cs 
%r1,%r2,32(%r3)
[  779.522789]            000003ff803da8a0: ec1603be007e        cij 
%r1,0,6,000003ff803db01c

[  779.522789]            000003ff803da8a6: 4110b002            la 
%r1,2(%r11)
[  779.522789]            000003ff803da8aa: e310f0f00024        stg 
%r1,240(%r15)
[  779.522789]            000003ff803da8b0: e310f0c00004        lg 
%r1,192(%r15)
[  779.522870] Call Trace:
[  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
[  779.522884]  [<000003ff803cb954>] 
smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]

> +		lnkc->pending_capability += (SMC_RMBS_PER_LGR_MAX - 1);
> +		spin_unlock_bh(&lnkc->lock);
>   		/* create new link group as well */
>   		goto create;
> +	}
> 
>   	/* determine if an existing link group can be reused */
>   	spin_lock_bh(lgr_lock);
> +	spin_lock(&lnkc->lock);
> +again:
>   	list_for_each_entry(lgr, lgr_list, list) {
>   		write_lock_bh(&lgr->conns_lock);
>   		if ((ini->is_smcd ?
> @@ -1894,9 +2202,33 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   		}
>   		write_unlock_bh(&lgr->conns_lock);
>   	}
> +	if (lnkc && ini->first_contact_local) {
> +		if (lnkc->pending_capability > lnkc->conns_pending) {
> +			lnkc->conns_pending++;
> +			add_wait_queue(&lnkc->first_contact_waitqueue, &wait);
> +			spin_unlock(&lnkc->lock);
> +			spin_unlock_bh(lgr_lock);
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			/* need to wait at least once first contact done */
> +			timeo = schedule_timeout(timeo);
> +			set_current_state(TASK_RUNNING);
> +			remove_wait_queue(&lnkc->first_contact_waitqueue, &wait);
> +			spin_lock_bh(lgr_lock);
> +			spin_lock(&lnkc->lock);
> +
> +			lnkc->conns_pending--;
> +			if (timeo)
> +				goto again;
> +		}
> +		if (role == SMC_SERV) {
> +			/* first_contact */
> +			lnkc->pending_capability += (SMC_RMBS_PER_LGR_MAX - 1);
> +		}
> +	}
> +	spin_unlock(&lnkc->lock);
>   	spin_unlock_bh(lgr_lock);
>   	if (rc)
> -		return rc;
> +		goto out;
> 
>   	if (role == SMC_CLNT && !ini->first_contact_peer &&
>   	    ini->first_contact_local) {
> @@ -1904,7 +2236,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   		 * a new one
>   		 * send out_of_sync decline, reason synchr. error
>   		 */
> -		return SMC_CLC_DECL_SYNCERR;
> +		rc = SMC_CLC_DECL_SYNCERR;
> +		goto out;
>   	}
> 
>   create:
> @@ -1941,6 +2274,8 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
>   #endif
> 
>   out:
> +	/* smc_lnk_cluster_hold in smcr_lnk_get_or_create_cluster */
> +	smc_lnk_cluster_put(lnkc);
>   	return rc;
>   }
> 
> @@ -2599,12 +2934,23 @@ static int smc_core_reboot_event(struct notifier_block *this,
> 
>   int __init smc_core_init(void)
>   {
> +	/* init smc lnk cluster maps */
> +	rhashtable_init(&smc_lgr_manager.lnk_cluster_maps, &smcr_lnk_cluster_rhl_params);
>   	return register_reboot_notifier(&smc_reboot_notifier);
>   }
> 
> +static void smc_lnk_cluster_free_cb(void *ptr, void *arg)
> +{
> +	pr_warn("smc: smc lnk cluster refcnt leak.\n");
> +	kfree(ptr);
> +}
> +
>   /* Called (from smc_exit) when module is removed */
>   void smc_core_exit(void)
>   {
>   	unregister_reboot_notifier(&smc_reboot_notifier);
>   	smc_lgrs_shutdown();
> +	/* destroy smc lnk cluster maps */
> +	rhashtable_free_and_destroy(&smc_lgr_manager.lnk_cluster_maps, smc_lnk_cluster_free_cb,
> +				    NULL);
>   }
> diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
> index fe8b524..199f533 100644
> --- a/net/smc/smc_core.h
> +++ b/net/smc/smc_core.h
> @@ -15,6 +15,7 @@
>   #include <linux/atomic.h>
>   #include <linux/smc.h>
>   #include <linux/pci.h>
> +#include <linux/rhashtable.h>
>   #include <rdma/ib_verbs.h>
>   #include <net/genetlink.h>
> 
> @@ -29,18 +30,62 @@ struct smc_lgr_list {			/* list of link group definition */
>   	u32			num;	/* unique link group number */
>   };
> 
> +struct smc_lgr_manager {		/* manager for link group */
> +	struct rhashtable	lnk_cluster_maps;	/* maps of smc_lnk_cluster */
> +	spinlock_t		lock;	/* lock for lgr_cm_maps */
> +};
> +
> +struct smc_lnk_cluster {
> +	struct rhash_head	rnode;	/* node for rhashtable */
> +	struct wait_queue_head	first_contact_waitqueue;
> +					/* queue for non first contact to wait
> +					 * first contact to be established.
> +					 */
> +	spinlock_t		lock;	/* protection for link group */
> +	refcount_t		ref;	/* refcount for cluster */
> +	unsigned long		pending_capability;
> +					/* maximum pending number of connections that
> +					 * need wait first contact complete.
> +					 */
> +	unsigned long		conns_pending;
> +					/* connections that are waiting for first contact
> +					 * complete
> +					 */
> +	u8		peer_systemid[SMC_SYSTEMID_LEN];
> +	u8		peer_mac[ETH_ALEN];	/* = gid[8:10||13:15] */
> +	u8		peer_gid[SMC_GID_SIZE];	/* gid of peer*/
> +	int		clcqpn;
> +	int		role;
> +};
> +
>   enum smc_lgr_role {		/* possible roles of a link group */
>   	SMC_CLNT,	/* client */
>   	SMC_SERV	/* server */
>   };
> 
> +struct smc_lnk_cluster_compare_arg	/* key for smc_lnk_cluster */
> +{
> +	int	smcr_version;
> +	enum smc_lgr_role role;
> +	u8	*peer_systemid;
> +	u8	*peer_gid;
> +	u8	*peer_mac;
> +	int clcqpn;
> +};
> +
>   enum smc_link_state {			/* possible states of a link */
>   	SMC_LNK_UNUSED,		/* link is unused */
>   	SMC_LNK_INACTIVE,	/* link is inactive */
>   	SMC_LNK_ACTIVATING,	/* link is being activated */
>   	SMC_LNK_ACTIVE,		/* link is active */
> +	SMC_LNK_TEAR_DWON,	/* link is tear down */
>   };
> 
> +#define SMC_LNK_STATE_BIT(state)	(1 << (state))
> +
> +#define	SMC_LNK_STATE_RECORD(lnk, state)	\
> +	((lnk)->state_record |= SMC_LNK_STATE_BIT(state))
> +
>   #define SMC_WR_BUF_SIZE		48	/* size of work request buffer */
>   #define SMC_WR_BUF_V2_SIZE	8192	/* size of v2 work request buffer */
> 
> @@ -145,6 +190,7 @@ struct smc_link {
>   	int			ndev_ifidx; /* network device ifindex */
> 
>   	enum smc_link_state	state;		/* state of link */
> +	int			state_record;		/* record of previous state */
>   	struct delayed_work	llc_testlink_wrk; /* testlink worker */
>   	struct completion	llc_testlink_resp; /* wait for rx of testlink */
>   	int			llc_testlink_time; /* testlink interval */
> @@ -557,6 +603,8 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
>   int smcr_nl_get_link(struct sk_buff *skb, struct netlink_callback *cb);
>   int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb);
> 
> +void smcr_lnk_cluster_on_lnk_state(struct smc_link *lnk);
> +
>   static inline struct smc_link_group *smc_get_lgr(struct smc_link *link)
>   {
>   	return link->lgr;
> diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
> index 175026a..8134c15 100644
> --- a/net/smc/smc_llc.c
> +++ b/net/smc/smc_llc.c
> @@ -1099,6 +1099,7 @@ int smc_llc_cli_add_link(struct smc_link *link, struct smc_llc_qentry *qentry)
>   		goto out;
>   out_clear_lnk:
>   	lnk_new->state = SMC_LNK_INACTIVE;
> +	smcr_lnk_cluster_on_lnk_state(lnk_new);
>   	smcr_link_clear(lnk_new, false);
>   out_reject:
>   	smc_llc_cli_add_link_reject(qentry);
> @@ -1278,6 +1279,7 @@ static void smc_llc_delete_asym_link(struct smc_link_group *lgr)
>   		return; /* no asymmetric link */
>   	if (!smc_link_downing(&lnk_asym->state))
>   		return;
> +	smcr_lnk_cluster_on_lnk_state(lnk_asym);
>   	lnk_new = smc_switch_conns(lgr, lnk_asym, false);
>   	smc_wr_tx_wait_no_pending_sends(lnk_asym);
>   	if (!lnk_new)
> @@ -1492,6 +1494,7 @@ int smc_llc_srv_add_link(struct smc_link *link,
>   out_err:
>   	if (link_new) {
>   		link_new->state = SMC_LNK_INACTIVE;
> +		smcr_lnk_cluster_on_lnk_state(link_new);
>   		smcr_link_clear(link_new, false);
>   	}
>   out:
> @@ -1602,8 +1605,10 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr)
>   	del_llc->reason = 0;
>   	smc_llc_send_message(lnk, &qentry->msg); /* response */
> 
> -	if (smc_link_downing(&lnk_del->state))
> +	if (smc_link_downing(&lnk_del->state)) {
> +		smcr_lnk_cluster_on_lnk_state(lnk);
>   		smc_switch_conns(lgr, lnk_del, false);
> +	}
>   	smcr_link_clear(lnk_del, true);
> 
>   	active_links = smc_llc_active_link_count(lgr);
> @@ -1676,6 +1681,7 @@ static void smc_llc_process_srv_delete_link(struct smc_link_group *lgr)
>   		goto out; /* asymmetric link already deleted */
> 
>   	if (smc_link_downing(&lnk_del->state)) {
> +		smcr_lnk_cluster_on_lnk_state(lnk);
>   		if (smc_switch_conns(lgr, lnk_del, false))
>   			smc_wr_tx_wait_no_pending_sends(lnk_del);
>   	}
> @@ -2167,6 +2173,7 @@ void smc_llc_link_active(struct smc_link *link)
>   		schedule_delayed_work(&link->llc_testlink_wrk,
>   				      link->llc_testlink_time);
>   	}
> +	smcr_lnk_cluster_on_lnk_state(link);
>   }
> 
>   /* called in worker context */

^ permalink raw reply

* Re: [PATCH bpf-next] xdp: report rx queue index in xdp_frame
From: Lorenzo Bianconi @ 2022-08-16  9:45 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, ast, daniel, andrii, netdev, davem, kuba, edumazet, pabeni,
	hawk, john.fastabend
In-Reply-To: <181f994e13c816116fa69a1e92c2f69e6330f749.1658746417.git.lorenzo@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]

> Report rx queue index in xdp_frame according to the xdp_buff xdp_rxq_info
> pointer. xdp_frame queue_index is currently used in cpumap code to covert
> the xdp_frame into a xdp_buff.
> xdp_frame size is not increased adding queue_index since an alignment padding
> in the structure is used to insert queue_index field.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h   | 2 ++
>  kernel/bpf/cpumap.c | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)


Hi Alexei and Daniel,

this patch is marked as 'new, archived' in patchwork.
Do I need to rebase and repost it?

Regards,
Lorenzo

> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 04c852c7a77f..3567866b0af5 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -172,6 +172,7 @@ struct xdp_frame {
>  	struct xdp_mem_info mem;
>  	struct net_device *dev_rx; /* used by cpumap */
>  	u32 flags; /* supported values defined in xdp_buff_flags */
> +	u32 queue_index;
>  };
>  
>  static __always_inline bool xdp_frame_has_frags(struct xdp_frame *frame)
> @@ -301,6 +302,7 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
>  
>  	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
>  	xdp_frame->mem = xdp->rxq->mem;
> +	xdp_frame->queue_index = xdp->rxq->queue_index;
>  
>  	return xdp_frame;
>  }
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index f4860ac756cd..09a792d088b3 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -228,7 +228,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>  
>  		rxq.dev = xdpf->dev_rx;
>  		rxq.mem = xdpf->mem;
> -		/* TODO: report queue_index to xdp_rxq_info */
> +		rxq.queue_index = xdpf->queue_index;
>  
>  		xdp_convert_frame_to_buff(xdpf, &xdp);
>  
> -- 
> 2.37.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH xfrm-next 00/26] mlx5 IPsec full offload part
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

This is supplementary part of "Extend XFRM core to allow full offload configuration"
series https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com

The series starts from very basic cleanup, continues with code alignment and
adds IPsec full offload logic to mlx5 driver.

Thanks

Leon Romanovsky (25):
  net/mlx5: Delete esp_id field that is not used
  net/mlx5: Add HW definitions for IPsec full offload
  net/mlx5: Remove from FPGA IFC file not-needed definitions
  net/mlx5e: Advertise IPsec full offload support
  net/mlx5e: Store replay window in XFRM attributes
  net/mlx5e: Remove extra layers of defines
  net/mlx5e: Create symmetric IPsec RX and TX flow steering structs
  net/mlx5e: Use mlx5 print routines for low level IPsec code
  net/mlx5e: Remove accesses to priv for low level IPsec FS code
  net/mlx5e: Validate that IPsec full offload can handle packets
  net/mlx5e: Create Advanced Steering Operation object for IPsec
  net/mlx5e: Create hardware IPsec full offload objects
  net/mlx5e: Move IPsec flow table creation to separate function
  net/mlx5e: Refactor FTE setup code to be more clear
  net/mlx5e: Flatten the IPsec RX add rule path
  net/mlx5e: Make clear what IPsec rx_err does
  net/mlx5e: Group IPsec miss handles into separate struct
  net/mlx5e: Generalize creation of default IPsec miss group and rule
  net/mlx5e: Create IPsec policy offload tables
  net/mlx5e: Add XFRM policy offload logic
  net/mlx5e: Use same coding pattern for Rx and Tx flows
  net/mlx5e: Configure IPsec full offload flow steering
  net/mlx5e: Improve IPsec flow steering autogroup
  net/mlx5e: Skip IPsec encryption for TX path without matching policy
  net/mlx5e: Open mlx5 driver to accept IPsec full offload

Raed Salem (1):
  net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows

 .../net/ethernet/mellanox/mlx5/core/en/fs.h   |    3 +-
 .../mellanox/mlx5/core/en_accel/ipsec.c       |  209 +++-
 .../mellanox/mlx5/core/en_accel/ipsec.h       |   93 +-
 .../mellanox/mlx5/core/en_accel/ipsec_fs.c    | 1066 ++++++++++++-----
 .../mlx5/core/en_accel/ipsec_offload.c        |   81 +-
 .../mellanox/mlx5/core/en_accel/ipsec_stats.c |   52 +
 .../ethernet/mellanox/mlx5/core/en_stats.c    |    1 +
 .../ethernet/mellanox/mlx5/core/en_stats.h    |    1 +
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |    6 +-
 include/linux/mlx5/fs.h                       |    5 +-
 include/linux/mlx5/mlx5_ifc.h                 |   71 +-
 include/linux/mlx5/mlx5_ifc_fpga.h            |   24 -
 12 files changed, 1223 insertions(+), 389 deletions(-)

-- 
2.37.2


^ permalink raw reply

* [PATCH xfrm-next 03/26] net/mlx5: Remove from FPGA IFC file not-needed definitions
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Move IP layout bits definitions to be close to the place that actually
uses it, together with removal extra defines that not in-use.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h      | 16 ++++++++++++++++
 include/linux/mlx5/mlx5_ifc_fpga.h | 24 ------------------------
 2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 57b4ae2dce07..c30036f7b517 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -479,6 +479,22 @@ struct mlx5_ifc_odp_per_transport_service_cap_bits {
 	u8         reserved_at_6[0x1a];
 };
 
+struct mlx5_ifc_ipv4_layout_bits {
+	u8         reserved_at_0[0x60];
+
+	u8         ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+	u8         ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+	u8         reserved_at_0[0x80];
+};
+
 struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
 	u8         smac_47_16[0x20];
 
diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h
index 45c7c0d67635..0596472923ad 100644
--- a/include/linux/mlx5/mlx5_ifc_fpga.h
+++ b/include/linux/mlx5/mlx5_ifc_fpga.h
@@ -32,30 +32,6 @@
 #ifndef MLX5_IFC_FPGA_H
 #define MLX5_IFC_FPGA_H
 
-struct mlx5_ifc_ipv4_layout_bits {
-	u8         reserved_at_0[0x60];
-
-	u8         ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
-	u8         ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
-	struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
-	struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
-	u8         reserved_at_0[0x80];
-};
-
-enum {
-	MLX5_FPGA_CAP_SANDBOX_VENDOR_ID_MLNX = 0x2c9,
-};
-
-enum {
-	MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_IPSEC    = 0x2,
-};
-
 struct mlx5_ifc_fpga_shell_caps_bits {
 	u8         max_num_qps[0x10];
 	u8         reserved_at_10[0x8];
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 02/26] net/mlx5: Add HW definitions for IPsec full offload
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Add all needed bits to support IPsec full offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 55 +++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4acd5610e96b..57b4ae2dce07 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -442,7 +442,10 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
 	u8         max_modify_header_actions[0x8];
 	u8         max_ft_level[0x8];
 
-	u8         reserved_at_40[0x6];
+	u8         reformat_add_esp_trasport[0x1];
+	u8         reserved_at_41[0x2];
+	u8         reformat_del_esp_trasport[0x1];
+	u8         reserved_at_44[0x2];
 	u8         execute_aso[0x1];
 	u8         reserved_at_47[0x19];
 
@@ -611,7 +614,11 @@ struct mlx5_ifc_fte_match_set_misc2_bits {
 
 	u8         metadata_reg_a[0x20];
 
-	u8         reserved_at_1a0[0x60];
+	u8         reserved_at_1a0[0x10];
+	u8         ipsec_syndrome[0x8];
+	u8         ipsec_next_header[0x8];
+
+	u8         reserved_at_1c0[0x40];
 };
 
 struct mlx5_ifc_fte_match_set_misc3_bits {
@@ -6314,6 +6321,9 @@ enum mlx5_reformat_ctx_type {
 	MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL = 0x2,
 	MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2 = 0x3,
 	MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4,
+	MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV4 = 0x5,
+	MLX5_REFORMAT_TYPE_DEL_ESP_TRANSPORT = 0x8,
+	MLX5_REFORMAT_TYPE_ADD_ESP_TRANSPORT_OVER_IPV6 = 0xb,
 	MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf,
 	MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10,
 };
@@ -11477,6 +11487,41 @@ enum {
 	MLX5_IPSEC_OBJECT_ICV_LEN_16B,
 };
 
+enum {
+	MLX5_IPSEC_ASO_REG_C_0_1 = 0x0,
+	MLX5_IPSEC_ASO_REG_C_2_3 = 0x1,
+	MLX5_IPSEC_ASO_REG_C_4_5 = 0x2,
+	MLX5_IPSEC_ASO_REG_C_6_7 = 0x3,
+};
+
+enum {
+	MLX5_IPSEC_ASO_MODE              = 0x0,
+	MLX5_IPSEC_ASO_REPLAY_PROTECTION = 0x1,
+	MLX5_IPSEC_ASO_INC_SN            = 0x2,
+};
+
+struct mlx5_ifc_ipsec_aso_bits {
+	u8         valid[0x1];
+	u8         reserved_at_201[0x1];
+	u8         mode[0x2];
+	u8         window_sz[0x2];
+	u8         soft_lft_arm[0x1];
+	u8         hard_lft_arm[0x1];
+	u8         remove_flow_enable[0x1];
+	u8         esn_event_arm[0x1];
+	u8         reserved_at_20a[0x16];
+
+	u8         remove_flow_pkt_cnt[0x20];
+
+	u8         remove_flow_soft_lft[0x20];
+
+	u8         reserved_at_260[0x80];
+
+	u8         mode_parameter[0x20];
+
+	u8         replay_protection_window[0x100];
+};
+
 struct mlx5_ifc_ipsec_obj_bits {
 	u8         modify_field_select[0x40];
 	u8         full_offload[0x1];
@@ -11498,7 +11543,11 @@ struct mlx5_ifc_ipsec_obj_bits {
 
 	u8         implicit_iv[0x40];
 
-	u8         reserved_at_100[0x700];
+	u8         reserved_at_100[0x8];
+	u8         ipsec_aso_access_pd[0x18];
+	u8         reserved_at_120[0xe0];
+
+	struct mlx5_ifc_ipsec_aso_bits ipsec_aso;
 };
 
 struct mlx5_ifc_create_ipsec_obj_in_bits {
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 01/26] net/mlx5: Delete esp_id field that is not used
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

The esp_id field is not used in mlx5 code, hence we can delete it.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/mlx5/fs.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 8e73c377da2c..714a4c40c5d1 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -243,10 +243,7 @@ struct mlx5_flow_act {
 	u32 action;
 	struct mlx5_modify_hdr  *modify_hdr;
 	struct mlx5_pkt_reformat *pkt_reformat;
-	union {
-		u32 ipsec_obj_id;
-		uintptr_t esp_id;
-	};
+	u32 ipsec_obj_id;
 	u32 flags;
 	struct mlx5_fs_vlan vlan[MLX5_FS_VLAN_DEPTH];
 	struct ib_counters *counters;
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 05/26] net/mlx5e: Store replay window in XFRM attributes
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

As a preparation for future extension of IPsec hardware object to allow
configuration of full offload mode, extend the XFRM validator to check
replay window values.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 12 ++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index c182b640b80d..e811f0d18b2a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -169,6 +169,7 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
 		attrs->esn = sa_entry->esn_state.esn;
 		if (sa_entry->esn_state.overlap)
 			attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP;
+		attrs->replay_window = x->replay_esn->replay_window;
 	}
 
 	/* action */
@@ -260,6 +261,17 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
 		netdev_info(netdev, "Unsupported xfrm offload type\n");
 		return -EINVAL;
 	}
+	if (x->xso.type == XFRM_DEV_OFFLOAD_FULL) {
+		if (x->replay_esn && x->replay_esn->replay_window != 32 &&
+		    x->replay_esn->replay_window != 64 &&
+		    x->replay_esn->replay_window != 128 &&
+		    x->replay_esn->replay_window != 256) {
+			netdev_info(netdev,
+				    "Unsupported replay window size %u\n",
+				    x->replay_esn->replay_window);
+			return -EINVAL;
+		}
+	}
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index feea909d76c6..de064c72b87d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -83,6 +83,7 @@ struct mlx5_accel_esp_xfrm_attrs {
 	} daddr;
 
 	u8 is_ipv6;
+	u32 replay_window;
 };
 
 enum mlx5_ipsec_cap {
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 06/26] net/mlx5e: Remove extra layers of defines
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Instead of performing redefinition of XFRM core defines to same
values but with MLX5_* prefix, cache the input values as is by making
sure that the proper storage objects are used.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ipsec.c       | 17 ++++-----------
 .../mellanox/mlx5/core/en_accel/ipsec.h       | 18 ++++------------
 .../mellanox/mlx5/core/en_accel/ipsec_fs.c    | 21 ++++++++++---------
 .../mlx5/core/en_accel/ipsec_offload.c        | 10 ++++-----
 4 files changed, 23 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index e811f0d18b2a..e4fe0249c5be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -165,29 +165,20 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
 
 	/* esn */
 	if (sa_entry->esn_state.trigger) {
-		attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED;
+		attrs->esn_trigger = true;
 		attrs->esn = sa_entry->esn_state.esn;
-		if (sa_entry->esn_state.overlap)
-			attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP;
+		attrs->esn_overlap = sa_entry->esn_state.overlap;
 		attrs->replay_window = x->replay_esn->replay_window;
 	}
 
-	/* action */
-	attrs->action = (x->xso.dir == XFRM_DEV_OFFLOAD_OUT) ?
-				MLX5_ACCEL_ESP_ACTION_ENCRYPT :
-				      MLX5_ACCEL_ESP_ACTION_DECRYPT;
-	/* flags */
-	attrs->flags |= (x->props.mode == XFRM_MODE_TRANSPORT) ?
-			MLX5_ACCEL_ESP_FLAGS_TRANSPORT :
-			MLX5_ACCEL_ESP_FLAGS_TUNNEL;
-
+	attrs->dir = x->xso.dir;
 	/* spi */
 	attrs->spi = be32_to_cpu(x->id.spi);
 
 	/* source , destination ips */
 	memcpy(&attrs->saddr, x->props.saddr.a6, sizeof(attrs->saddr));
 	memcpy(&attrs->daddr, x->id.daddr.a6, sizeof(attrs->daddr));
-	attrs->is_ipv6 = (x->props.family != AF_INET);
+	attrs->family = x->props.family;
 }
 
 static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index de064c72b87d..7cc091115b5d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -43,18 +43,6 @@
 #define MLX5E_IPSEC_SADB_RX_BITS 10
 #define MLX5E_IPSEC_ESN_SCOPE_MID 0x80000000L
 
-enum mlx5_accel_esp_flags {
-	MLX5_ACCEL_ESP_FLAGS_TUNNEL            = 0,    /* Default */
-	MLX5_ACCEL_ESP_FLAGS_TRANSPORT         = 1UL << 0,
-	MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED     = 1UL << 1,
-	MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP = 1UL << 2,
-};
-
-enum mlx5_accel_esp_action {
-	MLX5_ACCEL_ESP_ACTION_DECRYPT,
-	MLX5_ACCEL_ESP_ACTION_ENCRYPT,
-};
-
 struct aes_gcm_keymat {
 	u64   seq_iv;
 
@@ -66,7 +54,6 @@ struct aes_gcm_keymat {
 };
 
 struct mlx5_accel_esp_xfrm_attrs {
-	enum mlx5_accel_esp_action action;
 	u32   esn;
 	u32   spi;
 	u32   flags;
@@ -82,7 +69,10 @@ struct mlx5_accel_esp_xfrm_attrs {
 		__be32 a6[4];
 	} daddr;
 
-	u8 is_ipv6;
+	u8 dir : 2;
+	u8 esn_overlap : 1;
+	u8 esn_trigger : 1;
+	u8 family;
 	u32 replay_window;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index f8113fd23265..45501764a9bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -338,7 +338,7 @@ static void setup_fte_common(struct mlx5_accel_esp_xfrm_attrs *attrs,
 			     struct mlx5_flow_spec *spec,
 			     struct mlx5_flow_act *flow_act)
 {
-	u8 ip_version = attrs->is_ipv6 ? 6 : 4;
+	u8 ip_version = (attrs->family == AF_INET) ? 4 : 6;
 
 	spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS | MLX5_MATCH_MISC_PARAMETERS;
 
@@ -407,7 +407,7 @@ static int rx_add_rule(struct mlx5e_priv *priv,
 	int err = 0;
 
 	accel_esp = priv->ipsec->rx_fs;
-	type = attrs->is_ipv6 ? ACCEL_FS_ESP6 : ACCEL_FS_ESP4;
+	type = (attrs->family == AF_INET) ? ACCEL_FS_ESP4 : ACCEL_FS_ESP6;
 	fs_prot = &accel_esp->fs_prot[type];
 
 	err = rx_ft_get(priv, type);
@@ -449,8 +449,8 @@ static int rx_add_rule(struct mlx5e_priv *priv,
 	rule = mlx5_add_flow_rules(fs_prot->ft, spec, &flow_act, &dest, 1);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
-		netdev_err(priv->netdev, "fail to add ipsec rule attrs->action=0x%x, err=%d\n",
-			   attrs->action, err);
+		netdev_err(priv->netdev, "fail to add RX ipsec rule err=%d\n",
+			   err);
 		goto out_err;
 	}
 
@@ -501,8 +501,8 @@ static int tx_add_rule(struct mlx5e_priv *priv,
 	rule = mlx5_add_flow_rules(priv->ipsec->tx_fs->ft, spec, &flow_act, NULL, 0);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
-		netdev_err(priv->netdev, "fail to add ipsec rule attrs->action=0x%x, err=%d\n",
-				sa_entry->attrs.action, err);
+		netdev_err(priv->netdev, "fail to add TX ipsec rule err=%d\n",
+			   err);
 		goto out;
 	}
 
@@ -518,7 +518,7 @@ static int tx_add_rule(struct mlx5e_priv *priv,
 int mlx5e_accel_ipsec_fs_add_rule(struct mlx5e_priv *priv,
 				  struct mlx5e_ipsec_sa_entry *sa_entry)
 {
-	if (sa_entry->attrs.action == MLX5_ACCEL_ESP_ACTION_ENCRYPT)
+	if (sa_entry->attrs.dir == XFRM_DEV_OFFLOAD_OUT)
 		return tx_add_rule(priv, sa_entry);
 
 	return rx_add_rule(priv, sa_entry);
@@ -529,17 +529,18 @@ void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_priv *priv,
 {
 	struct mlx5e_ipsec_rule *ipsec_rule = &sa_entry->ipsec_rule;
 	struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
+	enum accel_fs_esp_type type;
 
 	mlx5_del_flow_rules(ipsec_rule->rule);
 
-	if (sa_entry->attrs.action == MLX5_ACCEL_ESP_ACTION_ENCRYPT) {
+	if (sa_entry->attrs.dir == XFRM_DEV_OFFLOAD_OUT) {
 		tx_ft_put(priv);
 		return;
 	}
 
 	mlx5_modify_header_dealloc(mdev, ipsec_rule->set_modify_hdr);
-	rx_ft_put(priv,
-		  sa_entry->attrs.is_ipv6 ? ACCEL_FS_ESP6 : ACCEL_FS_ESP4);
+	type = (sa_entry->attrs.family == AF_INET) ? ACCEL_FS_ESP4 : ACCEL_FS_ESP6;
+	rx_ft_put(priv, type);
 }
 
 void mlx5e_accel_ipsec_fs_cleanup(struct mlx5e_ipsec *ipsec)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
index e93775eb40b7..1e586db009be 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
@@ -76,11 +76,10 @@ static int mlx5_create_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry)
 	salt_iv_p = MLX5_ADDR_OF(ipsec_obj, obj, implicit_iv);
 	memcpy(salt_iv_p, &aes_gcm->seq_iv, sizeof(aes_gcm->seq_iv));
 	/* esn */
-	if (attrs->flags & MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED) {
+	if (attrs->esn_trigger) {
 		MLX5_SET(ipsec_obj, obj, esn_en, 1);
 		MLX5_SET(ipsec_obj, obj, esn_msb, attrs->esn);
-		if (attrs->flags & MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP)
-			MLX5_SET(ipsec_obj, obj, esn_overlap, 1);
+		MLX5_SET(ipsec_obj, obj, esn_overlap, attrs->esn_overlap);
 	}
 
 	MLX5_SET(ipsec_obj, obj, dekn, sa_entry->enc_key_id);
@@ -162,7 +161,7 @@ static int mlx5_modify_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
 	void *obj;
 	int err;
 
-	if (!(attrs->flags & MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED))
+	if (!attrs->esn_trigger)
 		return 0;
 
 	general_obj_types = MLX5_CAP_GEN_64(mdev, general_obj_types);
@@ -193,8 +192,7 @@ static int mlx5_modify_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
 		   MLX5_MODIFY_IPSEC_BITMASK_ESN_OVERLAP |
 			   MLX5_MODIFY_IPSEC_BITMASK_ESN_MSB);
 	MLX5_SET(ipsec_obj, obj, esn_msb, attrs->esn);
-	if (attrs->flags & MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP)
-		MLX5_SET(ipsec_obj, obj, esn_overlap, 1);
+	MLX5_SET(ipsec_obj, obj, esn_overlap, attrs->esn_overlap);
 
 	/* general object fields set */
 	MLX5_SET(general_obj_in_cmd_hdr, in, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 04/26] net/mlx5e: Advertise IPsec full offload support
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Add needed capabilities check to determine if device supports IPsec
full offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h   |  1 +
 .../mellanox/mlx5/core/en_accel/ipsec_offload.c        | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index 16bcceec16c4..feea909d76c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -88,6 +88,7 @@ struct mlx5_accel_esp_xfrm_attrs {
 enum mlx5_ipsec_cap {
 	MLX5_IPSEC_CAP_CRYPTO		= 1 << 0,
 	MLX5_IPSEC_CAP_ESN		= 1 << 1,
+	MLX5_IPSEC_CAP_FULL_OFFLOAD	= 1 << 2,
 };
 
 struct mlx5e_priv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
index 792724ce7336..e93775eb40b7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
@@ -1,12 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /* Copyright (c) 2017, Mellanox Technologies inc. All rights reserved. */
 
+#include <linux/mlx5/eswitch.h>
 #include "mlx5_core.h"
 #include "ipsec.h"
 #include "lib/mlx5.h"
 
 u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev)
 {
+	bool esw_encap;
 	u32 caps = 0;
 
 	if (!MLX5_CAP_GEN(mdev, ipsec_offload))
@@ -31,6 +33,14 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev)
 	    MLX5_CAP_ETH(mdev, insert_trailer) && MLX5_CAP_ETH(mdev, swp))
 		caps |= MLX5_IPSEC_CAP_CRYPTO;
 
+	esw_encap = mlx5_eswitch_get_encap_mode(mdev) !=
+		    DEVLINK_ESWITCH_ENCAP_MODE_NONE;
+	if (!esw_encap && MLX5_CAP_IPSEC(mdev, ipsec_full_offload) &&
+	    MLX5_CAP_FLOWTABLE_NIC_TX(mdev, reformat_add_esp_trasport) &&
+	    MLX5_CAP_FLOWTABLE_NIC_RX(mdev, reformat_del_esp_trasport) &&
+	    MLX5_CAP_FLOWTABLE_NIC_RX(mdev, decap))
+		caps |= MLX5_IPSEC_CAP_FULL_OFFLOAD;
+
 	if (!caps)
 		return 0;
 
-- 
2.37.2


^ permalink raw reply related

* [PATCH xfrm-next 07/26] net/mlx5e: Create symmetric IPsec RX and TX flow steering structs
From: Leon Romanovsky @ 2022-08-16 10:37 UTC (permalink / raw)
  To: Steffen Klassert, David S . Miller, Jakub Kicinski,
	Saeed Mahameed
  Cc: Leon Romanovsky, Eric Dumazet, netdev, Paolo Abeni, Raed Salem,
	ipsec-devel
In-Reply-To: <cover.1660641154.git.leonro@nvidia.com>

From: Leon Romanovsky <leonro@nvidia.com>

Remove AF family obfuscation by creating symmetric structs for RX and
TX IPsec flow steering chains. This simplifies to us low level IPsec
FS creation logic without need to dig into multiple levels of structs.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/ipsec.h       |   7 +-
 .../mellanox/mlx5/core/en_accel/ipsec_fs.c    | 280 ++++++++----------
 2 files changed, 132 insertions(+), 155 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index 7cc091115b5d..02c3e6334cdd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -94,7 +94,7 @@ struct mlx5e_ipsec_sw_stats {
 	atomic64_t ipsec_tx_drop_trailer;
 };
 
-struct mlx5e_accel_fs_esp;
+struct mlx5e_ipsec_rx;
 struct mlx5e_ipsec_tx;
 
 struct mlx5e_ipsec {
@@ -103,8 +103,9 @@ struct mlx5e_ipsec {
 	spinlock_t sadb_rx_lock; /* Protects sadb_rx */
 	struct mlx5e_ipsec_sw_stats sw_stats;
 	struct workqueue_struct *wq;
-	struct mlx5e_accel_fs_esp *rx_fs;
-	struct mlx5e_ipsec_tx *tx_fs;
+	struct mlx5e_ipsec_rx *rx_ipv4;
+	struct mlx5e_ipsec_rx *rx_ipv6;
+	struct mlx5e_ipsec_tx *tx;
 };
 
 struct mlx5e_ipsec_esn_state {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
index 45501764a9bb..fdc4517fa104 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
@@ -9,49 +9,40 @@
 
 #define NUM_IPSEC_FTE BIT(15)
 
-enum accel_fs_esp_type {
-	ACCEL_FS_ESP4,
-	ACCEL_FS_ESP6,
-	ACCEL_FS_ESP_NUM_TYPES,
-};
-
 struct mlx5e_ipsec_rx_err {
 	struct mlx5_flow_table *ft;
 	struct mlx5_flow_handle *rule;
 	struct mlx5_modify_hdr *copy_modify_hdr;
 };
 
-struct mlx5e_accel_fs_esp_prot {
-	struct mlx5_flow_table *ft;
+struct mlx5e_ipsec_ft {
+	struct mutex mutex; /* Protect changes to this struct */
+	struct mlx5_flow_table *sa;
+	u32 refcnt;
+};
+
+struct mlx5e_ipsec_rx {
+	struct mlx5e_ipsec_ft ft;
 	struct mlx5_flow_group *miss_group;
 	struct mlx5_flow_handle *miss_rule;
 	struct mlx5_flow_destination default_dest;
 	struct mlx5e_ipsec_rx_err rx_err;
-	u32 refcnt;
-	struct mutex prot_mutex; /* protect ESP4/ESP6 protocol */
-};
-
-struct mlx5e_accel_fs_esp {
-	struct mlx5e_accel_fs_esp_prot fs_prot[ACCEL_FS_ESP_NUM_TYPES];
 };
 
 struct mlx5e_ipsec_tx {
+	struct mlx5e_ipsec_ft ft;
 	struct mlx5_flow_namespace *ns;
-	struct mlx5_flow_table *ft;
-	struct mutex mutex; /* Protect IPsec TX steering */
-	u32 refcnt;
 };
 
 /* IPsec RX flow steering */
-static enum mlx5_traffic_types fs_esp2tt(enum accel_fs_esp_type i)
+static enum mlx5_traffic_types family2tt(u32 family)
 {
-	if (i == ACCEL_FS_ESP4)
+	if (family == AF_INET)
 		return MLX5_TT_IPV4_IPSEC_ESP;
 	return MLX5_TT_IPV6_IPSEC_ESP;
 }
 
-static int rx_err_add_rule(struct mlx5e_priv *priv,
-			   struct mlx5e_accel_fs_esp_prot *fs_prot,
+static int rx_err_add_rule(struct mlx5e_priv *priv, struct mlx5e_ipsec_rx *rx,
 			   struct mlx5e_ipsec_rx_err *rx_err)
 {
 	u8 action[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {};
@@ -89,7 +80,7 @@ static int rx_err_add_rule(struct mlx5e_priv *priv,
 			  MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
 	flow_act.modify_hdr = modify_hdr;
 	fte = mlx5_add_flow_rules(rx_err->ft, spec, &flow_act,
-				  &fs_prot->default_dest, 1);
+				  &rx->default_dest, 1);
 	if (IS_ERR(fte)) {
 		err = PTR_ERR(fte);
 		netdev_err(priv->netdev, "fail to add ipsec rx err copy rule err=%d\n", err);
@@ -108,11 +99,10 @@ static int rx_err_add_rule(struct mlx5e_priv *priv,
 	return err;
 }
 
-static int rx_fs_create(struct mlx5e_priv *priv,
-			struct mlx5e_accel_fs_esp_prot *fs_prot)
+static int rx_fs_create(struct mlx5e_priv *priv, struct mlx5e_ipsec_rx *rx)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
-	struct mlx5_flow_table *ft = fs_prot->ft;
+	struct mlx5_flow_table *ft = rx->ft.sa;
 	struct mlx5_flow_group *miss_group;
 	struct mlx5_flow_handle *miss_rule;
 	MLX5_DECLARE_FLOW_ACT(flow_act);
@@ -136,55 +126,44 @@ static int rx_fs_create(struct mlx5e_priv *priv,
 		netdev_err(priv->netdev, "fail to create ipsec rx miss_group err=%d\n", err);
 		goto out;
 	}
-	fs_prot->miss_group = miss_group;
+	rx->miss_group = miss_group;
 
 	/* Create miss rule */
-	miss_rule = mlx5_add_flow_rules(ft, spec, &flow_act, &fs_prot->default_dest, 1);
+	miss_rule =
+		mlx5_add_flow_rules(ft, spec, &flow_act, &rx->default_dest, 1);
 	if (IS_ERR(miss_rule)) {
-		mlx5_destroy_flow_group(fs_prot->miss_group);
+		mlx5_destroy_flow_group(rx->miss_group);
 		err = PTR_ERR(miss_rule);
 		netdev_err(priv->netdev, "fail to create ipsec rx miss_rule err=%d\n", err);
 		goto out;
 	}
-	fs_prot->miss_rule = miss_rule;
+	rx->miss_rule = miss_rule;
 out:
 	kvfree(flow_group_in);
 	kvfree(spec);
 	return err;
 }
 
-static void rx_destroy(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
+static void rx_destroy(struct mlx5e_priv *priv, struct mlx5e_ipsec_rx *rx)
 {
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
-	struct mlx5e_accel_fs_esp *accel_esp;
-
-	accel_esp = priv->ipsec->rx_fs;
+	mlx5_del_flow_rules(rx->miss_rule);
+	mlx5_destroy_flow_group(rx->miss_group);
+	mlx5_destroy_flow_table(rx->ft.sa);
 
-	/* The netdev unreg already happened, so all offloaded rule are already removed */
-	fs_prot = &accel_esp->fs_prot[type];
-
-	mlx5_del_flow_rules(fs_prot->miss_rule);
-	mlx5_destroy_flow_group(fs_prot->miss_group);
-	mlx5_destroy_flow_table(fs_prot->ft);
-
-	mlx5_del_flow_rules(fs_prot->rx_err.rule);
-	mlx5_modify_header_dealloc(priv->mdev, fs_prot->rx_err.copy_modify_hdr);
-	mlx5_destroy_flow_table(fs_prot->rx_err.ft);
+	mlx5_del_flow_rules(rx->rx_err.rule);
+	mlx5_modify_header_dealloc(priv->mdev, rx->rx_err.copy_modify_hdr);
+	mlx5_destroy_flow_table(rx->rx_err.ft);
 }
 
-static int rx_create(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
+static int rx_create(struct mlx5e_priv *priv, struct mlx5e_ipsec_rx *rx,
+		     u32 family)
 {
 	struct mlx5_flow_table_attr ft_attr = {};
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
-	struct mlx5e_accel_fs_esp *accel_esp;
 	struct mlx5_flow_table *ft;
 	int err;
 
-	accel_esp = priv->ipsec->rx_fs;
-	fs_prot = &accel_esp->fs_prot[type];
-
-	fs_prot->default_dest =
-		mlx5_ttc_get_default_dest(priv->fs->ttc, fs_esp2tt(type));
+	rx->default_dest =
+		mlx5_ttc_get_default_dest(priv->fs->ttc, family2tt(family));
 
 	ft_attr.max_fte = 1;
 	ft_attr.autogroup.max_num_groups = 1;
@@ -194,8 +173,8 @@ static int rx_create(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
 	if (IS_ERR(ft))
 		return PTR_ERR(ft);
 
-	fs_prot->rx_err.ft = ft;
-	err = rx_err_add_rule(priv, fs_prot, &fs_prot->rx_err);
+	rx->rx_err.ft = ft;
+	err = rx_err_add_rule(priv, rx, &rx->rx_err);
 	if (err)
 		goto err_add;
 
@@ -210,74 +189,80 @@ static int rx_create(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
 		err = PTR_ERR(ft);
 		goto err_fs_ft;
 	}
-	fs_prot->ft = ft;
+	rx->ft.sa = ft;
 
-	err = rx_fs_create(priv, fs_prot);
+	err = rx_fs_create(priv, rx);
 	if (err)
 		goto err_fs;
 
 	return 0;
 
 err_fs:
-	mlx5_destroy_flow_table(fs_prot->ft);
+	mlx5_destroy_flow_table(rx->ft.sa);
 err_fs_ft:
-	mlx5_del_flow_rules(fs_prot->rx_err.rule);
-	mlx5_modify_header_dealloc(priv->mdev, fs_prot->rx_err.copy_modify_hdr);
+	mlx5_del_flow_rules(rx->rx_err.rule);
+	mlx5_modify_header_dealloc(priv->mdev, rx->rx_err.copy_modify_hdr);
 err_add:
-	mlx5_destroy_flow_table(fs_prot->rx_err.ft);
+	mlx5_destroy_flow_table(rx->rx_err.ft);
 	return err;
 }
 
-static int rx_ft_get(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
+static struct mlx5e_ipsec_rx *rx_ft_get(struct mlx5e_priv *priv, u32 family)
 {
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
 	struct mlx5_flow_destination dest = {};
-	struct mlx5e_accel_fs_esp *accel_esp;
+	struct mlx5e_ipsec_rx *rx;
 	int err = 0;
 
-	accel_esp = priv->ipsec->rx_fs;
-	fs_prot = &accel_esp->fs_prot[type];
-	mutex_lock(&fs_prot->prot_mutex);
-	if (fs_prot->refcnt)
+	if (family == AF_INET)
+		rx = priv->ipsec->rx_ipv4;
+	else
+		rx = priv->ipsec->rx_ipv6;
+
+	mutex_lock(&rx->ft.mutex);
+	if (rx->ft.refcnt)
 		goto skip;
 
 	/* create FT */
-	err = rx_create(priv, type);
+	err = rx_create(priv, rx, family);
 	if (err)
 		goto out;
 
 	/* connect */
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
-	dest.ft = fs_prot->ft;
-	mlx5_ttc_fwd_dest(priv->fs->ttc, fs_esp2tt(type), &dest);
+	dest.ft = rx->ft.sa;
+	mlx5_ttc_fwd_dest(priv->fs->ttc, family2tt(family), &dest);
 
 skip:
-	fs_prot->refcnt++;
+	rx->ft.refcnt++;
 out:
-	mutex_unlock(&fs_prot->prot_mutex);
-	return err;
+	mutex_unlock(&rx->ft.mutex);
+	if (err)
+		return ERR_PTR(err);
+	return rx;
 }
 
-static void rx_ft_put(struct mlx5e_priv *priv, enum accel_fs_esp_type type)
+static void rx_ft_put(struct mlx5e_priv *priv, u32 family)
 {
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
-	struct mlx5e_accel_fs_esp *accel_esp;
-
-	accel_esp = priv->ipsec->rx_fs;
-	fs_prot = &accel_esp->fs_prot[type];
-	mutex_lock(&fs_prot->prot_mutex);
-	fs_prot->refcnt--;
-	if (fs_prot->refcnt)
+	struct mlx5e_ipsec_rx *rx;
+
+	if (family == AF_INET)
+		rx = priv->ipsec->rx_ipv4;
+	else
+		rx = priv->ipsec->rx_ipv6;
+
+	mutex_lock(&rx->ft.mutex);
+	rx->ft.refcnt--;
+	if (rx->ft.refcnt)
 		goto out;
 
 	/* disconnect */
-	mlx5_ttc_fwd_default_dest(priv->fs->ttc, fs_esp2tt(type));
+	mlx5_ttc_fwd_default_dest(priv->fs->ttc, family2tt(family));
 
 	/* remove FT */
-	rx_destroy(priv, type);
+	rx_destroy(priv, rx);
 
 out:
-	mutex_unlock(&fs_prot->prot_mutex);
+	mutex_unlock(&rx->ft.mutex);
 }
 
 /* IPsec TX flow steering */
@@ -290,47 +275,49 @@ static int tx_create(struct mlx5e_priv *priv)
 
 	ft_attr.max_fte = NUM_IPSEC_FTE;
 	ft_attr.autogroup.max_num_groups = 1;
-	ft = mlx5_create_auto_grouped_flow_table(ipsec->tx_fs->ns, &ft_attr);
+	ft = mlx5_create_auto_grouped_flow_table(ipsec->tx->ns, &ft_attr);
 	if (IS_ERR(ft)) {
 		err = PTR_ERR(ft);
 		netdev_err(priv->netdev, "fail to create ipsec tx ft err=%d\n", err);
 		return err;
 	}
-	ipsec->tx_fs->ft = ft;
+	ipsec->tx->ft.sa = ft;
 	return 0;
 }
 
-static int tx_ft_get(struct mlx5e_priv *priv)
+static struct mlx5e_ipsec_tx *tx_ft_get(struct mlx5e_priv *priv)
 {
-	struct mlx5e_ipsec_tx *tx_fs = priv->ipsec->tx_fs;
+	struct mlx5e_ipsec_tx *tx = priv->ipsec->tx;
 	int err = 0;
 
-	mutex_lock(&tx_fs->mutex);
-	if (tx_fs->refcnt)
+	mutex_lock(&tx->ft.mutex);
+	if (tx->ft.refcnt)
 		goto skip;
 
 	err = tx_create(priv);
 	if (err)
 		goto out;
 skip:
-	tx_fs->refcnt++;
+	tx->ft.refcnt++;
 out:
-	mutex_unlock(&tx_fs->mutex);
-	return err;
+	mutex_unlock(&tx->ft.mutex);
+	if (err)
+		return ERR_PTR(err);
+	return tx;
 }
 
 static void tx_ft_put(struct mlx5e_priv *priv)
 {
-	struct mlx5e_ipsec_tx *tx_fs = priv->ipsec->tx_fs;
+	struct mlx5e_ipsec_tx *tx = priv->ipsec->tx;
 
-	mutex_lock(&tx_fs->mutex);
-	tx_fs->refcnt--;
-	if (tx_fs->refcnt)
+	mutex_lock(&tx->ft.mutex);
+	tx->ft.refcnt--;
+	if (tx->ft.refcnt)
 		goto out;
 
-	mlx5_destroy_flow_table(tx_fs->ft);
+	mlx5_destroy_flow_table(tx->ft.sa);
 out:
-	mutex_unlock(&tx_fs->mutex);
+	mutex_unlock(&tx->ft.mutex);
 }
 
 static void setup_fte_common(struct mlx5_accel_esp_xfrm_attrs *attrs,
@@ -397,22 +384,16 @@ static int rx_add_rule(struct mlx5e_priv *priv,
 	struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs;
 	u32 ipsec_obj_id = sa_entry->ipsec_obj_id;
 	struct mlx5_modify_hdr *modify_hdr = NULL;
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
 	struct mlx5_flow_destination dest = {};
-	struct mlx5e_accel_fs_esp *accel_esp;
 	struct mlx5_flow_act flow_act = {};
 	struct mlx5_flow_handle *rule;
-	enum accel_fs_esp_type type;
 	struct mlx5_flow_spec *spec;
+	struct mlx5e_ipsec_rx *rx;
 	int err = 0;
 
-	accel_esp = priv->ipsec->rx_fs;
-	type = (attrs->family == AF_INET) ? ACCEL_FS_ESP4 : ACCEL_FS_ESP6;
-	fs_prot = &accel_esp->fs_prot[type];
-
-	err = rx_ft_get(priv, type);
-	if (err)
-		return err;
+	rx = rx_ft_get(priv, attrs->family);
+	if (IS_ERR(rx))
+		return PTR_ERR(rx);
 
 	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
 	if (!spec) {
@@ -445,8 +426,8 @@ static int rx_add_rule(struct mlx5e_priv *priv,
 			  MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
 	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 	flow_act.modify_hdr = modify_hdr;
-	dest.ft = fs_prot->rx_err.ft;
-	rule = mlx5_add_flow_rules(fs_prot->ft, spec, &flow_act, &dest, 1);
+	dest.ft = rx->rx_err.ft;
+	rule = mlx5_add_flow_rules(rx->ft.sa, spec, &flow_act, &dest, 1);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
 		netdev_err(priv->netdev, "fail to add RX ipsec rule err=%d\n",
@@ -461,7 +442,7 @@ static int rx_add_rule(struct mlx5e_priv *priv,
 out_err:
 	if (modify_hdr)
 		mlx5_modify_header_dealloc(priv->mdev, modify_hdr);
-	rx_ft_put(priv, type);
+	rx_ft_put(priv, attrs->family);
 
 out:
 	kvfree(spec);
@@ -474,11 +455,12 @@ static int tx_add_rule(struct mlx5e_priv *priv,
 	struct mlx5_flow_act flow_act = {};
 	struct mlx5_flow_handle *rule;
 	struct mlx5_flow_spec *spec;
+	struct mlx5e_ipsec_tx *tx;
 	int err = 0;
 
-	err = tx_ft_get(priv);
-	if (err)
-		return err;
+	tx = tx_ft_get(priv);
+	if (IS_ERR(tx))
+		return PTR_ERR(tx);
 
 	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
 	if (!spec) {
@@ -498,7 +480,7 @@ static int tx_add_rule(struct mlx5e_priv *priv,
 
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW |
 			  MLX5_FLOW_CONTEXT_ACTION_IPSEC_ENCRYPT;
-	rule = mlx5_add_flow_rules(priv->ipsec->tx_fs->ft, spec, &flow_act, NULL, 0);
+	rule = mlx5_add_flow_rules(tx->ft.sa, spec, &flow_act, NULL, 0);
 	if (IS_ERR(rule)) {
 		err = PTR_ERR(rule);
 		netdev_err(priv->netdev, "fail to add TX ipsec rule err=%d\n",
@@ -529,7 +511,6 @@ void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_priv *priv,
 {
 	struct mlx5e_ipsec_rule *ipsec_rule = &sa_entry->ipsec_rule;
 	struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
-	enum accel_fs_esp_type type;
 
 	mlx5_del_flow_rules(ipsec_rule->rule);
 
@@ -539,38 +520,30 @@ void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_priv *priv,
 	}
 
 	mlx5_modify_header_dealloc(mdev, ipsec_rule->set_modify_hdr);
-	type = (sa_entry->attrs.family == AF_INET) ? ACCEL_FS_ESP4 : ACCEL_FS_ESP6;
-	rx_ft_put(priv, type);
+	rx_ft_put(priv, sa_entry->attrs.family);
 }
 
 void mlx5e_accel_ipsec_fs_cleanup(struct mlx5e_ipsec *ipsec)
 {
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
-	struct mlx5e_accel_fs_esp *accel_esp;
-	enum accel_fs_esp_type i;
-
-	if (!ipsec->rx_fs)
+	if (!ipsec->tx)
 		return;
 
-	mutex_destroy(&ipsec->tx_fs->mutex);
-	WARN_ON(ipsec->tx_fs->refcnt);
-	kfree(ipsec->tx_fs);
+	mutex_destroy(&ipsec->tx->ft.mutex);
+	WARN_ON(ipsec->tx->ft.refcnt);
+	kfree(ipsec->tx);
 
-	accel_esp = ipsec->rx_fs;
-	for (i = 0; i < ACCEL_FS_ESP_NUM_TYPES; i++) {
-		fs_prot = &accel_esp->fs_prot[i];
-		mutex_destroy(&fs_prot->prot_mutex);
-		WARN_ON(fs_prot->refcnt);
-	}
-	kfree(ipsec->rx_fs);
+	mutex_destroy(&ipsec->rx_ipv4->ft.mutex);
+	WARN_ON(ipsec->rx_ipv4->ft.refcnt);
+	kfree(ipsec->rx_ipv4);
+
+	mutex_destroy(&ipsec->rx_ipv6->ft.mutex);
+	WARN_ON(ipsec->rx_ipv6->ft.refcnt);
+	kfree(ipsec->rx_ipv6);
 }
 
 int mlx5e_accel_ipsec_fs_init(struct mlx5e_ipsec *ipsec)
 {
-	struct mlx5e_accel_fs_esp_prot *fs_prot;
-	struct mlx5e_accel_fs_esp *accel_esp;
 	struct mlx5_flow_namespace *ns;
-	enum accel_fs_esp_type i;
 	int err = -ENOMEM;
 
 	ns = mlx5_get_flow_namespace(ipsec->mdev,
@@ -578,26 +551,29 @@ int mlx5e_accel_ipsec_fs_init(struct mlx5e_ipsec *ipsec)
 	if (!ns)
 		return -EOPNOTSUPP;
 
-	ipsec->tx_fs = kzalloc(sizeof(*ipsec->tx_fs), GFP_KERNEL);
-	if (!ipsec->tx_fs)
+	ipsec->tx = kzalloc(sizeof(*ipsec->tx), GFP_KERNEL);
+	if (!ipsec->tx)
 		return -ENOMEM;
 
-	ipsec->rx_fs = kzalloc(sizeof(*ipsec->rx_fs), GFP_KERNEL);
-	if (!ipsec->rx_fs)
-		goto err_rx;
+	ipsec->rx_ipv4 = kzalloc(sizeof(*ipsec->rx_ipv4), GFP_KERNEL);
+	if (!ipsec->rx_ipv4)
+		goto err_rx_ipv4;
 
-	mutex_init(&ipsec->tx_fs->mutex);
-	ipsec->tx_fs->ns = ns;
+	ipsec->rx_ipv6 = kzalloc(sizeof(*ipsec->rx_ipv6), GFP_KERNEL);
+	if (!ipsec->rx_ipv6)
+		goto err_rx_ipv6;
 
-	accel_esp = ipsec->rx_fs;
-	for (i = 0; i < ACCEL_FS_ESP_NUM_TYPES; i++) {
-		fs_prot = &accel_esp->fs_prot[i];
-		mutex_init(&fs_prot->prot_mutex);
-	}
+	mutex_init(&ipsec->tx->ft.mutex);
+	mutex_init(&ipsec->rx_ipv4->ft.mutex);
+	mutex_init(&ipsec->rx_ipv6->ft.mutex);
+	ipsec->tx->ns = ns;
 
 	return 0;
 
-err_rx:
-	kfree(ipsec->tx_fs);
+err_rx_ipv6:
+	kfree(ipsec->rx_ipv4);
+err_rx_ipv4:
+	kfree(ipsec->tx);
+	ipsec->tx = NULL;
 	return err;
 }
-- 
2.37.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox