public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>,
	jasowang@redhat.com, mst@redhat.com
Cc: virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, kvm@vger.kernel.org, parav@nvidia.com,
	xieyongji@bytedance.com, gautam.dawar@amd.com
Subject: Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
Date: Tue, 16 Aug 2022 16:14:59 -0700	[thread overview]
Message-ID: <892b39d6-85f8-bff5-030d-e21288975572@oracle.com> (raw)
In-Reply-To: <22e0236f-b556-c6a8-0043-b39b02928fd6@intel.com>



On 8/16/2022 2:08 AM, Zhu, Lingshan wrote:
>
>
> On 8/16/2022 3:58 PM, Si-Wei Liu wrote:
>>
>>
>> On 8/15/2022 6:58 PM, Zhu, Lingshan wrote:
>>>
>>>
>>> On 8/16/2022 7:32 AM, Si-Wei Liu wrote:
>>>>
>>>>
>>>> On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
>>>>> Some fields of virtio-net device config space are
>>>>> conditional on the feature bits, the spec says:
>>>>>
>>>>> "The mac address field always exists
>>>>> (though is only valid if VIRTIO_NET_F_MAC is set)"
>>>>>
>>>>> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
>>>>> or VIRTIO_NET_F_RSS is set"
>>>>>
>>>>> "mtu only exists if VIRTIO_NET_F_MTU is set"
>>>>>
>>>>> so we should read MTU, MAC and MQ in the device config
>>>>> space only when these feature bits are offered.
>>>>>
>>>>> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
>>>>> not set, the virtio device should have
>>>>> one queue pair as default value, so when userspace querying queue 
>>>>> pair numbers,
>>>>> it should return mq=1 than zero.
>>>>>
>>>>> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
>>>>> MTU from the device config sapce.
>>>>> RFC894 <A Standard for the Transmission of IP Datagrams over 
>>>>> Ethernet Networks>
>>>>> says:"The minimum length of the data field of a packet sent over an
>>>>> Ethernet is 1500 octets, thus the maximum length of an IP datagram
>>>>> sent over an Ethernet is 1500 octets.  Implementations are encouraged
>>>>> to support full-length packets"
>>>> Noted there's a typo in the above "The *maximum* length of the data 
>>>> field of a packet sent over an Ethernet is 1500 octets ..." and the 
>>>> RFC was written 1984.
>>> the spec RFC894 says it is 1500, see <a 
>>> href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$">https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrmZPRFaM$ 
>>> </a>
>>>>
>>>> Apparently that is no longer true with the introduction of Jumbo 
>>>> size frame later in the 2000s. I'm not sure what is the point of 
>>>> mention this ancient RFC. It doesn't say default MTU of any 
>>>> Ethernet NIC/switch should be 1500 in either  case.
>>> This could be a larger number for sure, we are trying to find out 
>>> the min value for Ethernet here, to support 1500 octets, MTU should 
>>> be 1500 at least, so I assume 1500 should be the default value for MTU
>>>>
>>>>>
>>>>> virtio spec says:"The virtio network device is a virtual ethernet 
>>>>> card",
>>>> Right,
>>>>> so the default MTU value should be 1500 for virtio-net.
>>>> ... but it doesn't say the default is 1500. At least, not in 
>>>> explicit way. Why it can't be 1492 or even lower? In practice, if 
>>>> the network backend has a MTU higher than 1500, there's nothing 
>>>> wrong for guest to configure default MTU more than 1500.
>>> same as above
>>>>
>>>>>
>>>>> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
>>>>> the configuration space mac entry indicates the “physical” address
>>>>> of the network card, otherwise the driver would typically
>>>>> generate a random local MAC address." So there is no
>>>>> default MAC address if VIRTIO_NET_F_MAC not set.
>>>>>
>>>>> This commits introduces functions vdpa_dev_net_mtu_config_fill()
>>>>> and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
>>>>> It also fixes vdpa_dev_net_mq_config_fill() to report correct
>>>>> MQ when _F_MQ is not present.
>>>>>
>>>>> These functions should check devices features than driver
>>>>> features, and struct vdpa_device is not needed as a parameter
>>>>>
>>>>> The test & userspace tool output:
>>>>>
>>>>> Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
>>>>> and VIRTIO_NET_F_MAC can be mask out by hardcode.
>>>>>
>>>>> However, it is challenging to "disable" the related fields
>>>>> in the HW device config space, so let's just assume the values
>>>>> are meaningless if the feature bits are not set.
>>>>>
>>>>> Before this change, when feature bits for RSS, MQ, MTU and MAC
>>>>> are not set, iproute2 output:
>>>>> $vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu 
>>>>> 1500
>>>>>    negotiated_features
>>>>>
>>>>> without this commit, function vdpa_dev_net_config_fill()
>>>>> reads all config space fields unconditionally, so let's
>>>>> assume the MAC and MTU are meaningless, and it checks
>>>>> MQ with driver_features, so we don't see max_vq_pairs.
>>>>>
>>>>> After applying this commit, when feature bits for
>>>>> MQ, RSS, MAC and MTU are not set,iproute2 output:
>>>>> $vdpa dev config show vdpa0
>>>>> vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
>>>>>    negotiated_features
>>>>>
>>>>> As explained above:
>>>>> Here is no MAC, because VIRTIO_NET_F_MAC is not set,
>>>>> and there is no default value for MAC. It shows
>>>>> max_vq_paris = 1 because even without MQ feature,
>>>>> a functional virtio-net must have one queue pair.
>>>>> mtu = 1500 is the default value as ethernet
>>>>> required.
>>>>>
>>>>> This commit also add supplementary comments for
>>>>> __virtio16_to_cpu(true, xxx) operations in
>>>>> vdpa_dev_net_config_fill() and vdpa_fill_stats_rec()
>>>>>
>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>> ---
>>>>>   drivers/vdpa/vdpa.c | 60 
>>>>> +++++++++++++++++++++++++++++++++++----------
>>>>>   1 file changed, 47 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>>>> index efb55a06e961..a74660b98979 100644
>>>>> --- a/drivers/vdpa/vdpa.c
>>>>> +++ b/drivers/vdpa/vdpa.c
>>>>> @@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct 
>>>>> sk_buff *msg, struct netlink_callba
>>>>>       return msg->len;
>>>>>   }
>>>>>   -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>>>>> -                       struct sk_buff *msg, u64 features,
>>>>> +static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64 
>>>>> features,
>>>>>                          const struct virtio_net_config *config)
>>>>>   {
>>>>>       u16 val_u16;
>>>>>   -    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
>>>>> -        return 0;
>>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
>>>>> +        (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
>>>>> +        val_u16 = 1;
>>>>> +    else
>>>>> +        val_u16 = __virtio16_to_cpu(true, 
>>>>> config->max_virtqueue_pairs);
>>>>>   -    val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
>>>>>       return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, 
>>>>> val_u16);
>>>>>   }
>>>>>   +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, 
>>>>> u64 features,
>>>>> +                    const struct virtio_net_config *config)
>>>>> +{
>>>>> +    u16 val_u16;
>>>>> +
>>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
>>>>> +        val_u16 = 1500;
>>>> As said, there's no virtio spec defined value for MTU. Please leave 
>>>> this field out if feature VIRTIO_NET_F_MTU is not negotiated.
>>> same as above
>>>>> +    else
>>>>> +        val_u16 = __virtio16_to_cpu(true, config->mtu);
>>>>> +
>>>>> +    return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
>>>>> +}
>>>>> +
>>>>> +static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 
>>>>> features,
>>>>> +                    const struct virtio_net_config *config)
>>>>> +{
>>>>> +    if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
>>>>> +        return 0;
>>>>> +    else
>>>>> +        return  nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>>>> +                sizeof(config->mac), config->mac);
>>>>> +}
>>>>> +
>>>>> +
>>>>>   static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, 
>>>>> struct sk_buff *msg)
>>>>>   {
>>>>>       struct virtio_net_config config = {};
>>>>> @@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct 
>>>>> vdpa_device *vdev, struct sk_buff *ms
>>>>>         vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>>>   -    if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, 
>>>>> sizeof(config.mac),
>>>>> -            config.mac))
>>>>> -        return -EMSGSIZE;
>>>>> +    /*
>>>>> +     * Assume little endian for now, userspace can tweak this for
>>>>> +     * legacy guest support.
>>>> You can leave it as a TODO for kernel (vdpa core limitation), but 
>>>> AFAIK there's nothing userspace needs to do to infer the 
>>>> endianness. IMHO it's the kernel's job to provide an abstraction 
>>>> rather than rely on userspace guessing it.
>>> we have discussed it in another thread, and this comment is 
>>> suggested by MST.
>> Can you provide the context or link? It shouldn't work like this, 
>> otherwise it is breaking uABI. E.g. how will a legacy/BE supporting 
>> kernel/device be backward compatible with older vdpa tool (which has 
>> knowledge of this endianness implication/assumption from day one)?
> https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg837114.html__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrGq7Vwjk$ 
>
> The challenge is that the status filed is virtio16, not le16, so 
> le16_to_cpu(xxx) is wrong anyway. However we can not tell whether it 
> is a LE or BE device from struct vdpa_device, so for most cases, we 
> assume it is LE, and leave this comment.
While the fix is fine, the comment is misleading in giving readers false 
hope. This is in vdpa_dev_net_config_fill() the vdpa tool query path, 
instead of calls from the VMM dealing with vhost/virtio plumbing 
specifics. I think what's missing today in vdpa core is the detection of 
guest type (legacy, transitional, or modern) regarding endianness 
through F_VERSION_1 and legacy interface access, the latter of which 
would need some assistance from VMM for sure. However, the presence of 
information via the vdpa tool query is totally orthogonal. I don't get a 
good reason for why it has to couple with endianness. How vdpa tool 
users space is supposed to tweak it? I don't get it...

-Siwei


>
> Thanks
>>
>> -Siwei
>>
>>>>
>>>>> +     */
>>>>> +    val_u16 = __virtio16_to_cpu(true, config.status);
>>>>>         val_u16 = __virtio16_to_cpu(true, config.status);
>>>>>       if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
>>>>>           return -EMSGSIZE;
>>>>>   -    val_u16 = __virtio16_to_cpu(true, config.mtu);
>>>>> -    if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>>>>> -        return -EMSGSIZE;
>>>>> -
>>>>>       features_driver = vdev->config->get_driver_features(vdev);
>>>>>       if (nla_put_u64_64bit(msg, 
>>>>> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
>>>>>                     VDPA_ATTR_PAD))
>>>>> @@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct 
>>>>> vdpa_device *vdev, struct sk_buff *ms
>>>>>                     VDPA_ATTR_PAD))
>>>>>           return -EMSGSIZE;
>>>>>   -    return vdpa_dev_net_mq_config_fill(vdev, msg, 
>>>>> features_driver, &config);
>>>>> +    if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
>>>>> +        return -EMSGSIZE;
>>>>> +
>>>>> +    if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config))
>>>>> +        return -EMSGSIZE;
>>>>> +
>>>>> +    return vdpa_dev_net_mq_config_fill(msg, features_device, 
>>>>> &config);
>>>>>   }
>>>>>     static int
>>>>> @@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct 
>>>>> vdpa_device *vdev, struct sk_buff *msg,
>>>>>       }
>>>>>       vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>>>>   +    /*
>>>>> +     * Assume little endian for now, userspace can tweak this for
>>>>> +     * legacy guest support.
>>>>> +     */
>>>>> +
>>>> Ditto.
>>> same as above
>>>
>>> Thanks
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>>       max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
>>>>>       if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
>>>>>           return -EMSGSIZE;
>>>>
>>>
>>
>


  reply	other threads:[~2022-08-16 23:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  9:26 [PATCH 0/2] allow userspace to query device features Zhu Lingshan
2022-08-15  9:26 ` [PATCH 1/2] vDPA: allow userspace to query features of a vDPA device Zhu Lingshan
2022-08-15 18:15   ` Si-Wei Liu
2022-08-16  1:49     ` Zhu, Lingshan
2022-08-16  2:07       ` Parav Pandit
2022-08-16  4:21         ` Zhu, Lingshan
2022-08-15  9:26 ` [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev Zhu Lingshan
2022-08-15 15:52   ` Michael S. Tsirkin
2022-08-15 23:32   ` Si-Wei Liu
2022-08-16  1:58     ` Zhu, Lingshan
2022-08-16  4:26       ` Zhu, Lingshan
2022-08-16  7:58       ` Si-Wei Liu
2022-08-16  9:08         ` Zhu, Lingshan
2022-08-16 23:14           ` Si-Wei Liu [this message]
2022-08-17  2:14             ` Zhu, Lingshan
2022-08-17  8:55               ` Michael S. Tsirkin
2022-08-17  9:13                 ` Zhu, Lingshan
2022-08-17  9:39                   ` Michael S. Tsirkin
2022-08-17  9:43                     ` Zhu, Lingshan
2022-08-17 10:37                       ` Michael S. Tsirkin
2022-08-18  4:15                         ` Jason Wang
2022-08-18  7:58                           ` Zhu, Lingshan
2022-08-18 23:20                           ` Si-Wei Liu
2022-08-19  0:42                             ` Jason Wang
2022-08-19  3:52                               ` Michael S. Tsirkin
2022-08-20  8:55                               ` Si-Wei Liu
2022-08-22  5:07                                 ` Zhu, Lingshan
2022-08-23  3:26                                   ` Jason Wang
2022-08-23  6:52                                     ` Zhu, Lingshan
2022-08-30  9:43                                       ` Zhu, Lingshan
2022-08-26  6:23                                     ` Si-Wei Liu
2022-09-02  6:03                                       ` Jason Wang
2022-09-02  6:14                                         ` Michael S. Tsirkin
2022-09-05  3:54                                           ` Jason Wang
2022-08-16  2:32   ` Parav Pandit
2022-08-16  4:18     ` Zhu, Lingshan
2022-08-16 21:02       ` Parav Pandit
2022-08-16 21:09         ` Michael S. Tsirkin
2022-08-17  2:03           ` Zhu, Lingshan
2022-08-18  4:18             ` Jason Wang
2022-08-18  6:38               ` Zhu, Lingshan
2022-08-18 17:20               ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=892b39d6-85f8-bff5-030d-e21288975572@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=gautam.dawar@amd.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lingshan.zhu@intel.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox