From: Si-Wei Liu <si-wei.liu@oracle.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>,
jasowang@redhat.com, mst@redhat.com
Cc: virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, kvm@vger.kernel.org, parav@nvidia.com,
xieyongji@bytedance.com, gautam.dawar@amd.com
Subject: Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
Date: Tue, 16 Aug 2022 00:58:41 -0700 [thread overview]
Message-ID: <9b6292f3-9bd5-ecd8-5e42-cd5d12f036e7@oracle.com> (raw)
In-Reply-To: <20e92551-a639-ec13-3d9c-13bb215422e1@intel.com>
On 8/15/2022 6:58 PM, Zhu, Lingshan wrote:
>
>
> On 8/16/2022 7:32 AM, Si-Wei Liu wrote:
>>
>>
>> On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
>>> Some fields of virtio-net device config space are
>>> conditional on the feature bits, the spec says:
>>>
>>> "The mac address field always exists
>>> (though is only valid if VIRTIO_NET_F_MAC is set)"
>>>
>>> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
>>> or VIRTIO_NET_F_RSS is set"
>>>
>>> "mtu only exists if VIRTIO_NET_F_MTU is set"
>>>
>>> so we should read MTU, MAC and MQ in the device config
>>> space only when these feature bits are offered.
>>>
>>> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
>>> not set, the virtio device should have
>>> one queue pair as default value, so when userspace querying queue
>>> pair numbers,
>>> it should return mq=1 than zero.
>>>
>>> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
>>> MTU from the device config sapce.
>>> RFC894 <A Standard for the Transmission of IP Datagrams over
>>> Ethernet Networks>
>>> says:"The minimum length of the data field of a packet sent over an
>>> Ethernet is 1500 octets, thus the maximum length of an IP datagram
>>> sent over an Ethernet is 1500 octets. Implementations are encouraged
>>> to support full-length packets"
>> Noted there's a typo in the above "The *maximum* length of the data
>> field of a packet sent over an Ethernet is 1500 octets ..." and the
>> RFC was written 1984.
> the spec RFC894 says it is 1500, see <a
> href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$">https://www.rfc-editor.org/rfc/rfc894.txt</a>
>>
>> Apparently that is no longer true with the introduction of Jumbo size
>> frame later in the 2000s. I'm not sure what is the point of mention
>> this ancient RFC. It doesn't say default MTU of any Ethernet
>> NIC/switch should be 1500 in either case.
> This could be a larger number for sure, we are trying to find out the
> min value for Ethernet here, to support 1500 octets, MTU should be
> 1500 at least, so I assume 1500 should be the default value for MTU
>>
>>>
>>> virtio spec says:"The virtio network device is a virtual ethernet
>>> card",
>> Right,
>>> so the default MTU value should be 1500 for virtio-net.
>> ... but it doesn't say the default is 1500. At least, not in explicit
>> way. Why it can't be 1492 or even lower? In practice, if the network
>> backend has a MTU higher than 1500, there's nothing wrong for guest
>> to configure default MTU more than 1500.
> same as above
>>
>>>
>>> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
>>> the configuration space mac entry indicates the “physical” address
>>> of the network card, otherwise the driver would typically
>>> generate a random local MAC address." So there is no
>>> default MAC address if VIRTIO_NET_F_MAC not set.
>>>
>>> This commits introduces functions vdpa_dev_net_mtu_config_fill()
>>> and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
>>> It also fixes vdpa_dev_net_mq_config_fill() to report correct
>>> MQ when _F_MQ is not present.
>>>
>>> These functions should check devices features than driver
>>> features, and struct vdpa_device is not needed as a parameter
>>>
>>> The test & userspace tool output:
>>>
>>> Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
>>> and VIRTIO_NET_F_MAC can be mask out by hardcode.
>>>
>>> However, it is challenging to "disable" the related fields
>>> in the HW device config space, so let's just assume the values
>>> are meaningless if the feature bits are not set.
>>>
>>> Before this change, when feature bits for RSS, MQ, MTU and MAC
>>> are not set, iproute2 output:
>>> $vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu 1500
>>> negotiated_features
>>>
>>> without this commit, function vdpa_dev_net_config_fill()
>>> reads all config space fields unconditionally, so let's
>>> assume the MAC and MTU are meaningless, and it checks
>>> MQ with driver_features, so we don't see max_vq_pairs.
>>>
>>> After applying this commit, when feature bits for
>>> MQ, RSS, MAC and MTU are not set,iproute2 output:
>>> $vdpa dev config show vdpa0
>>> vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
>>> negotiated_features
>>>
>>> As explained above:
>>> Here is no MAC, because VIRTIO_NET_F_MAC is not set,
>>> and there is no default value for MAC. It shows
>>> max_vq_paris = 1 because even without MQ feature,
>>> a functional virtio-net must have one queue pair.
>>> mtu = 1500 is the default value as ethernet
>>> required.
>>>
>>> This commit also add supplementary comments for
>>> __virtio16_to_cpu(true, xxx) operations in
>>> vdpa_dev_net_config_fill() and vdpa_fill_stats_rec()
>>>
>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>> ---
>>> drivers/vdpa/vdpa.c | 60
>>> +++++++++++++++++++++++++++++++++++----------
>>> 1 file changed, 47 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
>>> index efb55a06e961..a74660b98979 100644
>>> --- a/drivers/vdpa/vdpa.c
>>> +++ b/drivers/vdpa/vdpa.c
>>> @@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct
>>> sk_buff *msg, struct netlink_callba
>>> return msg->len;
>>> }
>>> -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
>>> - struct sk_buff *msg, u64 features,
>>> +static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64
>>> features,
>>> const struct virtio_net_config *config)
>>> {
>>> u16 val_u16;
>>> - if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
>>> - return 0;
>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
>>> + (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
>>> + val_u16 = 1;
>>> + else
>>> + val_u16 = __virtio16_to_cpu(true,
>>> config->max_virtqueue_pairs);
>>> - val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
>>> return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, val_u16);
>>> }
>>> +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, u64
>>> features,
>>> + const struct virtio_net_config *config)
>>> +{
>>> + u16 val_u16;
>>> +
>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
>>> + val_u16 = 1500;
>> As said, there's no virtio spec defined value for MTU. Please leave
>> this field out if feature VIRTIO_NET_F_MTU is not negotiated.
> same as above
>>> + else
>>> + val_u16 = __virtio16_to_cpu(true, config->mtu);
>>> +
>>> + return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
>>> +}
>>> +
>>> +static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64
>>> features,
>>> + const struct virtio_net_config *config)
>>> +{
>>> + if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
>>> + return 0;
>>> + else
>>> + return nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>> + sizeof(config->mac), config->mac);
>>> +}
>>> +
>>> +
>>> static int vdpa_dev_net_config_fill(struct vdpa_device *vdev,
>>> struct sk_buff *msg)
>>> {
>>> struct virtio_net_config config = {};
>>> @@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct
>>> vdpa_device *vdev, struct sk_buff *ms
>>> vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>> - if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
>>> sizeof(config.mac),
>>> - config.mac))
>>> - return -EMSGSIZE;
>>> + /*
>>> + * Assume little endian for now, userspace can tweak this for
>>> + * legacy guest support.
>> You can leave it as a TODO for kernel (vdpa core limitation), but
>> AFAIK there's nothing userspace needs to do to infer the endianness.
>> IMHO it's the kernel's job to provide an abstraction rather than rely
>> on userspace guessing it.
> we have discussed it in another thread, and this comment is suggested
> by MST.
Can you provide the context or link? It shouldn't work like this,
otherwise it is breaking uABI. E.g. how will a legacy/BE supporting
kernel/device be backward compatible with older vdpa tool (which has
knowledge of this endianness implication/assumption from day one)?
-Siwei
>>
>>> + */
>>> + val_u16 = __virtio16_to_cpu(true, config.status);
>>> val_u16 = __virtio16_to_cpu(true, config.status);
>>> if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
>>> return -EMSGSIZE;
>>> - val_u16 = __virtio16_to_cpu(true, config.mtu);
>>> - if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>>> - return -EMSGSIZE;
>>> -
>>> features_driver = vdev->config->get_driver_features(vdev);
>>> if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES,
>>> features_driver,
>>> VDPA_ATTR_PAD))
>>> @@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct
>>> vdpa_device *vdev, struct sk_buff *ms
>>> VDPA_ATTR_PAD))
>>> return -EMSGSIZE;
>>> - return vdpa_dev_net_mq_config_fill(vdev, msg,
>>> features_driver, &config);
>>> + if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
>>> + return -EMSGSIZE;
>>> +
>>> + if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config))
>>> + return -EMSGSIZE;
>>> +
>>> + return vdpa_dev_net_mq_config_fill(msg, features_device, &config);
>>> }
>>> static int
>>> @@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct
>>> vdpa_device *vdev, struct sk_buff *msg,
>>> }
>>> vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
>>> + /*
>>> + * Assume little endian for now, userspace can tweak this for
>>> + * legacy guest support.
>>> + */
>>> +
>> Ditto.
> same as above
>
> Thanks
>>
>> Thanks,
>> -Siwei
>>> max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
>>> if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
>>> return -EMSGSIZE;
>>
>
next prev parent reply other threads:[~2022-08-16 9:38 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-15 9:26 [PATCH 0/2] allow userspace to query device features Zhu Lingshan
2022-08-15 9:26 ` [PATCH 1/2] vDPA: allow userspace to query features of a vDPA device Zhu Lingshan
2022-08-15 18:15 ` Si-Wei Liu
2022-08-16 1:49 ` Zhu, Lingshan
2022-08-16 2:07 ` Parav Pandit
2022-08-16 4:21 ` Zhu, Lingshan
2022-08-15 9:26 ` [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev Zhu Lingshan
2022-08-15 15:52 ` Michael S. Tsirkin
2022-08-15 23:32 ` Si-Wei Liu
2022-08-16 1:58 ` Zhu, Lingshan
2022-08-16 4:26 ` Zhu, Lingshan
2022-08-16 7:58 ` Si-Wei Liu [this message]
2022-08-16 9:08 ` Zhu, Lingshan
2022-08-16 23:14 ` Si-Wei Liu
2022-08-17 2:14 ` Zhu, Lingshan
2022-08-17 8:55 ` Michael S. Tsirkin
2022-08-17 9:13 ` Zhu, Lingshan
2022-08-17 9:39 ` Michael S. Tsirkin
2022-08-17 9:43 ` Zhu, Lingshan
2022-08-17 10:37 ` Michael S. Tsirkin
2022-08-18 4:15 ` Jason Wang
2022-08-18 7:58 ` Zhu, Lingshan
2022-08-18 23:20 ` Si-Wei Liu
2022-08-19 0:42 ` Jason Wang
2022-08-19 3:52 ` Michael S. Tsirkin
2022-08-20 8:55 ` Si-Wei Liu
2022-08-22 5:07 ` Zhu, Lingshan
2022-08-23 3:26 ` Jason Wang
2022-08-23 6:52 ` Zhu, Lingshan
2022-08-30 9:43 ` Zhu, Lingshan
2022-08-26 6:23 ` Si-Wei Liu
2022-09-02 6:03 ` Jason Wang
2022-09-02 6:14 ` Michael S. Tsirkin
2022-09-05 3:54 ` Jason Wang
2022-08-16 2:32 ` Parav Pandit
2022-08-16 4:18 ` Zhu, Lingshan
2022-08-16 21:02 ` Parav Pandit
2022-08-16 21:09 ` Michael S. Tsirkin
2022-08-17 2:03 ` Zhu, Lingshan
2022-08-18 4:18 ` Jason Wang
2022-08-18 6:38 ` Zhu, Lingshan
2022-08-18 17:20 ` Parav Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9b6292f3-9bd5-ecd8-5e42-cd5d12f036e7@oracle.com \
--to=si-wei.liu@oracle.com \
--cc=gautam.dawar@amd.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=lingshan.zhu@intel.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=xieyongji@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).