[RFC] Flexible SR-IOV support for virtio-net

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Flexible SR-IOV support for virtio-net
@ 2023-11-18 12:10 Akihiko Odaki
  2023-11-28  8:47 ` Yui Washizu
  0 siblings, 1 reply; 3+ messages in thread
From: Akihiko Odaki @ 2023-11-18 12:10 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Washizu Yui,
	qemu-devel@nongnu.org

Hi,

We are planning to add PCIe SR-IOV support to the virtio-net driver for 
Windows ("NetKVM")[1], and we want a SR-IOV feature for virtio-net 
emulation code in QEMU to test it. I expect there are other people 
interested in such a feature, considering that people are using igb[2] 
to test SR-IOV support in VMs.

Washizu Yui have already proposed an RFC patch to add a SR-IOV feature 
to virtio-net emulation[3][4] but it's preliminary and has no 
configurability for VFs.

Now I'm proposing to add SR-IOV support to virtio-net with full 
configurability for VFs by following the implementation of virtio-net 
failover[5]. I'm planning to write patches myself, but I know there are 
people interested in such patches so I'd like to let you know the idea 
beforehand.

The idea:

The problem when implementing configurability for VFs is that SR-IOV VFs 
can be realized and unrealized at runtime with a request from the guest. 
So a naive implementation cannot deal with a command line like the 
following:
-device virtio-net-pci,addr=0x0.0x0,sriov=on
-device virtio-net-pci,addr=0x0.0x1
-device virtio-net-pci,addr=0x0.0x2

This will realize the virtio-net functions in 0x0.0x1 and 0x0.0x2 when 
the guest starts instead of when the guest requests to enable VFs.

However, reviewing the virtio-net emulation code, I realized the 
virtio-net failover also "hides" devices when the guest starts. The 
following command line hides hostdev0 when the guest starts, and adds it 
when the guest requests VIRTIO_NET_F_STANDBY feature:

-device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
   bus=root2,failover=on
-device vfiopci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1

So it should be also possible to do similar to "hide" VFs and 
realize/unrealize them when the guest requests.

There are two things I hate with this idea when contrasting it with the 
conventional multifunction feature[6] though. One is that the PF must be 
added before VFs; a similar limitation is imposed for failover.

Another is that it will be specific to virtio-net. I was considering to 
implement a "generic" SR-IOV feature that will work on various devices, 
but I realized that will need lots of configuration validations. We may 
eventually want it, but probably it's better to avoid such a big leap as 
the first step.

Please tell me if you have questions or suggestions.

Regards,
Akihiko Odaki

[1] https://github.com/virtio-win/kvm-guest-drivers-windows
[2] https://qemu.readthedocs.io/en/v8.1.0/system/devices/igb.html
[3] 
https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
[4] 
https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offload-on-virtual-machines.html
[5] https://qemu.readthedocs.io/en/v8.1.0/system/virtio-net-failover.html
[6] https://gitlab.com/qemu-project/qemu/-/blob/v8.1.2/docs/pcie.txt

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Flexible SR-IOV support for virtio-net
  2023-11-18 12:10 [RFC] Flexible SR-IOV support for virtio-net Akihiko Odaki
@ 2023-11-28  8:47 ` Yui Washizu
  2023-11-28  9:34   ` Akihiko Odaki
  0 siblings, 1 reply; 3+ messages in thread
From: Yui Washizu @ 2023-11-28  8:47 UTC (permalink / raw)
  To: Akihiko Odaki, Michael S. Tsirkin, Marcel Apfelbaum,
	qemu-devel@nongnu.org


On 2023/11/18 21:10, Akihiko Odaki wrote:
> Hi,
>
> We are planning to add PCIe SR-IOV support to the virtio-net driver 
> for Windows ("NetKVM")[1], and we want a SR-IOV feature for virtio-net 
> emulation code in QEMU to test it. I expect there are other people 
> interested in such a feature, considering that people are using igb[2] 
> to test SR-IOV support in VMs.
>
> Washizu Yui have already proposed an RFC patch to add a SR-IOV feature 
> to virtio-net emulation[3][4] but it's preliminary and has no 
> configurability for VFs.
>
> Now I'm proposing to add SR-IOV support to virtio-net with full 
> configurability for VFs by following the implementation of virtio-net 
> failover[5]. I'm planning to write patches myself, but I know there 
> are people interested in such patches so I'd like to let you know the 
> idea beforehand.
>
> The idea:
>
> The problem when implementing configurability for VFs is that SR-IOV 
> VFs can be realized and unrealized at runtime with a request from the 
> guest. So a naive implementation cannot deal with a command line like 
> the following:
> -device virtio-net-pci,addr=0x0.0x0,sriov=on
> -device virtio-net-pci,addr=0x0.0x1
> -device virtio-net-pci,addr=0x0.0x2
>
> This will realize the virtio-net functions in 0x0.0x1 and 0x0.0x2 when 
> the guest starts instead of when the guest requests to enable VFs.
>
> However, reviewing the virtio-net emulation code, I realized the 
> virtio-net failover also "hides" devices when the guest starts. The 
> following command line hides hostdev0 when the guest starts, and adds 
> it when the guest requests VIRTIO_NET_F_STANDBY feature:
>
> -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
>   bus=root2,failover=on
> -device vfiopci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1
>
> So it should be also possible to do similar to "hide" VFs and 
> realize/unrealize them when the guest requests.
>
> There are two things I hate with this idea when contrasting it with 
> the conventional multifunction feature[6] though. One is that the PF 
> must be added before VFs; a similar limitation is imposed for failover.
>
> Another is that it will be specific to virtio-net. I was considering 
> to implement a "generic" SR-IOV feature that will work on various 
> devices, but I realized that will need lots of configuration 
> validations. We may eventually want it, but probably it's better to 
> avoid such a big leap as the first step.
>
> Please tell me if you have questions or suggestions.
>


Hi, Odaki-san

The idea appears to be practical and convenient.

I have some things I want to confirm.
I understood your idea can make deices for VFs,
created by qdev_new or qdev_realize function, invisible from guest OS.
Is my understanding correct ?
And, if your idea is realized,
will it be possible to specify the backend device for the virtio-pci-net 
device ?

Could you provide insights into the next steps
beyond the implementation details ?
About when do you expect your implementation
to be merged into qemu ?
Do you have a timeline for this plan ?
Moreover, is there any way
we can collaborate on the implementation you're planning ?

Regards,

Yui Washizu


> Regards,
> Akihiko Odaki
>
> [1] https://github.com/virtio-win/kvm-guest-drivers-windows
> [2] https://qemu.readthedocs.io/en/v8.1.0/system/devices/igb.html
> [3] 
> https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
> [4] 
> https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offload-on-virtual-machines.html
> [5] https://qemu.readthedocs.io/en/v8.1.0/system/virtio-net-failover.html
> [6] https://gitlab.com/qemu-project/qemu/-/blob/v8.1.2/docs/pcie.txt


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] Flexible SR-IOV support for virtio-net
  2023-11-28  8:47 ` Yui Washizu
@ 2023-11-28  9:34   ` Akihiko Odaki
  0 siblings, 0 replies; 3+ messages in thread
From: Akihiko Odaki @ 2023-11-28  9:34 UTC (permalink / raw)
  To: Yui Washizu, Michael S. Tsirkin, Marcel Apfelbaum,
	qemu-devel@nongnu.org

On 2023/11/28 17:47, Yui Washizu wrote:
> 
> On 2023/11/18 21:10, Akihiko Odaki wrote:
>> Hi,
>>
>> We are planning to add PCIe SR-IOV support to the virtio-net driver 
>> for Windows ("NetKVM")[1], and we want a SR-IOV feature for virtio-net 
>> emulation code in QEMU to test it. I expect there are other people 
>> interested in such a feature, considering that people are using igb[2] 
>> to test SR-IOV support in VMs.
>>
>> Washizu Yui have already proposed an RFC patch to add a SR-IOV feature 
>> to virtio-net emulation[3][4] but it's preliminary and has no 
>> configurability for VFs.
>>
>> Now I'm proposing to add SR-IOV support to virtio-net with full 
>> configurability for VFs by following the implementation of virtio-net 
>> failover[5]. I'm planning to write patches myself, but I know there 
>> are people interested in such patches so I'd like to let you know the 
>> idea beforehand.
>>
>> The idea:
>>
>> The problem when implementing configurability for VFs is that SR-IOV 
>> VFs can be realized and unrealized at runtime with a request from the 
>> guest. So a naive implementation cannot deal with a command line like 
>> the following:
>> -device virtio-net-pci,addr=0x0.0x0,sriov=on
>> -device virtio-net-pci,addr=0x0.0x1
>> -device virtio-net-pci,addr=0x0.0x2
>>
>> This will realize the virtio-net functions in 0x0.0x1 and 0x0.0x2 when 
>> the guest starts instead of when the guest requests to enable VFs.
>>
>> However, reviewing the virtio-net emulation code, I realized the 
>> virtio-net failover also "hides" devices when the guest starts. The 
>> following command line hides hostdev0 when the guest starts, and adds 
>> it when the guest requests VIRTIO_NET_F_STANDBY feature:
>>
>> -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
>>   bus=root2,failover=on
>> -device vfiopci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1
>>
>> So it should be also possible to do similar to "hide" VFs and 
>> realize/unrealize them when the guest requests.
>>
>> There are two things I hate with this idea when contrasting it with 
>> the conventional multifunction feature[6] though. One is that the PF 
>> must be added before VFs; a similar limitation is imposed for failover.
>>
>> Another is that it will be specific to virtio-net. I was considering 
>> to implement a "generic" SR-IOV feature that will work on various 
>> devices, but I realized that will need lots of configuration 
>> validations. We may eventually want it, but probably it's better to 
>> avoid such a big leap as the first step.
>>
>> Please tell me if you have questions or suggestions.
>>
> 
> 
> Hi, Odaki-san

Hi,

> 
> The idea appears to be practical and convenient.
> 
> I have some things I want to confirm.
> I understood your idea can make deices for VFs,
> created by qdev_new or qdev_realize function, invisible from guest OS.
> Is my understanding correct ?

Yes, the guest will request to enable VFs with the standard SR-IOV 
capability, and the virtio-net implementation will use appropriate 
QEMU-internal APIs to create and realize VFs accordingly.

> And, if your idea is realized,
> will it be possible to specify the backend device for the virtio-pci-net 
> device ?

Yes, you can specify netdev like conventional virtio-net devices.

> 
> Could you provide insights into the next steps
> beyond the implementation details ?
> About when do you expect your implementation
> to be merged into qemu ?
> Do you have a timeline for this plan ?
> Moreover, is there any way
> we can collaborate on the implementation you're planning ?

I intend to upstream my implementation. The flexibility of this design 
will make the SR-IOV support useful for many people and make it suitable 
for upstreaming. I also expect the implementation will be clean enough 
for upstreaming. I'll submit it to the mailing list when I finish the 
implementation so I'd like you to test and review it.

By the way, I started the implementation and realized it may be better 
to change the design so I present the design changes below:

First I intend to change the CLI. The interface in my last proposal 
expects there is only one PF in a bus and it is marked with "sriov" 
property. However, the specification allows to have multiple PFs in a 
bus so it's better to design the CLI so that it allows to have multiple 
PFs though I'm not going to implement such a feature at first.

The new CLI will instead add "sriov-pf" property to VFs, which 
designates the PF paired with them. The below is an example of a command 
line conforming to the new interface:

-device virtio-net-pci,addr=0x0.0x3,netdev=tap3,sriov-pf=pf1
-device virtio-net-pci,addr=0x0.0x2,netdev=tap2,id=pf1
-device virtio-net-pci,addr=0x0.0x1,netdev=tap1,sriov-pf=pf0
-device virtio-net-pci,addr=0x0.0x0,netdev=tap0,id=pf0

Another design change is *not* to use the "device hiding" API of 
failover. It is because fully-realized devices are useful when 
validating the configuration. In particular, VFs must have a consistent 
BAR configuration, and that can be validated only after they are realized.

So I'm now considering to have "prototype VFs" realized before the PF 
gets realized. Prototype VFs will be fully realized, but 
virtio_write_config() and virtio_read_config() will do nothing for those 
VFs, which effectively disables them. It is similar how functions are 
disabled until function 0 gets plugged for a conventional multifunction 
device (c.f., pci_host_config_write_common() and 
pci_host_config_read_common()).

When the PF gets realized, the PF will validate the configuration by 
inspecting the prototype VFs. If the configuration looks valid, the PF 
backs up DeviceState::opts and unplugs them. The PF will later use the 
backed up device options to realize VFs when the guest requests.

This design change forces to create VFs before the PF in the command 
line. It is similar that the conventional multifunction requires 
function 0 to be realized after the other functions.

I may make other design changes as the implementation progresses, but 
the above is the current design I have in mind.

Regards,
Akihiko Odaki

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-28  9:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-18 12:10 [RFC] Flexible SR-IOV support for virtio-net Akihiko Odaki
2023-11-28  8:47 ` Yui Washizu
2023-11-28  9:34   ` Akihiko Odaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).