How to implement message forwarding from one CID to another in vhost driver

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How to implement message forwarding from one CID to another in vhost driver
@ 2024-05-18 10:17 Dorjoy Chowdhury
  2024-05-20  8:55 ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-05-18 10:17 UTC (permalink / raw)
  To: virtualization; +Cc: kvm, netdev, Alexander Graf, agraf, stefanha, sgarzare

Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
patch series has already been posted to the qemu-devel mailing list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
execution environments, called enclaves, from Amazon EC2 instances, which are
used for processing highly sensitive data. Enclaves have no persistent storage
and no external networking. The enclave VMs are based on Firecracker microvm
and have a vhost-vsock device for communication with the parent EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for cryptographic
attestation. The parent instance VM always has CID 3 while the enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent instance over
various ports to CID 3, for example, the init process inside an enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock
communication support. For this to work, one approach could be to teach the
vhost driver in kernel to forward CID 3 messages to another CID N
(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
and from N to 3 on responses. This will enable users of the
nitro-enclave machine
type in QEMU to run the necessary vsock server/clients in the host machine
(some defaults can be implemented in QEMU as well, for example, sending a reply
to the heartbeat) which will rid them of the cumbersome way of running another
whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
potentially also run multiple enclaves with their messages for CID 3 forwarded
to different CIDs which, in QEMU side, could then be specified using a new
machine type option (parent-cid) if implemented. I guess in the QEMU side, this
will be an ioctl call (or some other way) to indicate to the host kernel that
the CID 3 messages need to be forwarded. Does this approach of
forwarding CID 3 messages to another CID sound good?

If this approach sounds good, I need some guidance on where the code
should be written in order to achieve this. I would greatly appreciate
any suggestions.

Thanks.

Regards,
Dorjoy

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg03524.html
[3] https://aws.amazon.com/ec2/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-18 10:17 How to implement message forwarding from one CID to another in vhost driver Dorjoy Chowdhury
@ 2024-05-20  8:55 ` Stefano Garzarella
  2024-05-20 10:44   ` Dorjoy Chowdhury
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-20  8:55 UTC (permalink / raw)
  To: Dorjoy Chowdhury
  Cc: virtualization, kvm, netdev, Alexander Graf, agraf, stefanha

Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>Hi,
>
>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
>patch series has already been posted to the qemu-devel mailing list[2].
>
>AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
>execution environments, called enclaves, from Amazon EC2 instances, which are
>used for processing highly sensitive data. Enclaves have no persistent storage
>and no external networking. The enclave VMs are based on Firecracker microvm
>and have a vhost-vsock device for communication with the parent EC2 instance
>that spawned it and a Nitro Secure Module (NSM) device for cryptographic
>attestation. The parent instance VM always has CID 3 while the enclave VM gets
>a dynamic CID. The enclave VMs can communicate with the parent instance over
>various ports to CID 3, for example, the init process inside an enclave sends a
>heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
>parent instance know that the enclave VM has successfully booted.
>
>The plan is to eventually make the nitro enclave emulation in QEMU standalone
>i.e., without needing to run another VM with CID 3 with proper vsock

If you don't have to launch another VM, maybe we can avoid vhost-vsock 
and emulate virtio-vsock in user-space, having complete control over the 
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4] 
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so maybe 
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

>communication support. For this to work, one approach could be to teach the
>vhost driver in kernel to forward CID 3 messages to another CID N

So in this case both CID 3 and N would be assigned to the same QEMU
process?

Do you have to allocate 2 separate virtio-vsock devices, one for the 
parent and one for the enclave?

>(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
>and from N to 3 on responses. This will enable users of the

Will these messages have the VMADDR_FLAG_TO_HOST flag set?

We don't support this in vhost-vsock yet, if supporting it helps, we 
might, but we need to better understand how to avoid security issues, so 
maybe each device needs to explicitly enable the feature and specify 
from which CIDs it accepts packets.

>nitro-enclave machine
>type in QEMU to run the necessary vsock server/clients in the host machine
>(some defaults can be implemented in QEMU as well, for example, sending a reply
>to the heartbeat) which will rid them of the cumbersome way of running another
>whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
>potentially also run multiple enclaves with their messages for CID 3 forwarded
>to different CIDs which, in QEMU side, could then be specified using a new
>machine type option (parent-cid) if implemented. I guess in the QEMU side, this
>will be an ioctl call (or some other way) to indicate to the host kernel that
>the CID 3 messages need to be forwarded. Does this approach of

What if there is already a VM with CID = 3 in the system?

>forwarding CID 3 messages to another CID sound good?

It seems too specific a case, if we can generalize it maybe we could 
make this change, but we would like to avoid complicating vhost-vsock 
and keep it as simple as possible to avoid then having to implement 
firewalls, etc.

So first I would see if vhost-user-vsock or the QEMU built-in device is 
right for this use-case.

Thanks,
Stefano

>
>If this approach sounds good, I need some guidance on where the code
>should be written in order to achieve this. I would greatly appreciate
>any suggestions.
>
>Thanks.
>
>Regards,
>Dorjoy
>
>[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
>[2] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg03524.html
>[3] https://aws.amazon.com/ec2/
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-20  8:55 ` Stefano Garzarella
@ 2024-05-20 10:44   ` Dorjoy Chowdhury
  2024-05-21  5:50     ` Alexander Graf
  0 siblings, 1 reply; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-05-20 10:44 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, kvm, netdev, Alexander Graf, agraf, stefanha

Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Dorjoy,
>
> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
> >Hi,
> >
> >Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
> >emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
> >patch series has already been posted to the qemu-devel mailing list[2].
> >
> >AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
> >execution environments, called enclaves, from Amazon EC2 instances, which are
> >used for processing highly sensitive data. Enclaves have no persistent storage
> >and no external networking. The enclave VMs are based on Firecracker microvm
> >and have a vhost-vsock device for communication with the parent EC2 instance
> >that spawned it and a Nitro Secure Module (NSM) device for cryptographic
> >attestation. The parent instance VM always has CID 3 while the enclave VM gets
> >a dynamic CID. The enclave VMs can communicate with the parent instance over
> >various ports to CID 3, for example, the init process inside an enclave sends a
> >heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
> >parent instance know that the enclave VM has successfully booted.
> >
> >The plan is to eventually make the nitro enclave emulation in QEMU standalone
> >i.e., without needing to run another VM with CID 3 with proper vsock
>
> If you don't have to launch another VM, maybe we can avoid vhost-vsock
> and emulate virtio-vsock in user-space, having complete control over the
> behavior.
>
> So we could use this opportunity to implement virtio-vsock in QEMU [4]
> or use vhost-user-vsock [5] and customize it somehow.
> (Note: vhost-user-vsock already supports sibling communication, so maybe
> with a few modifications it fits your case perfectly)
>
> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
> [5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock



Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.
>
>
>
> >communication support. For this to work, one approach could be to teach the
> >vhost driver in kernel to forward CID 3 messages to another CID N
>
> So in this case both CID 3 and N would be assigned to the same QEMU
> process?



CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.

>
> Do you have to allocate 2 separate virtio-vsock devices, one for the
> parent and one for the enclave?



If there is a parent VM, then I guess both parent and enclave VMs need
virtio-vsock devices.

>
> >(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
> >and from N to 3 on responses. This will enable users of the
>
> Will these messages have the VMADDR_FLAG_TO_HOST flag set?
>
> We don't support this in vhost-vsock yet, if supporting it helps, we
> might, but we need to better understand how to avoid security issues, so
> maybe each device needs to explicitly enable the feature and specify
> from which CIDs it accepts packets.



I don't know about the flag. So I don't know if it will be set. Sorry.


>
> >nitro-enclave machine
> >type in QEMU to run the necessary vsock server/clients in the host machine
> >(some defaults can be implemented in QEMU as well, for example, sending a reply
> >to the heartbeat) which will rid them of the cumbersome way of running another
> >whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
> >potentially also run multiple enclaves with their messages for CID 3 forwarded
> >to different CIDs which, in QEMU side, could then be specified using a new
> >machine type option (parent-cid) if implemented. I guess in the QEMU side, this
> >will be an ioctl call (or some other way) to indicate to the host kernel that
> >the CID 3 messages need to be forwarded. Does this approach of
>
> What if there is already a VM with CID = 3 in the system?



Good question! I don't know what should happen in this case.


>
> >forwarding CID 3 messages to another CID sound good?
>
> It seems too specific a case, if we can generalize it maybe we could
> make this change, but we would like to avoid complicating vhost-vsock
> and keep it as simple as possible to avoid then having to implement
> firewalls, etc.
>
> So first I would see if vhost-user-vsock or the QEMU built-in device is
> right for this use-case.



Thanks you! I will check everything out and reach out if I need
further guidance about what needs to be done. And sorry as I wasn't
able to answer some of your questions.

Regards,
Dorjoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-20 10:44   ` Dorjoy Chowdhury
@ 2024-05-21  5:50     ` Alexander Graf
  2024-05-23  8:45       ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2024-05-21  5:50 UTC (permalink / raw)
  To: Dorjoy Chowdhury, Stefano Garzarella
  Cc: virtualization, kvm, netdev, Alexander Graf, stefanha

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
> Hey Stefano,
>
> Thanks for the reply.
>
>
> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> Hi Dorjoy,
>>
>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>> Hi,
>>>
>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>> emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
>>> patch series has already been posted to the qemu-devel mailing list[2].
>>>
>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
>>> execution environments, called enclaves, from Amazon EC2 instances, which are
>>> used for processing highly sensitive data. Enclaves have no persistent storage
>>> and no external networking. The enclave VMs are based on Firecracker microvm
>>> and have a vhost-vsock device for communication with the parent EC2 instance
>>> that spawned it and a Nitro Secure Module (NSM) device for cryptographic
>>> attestation. The parent instance VM always has CID 3 while the enclave VM gets
>>> a dynamic CID. The enclave VMs can communicate with the parent instance over
>>> various ports to CID 3, for example, the init process inside an enclave sends a
>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
>>> parent instance know that the enclave VM has successfully booted.
>>>
>>> The plan is to eventually make the nitro enclave emulation in QEMU standalone
>>> i.e., without needing to run another VM with CID 3 with proper vsock
>> If you don't have to launch another VM, maybe we can avoid vhost-vsock
>> and emulate virtio-vsock in user-space, having complete control over the
>> behavior.
>>
>> So we could use this opportunity to implement virtio-vsock in QEMU [4]
>> or use vhost-user-vsock [5] and customize it somehow.
>> (Note: vhost-user-vsock already supports sibling communication, so maybe
>> with a few modifications it fits your case perfectly)
>>
>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>> [5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>
>
> Thanks for letting me know. Right now I don't have a complete picture
> but I will look into them. Thank you.
>>
>>
>>> communication support. For this to work, one approach could be to teach the
>>> vhost driver in kernel to forward CID 3 messages to another CID N
>> So in this case both CID 3 and N would be assigned to the same QEMU
>> process?
>
>
> CID N is assigned to the enclave VM. CID 3 was supposed to be the
> parent VM that spawns the enclave VM (this is how it is in AWS, where
> an EC2 instance VM spawns the enclave VM from inside it and that
> parent EC2 instance always has CID 3). But in the QEMU case as we
> don't want a parent VM (we want to run enclave VMs standalone) we
> would need to forward the CID 3 messages to host CID. I don't know if
> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.


There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario, 
the "parent instance" that shows up as CID 3 in the Enclave doesn't 
really exist. Instead, when the Enclave attempts to talk to CID 3, it 
should really land on CID 0 (hypervisor). When the hypervisor tries to 
connect to the Enclave on port X, it should look as if it originates 
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, 
we have multiple "parent instances". Each of them thinks it's CID 3. 
Each can spawn an Enclave that talks to CID 3 and reach the parent. For 
this case, I think implementing all of virtio-vsock in user space is the 
best path forward. But in theory, you could also swizzle CIDs to make 
random "real" CIDs appear as CID 3.


>
>> Do you have to allocate 2 separate virtio-vsock devices, one for the
>> parent and one for the enclave?
>
>
> If there is a parent VM, then I guess both parent and enclave VMs need
> virtio-vsock devices.
>
>>> (set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
>>> and from N to 3 on responses. This will enable users of the
>> Will these messages have the VMADDR_FLAG_TO_HOST flag set?
>>
>> We don't support this in vhost-vsock yet, if supporting it helps, we
>> might, but we need to better understand how to avoid security issues, so
>> maybe each device needs to explicitly enable the feature and specify
>> from which CIDs it accepts packets.
>
>
> I don't know about the flag. So I don't know if it will be set. Sorry.


 From the guest's point of view, the parent (CID 3) is just another VM. 
Since Linux as of

  https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I 
would say the message has the flag set.

How would you envision the host to implement the flag? Would the host 
allow user space to listen on any CID and hence receive the respective 
target connections? And wouldn't listening on CID 0 then mean you're 
effectively listening to "any" other CID? Thinking about that a bit 
more, that may be just what we need, yes :)


>
>
>>> nitro-enclave machine
>>> type in QEMU to run the necessary vsock server/clients in the host machine
>>> (some defaults can be implemented in QEMU as well, for example, sending a reply
>>> to the heartbeat) which will rid them of the cumbersome way of running another
>>> whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
>>> potentially also run multiple enclaves with their messages for CID 3 forwarded
>>> to different CIDs which, in QEMU side, could then be specified using a new
>>> machine type option (parent-cid) if implemented. I guess in the QEMU side, this
>>> will be an ioctl call (or some other way) to indicate to the host kernel that
>>> the CID 3 messages need to be forwarded. Does this approach of
>> What if there is already a VM with CID = 3 in the system?
>
>
> Good question! I don't know what should happen in this case.


See case 2 above :). In a nutshell, I don't think it'd be legal to have 
a real CID 3 in that scenario.


>
>
>>> forwarding CID 3 messages to another CID sound good?
>> It seems too specific a case, if we can generalize it maybe we could
>> make this change, but we would like to avoid complicating vhost-vsock
>> and keep it as simple as possible to avoid then having to implement
>> firewalls, etc.
>>
>> So first I would see if vhost-user-vsock or the QEMU built-in device is
>> right for this use-case.
> Thanks you! I will check everything out and reach out if I need
> further guidance about what needs to be done. And sorry as I wasn't
> able to answer some of your questions.


As mentioned above, I think there is merit for both. I personally care a 
lot more for case 1 over case 2: We already have a working 
implementation of Nitro Enclaves in a Cloud setup. What is missing is a 
way to easily run a Nitro Enclave locally for development.


Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-21  5:50     ` Alexander Graf
@ 2024-05-23  8:45       ` Stefano Garzarella
  2024-05-27  7:08         ` Alexander Graf
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-23  8:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Dorjoy Chowdhury, virtualization, kvm, netdev, Alexander Graf,
	stefanha

On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>Howdy,
>
>On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>Hey Stefano,
>>
>>Thanks for the reply.
>>
>>
>>On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>>Hi Dorjoy,
>>>
>>>On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>Hi,
>>>>
>>>>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
>>>>patch series has already been posted to the qemu-devel mailing list[2].
>>>>
>>>>AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
>>>>execution environments, called enclaves, from Amazon EC2 instances, which are
>>>>used for processing highly sensitive data. Enclaves have no persistent storage
>>>>and no external networking. The enclave VMs are based on Firecracker microvm
>>>>and have a vhost-vsock device for communication with the parent EC2 instance
>>>>that spawned it and a Nitro Secure Module (NSM) device for cryptographic
>>>>attestation. The parent instance VM always has CID 3 while the enclave VM gets
>>>>a dynamic CID. The enclave VMs can communicate with the parent instance over
>>>>various ports to CID 3, for example, the init process inside an enclave sends a
>>>>heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
>>>>parent instance know that the enclave VM has successfully booted.
>>>>
>>>>The plan is to eventually make the nitro enclave emulation in QEMU standalone
>>>>i.e., without needing to run another VM with CID 3 with proper vsock
>>>If you don't have to launch another VM, maybe we can avoid vhost-vsock
>>>and emulate virtio-vsock in user-space, having complete control over the
>>>behavior.
>>>
>>>So we could use this opportunity to implement virtio-vsock in QEMU [4]
>>>or use vhost-user-vsock [5] and customize it somehow.
>>>(Note: vhost-user-vsock already supports sibling communication, so maybe
>>>with a few modifications it fits your case perfectly)
>>>
>>>[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>
>>
>>Thanks for letting me know. Right now I don't have a complete picture
>>but I will look into them. Thank you.
>>>
>>>
>>>>communication support. For this to work, one approach could be to teach the
>>>>vhost driver in kernel to forward CID 3 messages to another CID N
>>>So in this case both CID 3 and N would be assigned to the same QEMU
>>>process?
>>
>>
>>CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>parent VM that spawns the enclave VM (this is how it is in AWS, where
>>an EC2 instance VM spawns the enclave VM from inside it and that
>>parent EC2 instance always has CID 3). But in the QEMU case as we
>>don't want a parent VM (we want to run enclave VMs standalone) we
>>would need to forward the CID 3 messages to host CID. I don't know if
>>it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>
>
>There are 2 use cases here:
>
>1) Enclave wants to treat host as parent (default). In this scenario, 
>the "parent instance" that shows up as CID 3 in the Enclave doesn't 
>really exist. Instead, when the Enclave attempts to talk to CID 3, it 
>should really land on CID 0 (hypervisor). When the hypervisor tries to 
>connect to the Enclave on port X, it should look as if it originates 
>from CID 3, not CID 0.
>
>2) Multiple parent VMs. Think of an actual cloud hosting scenario. 
>Here, we have multiple "parent instances". Each of them thinks it's 
>CID 3. Each can spawn an Enclave that talks to CID 3 and reach the 
>parent. For this case, I think implementing all of virtio-vsock in 
>user space is the best path forward. But in theory, you could also 
>swizzle CIDs to make random "real" CIDs appear as CID 3.
>

Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion 
it's easier to go into user-space with vhost-user-vsock or the built-in 
device.

Maybe initially with vhost-user-vsock it's easier because we already 
have some thing that works and supports sibling communication (for case 
2).

>
>>
>>>Do you have to allocate 2 separate virtio-vsock devices, one for the
>>>parent and one for the enclave?
>>
>>
>>If there is a parent VM, then I guess both parent and enclave VMs need
>>virtio-vsock devices.
>>
>>>>(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
>>>>and from N to 3 on responses. This will enable users of the
>>>Will these messages have the VMADDR_FLAG_TO_HOST flag set?
>>>
>>>We don't support this in vhost-vsock yet, if supporting it helps, we
>>>might, but we need to better understand how to avoid security issues, so
>>>maybe each device needs to explicitly enable the feature and specify
>>>from which CIDs it accepts packets.
>>
>>
>>I don't know about the flag. So I don't know if it will be set. Sorry.
>
>
>From the guest's point of view, the parent (CID 3) is just another VM. 
>Since Linux as of
>
> https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117
>
>always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I 
>would say the message has the flag set.
>
>How would you envision the host to implement the flag? Would the host 
>allow user space to listen on any CID and hence receive the respective 
>target connections? And wouldn't listening on CID 0 then mean you're 
>effectively listening to "any" other CID? Thinking about that a bit 
>more, that may be just what we need, yes :)

No, wait. The flag I had guessed only to implement sibling 
communication, so the host doesn't re-forward those packets to sockets 
opened by applications in the host, but only to other VMs in the same 
host. So the host would always only have CID 2 assigned (CID 0 is not 
supported by vhost-vsock).

>
>
>>
>>
>>>>nitro-enclave machine
>>>>type in QEMU to run the necessary vsock server/clients in the host machine
>>>>(some defaults can be implemented in QEMU as well, for example, sending a reply
>>>>to the heartbeat) which will rid them of the cumbersome way of running another
>>>>whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
>>>>potentially also run multiple enclaves with their messages for CID 3 forwarded
>>>>to different CIDs which, in QEMU side, could then be specified using a new
>>>>machine type option (parent-cid) if implemented. I guess in the QEMU side, this
>>>>will be an ioctl call (or some other way) to indicate to the host kernel that
>>>>the CID 3 messages need to be forwarded. Does this approach of
>>>What if there is already a VM with CID = 3 in the system?
>>
>>
>>Good question! I don't know what should happen in this case.
>
>
>See case 2 above :). In a nutshell, I don't think it'd be legal to 
>have a real CID 3 in that scenario.

Yeah, with vhost-vsock we can't, but with vhost-user-vsock I think is 
fine since the guest CID is local for each instance. The host only sees
the unix socket (like with firecracker).

>
>
>>
>>
>>>>forwarding CID 3 messages to another CID sound good?
>>>It seems too specific a case, if we can generalize it maybe we could
>>>make this change, but we would like to avoid complicating vhost-vsock
>>>and keep it as simple as possible to avoid then having to implement
>>>firewalls, etc.
>>>
>>>So first I would see if vhost-user-vsock or the QEMU built-in device is
>>>right for this use-case.
>>Thanks you! I will check everything out and reach out if I need
>>further guidance about what needs to be done. And sorry as I wasn't
>>able to answer some of your questions.
>
>
>As mentioned above, I think there is merit for both. I personally care 
>a lot more for case 1 over case 2: We already have a working 
>implementation of Nitro Enclaves in a Cloud setup. What is missing is 
>a way to easily run a Nitro Enclave locally for development.

If both are fine, then I would go more on modifying vhost-user-vsock or 
adding a built-in device in QEMU.
We have more freedom and also easier to update/debug.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-23  8:45       ` Stefano Garzarella
@ 2024-05-27  7:08         ` Alexander Graf
  2024-05-27  7:54           ` Alexander Graf
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2024-05-27  7:08 UTC (permalink / raw)
  To: Stefano Garzarella, Alexander Graf
  Cc: Dorjoy Chowdhury, virtualization, kvm, netdev, stefanha

Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:
> On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>> Howdy,
>>
>> On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>> Hey Stefano,
>>>
>>> Thanks for the reply.
>>>
>>>
>>> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>> <sgarzare@redhat.com> wrote:
>>>> Hi Dorjoy,
>>>>
>>>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>> Hi,
>>>>>
>>>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>> emulation support in QEMU. Alexander Graf is mentoring me on this 
>>>>> work. A v1
>>>>> patch series has already been posted to the qemu-devel mailing 
>>>>> list[2].
>>>>>
>>>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows 
>>>>> creating isolated
>>>>> execution environments, called enclaves, from Amazon EC2 
>>>>> instances, which are
>>>>> used for processing highly sensitive data. Enclaves have no 
>>>>> persistent storage
>>>>> and no external networking. The enclave VMs are based on 
>>>>> Firecracker microvm
>>>>> and have a vhost-vsock device for communication with the parent 
>>>>> EC2 instance
>>>>> that spawned it and a Nitro Secure Module (NSM) device for 
>>>>> cryptographic
>>>>> attestation. The parent instance VM always has CID 3 while the 
>>>>> enclave VM gets
>>>>> a dynamic CID. The enclave VMs can communicate with the parent 
>>>>> instance over
>>>>> various ports to CID 3, for example, the init process inside an 
>>>>> enclave sends a
>>>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
>>>>> letting the
>>>>> parent instance know that the enclave VM has successfully booted.
>>>>>
>>>>> The plan is to eventually make the nitro enclave emulation in QEMU 
>>>>> standalone
>>>>> i.e., without needing to run another VM with CID 3 with proper vsock
>>>> If you don't have to launch another VM, maybe we can avoid vhost-vsock
>>>> and emulate virtio-vsock in user-space, having complete control 
>>>> over the
>>>> behavior.
>>>>
>>>> So we could use this opportunity to implement virtio-vsock in QEMU [4]
>>>> or use vhost-user-vsock [5] and customize it somehow.
>>>> (Note: vhost-user-vsock already supports sibling communication, so 
>>>> maybe
>>>> with a few modifications it fits your case perfectly)
>>>>
>>>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>> [5] 
>>>> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>
>>>
>>> Thanks for letting me know. Right now I don't have a complete picture
>>> but I will look into them. Thank you.
>>>>
>>>>
>>>>> communication support. For this to work, one approach could be to 
>>>>> teach the
>>>>> vhost driver in kernel to forward CID 3 messages to another CID N
>>>> So in this case both CID 3 and N would be assigned to the same QEMU
>>>> process?
>>>
>>>
>>> CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>> parent VM that spawns the enclave VM (this is how it is in AWS, where
>>> an EC2 instance VM spawns the enclave VM from inside it and that
>>> parent EC2 instance always has CID 3). But in the QEMU case as we
>>> don't want a parent VM (we want to run enclave VMs standalone) we
>>> would need to forward the CID 3 messages to host CID. I don't know if
>>> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>
>>
>> There are 2 use cases here:
>>
>> 1) Enclave wants to treat host as parent (default). In this scenario,
>> the "parent instance" that shows up as CID 3 in the Enclave doesn't
>> really exist. Instead, when the Enclave attempts to talk to CID 3, it
>> should really land on CID 0 (hypervisor). When the hypervisor tries to
>> connect to the Enclave on port X, it should look as if it originates
>> from CID 3, not CID 0.
>>
>> 2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>> Here, we have multiple "parent instances". Each of them thinks it's
>> CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>> parent. For this case, I think implementing all of virtio-vsock in
>> user space is the best path forward. But in theory, you could also
>> swizzle CIDs to make random "real" CIDs appear as CID 3.
>>
>
> Thank you for clarifying the use cases!
>
> Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
> it's easier to go into user-space with vhost-user-vsock or the built-in
> device.


Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
on the hypervisor listens on port 1234, it should be visible as 3:1234 
from the VM and when the hypervisor process connects to <VM CID>:1234, 
it should look as if that connection came from CID 3.


> Maybe initially with vhost-user-vsock it's easier because we already
> have some thing that works and supports sibling communication (for case
> 2).


The problem with vhost-user-vsock is that you don't get to use AF_VSOCK 
as a host process.

A typical Nitro Enclaves application is split into 2 parts: An 
in-Enclave component that listens/connects to vsock and a parent process 
that listens/connects to vsock. The experience of launching an Enclave 
is very similar to launching a QEMU VM: You run nitro-cli and tell it to 
pop up the Enclave based on an EIF file. Nitro-cli then tells you the 
CID that was allocated for the Enclave and you communicate to it using that.

What I would ideally like to have as development experience is that you 
run QEMU with unmodified Enclave components (the EIF file) and run your 
parent application unmodified on the host.

For that to work, the host applications needs to be able to use AF_VSOCK.


I agree that for this conversation, we should just ignore case 2 and 
consider it as "solved" through vhost-user-vsock, as that can create its 
own CID namespace between different VMs.


>
>>
>>>
>>>> Do you have to allocate 2 separate virtio-vsock devices, one for the
>>>> parent and one for the enclave?
>>>
>>>
>>> If there is a parent VM, then I guess both parent and enclave VMs need
>>> virtio-vsock devices.
>>>
>>>>> (set to CID 2 for host) i.e., it patches CID from 3 to N on 
>>>>> incoming messages
>>>>> and from N to 3 on responses. This will enable users of the
>>>> Will these messages have the VMADDR_FLAG_TO_HOST flag set?
>>>>
>>>> We don't support this in vhost-vsock yet, if supporting it helps, we
>>>> might, but we need to better understand how to avoid security 
>>>> issues, so
>>>> maybe each device needs to explicitly enable the feature and specify
>>>> from which CIDs it accepts packets.
>>>
>>>
>>> I don't know about the flag. So I don't know if it will be set. Sorry.
>>
>>
>> From the guest's point of view, the parent (CID 3) is just another VM.
>> Since Linux as of
>>
>>  https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117 
>>
>>
>> always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I
>> would say the message has the flag set.
>>
>> How would you envision the host to implement the flag? Would the host
>> allow user space to listen on any CID and hence receive the respective
>> target connections? And wouldn't listening on CID 0 then mean you're
>> effectively listening to "any" other CID? Thinking about that a bit
>> more, that may be just what we need, yes :)
>
> No, wait. The flag I had guessed only to implement sibling
> communication, so the host doesn't re-forward those packets to sockets
> opened by applications in the host, but only to other VMs in the same
> host. So the host would always only have CID 2 assigned (CID 0 is not
> supported by vhost-vsock).
>
>>
>>
>>>
>>>
>>>>> nitro-enclave machine
>>>>> type in QEMU to run the necessary vsock server/clients in the host 
>>>>> machine
>>>>> (some defaults can be implemented in QEMU as well, for example, 
>>>>> sending a reply
>>>>> to the heartbeat) which will rid them of the cumbersome way of 
>>>>> running another
>>>>> whole VM with CID 3. This way, users of nitro-enclave machine in 
>>>>> QEMU, could
>>>>> potentially also run multiple enclaves with their messages for CID 
>>>>> 3 forwarded
>>>>> to different CIDs which, in QEMU side, could then be specified 
>>>>> using a new
>>>>> machine type option (parent-cid) if implemented. I guess in the 
>>>>> QEMU side, this
>>>>> will be an ioctl call (or some other way) to indicate to the host 
>>>>> kernel that
>>>>> the CID 3 messages need to be forwarded. Does this approach of
>>>> What if there is already a VM with CID = 3 in the system?
>>>
>>>
>>> Good question! I don't know what should happen in this case.
>>
>>
>> See case 2 above :). In a nutshell, I don't think it'd be legal to
>> have a real CID 3 in that scenario.
>
> Yeah, with vhost-vsock we can't, but with vhost-user-vsock I think is
> fine since the guest CID is local for each instance. The host only sees
> the unix socket (like with firecracker).


See above why a unix socket is not really great CX :)


>
>>
>>
>>>
>>>
>>>>> forwarding CID 3 messages to another CID sound good?
>>>> It seems too specific a case, if we can generalize it maybe we could
>>>> make this change, but we would like to avoid complicating vhost-vsock
>>>> and keep it as simple as possible to avoid then having to implement
>>>> firewalls, etc.
>>>>
>>>> So first I would see if vhost-user-vsock or the QEMU built-in 
>>>> device is
>>>> right for this use-case.
>>> Thanks you! I will check everything out and reach out if I need
>>> further guidance about what needs to be done. And sorry as I wasn't
>>> able to answer some of your questions.
>>
>>
>> As mentioned above, I think there is merit for both. I personally care
>> a lot more for case 1 over case 2: We already have a working
>> implementation of Nitro Enclaves in a Cloud setup. What is missing is
>> a way to easily run a Nitro Enclave locally for development.
>
> If both are fine, then I would go more on modifying vhost-user-vsock or
> adding a built-in device in QEMU.
> We have more freedom and also easier to update/debug.


I agree on those points, but if we go down that route users can't simply 
reuse their existing code, no? At that point, they're probably better 
off just spawning another (micro)-VM on CID 3, as that at least gives 
them the ability to reuse their existing parent code.


Alex





Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-27  7:08         ` Alexander Graf
@ 2024-05-27  7:54           ` Alexander Graf
  2024-05-28 14:43             ` Stefano Garzarella
  2024-05-28 15:19             ` Paolo Bonzini
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Graf @ 2024-05-27  7:54 UTC (permalink / raw)
  To: Stefano Garzarella, Alexander Graf
  Cc: Dorjoy Chowdhury, virtualization, kvm, netdev, stefanha


On 27.05.24 09:08, Alexander Graf wrote:
> Hey Stefano,
>
> On 23.05.24 10:45, Stefano Garzarella wrote:
>> On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>> Howdy,
>>>
>>> On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>> Hey Stefano,
>>>>
>>>> Thanks for the reply.
>>>>
>>>>
>>>> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>>> <sgarzare@redhat.com> wrote:
>>>>> Hi Dorjoy,
>>>>>
>>>>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>> emulation support in QEMU. Alexander Graf is mentoring me on this 
>>>>>> work. A v1
>>>>>> patch series has already been posted to the qemu-devel mailing 
>>>>>> list[2].
>>>>>>
>>>>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows 
>>>>>> creating isolated
>>>>>> execution environments, called enclaves, from Amazon EC2 
>>>>>> instances, which are
>>>>>> used for processing highly sensitive data. Enclaves have no 
>>>>>> persistent storage
>>>>>> and no external networking. The enclave VMs are based on 
>>>>>> Firecracker microvm
>>>>>> and have a vhost-vsock device for communication with the parent 
>>>>>> EC2 instance
>>>>>> that spawned it and a Nitro Secure Module (NSM) device for 
>>>>>> cryptographic
>>>>>> attestation. The parent instance VM always has CID 3 while the 
>>>>>> enclave VM gets
>>>>>> a dynamic CID. The enclave VMs can communicate with the parent 
>>>>>> instance over
>>>>>> various ports to CID 3, for example, the init process inside an 
>>>>>> enclave sends a
>>>>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
>>>>>> letting the
>>>>>> parent instance know that the enclave VM has successfully booted.
>>>>>>
>>>>>> The plan is to eventually make the nitro enclave emulation in 
>>>>>> QEMU standalone
>>>>>> i.e., without needing to run another VM with CID 3 with proper vsock
>>>>> If you don't have to launch another VM, maybe we can avoid 
>>>>> vhost-vsock
>>>>> and emulate virtio-vsock in user-space, having complete control 
>>>>> over the
>>>>> behavior.
>>>>>
>>>>> So we could use this opportunity to implement virtio-vsock in QEMU 
>>>>> [4]
>>>>> or use vhost-user-vsock [5] and customize it somehow.
>>>>> (Note: vhost-user-vsock already supports sibling communication, so 
>>>>> maybe
>>>>> with a few modifications it fits your case perfectly)
>>>>>
>>>>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>> [5] 
>>>>> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>
>>>>
>>>> Thanks for letting me know. Right now I don't have a complete picture
>>>> but I will look into them. Thank you.
>>>>>
>>>>>
>>>>>> communication support. For this to work, one approach could be to 
>>>>>> teach the
>>>>>> vhost driver in kernel to forward CID 3 messages to another CID N
>>>>> So in this case both CID 3 and N would be assigned to the same QEMU
>>>>> process?
>>>>
>>>>
>>>> CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>> parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>> an EC2 instance VM spawns the enclave VM from inside it and that
>>>> parent EC2 instance always has CID 3). But in the QEMU case as we
>>>> don't want a parent VM (we want to run enclave VMs standalone) we
>>>> would need to forward the CID 3 messages to host CID. I don't know if
>>>> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>
>>>
>>> There are 2 use cases here:
>>>
>>> 1) Enclave wants to treat host as parent (default). In this scenario,
>>> the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>> really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>> should really land on CID 0 (hypervisor). When the hypervisor tries to
>>> connect to the Enclave on port X, it should look as if it originates
>>> from CID 3, not CID 0.
>>>
>>> 2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>> Here, we have multiple "parent instances". Each of them thinks it's
>>> CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>> parent. For this case, I think implementing all of virtio-vsock in
>>> user space is the best path forward. But in theory, you could also
>>> swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>
>>
>> Thank you for clarifying the use cases!
>>
>> Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>> it's easier to go into user-space with vhost-user-vsock or the built-in
>> device.
>
>
> Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
> on the hypervisor listens on port 1234, it should be visible as 3:1234 
> from the VM and when the hypervisor process connects to <VM CID>:1234, 
> it should look as if that connection came from CID 3.


Now that I'm thinking about my message again: What if we just introduce 
a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
just write 3 to it.

It means we'd need to change all references to VMADDR_CID_HOST to 
instead refer to a global variable that indicates the new "host CID". 
It'd need some more careful massaging to not break number namespace 
assumptions (<= CID_HOST no longer works), but the idea should fly.

That would give us all 3 options:

1) User sets vsock.host_cid = 3 to simulate that the host is in reality 
an enclave parent
2) User spawns VM with CID = 3 to run parent payload inside
3) User spawns parent and enclave VMs with vhost-vsock-user which 
creates its own CID namespace


Stefano, WDYT?


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-27  7:54           ` Alexander Graf
@ 2024-05-28 14:43             ` Stefano Garzarella
  2024-05-28 15:19             ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-28 14:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alexander Graf, Dorjoy Chowdhury, virtualization, kvm, netdev,
	stefanha

On Mon, May 27, 2024 at 09:54:17AM GMT, Alexander Graf wrote:
>
>On 27.05.24 09:08, Alexander Graf wrote:
>>Hey Stefano,
>>
>>On 23.05.24 10:45, Stefano Garzarella wrote:
>>>On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>>Howdy,
>>>>
>>>>On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>>Hey Stefano,
>>>>>
>>>>>Thanks for the reply.
>>>>>
>>>>>
>>>>>On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>>>><sgarzare@redhat.com> wrote:
>>>>>>Hi Dorjoy,
>>>>>>
>>>>>>On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>>Hi,
>>>>>>>
>>>>>>>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>>emulation support in QEMU. Alexander Graf is mentoring me 
>>>>>>>on this work. A v1
>>>>>>>patch series has already been posted to the qemu-devel 
>>>>>>>mailing list[2].
>>>>>>>
>>>>>>>AWS nitro enclaves is an Amazon EC2[3] feature that allows 
>>>>>>>creating isolated
>>>>>>>execution environments, called enclaves, from Amazon EC2 
>>>>>>>instances, which are
>>>>>>>used for processing highly sensitive data. Enclaves have 
>>>>>>>no persistent storage
>>>>>>>and no external networking. The enclave VMs are based on 
>>>>>>>Firecracker microvm
>>>>>>>and have a vhost-vsock device for communication with the 
>>>>>>>parent EC2 instance
>>>>>>>that spawned it and a Nitro Secure Module (NSM) device for 
>>>>>>>cryptographic
>>>>>>>attestation. The parent instance VM always has CID 3 while 
>>>>>>>the enclave VM gets
>>>>>>>a dynamic CID. The enclave VMs can communicate with the 
>>>>>>>parent instance over
>>>>>>>various ports to CID 3, for example, the init process 
>>>>>>>inside an enclave sends a
>>>>>>>heartbeat to port 9000 upon boot, expecting a heartbeat 
>>>>>>>reply, letting the
>>>>>>>parent instance know that the enclave VM has successfully booted.
>>>>>>>
>>>>>>>The plan is to eventually make the nitro enclave emulation 
>>>>>>>in QEMU standalone
>>>>>>>i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>>If you don't have to launch another VM, maybe we can avoid 
>>>>>>vhost-vsock
>>>>>>and emulate virtio-vsock in user-space, having complete 
>>>>>>control over the
>>>>>>behavior.
>>>>>>
>>>>>>So we could use this opportunity to implement virtio-vsock 
>>>>>>in QEMU [4]
>>>>>>or use vhost-user-vsock [5] and customize it somehow.
>>>>>>(Note: vhost-user-vsock already supports sibling 
>>>>>>communication, so maybe
>>>>>>with a few modifications it fits your case perfectly)
>>>>>>
>>>>>>[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>>[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>
>>>>>
>>>>>Thanks for letting me know. Right now I don't have a complete picture
>>>>>but I will look into them. Thank you.
>>>>>>
>>>>>>
>>>>>>>communication support. For this to work, one approach 
>>>>>>>could be to teach the
>>>>>>>vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>>So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>>process?
>>>>>
>>>>>
>>>>>CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>>parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>>an EC2 instance VM spawns the enclave VM from inside it and that
>>>>>parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>>don't want a parent VM (we want to run enclave VMs standalone) we
>>>>>would need to forward the CID 3 messages to host CID. I don't know if
>>>>>it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>
>>>>
>>>>There are 2 use cases here:
>>>>
>>>>1) Enclave wants to treat host as parent (default). In this 
>>>>scenario,
>>>>the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>>really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>>should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>>connect to the Enclave on port X, it should look as if it originates
>>>>from CID 3, not CID 0.
>>>>
>>>>2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>>Here, we have multiple "parent instances". Each of them thinks it's
>>>>CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>>parent. For this case, I think implementing all of virtio-vsock in
>>>>user space is the best path forward. But in theory, you could also
>>>>swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>
>>>
>>>Thank you for clarifying the use cases!
>>>
>>>Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>>it's easier to go into user-space with vhost-user-vsock or the built-in
>>>device.
>>
>>
>>Sorry, I believe I meant CID 2. Effectively for case 1, when a 
>>process on the hypervisor listens on port 1234, it should be visible 
>>as 3:1234 from the VM and when the hypervisor process connects to 
>><VM CID>:1234, it should look as if that connection came from CID 3.
>
>
>Now that I'm thinking about my message again: What if we just introduce 
>a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
>2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
>just write 3 to it.

I don't know if I understand the final use case well, so I'll try to 
summarize it:

what you would like is to have the ability to receive/send messages from 
the host to a guest as if it were a sibling VM, so as if it had a CID 
!=2 (in your case 3). The important point is to use AF_VSOCK in the host 
application, so no a unix-socket like firecracker.

Is this correct?

I thought you were using firecracker for this scenario, so it seemed to 
make sense to expect user applications to support hybrid vsock.

>
>It means we'd need to change all references to VMADDR_CID_HOST to 
>instead refer to a global variable that indicates the new "host CID". 
>It'd need some more careful massaging to not break number namespace 
>assumptions (<= CID_HOST no longer works), but the idea should fly.
>
>That would give us all 3 options:
>
>1) User sets vsock.host_cid = 3 to simulate that the host is in 
>reality an enclave parent
>2) User spawns VM with CID = 3 to run parent payload inside
>3) User spawns parent and enclave VMs with vhost-vsock-user which 
>creates its own CID namespace
>
>
>Stefano, WDYT?

This would require many changes in the af_vsock core as well. Perhaps we 
can avoid touching the core in this way:

1. extend vhost-vsock to support VMADDR_FLAG_TO_HOST (this is need also
    when the user spawns a VM with CID = 3 using vhost-vsock).
    Some new ioctl/sysfs should be needed to create an allowlist of CIDs
    that may or may not be accepted. (note: as now, vhost-vsock discards
    all packets that have dst_cid != 2)

2. create a new G2H transport that will be loaded in the host.
    af_vsock core supports 3 transport types to be loaded at runtime
    simultaneously: looback, G2H (e.g. virtio-vsock, hyper-v, vmci
    driver), H2G (e.g. vhost-vsock kernel module). We originally
    introduced this extension to support nested VMs. This split is used
    mostly to handle CIDs:
    - loopback (local CID = 1)
    - H2G (local CID = 2)
    - G2H (local CID > 2)

    Perhaps the simplest thing is to extend vsock_loopback to be used
    here, but instead of registering as loopback (which can only handle
    CID 1), it should register as G2H, this way we reuse all the logic
    already in the af_vsock core to handle CIDs > 2.

    The only problem is that in this case your host, it can't be nested.
    But upstream there's a proposal to support multiple virtio-vsock
    devices in a guest, so we could adapt it to support this case in the
    future.


WDYT?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-27  7:54           ` Alexander Graf
  2024-05-28 14:43             ` Stefano Garzarella
@ 2024-05-28 15:19             ` Paolo Bonzini
  2024-05-28 15:41               ` Stefano Garzarella
  1 sibling, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2024-05-28 15:19 UTC (permalink / raw)
  To: Alexander Graf, Stefano Garzarella, Alexander Graf
  Cc: Dorjoy Chowdhury, virtualization, kvm, netdev, stefanha

On 5/27/24 09:54, Alexander Graf wrote:
> 
> On 27.05.24 09:08, Alexander Graf wrote:
>> Hey Stefano,
>>
>> On 23.05.24 10:45, Stefano Garzarella wrote:
>>> On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>> Howdy,
>>>>
>>>> On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>> Hey Stefano,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>>
>>>>> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>>>> <sgarzare@redhat.com> wrote:
>>>>>> Hi Dorjoy,
>>>>>>
>>>>>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>> emulation support in QEMU. Alexander Graf is mentoring me on this 
>>>>>>> work. A v1
>>>>>>> patch series has already been posted to the qemu-devel mailing 
>>>>>>> list[2].
>>>>>>>
>>>>>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows 
>>>>>>> creating isolated
>>>>>>> execution environments, called enclaves, from Amazon EC2 
>>>>>>> instances, which are
>>>>>>> used for processing highly sensitive data. Enclaves have no 
>>>>>>> persistent storage
>>>>>>> and no external networking. The enclave VMs are based on 
>>>>>>> Firecracker microvm
>>>>>>> and have a vhost-vsock device for communication with the parent 
>>>>>>> EC2 instance
>>>>>>> that spawned it and a Nitro Secure Module (NSM) device for 
>>>>>>> cryptographic
>>>>>>> attestation. The parent instance VM always has CID 3 while the 
>>>>>>> enclave VM gets
>>>>>>> a dynamic CID. The enclave VMs can communicate with the parent 
>>>>>>> instance over
>>>>>>> various ports to CID 3, for example, the init process inside an 
>>>>>>> enclave sends a
>>>>>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
>>>>>>> letting the
>>>>>>> parent instance know that the enclave VM has successfully booted.
>>>>>>>
>>>>>>> The plan is to eventually make the nitro enclave emulation in 
>>>>>>> QEMU standalone
>>>>>>> i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>> If you don't have to launch another VM, maybe we can avoid 
>>>>>> vhost-vsock
>>>>>> and emulate virtio-vsock in user-space, having complete control 
>>>>>> over the
>>>>>> behavior.
>>>>>>
>>>>>> So we could use this opportunity to implement virtio-vsock in QEMU 
>>>>>> [4]
>>>>>> or use vhost-user-vsock [5] and customize it somehow.
>>>>>> (Note: vhost-user-vsock already supports sibling communication, so 
>>>>>> maybe
>>>>>> with a few modifications it fits your case perfectly)
>>>>>>
>>>>>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>> [5] 
>>>>>> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>
>>>>>
>>>>> Thanks for letting me know. Right now I don't have a complete picture
>>>>> but I will look into them. Thank you.
>>>>>>
>>>>>>
>>>>>>> communication support. For this to work, one approach could be to 
>>>>>>> teach the
>>>>>>> vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>> So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>> process?
>>>>>
>>>>>
>>>>> CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>> parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>> an EC2 instance VM spawns the enclave VM from inside it and that
>>>>> parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>> don't want a parent VM (we want to run enclave VMs standalone) we
>>>>> would need to forward the CID 3 messages to host CID. I don't know if
>>>>> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>
>>>>
>>>> There are 2 use cases here:
>>>>
>>>> 1) Enclave wants to treat host as parent (default). In this scenario,
>>>> the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>> really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>> should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>> connect to the Enclave on port X, it should look as if it originates
>>>> from CID 3, not CID 0.
>>>>
>>>> 2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>> Here, we have multiple "parent instances". Each of them thinks it's
>>>> CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>> parent. For this case, I think implementing all of virtio-vsock in
>>>> user space is the best path forward. But in theory, you could also
>>>> swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>
>>>
>>> Thank you for clarifying the use cases!
>>>
>>> Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>> it's easier to go into user-space with vhost-user-vsock or the built-in
>>> device.
>>
>>
>> Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
>> on the hypervisor listens on port 1234, it should be visible as 3:1234 
>> from the VM and when the hypervisor process connects to <VM CID>:1234, 
>> it should look as if that connection came from CID 3.
> 
> 
> Now that I'm thinking about my message again: What if we just introduce 
> a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
> 2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
> just write 3 to it.
> 
> It means we'd need to change all references to VMADDR_CID_HOST to 
> instead refer to a global variable that indicates the new "host CID". 
> It'd need some more careful massaging to not break number namespace 
> assumptions (<= CID_HOST no longer works), but the idea should fly.

Forwarding one or more ports of a given CID to CID 2 (the host) should 
be doable with a dummy vhost client that listens to CID 3, connects to 
CID 2 and send data back and forth.  Not hard enough to justify changing 
all references to VMADDR_CID_HOST (and also I am not sure if vsock 
supports network namespaces?  then the sysctl/sysfs way is not feasible 
because you cannot set it per-netns, can you?).  It also has the 
disadvantages that different QEMU instances are not insulated.

I think it's either that or implementing virtio-vsock in userspace 
(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/, 
search for "To connect host<->guest").

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-28 15:19             ` Paolo Bonzini
@ 2024-05-28 15:41               ` Stefano Garzarella
  2024-05-28 15:49                 ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-28 15:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, Alexander Graf, Dorjoy Chowdhury, virtualization,
	kvm, netdev, stefanha

On Tue, May 28, 2024 at 05:19:34PM GMT, Paolo Bonzini wrote:
>On 5/27/24 09:54, Alexander Graf wrote:
>>
>>On 27.05.24 09:08, Alexander Graf wrote:
>>>Hey Stefano,
>>>
>>>On 23.05.24 10:45, Stefano Garzarella wrote:
>>>>On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>>>Howdy,
>>>>>
>>>>>On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>>>Hey Stefano,
>>>>>>
>>>>>>Thanks for the reply.
>>>>>>
>>>>>>
>>>>>>On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>>>>><sgarzare@redhat.com> wrote:
>>>>>>>Hi Dorjoy,
>>>>>>>
>>>>>>>On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>>>emulation support in QEMU. Alexander Graf is mentoring 
>>>>>>>>me on this work. A v1
>>>>>>>>patch series has already been posted to the qemu-devel 
>>>>>>>>mailing list[2].
>>>>>>>>
>>>>>>>>AWS nitro enclaves is an Amazon EC2[3] feature that 
>>>>>>>>allows creating isolated
>>>>>>>>execution environments, called enclaves, from Amazon EC2 
>>>>>>>>instances, which are
>>>>>>>>used for processing highly sensitive data. Enclaves have 
>>>>>>>>no persistent storage
>>>>>>>>and no external networking. The enclave VMs are based on 
>>>>>>>>Firecracker microvm
>>>>>>>>and have a vhost-vsock device for communication with the 
>>>>>>>>parent EC2 instance
>>>>>>>>that spawned it and a Nitro Secure Module (NSM) device 
>>>>>>>>for cryptographic
>>>>>>>>attestation. The parent instance VM always has CID 3 
>>>>>>>>while the enclave VM gets
>>>>>>>>a dynamic CID. The enclave VMs can communicate with the 
>>>>>>>>parent instance over
>>>>>>>>various ports to CID 3, for example, the init process 
>>>>>>>>inside an enclave sends a
>>>>>>>>heartbeat to port 9000 upon boot, expecting a heartbeat 
>>>>>>>>reply, letting the
>>>>>>>>parent instance know that the enclave VM has successfully booted.
>>>>>>>>
>>>>>>>>The plan is to eventually make the nitro enclave 
>>>>>>>>emulation in QEMU standalone
>>>>>>>>i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>>>If you don't have to launch another VM, maybe we can avoid 
>>>>>>>vhost-vsock
>>>>>>>and emulate virtio-vsock in user-space, having complete 
>>>>>>>control over the
>>>>>>>behavior.
>>>>>>>
>>>>>>>So we could use this opportunity to implement virtio-vsock 
>>>>>>>in QEMU [4]
>>>>>>>or use vhost-user-vsock [5] and customize it somehow.
>>>>>>>(Note: vhost-user-vsock already supports sibling 
>>>>>>>communication, so maybe
>>>>>>>with a few modifications it fits your case perfectly)
>>>>>>>
>>>>>>>[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>>>[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>>
>>>>>>
>>>>>>Thanks for letting me know. Right now I don't have a complete picture
>>>>>>but I will look into them. Thank you.
>>>>>>>
>>>>>>>
>>>>>>>>communication support. For this to work, one approach 
>>>>>>>>could be to teach the
>>>>>>>>vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>>>So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>>>process?
>>>>>>
>>>>>>
>>>>>>CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>>>parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>>>an EC2 instance VM spawns the enclave VM from inside it and that
>>>>>>parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>>>don't want a parent VM (we want to run enclave VMs standalone) we
>>>>>>would need to forward the CID 3 messages to host CID. I don't know if
>>>>>>it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>>
>>>>>
>>>>>There are 2 use cases here:
>>>>>
>>>>>1) Enclave wants to treat host as parent (default). In this scenario,
>>>>>the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>>>really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>>>should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>>>connect to the Enclave on port X, it should look as if it originates
>>>>>from CID 3, not CID 0.
>>>>>
>>>>>2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>>>Here, we have multiple "parent instances". Each of them thinks it's
>>>>>CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>>>parent. For this case, I think implementing all of virtio-vsock in
>>>>>user space is the best path forward. But in theory, you could also
>>>>>swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>>
>>>>
>>>>Thank you for clarifying the use cases!
>>>>
>>>>Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>>>it's easier to go into user-space with vhost-user-vsock or the built-in
>>>>device.
>>>
>>>
>>>Sorry, I believe I meant CID 2. Effectively for case 1, when a 
>>>process on the hypervisor listens on port 1234, it should be 
>>>visible as 3:1234 from the VM and when the hypervisor process 
>>>connects to <VM CID>:1234, it should look as if that connection 
>>>came from CID 3.
>>
>>
>>Now that I'm thinking about my message again: What if we just 
>>introduce a sysfs/sysctl file for vsock that indicates the "host 
>>CID" (default: 2)? Users that want vhost-vsock to behave as if the 
>>host is CID 3 can just write 3 to it.
>>
>>It means we'd need to change all references to VMADDR_CID_HOST to 
>>instead refer to a global variable that indicates the new "host 
>>CID". It'd need some more careful massaging to not break number 
>>namespace assumptions (<= CID_HOST no longer works), but the idea 
>>should fly.
>
>Forwarding one or more ports of a given CID to CID 2 (the host) should 
>be doable with a dummy vhost client that listens to CID 3, connects to 
>CID 2 and send data back and forth.

Good idea, a kind of socat but that can handle /dev/vhost-vsock. With 
rust-vmm crates it should be doable, but I think we always need to 
extend vhost-vsock to support VMADDR_FLAG_TO_HOST, because for now it 
does not allow guests to send packets to the host with destinatation 
other than 2.

>Not hard enough to justify changing all references to VMADDR_CID_HOST

I agree.

>(and also I am not sure if vsock supports network namespaces?

nope, I had been working on it, but I could never finish it :-(
Tracking the work here: https://gitlab.com/vsock/vsock/-/issues/2

>then the sysctl/sysfs way is not feasible because you cannot set it 
>per-netns, can you?).  It also has the disadvantages that different 
>QEMU instances are not insulated.
>
>I think it's either that or implementing virtio-vsock in userspace (https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/, 
>search for "To connect host<->guest").

For in this case AF_VSOCK can't be used in the host, right?
So it's similar to vhost-user-vsock.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-28 15:41               ` Stefano Garzarella
@ 2024-05-28 15:49                 ` Paolo Bonzini
  2024-05-28 15:53                   ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2024-05-28 15:49 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Alexander Graf, Dorjoy Chowdhury, virtualization,
	kvm, netdev, stefanha

On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >I think it's either that or implementing virtio-vsock in userspace
> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
> >search for "To connect host<->guest").
>
> For in this case AF_VSOCK can't be used in the host, right?
> So it's similar to vhost-user-vsock.

Not sure if I understand but in this case QEMU knows which CIDs are
forwarded to the host (either listen on vsock and connect to the host,
or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
involved.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-28 15:49                 ` Paolo Bonzini
@ 2024-05-28 15:53                   ` Stefano Garzarella
  2024-05-28 16:38                     ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-28 15:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, Alexander Graf, Dorjoy Chowdhury, virtualization,
	kvm, netdev, stefanha

On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
>On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >I think it's either that or implementing virtio-vsock in userspace
>> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
>> >search for "To connect host<->guest").
>>
>> For in this case AF_VSOCK can't be used in the host, right?
>> So it's similar to vhost-user-vsock.
>
>Not sure if I understand but in this case QEMU knows which CIDs are
>forwarded to the host (either listen on vsock and connect to the host,
>or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
>involved.
>

I meant that the application in the host that wants to connect to the 
guest cannot use AF_VSOCK in the host, but must use the one where QEMU 
is listening (e.g. AF_INET, AF_UNIX), right?

I think one of Alex's requirements was that the application in the host 
continue to use AF_VSOCK as in their environment.

Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-28 15:53                   ` Stefano Garzarella
@ 2024-05-28 16:38                     ` Paolo Bonzini
  2024-05-29  8:04                       ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2024-05-28 16:38 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Alexander Graf, Dorjoy Chowdhury, virtualization,
	kvm, netdev, stefanha

On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
> >On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >I think it's either that or implementing virtio-vsock in userspace
> >> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
> >> >search for "To connect host<->guest").
> >>
> >> For in this case AF_VSOCK can't be used in the host, right?
> >> So it's similar to vhost-user-vsock.
> >
> >Not sure if I understand but in this case QEMU knows which CIDs are
> >forwarded to the host (either listen on vsock and connect to the host,
> >or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
> >involved.
>
> I meant that the application in the host that wants to connect to the
> guest cannot use AF_VSOCK in the host, but must use the one where QEMU
> is listening (e.g. AF_INET, AF_UNIX), right?
>
> I think one of Alex's requirements was that the application in the host
> continue to use AF_VSOCK as in their environment.

Can the host use VMADDR_CID_LOCAL for host-to-host communication? If
so, the proposed "-object vsock-forward" syntax can connect to it and
it should work as long as the application on the host does not assume
that it is on CID 3.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-28 16:38                     ` Paolo Bonzini
@ 2024-05-29  8:04                       ` Stefano Garzarella
  2024-05-29 10:43                         ` Alexander Graf
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-29  8:04 UTC (permalink / raw)
  To: Paolo Bonzini, Alexander Graf
  Cc: Alexander Graf, Dorjoy Chowdhury, virtualization, kvm, netdev,
	stefanha

On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
>On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
>> >On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >> >I think it's either that or implementing virtio-vsock in userspace
>> >> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
>> >> >search for "To connect host<->guest").
>> >>
>> >> For in this case AF_VSOCK can't be used in the host, right?
>> >> So it's similar to vhost-user-vsock.
>> >
>> >Not sure if I understand but in this case QEMU knows which CIDs are
>> >forwarded to the host (either listen on vsock and connect to the host,
>> >or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
>> >involved.
>>
>> I meant that the application in the host that wants to connect to the
>> guest cannot use AF_VSOCK in the host, but must use the one where QEMU
>> is listening (e.g. AF_INET, AF_UNIX), right?
>>
>> I think one of Alex's requirements was that the application in the host
>> continue to use AF_VSOCK as in their environment.
>
>Can the host use VMADDR_CID_LOCAL for host-to-host communication?

Yep!

>If
>so, the proposed "-object vsock-forward" syntax can connect to it and
>it should work as long as the application on the host does not assume
>that it is on CID 3.

Right, good point!
We can also support something similar in vhost-user-vsock, where instead 
of using AF_UNIX and firecracker's hybrid vsock, we can redirect 
everything to VMADDR_CID_LOCAL.

Alex what do you think? That would simplify things a lot to do.
The only difference is that the application in the host has to talk to 
VMADDR_CID_LOCAL (1).

Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-29  8:04                       ` Stefano Garzarella
@ 2024-05-29 10:43                         ` Alexander Graf
  2024-05-29 10:55                           ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Graf @ 2024-05-29 10:43 UTC (permalink / raw)
  To: Stefano Garzarella, Paolo Bonzini
  Cc: Alexander Graf, Dorjoy Chowdhury, virtualization, kvm, netdev,
	stefanha

On 29.05.24 10:04, Stefano Garzarella wrote:
>
> On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
>> On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella 
>> <sgarzare@redhat.com> wrote:
>>>
>>> On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
>>> >On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella 
>>> <sgarzare@redhat.com> wrote:
>>> >> >I think it's either that or implementing virtio-vsock in userspace
>>> >> 
>>> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
>>> >> >search for "To connect host<->guest").
>>> >>
>>> >> For in this case AF_VSOCK can't be used in the host, right?
>>> >> So it's similar to vhost-user-vsock.
>>> >
>>> >Not sure if I understand but in this case QEMU knows which CIDs are
>>> >forwarded to the host (either listen on vsock and connect to the host,
>>> >or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
>>> >involved.
>>>
>>> I meant that the application in the host that wants to connect to the
>>> guest cannot use AF_VSOCK in the host, but must use the one where QEMU
>>> is listening (e.g. AF_INET, AF_UNIX), right?
>>>
>>> I think one of Alex's requirements was that the application in the host
>>> continue to use AF_VSOCK as in their environment.
>>
>> Can the host use VMADDR_CID_LOCAL for host-to-host communication?
>
> Yep!
>
>> If
>> so, the proposed "-object vsock-forward" syntax can connect to it and
>> it should work as long as the application on the host does not assume
>> that it is on CID 3.
>
> Right, good point!
> We can also support something similar in vhost-user-vsock, where instead
> of using AF_UNIX and firecracker's hybrid vsock, we can redirect
> everything to VMADDR_CID_LOCAL.
>
> Alex what do you think? That would simplify things a lot to do.
> The only difference is that the application in the host has to talk to
> VMADDR_CID_LOCAL (1).

The application in the host would see an incoming connection from CID 1 
(which is probably fine) and would still be able to establish outgoing 
connections to the actual VM's CID as long as the Enclave doesn't check 
for the peer CID (I haven't seen anyone check yet). So yes, indeed, this 
should work.

The only case where I can see it breaking is when you run multiple 
Enclave VMs in parallel. In that case, each would try to listen to CID 3 
and the second that does would fail. But it's a well solvable problem: 
We could (in addition to the simple in-QEMU case) build an external 
daemon that does the proxying and hence owns CID3.

So the immediate plan would be to:

   1) Build a new vhost-vsock-forward object model that connects to 
vhost as CID 3 and then forwards every packet from CID 1 to the 
Enclave-CID and every packet that arrives on to CID 3 to CID 2.
   2) Create a machine option for -M nitro-enclave that automatically 
spawns the vhost-vsock-forward object. (default: off)

The above may need some fiddling with object creation times to ensure 
that the forward object gets CID 3, not the Enclave as auto-assigned CID.

Thanks,

Alex

Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-29 10:43                         ` Alexander Graf
@ 2024-05-29 10:55                           ` Stefano Garzarella
  2024-06-25 17:44                             ` Dorjoy Chowdhury
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-05-29 10:55 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paolo Bonzini, Alexander Graf, Dorjoy Chowdhury, virtualization,
	kvm, netdev, stefanha

On Wed, May 29, 2024 at 12:43:57PM GMT, Alexander Graf wrote:
>
>On 29.05.24 10:04, Stefano Garzarella wrote:
>>
>>On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
>>>On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella 
>>><sgarzare@redhat.com> wrote:
>>>>
>>>>On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
>>>>>On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella 
>>>><sgarzare@redhat.com> wrote:
>>>>>> >I think it's either that or implementing virtio-vsock in userspace
>>>>>> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
>>>>>> >search for "To connect host<->guest").
>>>>>>
>>>>>> For in this case AF_VSOCK can't be used in the host, right?
>>>>>> So it's similar to vhost-user-vsock.
>>>>>
>>>>>Not sure if I understand but in this case QEMU knows which CIDs are
>>>>>forwarded to the host (either listen on vsock and connect to the host,
>>>>>or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
>>>>>involved.
>>>>
>>>>I meant that the application in the host that wants to connect to the
>>>>guest cannot use AF_VSOCK in the host, but must use the one where QEMU
>>>>is listening (e.g. AF_INET, AF_UNIX), right?
>>>>
>>>>I think one of Alex's requirements was that the application in the host
>>>>continue to use AF_VSOCK as in their environment.
>>>
>>>Can the host use VMADDR_CID_LOCAL for host-to-host communication?
>>
>>Yep!
>>
>>>If
>>>so, the proposed "-object vsock-forward" syntax can connect to it and
>>>it should work as long as the application on the host does not assume
>>>that it is on CID 3.
>>
>>Right, good point!
>>We can also support something similar in vhost-user-vsock, where instead
>>of using AF_UNIX and firecracker's hybrid vsock, we can redirect
>>everything to VMADDR_CID_LOCAL.
>>
>>Alex what do you think? That would simplify things a lot to do.
>>The only difference is that the application in the host has to talk to
>>VMADDR_CID_LOCAL (1).
>
>
>The application in the host would see an incoming connection from CID 
>1 (which is probably fine) and would still be able to establish 
>outgoing connections to the actual VM's CID as long as the Enclave 
>doesn't check for the peer CID (I haven't seen anyone check yet). So 
>yes, indeed, this should work.
>
>The only case where I can see it breaking is when you run multiple 
>Enclave VMs in parallel. In that case, each would try to listen to CID 
>3 and the second that does would fail. But it's a well solvable 
>problem: We could (in addition to the simple in-QEMU case) build an 
>external daemon that does the proxying and hence owns CID3.

Well, we can modify vhost-user-vsock for that. It's already a daemon, 
already supports different VMs per single daemon but as of now they have 
to have different CIDs.

>
>So the immediate plan would be to:
>
>  1) Build a new vhost-vsock-forward object model that connects to 
>vhost as CID 3 and then forwards every packet from CID 1 to the 
>Enclave-CID and every packet that arrives on to CID 3 to CID 2.

This though requires writing completely from scratch the virtio-vsock 
emulation in QEMU. If you have time that would be great, otherwise if 
you want to do a PoC, my advice is to start with vhost-user-vsock which 
is already there.

Thanks,
Stefano

>  2) Create a machine option for -M nitro-enclave that automatically 
>spawns the vhost-vsock-forward object. (default: off)
>
>
>The above may need some fiddling with object creation times to ensure 
>that the forward object gets CID 3, not the Enclave as auto-assigned 
>CID.
>
>
>Thanks,
>
>Alex
>
>
>
>
>Amazon Web Services Development Center Germany GmbH
>Krausenstr. 38
>10117 Berlin
>Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
>Sitz: Berlin
>Ust-ID: DE 365 538 597


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-05-29 10:55                           ` Stefano Garzarella
@ 2024-06-25 17:44                             ` Dorjoy Chowdhury
  2024-06-26  8:37                               ` Stefano Garzarella
  0 siblings, 1 reply; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-06-25 17:44 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

Hey Stefano,

On Wed, May 29, 2024 at 4:56 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, May 29, 2024 at 12:43:57PM GMT, Alexander Graf wrote:
> >
> >On 29.05.24 10:04, Stefano Garzarella wrote:
> >>
> >>On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
> >>>On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella
> >>><sgarzare@redhat.com> wrote:
> >>>>
> >>>>On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
> >>>>>On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella
> >>>><sgarzare@redhat.com> wrote:
> >>>>>> >I think it's either that or implementing virtio-vsock in userspace
> >>>>>> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
> >>>>>> >search for "To connect host<->guest").
> >>>>>>
> >>>>>> For in this case AF_VSOCK can't be used in the host, right?
> >>>>>> So it's similar to vhost-user-vsock.
> >>>>>
> >>>>>Not sure if I understand but in this case QEMU knows which CIDs are
> >>>>>forwarded to the host (either listen on vsock and connect to the host,
> >>>>>or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
> >>>>>involved.
> >>>>
> >>>>I meant that the application in the host that wants to connect to the
> >>>>guest cannot use AF_VSOCK in the host, but must use the one where QEMU
> >>>>is listening (e.g. AF_INET, AF_UNIX), right?
> >>>>
> >>>>I think one of Alex's requirements was that the application in the host
> >>>>continue to use AF_VSOCK as in their environment.
> >>>
> >>>Can the host use VMADDR_CID_LOCAL for host-to-host communication?
> >>
> >>Yep!
> >>
> >>>If
> >>>so, the proposed "-object vsock-forward" syntax can connect to it and
> >>>it should work as long as the application on the host does not assume
> >>>that it is on CID 3.
> >>
> >>Right, good point!
> >>We can also support something similar in vhost-user-vsock, where instead
> >>of using AF_UNIX and firecracker's hybrid vsock, we can redirect
> >>everything to VMADDR_CID_LOCAL.
> >>
> >>Alex what do you think? That would simplify things a lot to do.
> >>The only difference is that the application in the host has to talk to
> >>VMADDR_CID_LOCAL (1).
> >
> >
> >The application in the host would see an incoming connection from CID
> >1 (which is probably fine) and would still be able to establish
> >outgoing connections to the actual VM's CID as long as the Enclave
> >doesn't check for the peer CID (I haven't seen anyone check yet). So
> >yes, indeed, this should work.
> >
> >The only case where I can see it breaking is when you run multiple
> >Enclave VMs in parallel. In that case, each would try to listen to CID
> >3 and the second that does would fail. But it's a well solvable
> >problem: We could (in addition to the simple in-QEMU case) build an
> >external daemon that does the proxying and hence owns CID3.
>
> Well, we can modify vhost-user-vsock for that. It's already a daemon,
> already supports different VMs per single daemon but as of now they have
> to have different CIDs.
>
> >
> >So the immediate plan would be to:
> >
> >  1) Build a new vhost-vsock-forward object model that connects to
> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>
> This though requires writing completely from scratch the virtio-vsock
> emulation in QEMU. If you have time that would be great, otherwise if
> you want to do a PoC, my advice is to start with vhost-user-vsock which
> is already there.
>

Can you give me some more details about how I can implement the
daemon? I would appreciate some pointers to code too.

Right now, the "nitro-enclave" machine type (wip) in QEMU
automatically spawns a VHOST_VSOCK device with the CID equal to the
"guest-cid" machine option. I think this is equivalent to using the
"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
need any change? I guess instead of "vhost-vsock-device", the
vhost-vsock device needs to be equivalent to "-device
vhost-user-vsock-device,guest-cid=N"?

The applications inside the nitro-enclave VM will still connect and
talk to CID 3. So on the daemon side, do we need to spawn a device
that has CID 3 and then forward everything this device receives to CID
1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
to the "guest-cid"? The applications that will be running in the host
need to be changed so that instead of connecting to the "guest-cid" of
the nitro-enclave VM, they will instead connect to VMADDR_CID_LOCAL.
Is my understanding correct?

BTW is there anything related to the "VMADDR_FLAG_TO_HOST" flag that
needs to be checked? I remember some discussion about it.

It would be great if you could give me some details about how I can
achieve the CID 3 <-> CID 2 communication using the vhost-user-vsock.
Is this https://github.com/stefano-garzarella/vhost-user-vsock where I
would need to add support for forwarding everything to
VMADDR_CID_LOCAL via an option maybe?

Thanks and Regards,
Dorjoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-06-25 17:44                             ` Dorjoy Chowdhury
@ 2024-06-26  8:37                               ` Stefano Garzarella
  2024-06-26 17:43                                 ` Dorjoy Chowdhury
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-06-26  8:37 UTC (permalink / raw)
  To: Dorjoy Chowdhury
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

Hi Dorjoy,

On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
>Hey Stefano,

[...]

>> >
>> >So the immediate plan would be to:
>> >
>> >  1) Build a new vhost-vsock-forward object model that connects to
>> >vhost as CID 3 and then forwards every packet from CID 1 to the
>> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>>
>> This though requires writing completely from scratch the virtio-vsock
>> emulation in QEMU. If you have time that would be great, otherwise if
>> you want to do a PoC, my advice is to start with vhost-user-vsock which
>> is already there.
>>
>
>Can you give me some more details about how I can implement the
>daemon? 

We already have a demon written in Rust, so I don't recommend you 
rewrite one from scratch, just start with that. You can find the daemon 
and instructions on how to use it with QEMU here [1].

>I would appreciate some pointers to code too.

I sent the pointer to it in my first reply [2].

>
>Right now, the "nitro-enclave" machine type (wip) in QEMU
>automatically spawns a VHOST_VSOCK device with the CID equal to the
>"guest-cid" machine option. I think this is equivalent to using the
>"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
>need any change? I guess instead of "vhost-vsock-device", the
>vhost-vsock device needs to be equivalent to "-device
>vhost-user-vsock-device,guest-cid=N"?

Nope, the vhost-user-vsock device requires just a `chardev` option.
The chardev points to the Unix socket used by QEMU to talk with the 
daemon. The daemon has a parameter to set the CID. See [1] for the 
examples.

>
>The applications inside the nitro-enclave VM will still connect and
>talk to CID 3. So on the daemon side, do we need to spawn a device
>that has CID 3 and then forward everything this device receives to CID
>1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
>to the "guest-cid"? 

Yep, I think this is right.
Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback` 
kernel module.

Before modifying the code, if you want to do some testing, perhaps you 
can use socat (which supports both UNIX-* and VSOCK-*). The daemon for 
now exposes two unix sockets, one is used to communicate with QEMU via 
the vhost-user protocol, and the other is to be used by the application 
to communicate with vsock sockets in the guest using the hybrid protocol 
defined by firecracker. So you could initiate a socat between the latter 
and VMADDR_CID_LOCAL, the only problem I see is that you have to send 
the first string provided by the hybrid protocol (CONNECT 1234), but for 
a PoC it should be ok.

I just tried the following and it works without touching any code:

shell1$ ./target/debug/vhost-device-vsock \
     --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock

shell2$ sudo modprobe vsock_loopback
shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock

shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
     -drive file=fedora40.qcow2,format=qcow2,if=virtio\
     -chardev socket,id=char0,path=/tmp/vhost3.socket \
     -device vhost-user-vsock-pci,chardev=char0 \
     -object memory-backend-memfd,id=mem,size=512M \
     -nographic

     guest$ nc --vsock -l 1234

shell4$ nc --vsock 1 1234
CONNECT 1234

     Note: the `CONNECT 1234` is required by the hybrid vsock protocol 
     defined by firecracker, so if we extend the vhost-device-vsock 
     daemon to forward packet to VMADDR_CID_LOCAL, that would not be 
     needed (including running socat).

This is just an example for how to use loopback, now if from the VM you 
want to connect to a CID other than 2, then we have to modify the daemon 
to do that.

>The applications that will be running in the host
>need to be changed so that instead of connecting to the "guest-cid" of
>the nitro-enclave VM, they will instead connect to VMADDR_CID_LOCAL.
>Is my understanding correct?

Yep.

>
>BTW is there anything related to the "VMADDR_FLAG_TO_HOST" flag that
>needs to be checked? I remember some discussion about it.

No, that flag is handled by the driver. If that flag is on, the driver 
forwards the packet to the host, regardless of the destination CID. So 
it has to be set by the application in the guest, but it should already 
do that since that flag was introduced just for Nitro enclaves.

>
>It would be great if you could give me some details about how I can
>achieve the CID 3 <-> CID 2 communication using the vhost-user-vsock.

CID 3 <-> CID 2 is the standard use case, right?
The readme in [1] contains several examples, let me know if you need 
more details ;-)

>Is this https://github.com/stefano-garzarella/vhost-user-vsock where I
>would need to add support for forwarding everything to
>VMADDR_CID_LOCAL via an option maybe?

Nope, that one was a PoC and the repo is archived, the daemon is [1].
BTW, I agree on the option for the forwarding.

Thanks,
Stefano

[1] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
[2] 
https://lore.kernel.org/virtualization/CAFfO_h5_uAwdNJB=fjrxb_pPiwRDQxaZn=OvR3yrYd+c18tUdQ@mail.gmail.com/T/#m4a50f94a5329cd262412437ac80a4f406404bf20

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-06-26  8:37                               ` Stefano Garzarella
@ 2024-06-26 17:43                                 ` Dorjoy Chowdhury
  2024-06-30 10:54                                   ` Dorjoy Chowdhury
  2024-07-02 11:58                                   ` Stefano Garzarella
  0 siblings, 2 replies; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-06-26 17:43 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

Hey Stefano,
Thanks a lot for all the details. I will look into them and reach out
if I need further input. Thanks! I have tried to summarize my
understanding below. Let me know if that sounds correct.

On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Dorjoy,
>
> On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
> >Hey Stefano,
>
> [...]
>
> >> >
> >> >So the immediate plan would be to:
> >> >
> >> >  1) Build a new vhost-vsock-forward object model that connects to
> >> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
> >>
> >> This though requires writing completely from scratch the virtio-vsock
> >> emulation in QEMU. If you have time that would be great, otherwise if
> >> you want to do a PoC, my advice is to start with vhost-user-vsock which
> >> is already there.
> >>
> >
> >Can you give me some more details about how I can implement the
> >daemon?
>
> We already have a demon written in Rust, so I don't recommend you
> rewrite one from scratch, just start with that. You can find the daemon
> and instructions on how to use it with QEMU here [1].
>
> >I would appreciate some pointers to code too.
>
> I sent the pointer to it in my first reply [2].
>
> >
> >Right now, the "nitro-enclave" machine type (wip) in QEMU
> >automatically spawns a VHOST_VSOCK device with the CID equal to the
> >"guest-cid" machine option. I think this is equivalent to using the
> >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
> >need any change? I guess instead of "vhost-vsock-device", the
> >vhost-vsock device needs to be equivalent to "-device
> >vhost-user-vsock-device,guest-cid=N"?
>
> Nope, the vhost-user-vsock device requires just a `chardev` option.
> The chardev points to the Unix socket used by QEMU to talk with the
> daemon. The daemon has a parameter to set the CID. See [1] for the
> examples.
>
> >
> >The applications inside the nitro-enclave VM will still connect and
> >talk to CID 3. So on the daemon side, do we need to spawn a device
> >that has CID 3 and then forward everything this device receives to CID
> >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
> >to the "guest-cid"?
>
> Yep, I think this is right.
> Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
> kernel module.
>
> Before modifying the code, if you want to do some testing, perhaps you
> can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
> now exposes two unix sockets, one is used to communicate with QEMU via
> the vhost-user protocol, and the other is to be used by the application
> to communicate with vsock sockets in the guest using the hybrid protocol
> defined by firecracker. So you could initiate a socat between the latter
> and VMADDR_CID_LOCAL, the only problem I see is that you have to send
> the first string provided by the hybrid protocol (CONNECT 1234), but for
> a PoC it should be ok.
>
> I just tried the following and it works without touching any code:
>
> shell1$ ./target/debug/vhost-device-vsock \
>      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>
> shell2$ sudo modprobe vsock_loopback
> shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>
> shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>      -chardev socket,id=char0,path=/tmp/vhost3.socket \
>      -device vhost-user-vsock-pci,chardev=char0 \
>      -object memory-backend-memfd,id=mem,size=512M \
>      -nographic
>
>      guest$ nc --vsock -l 1234
>
> shell4$ nc --vsock 1 1234
> CONNECT 1234
>
>      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>      defined by firecracker, so if we extend the vhost-device-vsock
>      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>      needed (including running socat).
>

Understood. Just trying to think out loud what the final UX will be
from the user perspective to successfully run a nitro VM before I try
to modify vhost-device-vsock to support forwarding to
VMADDR_CID_LOCAL.
I guess because the "vhost-user-vsock" device needs to be spawned
implicitly (without any explicit option) inside nitro-enclave in QEMU,
we now need to provide the "chardev" as a machine option, so the
nitro-enclave command would look something like below:
"./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
/path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
--enable-kvm -cpu host"
and then set the chardev id to the vhost-user-vsock device in the code
from the machine option.

The modified "vhost-device-vsock" would need to be run with the new
option that will forward everything to VMADDR_CID_LOCAL (below by the
"-z" I mean the new option)
"./target/debug/vhost-device-vsock -z --vm
guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
this means the guest-cid of the nitro VM is CID 5, right?

And the applications in the host would need to use VMADDR_CID_LOCAL
for communication instead of "guest-cid" (5) (assuming vsock_loopback
is modprobed). Let's say there are 2 applications inside the nitro VM
that connect to CID 3 on port 9000 and 9001. And the applications on
the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
commands above (qemu VM and vhost-device-vsock) are run, the
communication between the applications in the host and the
applications in the nitro VM on port 9000 and 9001 should just work,
right, without needing to run any extra socat commands or such? or
will the user still need to run some socat commands for all the
relevant ports (e.g.,9000 and 9001)?

I am just wondering what kind of changes are needed in
vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
that be something like this: the codepath that handles
"/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
connect to the same port using AF_VSOCK using the socket system calls
and messages received on that port in "/tmp/vm5.vsock" will be "send"
to the AF_VSOCK socket? or am I not thinking right and the
implementation would be something different entirely (change the CID
from 3 to 2 (or 1?) on the packets before they are handled then socat
will be needed probably)? Will this work if the applications in the
host want to connect to applications inside the nitro VM (as opposed
to applications inside the nitro VM connecting to CID 3)?

Thanks and Regards,
Dorjoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-06-26 17:43                                 ` Dorjoy Chowdhury
@ 2024-06-30 10:54                                   ` Dorjoy Chowdhury
  2024-07-02 12:05                                     ` Stefano Garzarella
  2024-07-02 11:58                                   ` Stefano Garzarella
  1 sibling, 1 reply; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-06-30 10:54 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

Hey Stefano,
Apart from my questions in my previous email, I have some others as well.

If the vhost-device-vsock modification to forward packets to
VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to
be set by any application in the guest? I understand that the flag is
set automatically in the listen path by the driver (ref:
https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117
), but from the comments in the referenced patch, I am guessing the
applications in the guest that will "connect" (as opposed to listen)
will need to set the flag in the application code? So does the
VMADDR_FLAG_TO_HOST flag need to be set by the applications in the
guest that will "connect" or should it work without it? I am asking
because the nitro-enclave VMs have an "init" which tries to connect to
CID 3 to send a "hello" on boot to let the parent VM know that it
booted expecting a "hello" reply but the init doesn't seem to set the
flag https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7
.

I was following
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication
to test if sibling communication works and it seems like I didn't need
to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering
why it works without any modification. Here is what I do:

shell1: ./vhost-device-vsock --vm
guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm
guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket

shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
-enable-kvm -m 8G -nic user,model=virtio -drive
file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio
--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device
vhost-user-vsock-pci,chardev=char0
    inside this guest I run: socat - VSOCK-LISTEN:9000

shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
-enable-kvm -m 8G -nic user,model=virtio -drive
file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio
--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device
vhost-user-vsock-pci,chardev=char0
    inside this guest I run: socat - VSOCK-CONNECT:3:9000

Then when I type something in the socat terminal of one VM and hit
'enter', they pop up in the socat terminal of the other VM. From the
documentation of the vhost-device-vsock, I thought I would need to
patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything
with socat. I simply did "sudo dnf install socat" in both VMs. I also
looked into the socat source code and I didn't see any reference to
"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you
know why it works without the flag?

On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury
<dorjoychy111@gmail.com> wrote:
>
> Hey Stefano,
> Thanks a lot for all the details. I will look into them and reach out
> if I need further input. Thanks! I have tried to summarize my
> understanding below. Let me know if that sounds correct.
>
> On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > Hi Dorjoy,
> >
> > On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
> > >Hey Stefano,
> >
> > [...]
> >
> > >> >
> > >> >So the immediate plan would be to:
> > >> >
> > >> >  1) Build a new vhost-vsock-forward object model that connects to
> > >> >vhost as CID 3 and then forwards every packet from CID 1 to the
> > >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
> > >>
> > >> This though requires writing completely from scratch the virtio-vsock
> > >> emulation in QEMU. If you have time that would be great, otherwise if
> > >> you want to do a PoC, my advice is to start with vhost-user-vsock which
> > >> is already there.
> > >>
> > >
> > >Can you give me some more details about how I can implement the
> > >daemon?
> >
> > We already have a demon written in Rust, so I don't recommend you
> > rewrite one from scratch, just start with that. You can find the daemon
> > and instructions on how to use it with QEMU here [1].
> >
> > >I would appreciate some pointers to code too.
> >
> > I sent the pointer to it in my first reply [2].
> >
> > >
> > >Right now, the "nitro-enclave" machine type (wip) in QEMU
> > >automatically spawns a VHOST_VSOCK device with the CID equal to the
> > >"guest-cid" machine option. I think this is equivalent to using the
> > >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
> > >need any change? I guess instead of "vhost-vsock-device", the
> > >vhost-vsock device needs to be equivalent to "-device
> > >vhost-user-vsock-device,guest-cid=N"?
> >
> > Nope, the vhost-user-vsock device requires just a `chardev` option.
> > The chardev points to the Unix socket used by QEMU to talk with the
> > daemon. The daemon has a parameter to set the CID. See [1] for the
> > examples.
> >
> > >
> > >The applications inside the nitro-enclave VM will still connect and
> > >talk to CID 3. So on the daemon side, do we need to spawn a device
> > >that has CID 3 and then forward everything this device receives to CID
> > >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
> > >to the "guest-cid"?
> >
> > Yep, I think this is right.
> > Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
> > kernel module.
> >
> > Before modifying the code, if you want to do some testing, perhaps you
> > can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
> > now exposes two unix sockets, one is used to communicate with QEMU via
> > the vhost-user protocol, and the other is to be used by the application
> > to communicate with vsock sockets in the guest using the hybrid protocol
> > defined by firecracker. So you could initiate a socat between the latter
> > and VMADDR_CID_LOCAL, the only problem I see is that you have to send
> > the first string provided by the hybrid protocol (CONNECT 1234), but for
> > a PoC it should be ok.
> >
> > I just tried the following and it works without touching any code:
> >
> > shell1$ ./target/debug/vhost-device-vsock \
> >      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
> >
> > shell2$ sudo modprobe vsock_loopback
> > shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
> >
> > shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> >      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
> >      -chardev socket,id=char0,path=/tmp/vhost3.socket \
> >      -device vhost-user-vsock-pci,chardev=char0 \
> >      -object memory-backend-memfd,id=mem,size=512M \
> >      -nographic
> >
> >      guest$ nc --vsock -l 1234
> >
> > shell4$ nc --vsock 1 1234
> > CONNECT 1234
> >
> >      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
> >      defined by firecracker, so if we extend the vhost-device-vsock
> >      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
> >      needed (including running socat).
> >
>
> Understood. Just trying to think out loud what the final UX will be
> from the user perspective to successfully run a nitro VM before I try
> to modify vhost-device-vsock to support forwarding to
> VMADDR_CID_LOCAL.
> I guess because the "vhost-user-vsock" device needs to be spawned
> implicitly (without any explicit option) inside nitro-enclave in QEMU,
> we now need to provide the "chardev" as a machine option, so the
> nitro-enclave command would look something like below:
> "./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
> /path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
> --enable-kvm -cpu host"
> and then set the chardev id to the vhost-user-vsock device in the code
> from the machine option.
>
> The modified "vhost-device-vsock" would need to be run with the new
> option that will forward everything to VMADDR_CID_LOCAL (below by the
> "-z" I mean the new option)
> "./target/debug/vhost-device-vsock -z --vm
> guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
> this means the guest-cid of the nitro VM is CID 5, right?
>
> And the applications in the host would need to use VMADDR_CID_LOCAL
> for communication instead of "guest-cid" (5) (assuming vsock_loopback
> is modprobed). Let's say there are 2 applications inside the nitro VM
> that connect to CID 3 on port 9000 and 9001. And the applications on
> the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
> commands above (qemu VM and vhost-device-vsock) are run, the
> communication between the applications in the host and the
> applications in the nitro VM on port 9000 and 9001 should just work,
> right, without needing to run any extra socat commands or such? or
> will the user still need to run some socat commands for all the
> relevant ports (e.g.,9000 and 9001)?
>
> I am just wondering what kind of changes are needed in
> vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
> that be something like this: the codepath that handles
> "/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
> VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
> connect to the same port using AF_VSOCK using the socket system calls
> and messages received on that port in "/tmp/vm5.vsock" will be "send"
> to the AF_VSOCK socket? or am I not thinking right and the
> implementation would be something different entirely (change the CID
> from 3 to 2 (or 1?) on the packets before they are handled then socat
> will be needed probably)? Will this work if the applications in the
> host want to connect to applications inside the nitro VM (as opposed
> to applications inside the nitro VM connecting to CID 3)?
>
> Thanks and Regards,
> Dorjoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-06-26 17:43                                 ` Dorjoy Chowdhury
  2024-06-30 10:54                                   ` Dorjoy Chowdhury
@ 2024-07-02 11:58                                   ` Stefano Garzarella
  1 sibling, 0 replies; 23+ messages in thread
From: Stefano Garzarella @ 2024-07-02 11:58 UTC (permalink / raw)
  To: Dorjoy Chowdhury
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

On Wed, Jun 26, 2024 at 11:43:25PM GMT, Dorjoy Chowdhury wrote:
>Hey Stefano,
>Thanks a lot for all the details. I will look into them and reach out
>if I need further input. Thanks! I have tried to summarize my
>understanding below. Let me know if that sounds correct.
>
>On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> Hi Dorjoy,
>>
>> On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
>> >Hey Stefano,
>>
>> [...]
>>
>> >> >
>> >> >So the immediate plan would be to:
>> >> >
>> >> >  1) Build a new vhost-vsock-forward object model that connects to
>> >> >vhost as CID 3 and then forwards every packet from CID 1 to the
>> >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>> >>
>> >> This though requires writing completely from scratch the virtio-vsock
>> >> emulation in QEMU. If you have time that would be great, otherwise if
>> >> you want to do a PoC, my advice is to start with vhost-user-vsock which
>> >> is already there.
>> >>
>> >
>> >Can you give me some more details about how I can implement the
>> >daemon?
>>
>> We already have a demon written in Rust, so I don't recommend you
>> rewrite one from scratch, just start with that. You can find the daemon
>> and instructions on how to use it with QEMU here [1].
>>
>> >I would appreciate some pointers to code too.
>>
>> I sent the pointer to it in my first reply [2].
>>
>> >
>> >Right now, the "nitro-enclave" machine type (wip) in QEMU
>> >automatically spawns a VHOST_VSOCK device with the CID equal to the
>> >"guest-cid" machine option. I think this is equivalent to using the
>> >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
>> >need any change? I guess instead of "vhost-vsock-device", the
>> >vhost-vsock device needs to be equivalent to "-device
>> >vhost-user-vsock-device,guest-cid=N"?
>>
>> Nope, the vhost-user-vsock device requires just a `chardev` option.
>> The chardev points to the Unix socket used by QEMU to talk with the
>> daemon. The daemon has a parameter to set the CID. See [1] for the
>> examples.
>>
>> >
>> >The applications inside the nitro-enclave VM will still connect and
>> >talk to CID 3. So on the daemon side, do we need to spawn a device
>> >that has CID 3 and then forward everything this device receives to CID
>> >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
>> >to the "guest-cid"?
>>
>> Yep, I think this is right.
>> Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
>> kernel module.
>>
>> Before modifying the code, if you want to do some testing, perhaps you
>> can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
>> now exposes two unix sockets, one is used to communicate with QEMU via
>> the vhost-user protocol, and the other is to be used by the application
>> to communicate with vsock sockets in the guest using the hybrid protocol
>> defined by firecracker. So you could initiate a socat between the latter
>> and VMADDR_CID_LOCAL, the only problem I see is that you have to send
>> the first string provided by the hybrid protocol (CONNECT 1234), but for
>> a PoC it should be ok.
>>
>> I just tried the following and it works without touching any code:
>>
>> shell1$ ./target/debug/vhost-device-vsock \
>>      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>>
>> shell2$ sudo modprobe vsock_loopback
>> shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>>
>> shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>>      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>>      -chardev socket,id=char0,path=/tmp/vhost3.socket \
>>      -device vhost-user-vsock-pci,chardev=char0 \
>>      -object memory-backend-memfd,id=mem,size=512M \
>>      -nographic
>>
>>      guest$ nc --vsock -l 1234
>>
>> shell4$ nc --vsock 1 1234
>> CONNECT 1234
>>
>>      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>>      defined by firecracker, so if we extend the vhost-device-vsock
>>      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>>      needed (including running socat).
>>
>
>Understood. Just trying to think out loud what the final UX will be
>from the user perspective to successfully run a nitro VM before I try
>to modify vhost-device-vsock to support forwarding to
>VMADDR_CID_LOCAL.
>I guess because the "vhost-user-vsock" device needs to be spawned
>implicitly (without any explicit option) inside nitro-enclave in QEMU,
>we now need to provide the "chardev" as a machine option, so the
>nitro-enclave command would look something like below:
>"./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
>/path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
>--enable-kvm -cpu host"
>and then set the chardev id to the vhost-user-vsock device in the code
>from the machine option.

Yep, it looks like an option. Maybe we can have
     -M nitro-enclave,vhost-user-vsock=char0

>
>The modified "vhost-device-vsock" would need to be run with the new
>option that will forward everything to VMADDR_CID_LOCAL (below by the
>"-z" I mean the new option)
>"./target/debug/vhost-device-vsock -z --vm

IMHO the new option should be part of the --vm group (please avoid short 
options) in conflict with `uds-path`. We may also need a parameter, e.g.  
the CID where forward them.
Something like this:
     --vm guest-cid=5,forward-cid=1,socket=/tmp/vhost5.socket

>guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
>this means the guest-cid of the nitro VM is CID 5, right?

Yep.

>
>And the applications in the host would need to use VMADDR_CID_LOCAL
>for communication instead of "guest-cid" (5) (assuming vsock_loopback
>is modprobed). Let's say there are 2 applications inside the nitro VM
>that connect to CID 3 on port 9000 and 9001. And the applications on
>the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
>commands above (qemu VM and vhost-device-vsock) are run, the
>communication between the applications in the host and the
>applications in the nitro VM on port 9000 and 9001 should just work,
>right, without needing to run any extra socat commands or such? 

Right.

>or
>will the user still need to run some socat commands for all the
>relevant ports (e.g.,9000 and 9001)?

Nope, the "socat" work should be done by the vhost-device-vsock daemon.

>
>I am just wondering what kind of changes are needed in
>vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
>that be something like this: the codepath that handles
>"/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
>VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
>connect to the same port using AF_VSOCK using the socket system calls
>and messages received on that port in "/tmp/vm5.vsock" will be "send"
>to the AF_VSOCK socket? or am I not thinking right and the
>implementation would be something different entirely (change the CID
>from 3 to 2 (or 1?) on the packets before they are handled then socat
>will be needed probably)? 

I think you're right.

>Will this work if the applications in the
>host want to connect to applications inside the nitro VM (as opposed
>to applications inside the nitro VM connecting to CID 3)?

Nope, if you know in advance which ports to expose, you can add another 
parameter to expose them, so the daemon con listen on the address 
(cid=1, port=1234) and forwards the connections to the guest.

Stefano


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-06-30 10:54                                   ` Dorjoy Chowdhury
@ 2024-07-02 12:05                                     ` Stefano Garzarella
  2024-07-02 14:26                                       ` Dorjoy Chowdhury
  0 siblings, 1 reply; 23+ messages in thread
From: Stefano Garzarella @ 2024-07-02 12:05 UTC (permalink / raw)
  To: Dorjoy Chowdhury
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

On Sun, Jun 30, 2024 at 04:54:18PM GMT, Dorjoy Chowdhury wrote:
>Hey Stefano,
>Apart from my questions in my previous email, I have some others as well.
>
>If the vhost-device-vsock modification to forward packets to
>VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to
>be set by any application in the guest? I understand that the flag is
>set automatically in the listen path by the driver (ref:
>https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117
>), but from the comments in the referenced patch, I am guessing the
>applications in the guest that will "connect" (as opposed to listen)
>will need to set the flag in the application code? So does the
>VMADDR_FLAG_TO_HOST flag need to be set by the applications in the
>guest that will "connect" or should it work without it? I am asking
>because the nitro-enclave VMs have an "init" which tries to connect to
>CID 3 to send a "hello" on boot to let the parent VM know that it
>booted expecting a "hello" reply but the init doesn't seem to set the
>flag https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7

Looking at af_vsock.c code, it looks like that if we don't have any
H2G transports (e.g. vhost-vsock) loaded in the VM (this is loaded for 
nested VMs, so I guess for nitro-enclave VM this should not be the 
case), the packets are forwarded to the host in any case.

See 
https://elixir.bootlin.com/linux/latest/source/net/vmw_vsock/af_vsock.c#L469

>.
>
>I was following
>https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication
>to test if sibling communication works and it seems like I didn't need
>to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering
>why it works without any modification. Here is what I do:
>
>shell1: ./vhost-device-vsock --vm
>guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm
>guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket
>
>shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
>-enable-kvm -m 8G -nic user,model=virtio -drive
>file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio
>--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
>socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device
>vhost-user-vsock-pci,chardev=char0
>    inside this guest I run: socat - VSOCK-LISTEN:9000
>
>shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
>-enable-kvm -m 8G -nic user,model=virtio -drive
>file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio
>--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
>socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device
>vhost-user-vsock-pci,chardev=char0
>    inside this guest I run: socat - VSOCK-CONNECT:3:9000
>
>Then when I type something in the socat terminal of one VM and hit
>'enter', they pop up in the socat terminal of the other VM. From the
>documentation of the vhost-device-vsock, I thought I would need to
>patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything
>with socat. I simply did "sudo dnf install socat" in both VMs. I also
>looked into the socat source code and I didn't see any reference to
>"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you
>know why it works without the flag?

Yep, so the driver will forward them if the H2G transport is not loaded, 
like in your case. So if you set VMADDR_FLAG_TO_HOST you are sure that 
it is always forwarded to the host, if you don't set it, it is forwarded 
only if you don't have a nested VM using vhost-vsock. In that case we 
don't know how to differentiate the case of communication with a nested 
guest or a sibling guest, for this reason we added the flag.

If the host uses vhost-vsock, that packets are discarded, but for 
vhost-device-vsock, we are handling them.

Hope this clarify.

Stefano

>
>On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury
><dorjoychy111@gmail.com> wrote:
>>
>> Hey Stefano,
>> Thanks a lot for all the details. I will look into them and reach out
>> if I need further input. Thanks! I have tried to summarize my
>> understanding below. Let me know if that sounds correct.
>>
>> On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > Hi Dorjoy,
>> >
>> > On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
>> > >Hey Stefano,
>> >
>> > [...]
>> >
>> > >> >
>> > >> >So the immediate plan would be to:
>> > >> >
>> > >> >  1) Build a new vhost-vsock-forward object model that connects to
>> > >> >vhost as CID 3 and then forwards every packet from CID 1 to the
>> > >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>> > >>
>> > >> This though requires writing completely from scratch the virtio-vsock
>> > >> emulation in QEMU. If you have time that would be great, otherwise if
>> > >> you want to do a PoC, my advice is to start with vhost-user-vsock which
>> > >> is already there.
>> > >>
>> > >
>> > >Can you give me some more details about how I can implement the
>> > >daemon?
>> >
>> > We already have a demon written in Rust, so I don't recommend you
>> > rewrite one from scratch, just start with that. You can find the daemon
>> > and instructions on how to use it with QEMU here [1].
>> >
>> > >I would appreciate some pointers to code too.
>> >
>> > I sent the pointer to it in my first reply [2].
>> >
>> > >
>> > >Right now, the "nitro-enclave" machine type (wip) in QEMU
>> > >automatically spawns a VHOST_VSOCK device with the CID equal to the
>> > >"guest-cid" machine option. I think this is equivalent to using the
>> > >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
>> > >need any change? I guess instead of "vhost-vsock-device", the
>> > >vhost-vsock device needs to be equivalent to "-device
>> > >vhost-user-vsock-device,guest-cid=N"?
>> >
>> > Nope, the vhost-user-vsock device requires just a `chardev` option.
>> > The chardev points to the Unix socket used by QEMU to talk with the
>> > daemon. The daemon has a parameter to set the CID. See [1] for the
>> > examples.
>> >
>> > >
>> > >The applications inside the nitro-enclave VM will still connect and
>> > >talk to CID 3. So on the daemon side, do we need to spawn a device
>> > >that has CID 3 and then forward everything this device receives to CID
>> > >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
>> > >to the "guest-cid"?
>> >
>> > Yep, I think this is right.
>> > Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
>> > kernel module.
>> >
>> > Before modifying the code, if you want to do some testing, perhaps you
>> > can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
>> > now exposes two unix sockets, one is used to communicate with QEMU via
>> > the vhost-user protocol, and the other is to be used by the application
>> > to communicate with vsock sockets in the guest using the hybrid protocol
>> > defined by firecracker. So you could initiate a socat between the latter
>> > and VMADDR_CID_LOCAL, the only problem I see is that you have to send
>> > the first string provided by the hybrid protocol (CONNECT 1234), but for
>> > a PoC it should be ok.
>> >
>> > I just tried the following and it works without touching any code:
>> >
>> > shell1$ ./target/debug/vhost-device-vsock \
>> >      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>> >
>> > shell2$ sudo modprobe vsock_loopback
>> > shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>> >
>> > shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>> >      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>> >      -chardev socket,id=char0,path=/tmp/vhost3.socket \
>> >      -device vhost-user-vsock-pci,chardev=char0 \
>> >      -object memory-backend-memfd,id=mem,size=512M \
>> >      -nographic
>> >
>> >      guest$ nc --vsock -l 1234
>> >
>> > shell4$ nc --vsock 1 1234
>> > CONNECT 1234
>> >
>> >      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>> >      defined by firecracker, so if we extend the vhost-device-vsock
>> >      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>> >      needed (including running socat).
>> >
>>
>> Understood. Just trying to think out loud what the final UX will be
>> from the user perspective to successfully run a nitro VM before I try
>> to modify vhost-device-vsock to support forwarding to
>> VMADDR_CID_LOCAL.
>> I guess because the "vhost-user-vsock" device needs to be spawned
>> implicitly (without any explicit option) inside nitro-enclave in QEMU,
>> we now need to provide the "chardev" as a machine option, so the
>> nitro-enclave command would look something like below:
>> "./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
>> /path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
>> --enable-kvm -cpu host"
>> and then set the chardev id to the vhost-user-vsock device in the code
>> from the machine option.
>>
>> The modified "vhost-device-vsock" would need to be run with the new
>> option that will forward everything to VMADDR_CID_LOCAL (below by the
>> "-z" I mean the new option)
>> "./target/debug/vhost-device-vsock -z --vm
>> guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
>> this means the guest-cid of the nitro VM is CID 5, right?
>>
>> And the applications in the host would need to use VMADDR_CID_LOCAL
>> for communication instead of "guest-cid" (5) (assuming vsock_loopback
>> is modprobed). Let's say there are 2 applications inside the nitro VM
>> that connect to CID 3 on port 9000 and 9001. And the applications on
>> the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
>> commands above (qemu VM and vhost-device-vsock) are run, the
>> communication between the applications in the host and the
>> applications in the nitro VM on port 9000 and 9001 should just work,
>> right, without needing to run any extra socat commands or such? or
>> will the user still need to run some socat commands for all the
>> relevant ports (e.g.,9000 and 9001)?
>>
>> I am just wondering what kind of changes are needed in
>> vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
>> that be something like this: the codepath that handles
>> "/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
>> VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
>> connect to the same port using AF_VSOCK using the socket system calls
>> and messages received on that port in "/tmp/vm5.vsock" will be "send"
>> to the AF_VSOCK socket? or am I not thinking right and the
>> implementation would be something different entirely (change the CID
>> from 3 to 2 (or 1?) on the packets before they are handled then socat
>> will be needed probably)? Will this work if the applications in the
>> host want to connect to applications inside the nitro VM (as opposed
>> to applications inside the nitro VM connecting to CID 3)?
>>
>> Thanks and Regards,
>> Dorjoy
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to implement message forwarding from one CID to another in vhost driver
  2024-07-02 12:05                                     ` Stefano Garzarella
@ 2024-07-02 14:26                                       ` Dorjoy Chowdhury
  0 siblings, 0 replies; 23+ messages in thread
From: Dorjoy Chowdhury @ 2024-07-02 14:26 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Alexander Graf, Paolo Bonzini, Alexander Graf, virtualization,
	kvm, netdev, stefanha

Hey Stefano,
Thanks a lot for all the details. I guess my next step is to try to
implement the forwarding logic in vhost-device-vsock and take it from
there.

Regards,
Dorjoy

On Tue, Jul 2, 2024 at 6:05 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Sun, Jun 30, 2024 at 04:54:18PM GMT, Dorjoy Chowdhury wrote:
> >Hey Stefano,
> >Apart from my questions in my previous email, I have some others as well.
> >
> >If the vhost-device-vsock modification to forward packets to
> >VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to
> >be set by any application in the guest? I understand that the flag is
> >set automatically in the listen path by the driver (ref:
> >https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117
> >), but from the comments in the referenced patch, I am guessing the
> >applications in the guest that will "connect" (as opposed to listen)
> >will need to set the flag in the application code? So does the
> >VMADDR_FLAG_TO_HOST flag need to be set by the applications in the
> >guest that will "connect" or should it work without it? I am asking
> >because the nitro-enclave VMs have an "init" which tries to connect to
> >CID 3 to send a "hello" on boot to let the parent VM know that it
> >booted expecting a "hello" reply but the init doesn't seem to set the
> >flag https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7
>
> Looking at af_vsock.c code, it looks like that if we don't have any
> H2G transports (e.g. vhost-vsock) loaded in the VM (this is loaded for
> nested VMs, so I guess for nitro-enclave VM this should not be the
> case), the packets are forwarded to the host in any case.
>
> See
> https://elixir.bootlin.com/linux/latest/source/net/vmw_vsock/af_vsock.c#L469
>
> >.
> >
> >I was following
> >https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication
> >to test if sibling communication works and it seems like I didn't need
> >to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering
> >why it works without any modification. Here is what I do:
> >
> >shell1: ./vhost-device-vsock --vm
> >guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm
> >guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket
> >
> >shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
> >-enable-kvm -m 8G -nic user,model=virtio -drive
> >file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio
> >--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
> >socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device
> >vhost-user-vsock-pci,chardev=char0
> >    inside this guest I run: socat - VSOCK-LISTEN:9000
> >
> >shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
> >-enable-kvm -m 8G -nic user,model=virtio -drive
> >file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio
> >--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
> >socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device
> >vhost-user-vsock-pci,chardev=char0
> >    inside this guest I run: socat - VSOCK-CONNECT:3:9000
> >
> >Then when I type something in the socat terminal of one VM and hit
> >'enter', they pop up in the socat terminal of the other VM. From the
> >documentation of the vhost-device-vsock, I thought I would need to
> >patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything
> >with socat. I simply did "sudo dnf install socat" in both VMs. I also
> >looked into the socat source code and I didn't see any reference to
> >"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you
> >know why it works without the flag?
>
> Yep, so the driver will forward them if the H2G transport is not loaded,
> like in your case. So if you set VMADDR_FLAG_TO_HOST you are sure that
> it is always forwarded to the host, if you don't set it, it is forwarded
> only if you don't have a nested VM using vhost-vsock. In that case we
> don't know how to differentiate the case of communication with a nested
> guest or a sibling guest, for this reason we added the flag.
>
> If the host uses vhost-vsock, that packets are discarded, but for
> vhost-device-vsock, we are handling them.
>
> Hope this clarify.
>
> Stefano
>
> >
> >On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury
> ><dorjoychy111@gmail.com> wrote:
> >>
> >> Hey Stefano,
> >> Thanks a lot for all the details. I will look into them and reach out
> >> if I need further input. Thanks! I have tried to summarize my
> >> understanding below. Let me know if that sounds correct.
> >>
> >> On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >
> >> > Hi Dorjoy,
> >> >
> >> > On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
> >> > >Hey Stefano,
> >> >
> >> > [...]
> >> >
> >> > >> >
> >> > >> >So the immediate plan would be to:
> >> > >> >
> >> > >> >  1) Build a new vhost-vsock-forward object model that connects to
> >> > >> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >> > >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
> >> > >>
> >> > >> This though requires writing completely from scratch the virtio-vsock
> >> > >> emulation in QEMU. If you have time that would be great, otherwise if
> >> > >> you want to do a PoC, my advice is to start with vhost-user-vsock which
> >> > >> is already there.
> >> > >>
> >> > >
> >> > >Can you give me some more details about how I can implement the
> >> > >daemon?
> >> >
> >> > We already have a demon written in Rust, so I don't recommend you
> >> > rewrite one from scratch, just start with that. You can find the daemon
> >> > and instructions on how to use it with QEMU here [1].
> >> >
> >> > >I would appreciate some pointers to code too.
> >> >
> >> > I sent the pointer to it in my first reply [2].
> >> >
> >> > >
> >> > >Right now, the "nitro-enclave" machine type (wip) in QEMU
> >> > >automatically spawns a VHOST_VSOCK device with the CID equal to the
> >> > >"guest-cid" machine option. I think this is equivalent to using the
> >> > >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
> >> > >need any change? I guess instead of "vhost-vsock-device", the
> >> > >vhost-vsock device needs to be equivalent to "-device
> >> > >vhost-user-vsock-device,guest-cid=N"?
> >> >
> >> > Nope, the vhost-user-vsock device requires just a `chardev` option.
> >> > The chardev points to the Unix socket used by QEMU to talk with the
> >> > daemon. The daemon has a parameter to set the CID. See [1] for the
> >> > examples.
> >> >
> >> > >
> >> > >The applications inside the nitro-enclave VM will still connect and
> >> > >talk to CID 3. So on the daemon side, do we need to spawn a device
> >> > >that has CID 3 and then forward everything this device receives to CID
> >> > >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
> >> > >to the "guest-cid"?
> >> >
> >> > Yep, I think this is right.
> >> > Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
> >> > kernel module.
> >> >
> >> > Before modifying the code, if you want to do some testing, perhaps you
> >> > can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
> >> > now exposes two unix sockets, one is used to communicate with QEMU via
> >> > the vhost-user protocol, and the other is to be used by the application
> >> > to communicate with vsock sockets in the guest using the hybrid protocol
> >> > defined by firecracker. So you could initiate a socat between the latter
> >> > and VMADDR_CID_LOCAL, the only problem I see is that you have to send
> >> > the first string provided by the hybrid protocol (CONNECT 1234), but for
> >> > a PoC it should be ok.
> >> >
> >> > I just tried the following and it works without touching any code:
> >> >
> >> > shell1$ ./target/debug/vhost-device-vsock \
> >> >      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
> >> >
> >> > shell2$ sudo modprobe vsock_loopback
> >> > shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
> >> >
> >> > shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> >> >      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
> >> >      -chardev socket,id=char0,path=/tmp/vhost3.socket \
> >> >      -device vhost-user-vsock-pci,chardev=char0 \
> >> >      -object memory-backend-memfd,id=mem,size=512M \
> >> >      -nographic
> >> >
> >> >      guest$ nc --vsock -l 1234
> >> >
> >> > shell4$ nc --vsock 1 1234
> >> > CONNECT 1234
> >> >
> >> >      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
> >> >      defined by firecracker, so if we extend the vhost-device-vsock
> >> >      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
> >> >      needed (including running socat).
> >> >
> >>
> >> Understood. Just trying to think out loud what the final UX will be
> >> from the user perspective to successfully run a nitro VM before I try
> >> to modify vhost-device-vsock to support forwarding to
> >> VMADDR_CID_LOCAL.
> >> I guess because the "vhost-user-vsock" device needs to be spawned
> >> implicitly (without any explicit option) inside nitro-enclave in QEMU,
> >> we now need to provide the "chardev" as a machine option, so the
> >> nitro-enclave command would look something like below:
> >> "./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
> >> /path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
> >> --enable-kvm -cpu host"
> >> and then set the chardev id to the vhost-user-vsock device in the code
> >> from the machine option.
> >>
> >> The modified "vhost-device-vsock" would need to be run with the new
> >> option that will forward everything to VMADDR_CID_LOCAL (below by the
> >> "-z" I mean the new option)
> >> "./target/debug/vhost-device-vsock -z --vm
> >> guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
> >> this means the guest-cid of the nitro VM is CID 5, right?
> >>
> >> And the applications in the host would need to use VMADDR_CID_LOCAL
> >> for communication instead of "guest-cid" (5) (assuming vsock_loopback
> >> is modprobed). Let's say there are 2 applications inside the nitro VM
> >> that connect to CID 3 on port 9000 and 9001. And the applications on
> >> the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
> >> commands above (qemu VM and vhost-device-vsock) are run, the
> >> communication between the applications in the host and the
> >> applications in the nitro VM on port 9000 and 9001 should just work,
> >> right, without needing to run any extra socat commands or such? or
> >> will the user still need to run some socat commands for all the
> >> relevant ports (e.g.,9000 and 9001)?
> >>
> >> I am just wondering what kind of changes are needed in
> >> vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
> >> that be something like this: the codepath that handles
> >> "/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
> >> VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
> >> connect to the same port using AF_VSOCK using the socket system calls
> >> and messages received on that port in "/tmp/vm5.vsock" will be "send"
> >> to the AF_VSOCK socket? or am I not thinking right and the
> >> implementation would be something different entirely (change the CID
> >> from 3 to 2 (or 1?) on the packets before they are handled then socat
> >> will be needed probably)? Will this work if the applications in the
> >> host want to connect to applications inside the nitro VM (as opposed
> >> to applications inside the nitro VM connecting to CID 3)?
> >>
> >> Thanks and Regards,
> >> Dorjoy
> >
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-07-02 14:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-18 10:17 How to implement message forwarding from one CID to another in vhost driver Dorjoy Chowdhury
2024-05-20  8:55 ` Stefano Garzarella
2024-05-20 10:44   ` Dorjoy Chowdhury
2024-05-21  5:50     ` Alexander Graf
2024-05-23  8:45       ` Stefano Garzarella
2024-05-27  7:08         ` Alexander Graf
2024-05-27  7:54           ` Alexander Graf
2024-05-28 14:43             ` Stefano Garzarella
2024-05-28 15:19             ` Paolo Bonzini
2024-05-28 15:41               ` Stefano Garzarella
2024-05-28 15:49                 ` Paolo Bonzini
2024-05-28 15:53                   ` Stefano Garzarella
2024-05-28 16:38                     ` Paolo Bonzini
2024-05-29  8:04                       ` Stefano Garzarella
2024-05-29 10:43                         ` Alexander Graf
2024-05-29 10:55                           ` Stefano Garzarella
2024-06-25 17:44                             ` Dorjoy Chowdhury
2024-06-26  8:37                               ` Stefano Garzarella
2024-06-26 17:43                                 ` Dorjoy Chowdhury
2024-06-30 10:54                                   ` Dorjoy Chowdhury
2024-07-02 12:05                                     ` Stefano Garzarella
2024-07-02 14:26                                       ` Dorjoy Chowdhury
2024-07-02 11:58                                   ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).