Using virtio as a physical (wire-level) transport

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Using virtio as a physical (wire-level) transport
@ 2010-08-04 23:04 Ira W. Snyder
  2010-08-05 21:30 ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: Ira W. Snyder @ 2010-08-04 23:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Rusty Russell; +Cc: netdev, Zang Roy, virtualization

Hello Michael, Rusty,

I'm trying to figure out how to use virtio-net and vhost-net to
communicate over a physical transport (PCI bus) instead of shared
memory (for example, qemu/kvm guest).

We've talked about this several times in the past, and I currently have
some time to devote to this again. I'm trying to figure out if virtio is
still a viable solution, or if it has been evolved such that it is
unusable for this application.

I am trying to create a generic system to allow the  type of
communications described below. I would like to create something that
can be easily ported to any slave computer which meets the following
requirements:

1) it is a PCI slave (agent) (it acts like any other PCI card)
2) it has an inter-processor communications mechanism
3) it has a DMA engine

There is a reasonable amount of demand for such a system. I get
inquiries about the prototype code I posted to linux-netdev at least
once a month. This sort of system is used regularly in the
telecommunications industry, among others.

Here is a quick drawing of the system I work with. Please forgive my
poor ascii art skills.

+-----------------+
| master computer |
|                 |                             +-------------------+
| PCI slot #1     | <-- physical connection --> | slave computer #1 |
| virtio-net if#1 |                             | vhost-net if#1    |
|                 |                             +-------------------+
|                 |
|                 |                             +-------------------+
| PCI slot #2     | <-- physical connection --> | slave computer #2 |
| virtio-net if#2 |                             | vhost-net if#2    |
|                 |                             +-------------------+
|                 |
|                 |                             +-------------------+
| PCI slot #n     | <-- physical connection --> | slave computer #n |
| virtio-net if#n |                             | vhost-net if#n    |
|                 |                             +-------------------+
+-----------------+

The reason for using vhost-net on the "slave" side is because vhost-net
is the component that performs data copies. In most cases, the slave
computers are non-x86 and have DMA controllers. DMA is an absolute
necessity when copying data across the PCI bus.

Do you think virtio is a viable solution to solve this problem? If not,
can you suggest anything else?

Another reason I ask this question is that I have previously invested
several months implementing a similar solution, only to have it outright
rejected for "not being the right way". If you don't think something
like this has any hope, I'd rather not waste another month of my life.
If you can think of a solution that is likely to be "the right way", I'd
rather you told me before I implement any code.

Making my life harder since the last time I tried this, mainline commit
7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
possibility of using an alternative virtqueue implementation. The commit
message suggests that you might be willing to add this capability back.
Would this be an option?

Thanks for your time,
Ira

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-04 23:04 Using virtio as a physical (wire-level) transport Ira W. Snyder
@ 2010-08-05 21:30 ` Michael S. Tsirkin
  2010-08-05 23:01   ` Ira W. Snyder
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2010-08-05 21:30 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: Rusty Russell, virtualization, Zang Roy, netdev

Hi Ira,

> Making my life harder since the last time I tried this, mainline commit
> 7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
> possibility of using an alternative virtqueue implementation. The commit
> message suggests that you might be willing to add this capability back.
> Would this be an option?

Sorry about that.

With respect to this commit, we only had one implementation upstream
and extra levels of indirection made extending the API
much harder for no apparent benefit.

When there's more than one ring implementation with very small amount of
common code, I think that it might make sense to readd the indirection
back, to separate the code cleanly.

OTOH if the two implementations share a lot of code, I think that it
might be better to just add a couple of if statements here and there.
This way compiler even might have a chance to compile the code out if
the feature is disabled in kernel config.

-- 
MST

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-05 21:30 ` Michael S. Tsirkin
@ 2010-08-05 23:01   ` Ira W. Snyder
  2010-08-05 23:20     ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: Ira W. Snyder @ 2010-08-05 23:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rusty Russell, virtualization, Zang Roy, netdev

On Fri, Aug 06, 2010 at 12:30:50AM +0300, Michael S. Tsirkin wrote:
> Hi Ira,
> 
> > Making my life harder since the last time I tried this, mainline commit
> > 7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
> > possibility of using an alternative virtqueue implementation. The commit
> > message suggests that you might be willing to add this capability back.
> > Would this be an option?
> 
> Sorry about that.
> 
> With respect to this commit, we only had one implementation upstream
> and extra levels of indirection made extending the API
> much harder for no apparent benefit.
> 
> When there's more than one ring implementation with very small amount of
> common code, I think that it might make sense to readd the indirection
> back, to separate the code cleanly.
> 
> OTOH if the two implementations share a lot of code, I think that it
> might be better to just add a couple of if statements here and there.
> This way compiler even might have a chance to compile the code out if
> the feature is disabled in kernel config.
> 

The virtqueue implementation I envision will be almost identical to the
current virtio_ring virtqueue implementation, with the following
exceptions:

* the "shared memory" will actually be remote, on the PCI BAR of a device
* iowrite32(), ioread32() and friends will be needed to access the memory
* there will only be a fixed number of virtqueues available, due to PCI
  BAR size
* cross-endian virtqueues must work
* kick needs to be cross-machine (using PCI IRQ's)

I don't think it is feasible to add this to the existing implementation.
I think the requirement of being cross-endian will be the hardest to
overcome. Rusty did not envision the cross-endian use case when he
designed this, and it shows, in virtio_ring, virtio_net and vhost. I
have no idea what to do about this. Do you have any ideas?


I plan to create a custom socket similar to tun/macvtap which will use
DMA to transfer around data. This, along with a few other tricks, will
allow me to use vhost_net to operate the device. Along with a custom
virtqueue implementation meeting the requirements above, this seems like
a good plan.

Thanks for responding,
Ira

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-05 23:01   ` Ira W. Snyder
@ 2010-08-05 23:20     ` Michael S. Tsirkin
  2010-08-06 15:34       ` Ira W. Snyder
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2010-08-05 23:20 UTC (permalink / raw)
  To: Ira W. Snyder; +Cc: Rusty Russell, virtualization, Zang Roy, netdev

On Thu, Aug 05, 2010 at 04:01:03PM -0700, Ira W. Snyder wrote:
> On Fri, Aug 06, 2010 at 12:30:50AM +0300, Michael S. Tsirkin wrote:
> > Hi Ira,
> > 
> > > Making my life harder since the last time I tried this, mainline commit
> > > 7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
> > > possibility of using an alternative virtqueue implementation. The commit
> > > message suggests that you might be willing to add this capability back.
> > > Would this be an option?
> > 
> > Sorry about that.
> > 
> > With respect to this commit, we only had one implementation upstream
> > and extra levels of indirection made extending the API
> > much harder for no apparent benefit.
> > 
> > When there's more than one ring implementation with very small amount of
> > common code, I think that it might make sense to readd the indirection
> > back, to separate the code cleanly.
> > 
> > OTOH if the two implementations share a lot of code, I think that it
> > might be better to just add a couple of if statements here and there.
> > This way compiler even might have a chance to compile the code out if
> > the feature is disabled in kernel config.
> > 
> 
> The virtqueue implementation I envision will be almost identical to the
> current virtio_ring virtqueue implementation, with the following
> exceptions:
> 
> * the "shared memory" will actually be remote, on the PCI BAR of a device
> * iowrite32(), ioread32() and friends will be needed to access the memory
> * there will only be a fixed number of virtqueues available, due to PCI
>   BAR size
> * cross-endian virtqueues must work
> * kick needs to be cross-machine (using PCI IRQ's)
> 
> I don't think it is feasible to add this to the existing implementation.
> I think the requirement of being cross-endian will be the hardest to
> overcome. Rusty did not envision the cross-endian use case when he
> designed this, and it shows, in virtio_ring, virtio_net and vhost. I
> have no idea what to do about this. Do you have any ideas?

My guess is sticking an if around each access in virtio would hurt,
if this is what you are asking about.

Just a crazy idea: vhost already uses wrappers like get_user etc,
maybe when building kernel for your board you could
redefine these to also byteswap?

-- 
MST

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-05 23:20     ` Michael S. Tsirkin
@ 2010-08-06 15:34       ` Ira W. Snyder
  2010-08-14 11:34         ` Alexander Graf
  0 siblings, 1 reply; 8+ messages in thread
From: Ira W. Snyder @ 2010-08-06 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Rusty Russell, virtualization, Zang Roy, netdev

On Fri, Aug 06, 2010 at 02:20:42AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 05, 2010 at 04:01:03PM -0700, Ira W. Snyder wrote:
> > On Fri, Aug 06, 2010 at 12:30:50AM +0300, Michael S. Tsirkin wrote:
> > > Hi Ira,
> > > 
> > > > Making my life harder since the last time I tried this, mainline commit
> > > > 7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
> > > > possibility of using an alternative virtqueue implementation. The commit
> > > > message suggests that you might be willing to add this capability back.
> > > > Would this be an option?
> > > 
> > > Sorry about that.
> > > 
> > > With respect to this commit, we only had one implementation upstream
> > > and extra levels of indirection made extending the API
> > > much harder for no apparent benefit.
> > > 
> > > When there's more than one ring implementation with very small amount of
> > > common code, I think that it might make sense to readd the indirection
> > > back, to separate the code cleanly.
> > > 
> > > OTOH if the two implementations share a lot of code, I think that it
> > > might be better to just add a couple of if statements here and there.
> > > This way compiler even might have a chance to compile the code out if
> > > the feature is disabled in kernel config.
> > > 
> > 
> > The virtqueue implementation I envision will be almost identical to the
> > current virtio_ring virtqueue implementation, with the following
> > exceptions:
> > 
> > * the "shared memory" will actually be remote, on the PCI BAR of a device
> > * iowrite32(), ioread32() and friends will be needed to access the memory
> > * there will only be a fixed number of virtqueues available, due to PCI
> >   BAR size
> > * cross-endian virtqueues must work
> > * kick needs to be cross-machine (using PCI IRQ's)
> > 
> > I don't think it is feasible to add this to the existing implementation.
> > I think the requirement of being cross-endian will be the hardest to
> > overcome. Rusty did not envision the cross-endian use case when he
> > designed this, and it shows, in virtio_ring, virtio_net and vhost. I
> > have no idea what to do about this. Do you have any ideas?
> 
> My guess is sticking an if around each access in virtio would hurt,
> if this is what you are asking about.
> 

Yes, I think so too. I think using le32 byte order everywhere in virtio
would be a good thing. In addition, it means that on all x86, things
continue to work as-is. It would also have no overhead in the most
common case: x86-on-x86.

This problem is not limited to my new use of virtio. Virtio is
completely useless in a relatively common virtualization scenario:
x86 host with qemu-ppc guest. Or any other big endian guest system.

> Just a crazy idea: vhost already uses wrappers like get_user etc,
> maybe when building kernel for your board you could
> redefine these to also byteswap?
> 

I think idea is clever, but also psychotic :) I'm sure it would work,
but that only solves the problem of virtio ring descriptors. The
virtio-net header contains several __u16 fields which would also need
to be fixed-endianness.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-06 15:34       ` Ira W. Snyder
@ 2010-08-14 11:34         ` Alexander Graf
  2010-08-16  0:19           ` Rusty Russell
  2010-09-06 11:19           ` Michael S. Tsirkin
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Graf @ 2010-08-14 11:34 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: Michael S. Tsirkin, netdev@vger.kernel.org, Zang Roy,
	virtualization@lists.linux-foundation.org


Am 06.08.2010 um 11:34 schrieb "Ira W. Snyder" <iws@ovro.caltech.edu>:

> On Fri, Aug 06, 2010 at 02:20:42AM +0300, Michael S. Tsirkin wrote:
>> On Thu, Aug 05, 2010 at 04:01:03PM -0700, Ira W. Snyder wrote:
>>> On Fri, Aug 06, 2010 at 12:30:50AM +0300, Michael S. Tsirkin wrote:
>>>> Hi Ira,
>>>> 
>>>>> Making my life harder since the last time I tried this, mainline commit
>>>>> 7c5e9ed0c (virtio_ring: remove a level of indirection) has removed the
>>>>> possibility of using an alternative virtqueue implementation. The commit
>>>>> message suggests that you might be willing to add this capability back.
>>>>> Would this be an option?
>>>> 
>>>> Sorry about that.
>>>> 
>>>> With respect to this commit, we only had one implementation upstream
>>>> and extra levels of indirection made extending the API
>>>> much harder for no apparent benefit.
>>>> 
>>>> When there's more than one ring implementation with very small amount of
>>>> common code, I think that it might make sense to readd the indirection
>>>> back, to separate the code cleanly.
>>>> 
>>>> OTOH if the two implementations share a lot of code, I think that it
>>>> might be better to just add a couple of if statements here and there.
>>>> This way compiler even might have a chance to compile the code out if
>>>> the feature is disabled in kernel config.
>>>> 
>>> 
>>> The virtqueue implementation I envision will be almost identical to the
>>> current virtio_ring virtqueue implementation, with the following
>>> exceptions:
>>> 
>>> * the "shared memory" will actually be remote, on the PCI BAR of a device
>>> * iowrite32(), ioread32() and friends will be needed to access the memory
>>> * there will only be a fixed number of virtqueues available, due to PCI
>>>  BAR size
>>> * cross-endian virtqueues must work
>>> * kick needs to be cross-machine (using PCI IRQ's)
>>> 
>>> I don't think it is feasible to add this to the existing implementation.
>>> I think the requirement of being cross-endian will be the hardest to
>>> overcome. Rusty did not envision the cross-endian use case when he
>>> designed this, and it shows, in virtio_ring, virtio_net and vhost. I
>>> have no idea what to do about this. Do you have any ideas?
>> 
>> My guess is sticking an if around each access in virtio would hurt,
>> if this is what you are asking about.
>> 
> 
> Yes, I think so too. I think using le32 byte order everywhere in virtio
> would be a good thing. In addition, it means that on all x86, things
> continue to work as-is. It would also have no overhead in the most
> common case: x86-on-x86.
> 
> This problem is not limited to my new use of virtio. Virtio is
> completely useless in a relatively common virtualization scenario:
> x86 host with qemu-ppc guest. Or any other big endian guest system.

This one actually works because we know that we're building for a BE guest. But I agree that it's a mess and clearly a very incorrect design decision.

>> Just a crazy idea: vhost already uses wrappers like get_user etc,
>> maybe when building kernel for your board you could
>> redefine these to also byteswap?
>> 
> 
> I think idea is clever, but also psychotic :) I'm sure it would work,
> but that only solves the problem of virtio ring descriptors. The
> virtio-net header contains several __u16 fields which would also need
> to be fixed-endianness.

I'd vote for defining virtio v2 that makes everything LE. Maybe we could even have an LE capability with a grace period of phasing out non-LE capable hosts and guests.


Alex


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-14 11:34         ` Alexander Graf
@ 2010-08-16  0:19           ` Rusty Russell
  2010-09-06 11:19           ` Michael S. Tsirkin
  1 sibling, 0 replies; 8+ messages in thread
From: Rusty Russell @ 2010-08-16  0:19 UTC (permalink / raw)
  To: virtualization
  Cc: Alexander Graf, Ira W. Snyder, netdev@vger.kernel.org, Zang Roy,
	Michael S. Tsirkin

On Sat, 14 Aug 2010 09:04:19 pm Alexander Graf wrote:
> 
> Am 06.08.2010 um 11:34 schrieb "Ira W. Snyder" <iws@ovro.caltech.edu>:
> > This problem is not limited to my new use of virtio. Virtio is
> > completely useless in a relatively common virtualization scenario:
> > x86 host with qemu-ppc guest. Or any other big endian guest system.
> 
> This one actually works because we know that we're building for a BE guest.
> But I agree that it's a mess and clearly a very incorrect design decision.

Yes, since you need to know the guest's endian to virtualize it, the
correct interpretation of the virtio ring seemed the least problem.  Perhaps
I went overboard in simplification here, but it seemed pure legacy.

If we did a virtio2, as has been suggested, it would be possible to address
this.  You could of course do a hack where you detect the ring endianness
the first time they use it (based on avail.flags, avail.index and the
descriptor it would be quite reliable in practice).

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using virtio as a physical (wire-level) transport
  2010-08-14 11:34         ` Alexander Graf
  2010-08-16  0:19           ` Rusty Russell
@ 2010-09-06 11:19           ` Michael S. Tsirkin
  1 sibling, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2010-09-06 11:19 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ira W. Snyder, netdev@vger.kernel.org, Zang Roy,
	virtualization@lists.linux-foundation.org

On Sat, Aug 14, 2010 at 07:34:19AM -0400, Alexander Graf wrote:
> I'd vote for defining virtio v2 that makes everything LE. Maybe we
> could even have an LE capability with a grace period of phasing out
> non-LE capable hosts and guests.

So there are multiple ideas floating for modifying the ring,
and together they might warrant virtio2.
This includes removing available ring, publishing consumer indexes,
possibly some interrupt mitigation ideas, and we can put
endian-ness there.

-- 
MST

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-06 11:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-04 23:04 Using virtio as a physical (wire-level) transport Ira W. Snyder
2010-08-05 21:30 ` Michael S. Tsirkin
2010-08-05 23:01   ` Ira W. Snyder
2010-08-05 23:20     ` Michael S. Tsirkin
2010-08-06 15:34       ` Ira W. Snyder
2010-08-14 11:34         ` Alexander Graf
2010-08-16  0:19           ` Rusty Russell
2010-09-06 11:19           ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).