netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ira W. Snyder" <iws@ovro.caltech.edu>
To: Avi Kivity <avi@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Gregory Haskins <gregory.haskins@gmail.com>,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	alacrityvm-devel@lists.sourceforge.net,
	Anthony Liguori <anthony@codemonkey.ws>,
	Ingo Molnar <mingo@elte.hu>,
	Gregory Haskins <ghaskins@novell.com>
Subject: Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects
Date: Wed, 19 Aug 2009 08:28:11 -0700	[thread overview]
Message-ID: <20090819152811.GA22294@ovro.caltech.edu> (raw)
In-Reply-To: <4A8B9051.3020505@redhat.com>

On Wed, Aug 19, 2009 at 08:40:33AM +0300, Avi Kivity wrote:
> On 08/19/2009 03:38 AM, Ira W. Snyder wrote:
>> On Wed, Aug 19, 2009 at 12:26:23AM +0300, Avi Kivity wrote:
>>    
>>> On 08/18/2009 11:59 PM, Ira W. Snyder wrote:
>>>      
>>>> On a non shared-memory system (where the guest's RAM is not just a chunk
>>>> of userspace RAM in the host system), virtio's management model seems to
>>>> fall apart. Feature negotiation doesn't work as one would expect.
>>>>
>>>>        
>>> In your case, virtio-net on the main board accesses PCI config space
>>> registers to perform the feature negotiation; software on your PCI cards
>>> needs to trap these config space accesses and respond to them according
>>> to virtio ABI.
>>>
>>>      
>> Is this "real PCI" (physical hardware) or "fake PCI" (software PCI
>> emulation) that you are describing?
>>
>>    
>
> Real PCI.
>
>> The host (x86, PCI master) must use "real PCI" to actually configure the
>> boards, enable bus mastering, etc. Just like any other PCI device, such
>> as a network card.
>>
>> On the guests (ppc, PCI agents) I cannot add/change PCI functions (the
>> last .[0-9] in the PCI address) nor can I change PCI BAR's once the
>> board has started. I'm pretty sure that would violate the PCI spec,
>> since the PCI master would need to re-scan the bus, and re-assign
>> addresses, which is a task for the BIOS.
>>    
>
> Yes.  Can the boards respond to PCI config space cycles coming from the  
> host, or is the config space implemented in silicon and immutable?   
> (reading on, I see the answer is no).  virtio-pci uses the PCI config  
> space to configure the hardware.
>

Yes, the PCI config space is implemented in silicon. I can change a few
things (mostly PCI BAR attributes), but not much.

>>> (There's no real guest on your setup, right?  just a kernel running on
>>> and x86 system and other kernels running on the PCI cards?)
>>>
>>>      
>> Yes, the x86 (PCI master) runs Linux (booted via PXELinux). The ppc's
>> (PCI agents) also run Linux (booted via U-Boot). They are independent
>> Linux systems, with a physical PCI interconnect.
>>
>> The x86 has CONFIG_PCI=y, however the ppc's have CONFIG_PCI=n. Linux's
>> PCI stack does bad things as a PCI agent. It always assumes it is a PCI
>> master.
>>
>> It is possible for me to enable CONFIG_PCI=y on the ppc's by removing
>> the PCI bus from their list of devices provided by OpenFirmware. They
>> can not access PCI via normal methods. PCI drivers cannot work on the
>> ppc's, because Linux assumes it is a PCI master.
>>
>> To the best of my knowledge, I cannot trap configuration space accesses
>> on the PCI agents. I haven't needed that for anything I've done thus
>> far.
>>
>>    
>
> Well, if you can't do that, you can't use virtio-pci on the host.   
> You'll need another virtio transport (equivalent to "fake pci" you  
> mentioned above).
>

Ok.

Is there something similar that I can study as an example? Should I look
at virtio-pci?

>>>> This does appear to be solved by vbus, though I haven't written a
>>>> vbus-over-PCI implementation, so I cannot be completely sure.
>>>>
>>>>        
>>> Even if virtio-pci doesn't work out for some reason (though it should),
>>> you can write your own virtio transport and implement its config space
>>> however you like.
>>>
>>>      
>> This is what I did with virtio-over-PCI. The way virtio-net negotiates
>> features makes this work non-intuitively.
>>    
>
> I think you tried to take two virtio-nets and make them talk together?   
> That won't work.  You need the code from qemu to talk to virtio-net  
> config space, and vhost-net to pump the rings.
>

It *is* possible to make two unmodified virtio-net's talk together. I've
done it, and it is exactly what the virtio-over-PCI patch does. Study it
and you'll see how I connected the rx/tx queues together.

The feature negotiation code also works, but in a very unintuitive
manner. I made it work in the virtio-over-PCI patch, but the devices are
hardcoded into the driver. It would be quite a bit of work to swap
virtio-net and virtio-console, for example.

>>>> I'm not at all clear on how to get feature negotiation to work on a
>>>> system like mine. From my study of lguest and kvm (see below) it looks
>>>> like userspace will need to be involved, via a miscdevice.
>>>>
>>>>        
>>> I don't see why.  Is the kernel on the PCI cards in full control of all
>>> accesses?
>>>
>>>      
>> I'm not sure what you mean by this. Could you be more specific? This is
>> a normal, unmodified vanilla Linux kernel running on the PCI agents.
>>    
>
> I meant, does board software implement the config space accesses issued  
> from the host, and it seems the answer is no.
>
>
>> In my virtio-over-PCI patch, I hooked two virtio-net's together. I wrote
>> an algorithm to pair the tx/rx queues together. Since virtio-net
>> pre-fills its rx queues with buffers, I was able to use the DMA engine
>> to copy from the tx queue into the pre-allocated memory in the rx queue.
>>
>>    
>
> Please find a name other than virtio-over-PCI since it conflicts with  
> virtio-pci.  You're tunnelling virtio config cycles (which are usually  
> done on pci config cycles) on a new protocol which is itself tunnelled  
> over PCI shared memory.
>

Sorry about that. Do you have suggestions for a better name?

I called it virtio-over-PCI in my previous postings to LKML, so until a
new patch is written and posted, I'll keep referring to it by the name
used in the past, so people can search for it.

When I post virtio patches, should I CC another mailing list in addition
to LKML?

>>>>
>>>>        
>>> Yeah.  You'll need to add byteswaps.
>>>
>>>      
>> I wonder if Rusty would accept a new feature:
>> VIRTIO_F_NET_LITTLE_ENDIAN, which would allow the virtio-net driver to
>> use LE for all of it's multi-byte fields.
>>
>> I don't think the transport should have to care about the endianness.
>>    
>
> Given this is not mainstream use, it would have to have zero impact when  
> configured out.
>

Yes, of course.

That said, I'm not sure how qemu-system-ppc running on x86 could
possibly communicate using virtio-net. This would mean the guest is an
emulated big-endian PPC, while the host is a little-endian x86. I
haven't actually tested this situation, so perhaps I am wrong.

>> True. It's slowpath setup, so I don't care how fast it is. For reasons
>> outside my control, the x86 (PCI master) is running a RHEL5 system. This
>> means glibc-2.5, which doesn't have eventfd support, AFAIK. I could try
>> and push for an upgrade. This obviously makes cat/echo really nice, it
>> doesn't depend on glibc, only the kernel version.
>>
>> I don't give much weight to the above, because I can use the eventfd
>> syscalls directly, without glibc support. It is just more painful.
>>    
>
> The x86 side only needs to run virtio-net, which is present in RHEL 5.3.  
> You'd only need to run virtio-tunnel or however it's called.  All the 
> eventfd magic takes place on the PCI agents.
>

I can upgrade the kernel to anything I want on both the x86 and ppc's.
I'd like to avoid changing the x86 (RHEL5) userspace, though. On the
ppc's, I have full control over the userspace environment.

Thanks,
Ira

  reply	other threads:[~2009-08-19 15:28 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-14 15:42 [PATCH v3 0/6] AlacrityVM guest drivers Gregory Haskins
2009-08-14 15:42 ` [PATCH v3 1/6] shm-signal: shared-memory signals Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 2/6] ioq: Add basic definitions for a shared-memory, lockless queue Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-15 10:32   ` Ingo Molnar
2009-08-15 19:15     ` Anthony Liguori
2009-08-16  7:16       ` Ingo Molnar
2009-08-17 13:54         ` Anthony Liguori
2009-08-17 14:23           ` Ingo Molnar
2009-08-17 14:14       ` Gregory Haskins
2009-08-17 14:58         ` Avi Kivity
2009-08-17 15:05           ` Ingo Molnar
2009-08-17 17:41         ` Michael S. Tsirkin
2009-08-17 20:17           ` Gregory Haskins
2009-08-18  8:46             ` Michael S. Tsirkin
2009-08-18 15:19               ` Gregory Haskins
2009-08-18 16:25                 ` Michael S. Tsirkin
2009-08-18 15:53               ` [Alacrityvm-devel] " Ira W. Snyder
2009-08-18 16:51                 ` Avi Kivity
2009-08-18 17:27                   ` Ira W. Snyder
2009-08-18 17:47                     ` Avi Kivity
2009-08-18 18:27                       ` Ira W. Snyder
2009-08-18 18:52                         ` Avi Kivity
2009-08-18 20:59                           ` Ira W. Snyder
2009-08-18 21:26                             ` Avi Kivity
2009-08-18 22:06                               ` Avi Kivity
2009-08-19  0:44                                 ` Ira W. Snyder
2009-08-19  5:26                                   ` Avi Kivity
2009-08-19  0:38                               ` Ira W. Snyder
2009-08-19  5:40                                 ` Avi Kivity
2009-08-19 15:28                                   ` Ira W. Snyder [this message]
2009-08-19 15:37                                     ` Avi Kivity
2009-08-19 16:29                                       ` Ira W. Snyder
2009-08-19 16:38                                         ` Avi Kivity
2009-08-19 21:05                                           ` Hollis Blanchard
2009-08-20  9:57                                             ` Stefan Hajnoczi
2009-08-20 10:08                                               ` Avi Kivity
2009-08-18 20:35                         ` Michael S. Tsirkin
2009-08-18 21:04                           ` Arnd Bergmann
2009-08-18 20:39                     ` Michael S. Tsirkin
2009-08-18 20:57                 ` Michael S. Tsirkin
2009-08-18 23:24                   ` Ira W. Snyder
2009-08-18  1:08         ` Anthony Liguori
2009-08-18  7:38           ` Avi Kivity
2009-08-18  8:54           ` Michael S. Tsirkin
2009-08-18 13:16           ` Gregory Haskins
2009-08-18 13:45             ` Avi Kivity
2009-08-18 15:51               ` Gregory Haskins
2009-08-18 16:14                 ` Ingo Molnar
2009-08-19  4:27                   ` Gregory Haskins
2009-08-19  5:22                     ` Avi Kivity
2009-08-19 13:27                       ` Gregory Haskins
2009-08-19 14:35                         ` Avi Kivity
2009-08-18 16:47                 ` Avi Kivity
2009-08-18 16:51                 ` Michael S. Tsirkin
2009-08-19  5:36                   ` Gregory Haskins
2009-08-19  5:48                     ` Avi Kivity
2009-08-19  6:40                       ` Gregory Haskins
2009-08-19  7:13                         ` Avi Kivity
2009-08-19 11:40                           ` Gregory Haskins
2009-08-19 11:49                             ` Avi Kivity
2009-08-19 11:52                               ` Gregory Haskins
2009-08-19 14:33                     ` Michael S. Tsirkin
2009-08-20 12:12                     ` Michael S. Tsirkin
2009-08-16  8:30     ` Avi Kivity
2009-08-17 14:16       ` Gregory Haskins
2009-08-17 14:59         ` Avi Kivity
2009-08-17 15:09           ` Gregory Haskins
2009-08-17 15:14             ` Ingo Molnar
2009-08-17 19:35               ` Gregory Haskins
2009-08-17 15:18             ` Avi Kivity
2009-08-17 13:02     ` Gregory Haskins
2009-08-17 14:25       ` Ingo Molnar
2009-08-17 15:05         ` Gregory Haskins
2009-08-17 15:08           ` Ingo Molnar
2009-08-17 19:33             ` Gregory Haskins
2009-08-18  8:33               ` Avi Kivity
2009-08-18 14:46                 ` Gregory Haskins
2009-08-18 16:27                   ` Avi Kivity
2009-08-19  6:28                     ` Gregory Haskins
2009-08-19  7:11                       ` Avi Kivity
2009-08-19 18:23                         ` Nicholas A. Bellinger
2009-08-19 18:39                           ` Gregory Haskins
2009-08-19 19:19                             ` Nicholas A. Bellinger
2009-08-19 19:34                               ` Nicholas A. Bellinger
2009-08-19 20:12                           ` configfs/sysfs Avi Kivity
2009-08-19 20:48                             ` configfs/sysfs Ingo Molnar
2009-08-19 20:53                               ` configfs/sysfs Avi Kivity
2009-08-19 21:19                             ` configfs/sysfs Nicholas A. Bellinger
2009-08-19 22:15                             ` configfs/sysfs Gregory Haskins
2009-08-19 22:16                             ` configfs/sysfs Joel Becker
2009-08-19 23:48                               ` [Alacrityvm-devel] configfs/sysfs Alex Tsariounov
2009-08-19 23:54                               ` configfs/sysfs Nicholas A. Bellinger
2009-08-20  6:09                               ` configfs/sysfs Avi Kivity
     [not found]                               ` <4A8CE891.2010502@redhat.com>
2009-08-20 22:48                                 ` configfs/sysfs Joel Becker
2009-08-21  4:14                                   ` configfs/sysfs Avi Kivity
2009-08-19 18:26                         ` [PATCH v3 3/6] vbus: add a "vbus-proxy" bus model for vbus_driver objects Gregory Haskins
2009-08-19 20:37                           ` Avi Kivity
2009-08-19 20:53                             ` Ingo Molnar
2009-08-20 17:25                             ` Muli Ben-Yehuda
2009-08-20 20:58                             ` Caitlin Bestler
2009-08-18 18:20                   ` Arnd Bergmann
2009-08-18 19:08                     ` Avi Kivity
2009-08-19  5:36                     ` Gregory Haskins
2009-08-18  9:53               ` Michael S. Tsirkin
2009-08-18 10:00                 ` Avi Kivity
2009-08-18 10:09                   ` Michael S. Tsirkin
2009-08-18 10:13                     ` Avi Kivity
2009-08-18 10:28                       ` Michael S. Tsirkin
2009-08-18 10:45                         ` Avi Kivity
2009-08-18 11:07                           ` Michael S. Tsirkin
2009-08-18 11:15                             ` Avi Kivity
2009-08-18 11:49                               ` Michael S. Tsirkin
2009-08-18 11:54                                 ` Avi Kivity
2009-08-18 15:39                 ` Gregory Haskins
2009-08-18 16:39                   ` Michael S. Tsirkin
2009-08-17 15:13           ` Avi Kivity
2009-08-14 15:43 ` [PATCH v3 4/6] vbus-proxy: add a pci-to-vbus bridge Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 5/6] ioq: add driver-side vbus helpers Gregory Haskins
2009-08-14 15:43 ` [PATCH v3 6/6] net: Add vbus_enet driver Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090819152811.GA22294@ovro.caltech.edu \
    --to=iws@ovro.caltech.edu \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=gregory.haskins@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).