From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37399)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1Y32-0007et-GQ
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 05:38:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1Y2w-0001oo-9o
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 05:38:52 -0400
Received: from mga09.intel.com ([134.134.136.24]:51332)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1Y2v-0001n2-SC
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 05:38:46 -0400
Message-ID: <53B1300D.10001@intel.com>
Date: Mon, 30 Jun 2014 17:38:21 +0800
From: "Chen, Tiejun" <tiejun.chen@intel.com>
MIME-Version: 1.0
References: <20140625090925.GH32652@redhat.com> <53AA9480.1010005@intel.com>
	<53AA96DF.6070501@redhat.com> <53AA9B58.6050803@intel.com>
	<53AA9C4E.9070506@redhat.com> <53ABE551.3080407@intel.com>
	<53ABF00E.6000309@redhat.com> <53B0D0C5.60000@intel.com>
	<20140630064822.GB14491@redhat.com> <53B110CA.6070606@intel.com>
	<20140630090511.GB15777@redhat.com>
In-Reply-To: <20140630090511.GB15777@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough
	support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, Kelly.Zytaruk@amd.com, qemu-devel@nongnu.org, yang.z.zhang@intel.com, anthony@codemonkey.ws, anthony.perard@citrix.com, Paolo Bonzini <pbonzini@redhat.com>

On 2014/6/30 17:05, Michael S. Tsirkin wrote:
> On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
>> On 2014/6/30 14:48, Michael S. Tsirkin wrote:
>>> On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
>>>> On 2014/6/26 18:03, Paolo Bonzini wrote:
>>>>> Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
>>>>>>
>>>>>>>
>>>>>>> - offsets 0x0000..0x0fff map to configuration space of the host MCH
>>>>>>>
>>>>>>
>>>>>> Are you saying the config space in the video device?
>>>>>
>>>>> No, I am saying in a new BAR, or at some magic offset of an existing
>>>>> MMIO BAR.
>>>>>
>>>>
>>>> As I mentioned previously, the IGD guy told me we have no any unused a
>>>> offset or BAR in the config space.
>>>>
>>>> And guy who are responsible for the native driver seems not be accept to
>>>> extend some magic offset of an existing MMIO BAR.
>>>>
>>>> In addition I think in a short time its not possible to migrate i440fx to
>>>> q35 as a PCIe machine of xen.
>>>
>>> That seems like a weak motivation.  I don't see a need to get something
>>> merged upstream in a short time: this seems sure to miss 2.1,
>>> so you have the time to make it architecturally sound.
>>> "Making existing guests work" would be a better motivation.
>>
>> Yes.
>
> So focus on this then. Existing guests will probably work
> fine on a newer chipset - likely better than on i440fx.
> xen management tools need to do some work to support this?
> That will just give everyone more long term benefits.
>
> If instead we create a hack that does not resemble
> any existing hardware even remotely, what's the
> chance that it will not break with any future
> guest modification?
>
>
>>> Isn't this possible with an mch chipset?
>>
>> If you're saying q35, I mean AFAIK we have no any plan to migrate to this
>> MCH in xen case.
>
> q35 or a newer chipset that's closer to what guests expect.
>
>> Additionally, I think I should follow this feature after
>> q35 can work for xen scenario.
>
> What's stopping you?

I mean if you want create an new machine based on q35, actually this is 
equal to start making xen to migrate to q35 now. Right? I can't image 
how much effort should be done.

So this is a reason why I'm saying I'd like to follow this feature after 
q35 can work with xen completely.

>
>>>
>>>
>>>> So could we do this step by step:
>>>>
>>>> #1 phase: We just cover current qemu-xen implementation based on i44fx, so
>>>> still provide that pseudo ISA bridge at 00:1f.0 as we already did.
>>>
>>> By the way there is no reason to put it at 00:1f.0 specifically I think.
>>> So it seems simple: create a dummy device that gets device and
>>> vendor id as properties. If you really like, add an option to get values
>>
>> Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
>> passthrough: create pseudo intel isa bridge. There, we fake this device just
>> at 00:1f.0.
>> But you guys don't like this, and shouldn't this be just this point we
>> discussing now?
>>
>> If you guys agree that , everything is fine.
>
> Actually, this isn't what you did.
> Don't tie it to xen, and don't tie it to 1f.
> Just make it a simple stub pci device.
> Whoever wants it, creates it.
>
> The thing I worry about, is the chance this will break going forward.
> So you created a system with 2 isa bridges.

I don't set class type to claim this is an ISA bridge, just a normal PCI 
device.

+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice 
*hdev)
+{
+    struct PCIDevice *dev;
+
+    char rid;
+
+    /* We havt to use a simple PCI device to fake this ISA bridge
+     * to avoid making some confusion to BIOS and ACPI.
+     */
+    dev = pci_create(bus, PCI_DEVFN(0x1f, 0), 
"pseudo-intel-pch-isa-bridge");
+
+    qdev_init_nofail(&dev->qdev);
+
+    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
+    pci_config_set_device_id(dev->config, hdev->device_id);
+
+    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
+
+    pci_config_set_revision(dev->config, rid);
+
+    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
+    return 0;
+}

> This is already not something that exists on real hardware.
> So it might break some guests that will get confused.
> Maybe we are lucky and most guests see an unfamiliar device
> and ignore it. It seems believable.
>
> But your MCH hack overrides registers in the pci host.

We just try to write *one* register we already confirm this is safe 
enough. Other register are read-only.

> Are we lucky and there's nothing in these registers
> of interest to guests? This seems much more fragile.
> So please poke at the spec, and compile the list
> of registers you want to touch, figure out why they are
> safe to override, and put this all in code comments.
>
> And the same thing that applies to the isa bridge

We just expose its own vendor/device ids here. We don't access to change 
anything in the isa bridge.

> applies here too. It should work without QEMU touching
> hosts' hardware.
>
>
>
>> >from sysfs: device and vendor id are world readable, so just get them
>>> directly and not through xen wrappers, this way you can open the files
>>> RO and not RW.
>>> You seem to poke at revision as well, I don't see
>>> driver looking at that - strictly necessary?
>>> If yes please patch host kernel to expose that info in sysfs,
>>> though we can fall back on pci config if not there.
>>>
>>> MCH (bridge_dev) hacks in i915 are nastier.
>>> To clean them up, we really have to have an explicit driver for this
>>> bridge, not a pass-through device.  Long term, the right thing to do is
>>> likely to extend host driver and expose the necessary information in
>>> sysfs on host kernel.
>>>
>>>
>>
>> I'm a bit confused. Any sysfs should be filled by the associated PCIe
>> device, shouldn't it? So qemu still need to emulate this PCIe device
>> firstly, then set properties into sysfs.
>
> I am talking about getting host properties into qemu.
> You don't want to give qemu R/W root access to host sysfs system
> of the root bridge, that's not secure.

igd_pci_read()
	|
	+ xen_host_pci_get_block()
		|
		+ xen_host_pci_config_read(()
			|
			+  pread()

So shouldn't we already expose these information via sysfs?

> Avoiding read only access to filesystem is a good idea too, so it
> should be possible to pass all parameters in as
> device properties, and let whoever starts qemu
> figure out what are reasonable values.
>
>>>
>>>
>>>> #2 phase: Now, we will choose a capability ID that won't be conflicting with
>>>> others. To do this properly, we need to get one from PCI SIG group. To have
>>>> this workable and consistently validated, this method shouldn't be virt
>>>> specific. Then native driver should use the same method.
>>>
>>> You mean you will be able to talk sense into hardware guys?
>>> I doubt that. If they could be convinced to make e.g. i915 base a
>>
>> We're negotiating this, so this is just our long term solution we figure out
>> currently.
>>
>>> proper BAR, why didn't they?
>>
>> We already have no any free BAR as we mentioned previously.
>
> I thought you were talking about modifying hardware?

Yes.

Tiejun

>
>>>
>>>
>>>> So when xen work on
>>>> q35 PCIe machine, we can walk this way.
>>>
>>> If you are emulating MCH anyway, pick one that is close
>>> to what i915 driver expects. It would then work with existing
>>
>> Looks you guys prefer we create a new MCH anyway, right? But is it necessary
>> to create a new based on i440fx just for a little change?
>>
>> Thanks
>> Tiejun
>
> You can inherit it. Maybe you are lucky and this happens to
> work without conflicting with whatever other guests want to do.
> But if you ask me, you are really just piling up hacks.
> If some guest does not work on i440, you should just work on
> emulating whatever it does work on.
> That would have real value.
>
>>> devices, without new capability IDs.
>>>
>>>
>>>> Anthony,
>>>>
>>>> Any comments to address this in xen case?
>>>>
>>>> Thanks
>>>> Tiejun
>>>
>>>
>
>