From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44096)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1o24-00010G-8g
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 22:43:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1o1z-0007yi-89
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 22:42:56 -0400
Received: from mga02.intel.com ([134.134.136.20]:16672)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tiejun.chen@intel.com>) id 1X1o1y-0007yE-QS
	for qemu-devel@nongnu.org; Mon, 30 Jun 2014 22:42:51 -0400
Message-ID: <53B21FAA.1050207@intel.com>
Date: Tue, 01 Jul 2014 10:40:42 +0800
From: "Chen, Tiejun" <tiejun.chen@intel.com>
MIME-Version: 1.0
References: <53AA9C4E.9070506@redhat.com> <53ABE551.3080407@intel.com>
	<53ABF00E.6000309@redhat.com> <53B0D0C5.60000@intel.com>
	<20140630064822.GB14491@redhat.com> <53B110CA.6070606@intel.com>
	<20140630090511.GB15777@redhat.com> <53B1300D.10001@intel.com>
	<20140630095509.GA17700@redhat.com> <53B139E6.1020607@intel.com>
	<20140630112845.GA29477@redhat.com>
In-Reply-To: <20140630112845.GA29477@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD
	passthrough support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, qemu-devel@nongnu.org, Kelly.Zytaruk@amd.com, anthony.perard@citrix.com, anthony@codemonkey.ws, yang.z.zhang@intel.com, Paolo Bonzini <pbonzini@redhat.com>

On 2014/6/30 19:28, Michael S. Tsirkin wrote:
> On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
>> On 2014/6/30 17:55, Michael S. Tsirkin wrote:
>>> On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
>>>> On 2014/6/30 17:05, Michael S. Tsirkin wrote:
>>>>> On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
>>>>>> On 2014/6/30 14:48, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
>>>>>>>> On 2014/6/26 18:03, Paolo Bonzini wrote:
>>>>>>>>> Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> - offsets 0x0000..0x0fff map to configuration space of the host MCH
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are you saying the config space in the video device?
>>>>>>>>>
>>>>>>>>> No, I am saying in a new BAR, or at some magic offset of an existing
>>>>>>>>> MMIO BAR.
>>>>>>>>>
>>>>>>>>
>>>>>>>> As I mentioned previously, the IGD guy told me we have no any unused a
>>>>>>>> offset or BAR in the config space.
>>>>>>>>
>>>>>>>> And guy who are responsible for the native driver seems not be accept to
>>>>>>>> extend some magic offset of an existing MMIO BAR.
>>>>>>>>
>>>>>>>> In addition I think in a short time its not possible to migrate i440fx to
>>>>>>>> q35 as a PCIe machine of xen.
>>>>>>>
>>>>>>> That seems like a weak motivation.  I don't see a need to get something
>>>>>>> merged upstream in a short time: this seems sure to miss 2.1,
>>>>>>> so you have the time to make it architecturally sound.
>>>>>>> "Making existing guests work" would be a better motivation.
>>>>>>
>>>>>> Yes.
>>>>>
>>>>> So focus on this then. Existing guests will probably work
>>>>> fine on a newer chipset - likely better than on i440fx.
>>>>> xen management tools need to do some work to support this?
>>>>> That will just give everyone more long term benefits.
>>>>>
>>>>> If instead we create a hack that does not resemble
>>>>> any existing hardware even remotely, what's the
>>>>> chance that it will not break with any future
>>>>> guest modification?
>>>>>
>>>>>
>>>>>>> Isn't this possible with an mch chipset?
>>>>>>
>>>>>> If you're saying q35, I mean AFAIK we have no any plan to migrate to this
>>>>>> MCH in xen case.
>>>>>
>>>>> q35 or a newer chipset that's closer to what guests expect.
>>>>>
>>>>>> Additionally, I think I should follow this feature after
>>>>>> q35 can work for xen scenario.
>>>>>
>>>>> What's stopping you?
>>>>
>>>> I mean if you want create an new machine based on q35, actually this is
>>>> equal to start making xen to migrate to q35 now. Right? I can't image how
>>>> much effort should be done.
>>>
>>> I don't see why you don't try. Sounds like a more robust solution to me.
>>
>> As I think this is another requirement to us. I'm not sure if I have enough
>> time to touch this currently.
>>
>>>
>>>> So this is a reason why I'm saying I'd like to follow this feature after q35
>>>> can work with xen completely.
>>>
>>> Then we'll end up with more configurations to support, and to what end?
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> So could we do this step by step:
>>>>>>>>
>>>>>>>> #1 phase: We just cover current qemu-xen implementation based on i44fx, so
>>>>>>>> still provide that pseudo ISA bridge at 00:1f.0 as we already did.
>>>>>>>
>>>>>>> By the way there is no reason to put it at 00:1f.0 specifically I think.
>>>>>>> So it seems simple: create a dummy device that gets device and
>>>>>>> vendor id as properties. If you really like, add an option to get values
>>>>>>
>>>>>> Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
>>>>>> passthrough: create pseudo intel isa bridge. There, we fake this device just
>>>>>> at 00:1f.0.
>>>>>> But you guys don't like this, and shouldn't this be just this point we
>>>>>> discussing now?
>>>>>>
>>>>>> If you guys agree that , everything is fine.
>>>>>
>>>>> Actually, this isn't what you did.
>>>>> Don't tie it to xen, and don't tie it to 1f.
>>>>> Just make it a simple stub pci device.
>>>>> Whoever wants it, creates it.
>>>>>
>>>>> The thing I worry about, is the chance this will break going forward.
>>>>> So you created a system with 2 isa bridges.
>>>>
>>>> I don't set class type to claim this is an ISA bridge, just a normal PCI
>>>> device.
>>>> +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
>>>> *hdev)
>>>> +{
>>>> +    struct PCIDevice *dev;
>>>> +
>>>> +    char rid;
>>>> +
>>>> +    /* We havt to use a simple PCI device to fake this ISA bridge
>>>> +     * to avoid making some confusion to BIOS and ACPI.
>>>> +     */
>>>> +    dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
>>>> "pseudo-intel-pch-isa-bridge");
>>>> +
>>>> +    qdev_init_nofail(&dev->qdev);
>>>> +
>>>> +    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
>>>> +    pci_config_set_device_id(dev->config, hdev->device_id);
>>>> +
>>>> +    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
>>>> +
>>>> +    pci_config_set_revision(dev->config, rid);
>>>> +
>>>> +    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
>>>> +    return 0;
>>>> +}
>>>
>>> Then I don't see how it's supposed to work.
>>> Doesn't i915 look for an isa bridge?
>>>
>>>          /*
>>>           * The reason to probe ISA bridge instead of Dev31:Fun0 is to
>>>           * make graphics device passthrough work easy for VMM, that only
>>>           * need to expose ISA bridge to let driver know the real hardware
>>>           * underneath. This is a requirement from virtualization team.
>>>           *
>>>           * In some virtualized environments (e.g. XEN), there is irrelevant
>>>           * ISA bridge in the system. To work reliably, we should scan trhough
>>>           * all the ISA bridge devices and check for the first match, instead
>>>           * of only checking the first one.
>>>           */
>>>          while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
>>>                  if (pch->vendor == PCI_VENDOR_ID_INTEL) {
>>>                          unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK;
>>>                          dev_priv->pch_id = id;
>>>
>>>                          if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
>>>                                  dev_priv->pch_type = PCH_IBX;
>>>
>>> etc
>>>
>>
>> I already post this to mainline to change as follows:
>>
>> -	while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
>> +	pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
>> +	if (pch) {
>
> I see - responded on that mail - but I thought the point is to make
> existing guests run? In fact you said so explicitly.
>
>
>> Please refer to this,
>>
>> [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of
>> check class type
>>
>> Linux Native guys would like to accept this. And actually Windows always use
>> devfn to detect this.
>
> In fact I see this:
>
>      linux 2.6.35-3.9 probes the 1st IA bridge
>
>          no idea how would you fix this.
>          try changing default class for the main bridge?
>
>      linux 3.10 probes all ISA bridges
>
>          requires your stub to be the isa bridge?
>
>
> I don't see why are driver guys so willing to do crazy things.  They
> want to match specific device/vendor id pairs, why don't they do just
> that? Why poke at class, random slot numbers etc etc?

AFAIK what they did is from our previous incorrect requirement as I 
understand. So we need to correct this.

Thanks
Tiejun

>
>
>
>>>
>>>
>>>>> This is already not something that exists on real hardware.
>>>>> So it might break some guests that will get confused.
>>>>> Maybe we are lucky and most guests see an unfamiliar device
>>>>> and ignore it. It seems believable.
>>>>>
>>>>> But your MCH hack overrides registers in the pci host.
>>>>
>>>> We just try to write *one* register we already confirm this is safe enough.
>>>
>>> This should go in code in form of comments:
>>> document what this register does on 440fx
>>> and why it's safe to override.
>>> We don't see what you
>>> confirmed off-line.
>>
>> That offset is one specific to IGD usage, not for i440fx common. This is why
>> we need to expose something in the host bridge. They're just introduced to
>> support IGD.
>>
>>>
>>>> Other register are read-only.
>>>
>>> Doesn't matter, need to document these as well.
>>
>> I think everything are covered in igd_pci_write()/igd_pci_write().
>>
>>>
>>>>> Are we lucky and there's nothing in these registers
>>>>> of interest to guests? This seems much more fragile.
>>>>> So please poke at the spec, and compile the list
>>>>> of registers you want to touch, figure out why they are
>>>>> safe to override, and put this all in code comments.
>>>>>
>>>>> And the same thing that applies to the isa bridge
>>>>
>>>> We just expose its own vendor/device ids here. We don't access to change
>>>> anything in the isa bridge.
>>>>
>>>>> applies here too. It should work without QEMU touching
>>>>> hosts' hardware.
>>>>>
>>>>>
>>>>>
>>>>>> >from sysfs: device and vendor id are world readable, so just get them
>>>>>>> directly and not through xen wrappers, this way you can open the files
>>>>>>> RO and not RW.
>>>>>>> You seem to poke at revision as well, I don't see
>>>>>>> driver looking at that - strictly necessary?
>>>>>>> If yes please patch host kernel to expose that info in sysfs,
>>>>>>> though we can fall back on pci config if not there.
>>>>>>>
>>>>>>> MCH (bridge_dev) hacks in i915 are nastier.
>>>>>>> To clean them up, we really have to have an explicit driver for this
>>>>>>> bridge, not a pass-through device.  Long term, the right thing to do is
>>>>>>> likely to extend host driver and expose the necessary information in
>>>>>>> sysfs on host kernel.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I'm a bit confused. Any sysfs should be filled by the associated PCIe
>>>>>> device, shouldn't it? So qemu still need to emulate this PCIe device
>>>>>> firstly, then set properties into sysfs.
>>>>>
>>>>> I am talking about getting host properties into qemu.
>>>>> You don't want to give qemu R/W root access to host sysfs system
>>>>> of the root bridge, that's not secure.
>>>>
>>>> igd_pci_read()
>>>> 	|
>>>> 	+ xen_host_pci_get_block()
>>>> 		|
>>>> 		+ xen_host_pci_config_read(()
>>>> 			|
>>>> 			+  pread()
>>>>
>>>> So shouldn't we already expose these information via sysfs?
>>>
>>> That's poking at sysfs of a real device, and after opening it RW.
>>
>> I don't think we can really write anything to those read-only sysfs
>> interface even we open them as RW.
>>
>> Tiejun
>>
>>>
>>>>> Avoiding read only access to filesystem is a good idea too, so it
>>>>> should be possible to pass all parameters in as
>>>>> device properties, and let whoever starts qemu
>>>>> figure out what are reasonable values.
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> #2 phase: Now, we will choose a capability ID that won't be conflicting with
>>>>>>>> others. To do this properly, we need to get one from PCI SIG group. To have
>>>>>>>> this workable and consistently validated, this method shouldn't be virt
>>>>>>>> specific. Then native driver should use the same method.
>>>>>>>
>>>>>>> You mean you will be able to talk sense into hardware guys?
>>>>>>> I doubt that. If they could be convinced to make e.g. i915 base a
>>>>>>
>>>>>> We're negotiating this, so this is just our long term solution we figure out
>>>>>> currently.
>>>>>>
>>>>>>> proper BAR, why didn't they?
>>>>>>
>>>>>> We already have no any free BAR as we mentioned previously.
>>>>>
>>>>> I thought you were talking about modifying hardware?
>>>>
>>>> Yes.
>>>>
>>>> Tiejun
>>>
>>> And they can't figure out how to stick extra info in an existing BAR?
>>> Oh well, that's hardware for you.
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> So when xen work on
>>>>>>>> q35 PCIe machine, we can walk this way.
>>>>>>>
>>>>>>> If you are emulating MCH anyway, pick one that is close
>>>>>>> to what i915 driver expects. It would then work with existing
>>>>>>
>>>>>> Looks you guys prefer we create a new MCH anyway, right? But is it necessary
>>>>>> to create a new based on i440fx just for a little change?
>>>>>>
>>>>>> Thanks
>>>>>> Tiejun
>>>>>
>>>>> You can inherit it. Maybe you are lucky and this happens to
>>>>> work without conflicting with whatever other guests want to do.
>>>>> But if you ask me, you are really just piling up hacks.
>>>>> If some guest does not work on i440, you should just work on
>>>>> emulating whatever it does work on.
>>>>> That would have real value.
>>>>>
>>>>>>> devices, without new capability IDs.
>>>>>>>
>>>>>>>
>>>>>>>> Anthony,
>>>>>>>>
>>>>>>>> Any comments to address this in xen case?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Tiejun
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>>
>>>
>