* [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged [not found] <20160225135613-mutt-send-email-mst@redhat.com> @ 2016-02-25 12:44 ` Laszlo Ersek 2016-02-25 13:00 ` Laszlo Ersek 0 siblings, 1 reply; 7+ messages in thread From: Laszlo Ersek @ 2016-02-25 12:44 UTC (permalink / raw) To: Michael S. Tsirkin, Igor Mammedov; +Cc: qemu devel list, Gerd Hoffmann Hi, On 02/25/16 12:57, Michael S. Tsirkin wrote: > ----- Forwarded message from Igor Mammedov <imammedo@redhat.com> ----- > > Date: Thu, 11 Feb 2016 16:16:05 +0100 > From: Igor Mammedov <imammedo@redhat.com> > To: "Michael S. Tsirkin" <mst@redhat.com> > To: lersek@redhat.com > Subject: on pci rebalancing > Message-ID: <20160211161605.0022ed38@nial.brq.redhat.com> > In-Reply-To: <20160209131656-mutt-send-email-mst@redhat.com> > >>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver >>>>> otherwise OS will ignore it when rebalancing happens and >>>>> might map something else over ignored BAR. >>>> >>>> Does it disable the BAR then? Or just move it elsewhere? >>> it doesn't, it just blindly ignores BARs existence and maps BAR of >>> another device with driver over it. >> >> Interesting. On classical PCI this is a forbidden configuration. >> Maybe we do something that confuses windows? >> Could you tell me how to reproduce this behaviour? > #cat > t << EOF > pci_update_mappings_del > pci_update_mappings_add > EOF > > #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > > wait till OS boots, note BARs programmed for ivshmem > in my case it was > 01:01.0 0,0xfe800000+0x100 > then execute script and watch pci_update_mappings* trace events > > # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > > hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > and then programs new BARs, where: > pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > creates overlapping BAR with ivshmem Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. --*-- Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. (2) Create a separate directory for testing. In this directory, run the following command: cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd Also create a disk image for your new guest, etc. (3) Use the following command line snippet to work with OVMF: qemu-system-x86_64 \ -machine accel=kvm \ -smp cpus=2 \ -m 2048 \ \ -debugcon file:ovmf.debug.log \ -global isa-debugcon.iobase=0x402 \ \ -device qxl-vga \ \ -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ -drive if=pflash,format=raw,unit=1,file=myvars.fd \ \ [your options here] You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. Thanks Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 12:44 ` [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged Laszlo Ersek @ 2016-02-25 13:00 ` Laszlo Ersek 2016-02-25 13:30 ` Michael S. Tsirkin 0 siblings, 1 reply; 7+ messages in thread From: Laszlo Ersek @ 2016-02-25 13:00 UTC (permalink / raw) To: Michael S. Tsirkin, Igor Mammedov; +Cc: qemu devel list, Gerd Hoffmann On 02/25/16 13:44, Laszlo Ersek wrote: > Hi, > > On 02/25/16 12:57, Michael S. Tsirkin wrote: >> ----- Forwarded message from Igor Mammedov <imammedo@redhat.com> ----- >> >> Date: Thu, 11 Feb 2016 16:16:05 +0100 >> From: Igor Mammedov <imammedo@redhat.com> >> To: "Michael S. Tsirkin" <mst@redhat.com> >> To: lersek@redhat.com >> Subject: on pci rebalancing >> Message-ID: <20160211161605.0022ed38@nial.brq.redhat.com> >> In-Reply-To: <20160209131656-mutt-send-email-mst@redhat.com> >> >>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver >>>>>> otherwise OS will ignore it when rebalancing happens and >>>>>> might map something else over ignored BAR. >>>>> >>>>> Does it disable the BAR then? Or just move it elsewhere? >>>> it doesn't, it just blindly ignores BARs existence and maps BAR of >>>> another device with driver over it. >>> >>> Interesting. On classical PCI this is a forbidden configuration. >>> Maybe we do something that confuses windows? >>> Could you tell me how to reproduce this behaviour? >> #cat > t << EOF >> pci_update_mappings_del >> pci_update_mappings_add >> EOF >> >> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ >> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ >> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ >> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 >> >> wait till OS boots, note BARs programmed for ivshmem >> in my case it was >> 01:01.0 0,0xfe800000+0x100 >> then execute script and watch pci_update_mappings* trace events >> >> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; >> >> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where >> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem >> and then programs new BARs, where: >> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 >> creates overlapping BAR with ivshmem > > Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) > > So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. > > The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. > > So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). > > Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. > > Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. > > (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) > > Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. > > So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. > > --*-- > > Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. > > (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. > > (2) Create a separate directory for testing. In this directory, run the following command: > > cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd > > Also create a disk image for your new guest, etc. > > (3) Use the following command line snippet to work with OVMF: > > qemu-system-x86_64 \ > -machine accel=kvm \ > -smp cpus=2 \ > -m 2048 \ > \ > -debugcon file:ovmf.debug.log \ > -global isa-debugcon.iobase=0x402 \ > \ > -device qxl-vga \ > \ > -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ > -drive if=pflash,format=raw,unit=1,file=myvars.fd \ > \ > [your options here] > > You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. > > Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. > > Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. > > Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. I should also mention that you might not be able to reproduce the same situation with the "ivshmem" device. Namely, if there is no UEFI driver for that PCI device (and OVMF certainly doesn't have one), then its MMIO and IO decoding bits will *never* be set. As I said, command register massaging is the jurisdiction of the individual UEFI driver that ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. Therefore you should probably try to reproduce the issue with another PCI device type that OVMF has a driver for, but Windows has none (installed at least). I'm quite hard pressed to name such a device type, unfortunately. :( Perhaps one of the more obscure emulated NICs could work in place of ivshmem. (The IPXE oproms provide UEFI drivers for those.) Thanks Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 13:00 ` Laszlo Ersek @ 2016-02-25 13:30 ` Michael S. Tsirkin 2016-02-25 14:05 ` Laszlo Ersek 0 siblings, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2016-02-25 13:30 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Igor Mammedov, qemu devel list, Gerd Hoffmann On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: > On 02/25/16 13:44, Laszlo Ersek wrote: > > Hi, > > > > On 02/25/16 12:57, Michael S. Tsirkin wrote: > >> ----- Forwarded message from Igor Mammedov <imammedo@redhat.com> ----- > >> > >> Date: Thu, 11 Feb 2016 16:16:05 +0100 > >> From: Igor Mammedov <imammedo@redhat.com> > >> To: "Michael S. Tsirkin" <mst@redhat.com> > >> To: lersek@redhat.com > >> Subject: on pci rebalancing > >> Message-ID: <20160211161605.0022ed38@nial.brq.redhat.com> > >> In-Reply-To: <20160209131656-mutt-send-email-mst@redhat.com> > >> > >>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver > >>>>>> otherwise OS will ignore it when rebalancing happens and > >>>>>> might map something else over ignored BAR. > >>>>> > >>>>> Does it disable the BAR then? Or just move it elsewhere? > >>>> it doesn't, it just blindly ignores BARs existence and maps BAR of > >>>> another device with driver over it. > >>> > >>> Interesting. On classical PCI this is a forbidden configuration. > >>> Maybe we do something that confuses windows? > >>> Could you tell me how to reproduce this behaviour? > >> #cat > t << EOF > >> pci_update_mappings_del > >> pci_update_mappings_add > >> EOF > >> > >> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > >> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > >> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > >> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > >> > >> wait till OS boots, note BARs programmed for ivshmem > >> in my case it was > >> 01:01.0 0,0xfe800000+0x100 > >> then execute script and watch pci_update_mappings* trace events > >> > >> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > >> > >> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > >> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > >> and then programs new BARs, where: > >> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > >> creates overlapping BAR with ivshmem > > > > Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) > > > > So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. > > > > The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. > > > > So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). > > > > Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. > > > > Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. > > > > (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) > > > > Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. > > > > So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. > > > > --*-- > > > > Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. > > > > (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. > > > > (2) Create a separate directory for testing. In this directory, run the following command: > > > > cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd > > > > Also create a disk image for your new guest, etc. > > > > (3) Use the following command line snippet to work with OVMF: > > > > qemu-system-x86_64 \ > > -machine accel=kvm \ > > -smp cpus=2 \ > > -m 2048 \ > > \ > > -debugcon file:ovmf.debug.log \ > > -global isa-debugcon.iobase=0x402 \ > > \ > > -device qxl-vga \ > > \ > > -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ > > -drive if=pflash,format=raw,unit=1,file=myvars.fd \ > > \ > > [your options here] > > > > You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. > > > > Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. > > > > Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. > > > > Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. > > I should also mention that you might not be able to reproduce the same > situation with the "ivshmem" device. Namely, if there is no UEFI driver > for that PCI device (and OVMF certainly doesn't have one), then its MMIO > and IO decoding bits will *never* be set. As I said, command register > massaging is the jurisdiction of the individual UEFI driver that > ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. > > Therefore you should probably try to reproduce the issue with another > PCI device type that OVMF has a driver for, but Windows has none > (installed at least). I'm quite hard pressed to name such a device type, > unfortunately. :( virtio? > Perhaps one of the more obscure emulated NICs could work in place of > ivshmem. (The IPXE oproms provide UEFI drivers for those.) > > Thanks > Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 13:30 ` Michael S. Tsirkin @ 2016-02-25 14:05 ` Laszlo Ersek 2016-02-25 14:18 ` Michael S. Tsirkin 0 siblings, 1 reply; 7+ messages in thread From: Laszlo Ersek @ 2016-02-25 14:05 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Igor Mammedov, qemu devel list, Gerd Hoffmann On 02/25/16 14:30, Michael S. Tsirkin wrote: > On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: >> On 02/25/16 13:44, Laszlo Ersek wrote: >>> Hi, >>> >>> On 02/25/16 12:57, Michael S. Tsirkin wrote: >>>> ----- Forwarded message from Igor Mammedov <imammedo@redhat.com> ----- >>>> >>>> Date: Thu, 11 Feb 2016 16:16:05 +0100 >>>> From: Igor Mammedov <imammedo@redhat.com> >>>> To: "Michael S. Tsirkin" <mst@redhat.com> >>>> To: lersek@redhat.com >>>> Subject: on pci rebalancing >>>> Message-ID: <20160211161605.0022ed38@nial.brq.redhat.com> >>>> In-Reply-To: <20160209131656-mutt-send-email-mst@redhat.com> >>>> >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver >>>>>>>> otherwise OS will ignore it when rebalancing happens and >>>>>>>> might map something else over ignored BAR. >>>>>>> >>>>>>> Does it disable the BAR then? Or just move it elsewhere? >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of >>>>>> another device with driver over it. >>>>> >>>>> Interesting. On classical PCI this is a forbidden configuration. >>>>> Maybe we do something that confuses windows? >>>>> Could you tell me how to reproduce this behaviour? >>>> #cat > t << EOF >>>> pci_update_mappings_del >>>> pci_update_mappings_add >>>> EOF >>>> >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 >>>> >>>> wait till OS boots, note BARs programmed for ivshmem >>>> in my case it was >>>> 01:01.0 0,0xfe800000+0x100 >>>> then execute script and watch pci_update_mappings* trace events >>>> >>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; >>>> >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem >>>> and then programs new BARs, where: >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 >>>> creates overlapping BAR with ivshmem >>> >>> Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) >>> >>> So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. >>> >>> The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. >>> >>> So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). >>> >>> Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. >>> >>> Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. >>> >>> (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) >>> >>> Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. >>> >>> So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. >>> >>> --*-- >>> >>> Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. >>> >>> (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. >>> >>> (2) Create a separate directory for testing. In this directory, run the following command: >>> >>> cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd >>> >>> Also create a disk image for your new guest, etc. >>> >>> (3) Use the following command line snippet to work with OVMF: >>> >>> qemu-system-x86_64 \ >>> -machine accel=kvm \ >>> -smp cpus=2 \ >>> -m 2048 \ >>> \ >>> -debugcon file:ovmf.debug.log \ >>> -global isa-debugcon.iobase=0x402 \ >>> \ >>> -device qxl-vga \ >>> \ >>> -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ >>> -drive if=pflash,format=raw,unit=1,file=myvars.fd \ >>> \ >>> [your options here] >>> >>> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. >>> >>> Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. >>> >>> Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. >>> >>> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. >> >> I should also mention that you might not be able to reproduce the same >> situation with the "ivshmem" device. Namely, if there is no UEFI driver >> for that PCI device (and OVMF certainly doesn't have one), then its MMIO >> and IO decoding bits will *never* be set. As I said, command register >> massaging is the jurisdiction of the individual UEFI driver that >> ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. >> >> Therefore you should probably try to reproduce the issue with another >> PCI device type that OVMF has a driver for, but Windows has none >> (installed at least). I'm quite hard pressed to name such a device type, >> unfortunately. :( > > virtio? ... was my first thought as well, but OVMF at the moment supports only legacy (0.9.5) virtio-pci devices (and virtio-mmio only on AARCH64) -- those don't have MMIO BARs, only IO BARs. Theoretically the Windows overlap issue should be triggerable with IO BARs just the same (resource - resource, right?), but I doubt it will be reproducible in practice. Laszlo >> Perhaps one of the more obscure emulated NICs could work in place of >> ivshmem. (The IPXE oproms provide UEFI drivers for those.) >> >> Thanks >> Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 14:05 ` Laszlo Ersek @ 2016-02-25 14:18 ` Michael S. Tsirkin 2016-02-25 17:37 ` Laszlo Ersek 0 siblings, 1 reply; 7+ messages in thread From: Michael S. Tsirkin @ 2016-02-25 14:18 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Igor Mammedov, qemu devel list, Gerd Hoffmann On Thu, Feb 25, 2016 at 03:05:08PM +0100, Laszlo Ersek wrote: > On 02/25/16 14:30, Michael S. Tsirkin wrote: > > On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: > >> On 02/25/16 13:44, Laszlo Ersek wrote: > >>> Hi, > >>> > >>> On 02/25/16 12:57, Michael S. Tsirkin wrote: > >>>> ----- Forwarded message from Igor Mammedov <imammedo@redhat.com> ----- > >>>> > >>>> Date: Thu, 11 Feb 2016 16:16:05 +0100 > >>>> From: Igor Mammedov <imammedo@redhat.com> > >>>> To: "Michael S. Tsirkin" <mst@redhat.com> > >>>> To: lersek@redhat.com > >>>> Subject: on pci rebalancing > >>>> Message-ID: <20160211161605.0022ed38@nial.brq.redhat.com> > >>>> In-Reply-To: <20160209131656-mutt-send-email-mst@redhat.com> > >>>> > >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver > >>>>>>>> otherwise OS will ignore it when rebalancing happens and > >>>>>>>> might map something else over ignored BAR. > >>>>>>> > >>>>>>> Does it disable the BAR then? Or just move it elsewhere? > >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of > >>>>>> another device with driver over it. > >>>>> > >>>>> Interesting. On classical PCI this is a forbidden configuration. > >>>>> Maybe we do something that confuses windows? > >>>>> Could you tell me how to reproduce this behaviour? > >>>> #cat > t << EOF > >>>> pci_update_mappings_del > >>>> pci_update_mappings_add > >>>> EOF > >>>> > >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > >>>> > >>>> wait till OS boots, note BARs programmed for ivshmem > >>>> in my case it was > >>>> 01:01.0 0,0xfe800000+0x100 > >>>> then execute script and watch pci_update_mappings* trace events > >>>> > >>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > >>>> > >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > >>>> and then programs new BARs, where: > >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > >>>> creates overlapping BAR with ivshmem > >>> > >>> Michael informed me of this on IRC (and forwarded this email to me). I hope to start a new thread with my response. (I also reedited the subject fully.) > >>> > >>> So, to summarize what I said on IRC first. The situation where firmware recognizes and enables a PCI device, hands control to the OS, and then the OS lacks a driver for the PCI device, is completely normal and expected. For UEFI specifically, I can name a general argument and a specific argument. > >>> > >>> The general argument is that actions that need to be taken in ExitBootServices() callbacks do not include clearing IO or MMIO decode bits in PCI device command registers. Command register manipulation happens when a PCI device driver (that conforms to the UEFI driver model) *binds* or *unbinds* a device. And unbinding a device is not possible in the ExitBootServices() callback, minimally because such callbacks are forbidden from modifying the memory map -- but unbinding would release allocated memory. > >>> > >>> So what we use such callbacks for is aborting in-flight, outstanding DMA-like transfers. Re-setting virtio devices is also an example (think outstanding receive requests for virtio-net). > >>> > >>> Now let's move on to the specific argument I mentioned above. The Graphics Output Protocol (GOP) is a UEFI abstraction that was specifically designed with the case in mind when the operating system doesn't have a display driver -- yet installed --, but the user obviously has to use the display somehow. The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a UEFI driver that drives a PCI device. In short, the driver provides the GOP on top of a PCI device. > >>> > >>> Now, the GOP is supposed to communicate the pixel format and the frame buffer base address for the currently active graphics mode to the software that consumes the GOP. This includes UEFI applications of course (think a boot loader putting up a splash screen or an anmiation), but importantly, the runtime OS is *also* supposed to inherit these characteristics from boot services time. The OS can then use simple unaccelerated MMIO writes to display things on the screen, until the users installs an accelerated driver. > >>> > >>> (Concrete example: this is why you can see *anything at all* on the screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, before installing the QXL WDDM driver in the guest.) > >>> > >>> Clearly, the frame buffer base address communicated through the GOP points into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO decoding were disabled for the PCI device that underlies the GOP, that would *completely* defeat the GOP design. The OS's attempt to poke at those MMIO addresses would be futile -- and in fact the OS has no idea what PCI device (if any) the framebuffer is supposed to be related to. This is the jurisdiction of the OS-level display driver -- if one exists and is installed. > >>> > >>> So, this is a Windows bug in my option. Just because there is no OS-level driver, a PCI device is fully expected to be decoding resources, if the firmware brought it up. > >>> > >>> --*-- > >>> > >>> Okay, so Michael asked me to try to reproduce the above with OVMF, and see what happens. Unfortunately I'm not really knowledgeable about ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. > >>> > >>> (1) Please follow the instructions on Gerd's page <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package. > >>> > >>> (2) Create a separate directory for testing. In this directory, run the following command: > >>> > >>> cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd > >>> > >>> Also create a disk image for your new guest, etc. > >>> > >>> (3) Use the following command line snippet to work with OVMF: > >>> > >>> qemu-system-x86_64 \ > >>> -machine accel=kvm \ > >>> -smp cpus=2 \ > >>> -m 2048 \ > >>> \ > >>> -debugcon file:ovmf.debug.log \ > >>> -global isa-debugcon.iobase=0x402 \ > >>> \ > >>> -device qxl-vga \ > >>> \ > >>> -drive if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \ > >>> -drive if=pflash,format=raw,unit=1,file=myvars.fd \ > >>> \ > >>> [your options here] > >>> > >>> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, network, and so on. > >>> > >>> Recommended: when you use the -device option to add the disk and the CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the "bootindex" property. OVMF will adhere to the boot order. It is recommended to set bootindex=0 for your main disk, bootindex=1 for your OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first boot (with no OS installed) OVMF will boot the installer CD-ROM. Further boots (with the same command line) will boot the installed OS. > >>> > >>> Caveat: I never used the -snapshot option with OVMF virtual machines; it might or might not work. > >>> > >>> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows running on OVMF many months ago, but I can't tell off-hand if it will work right now. > >> > >> I should also mention that you might not be able to reproduce the same > >> situation with the "ivshmem" device. Namely, if there is no UEFI driver > >> for that PCI device (and OVMF certainly doesn't have one), then its MMIO > >> and IO decoding bits will *never* be set. As I said, command register > >> massaging is the jurisdiction of the individual UEFI driver that > >> ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. > >> > >> Therefore you should probably try to reproduce the issue with another > >> PCI device type that OVMF has a driver for, but Windows has none > >> (installed at least). I'm quite hard pressed to name such a device type, > >> unfortunately. :( > > > > virtio? > > ... was my first thought as well, but OVMF at the moment supports only > legacy (0.9.5) virtio-pci devices Oh. We'll have to fix that too :( > (and virtio-mmio only on AARCH64) -- > those don't have MMIO BARs, only IO BARs. Well that's not exactly true - there is an MSI-X BAR. Maybe OVMF does not enable that, though. > Theoretically the Windows overlap issue should be triggerable with IO > BARs just the same (resource - resource, right?), but I doubt it will be > reproducible in practice. > > Laszlo > > >> Perhaps one of the more obscure emulated NICs could work in place of > >> ivshmem. (The IPXE oproms provide UEFI drivers for those.) > >> > >> Thanks > >> Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 14:18 ` Michael S. Tsirkin @ 2016-02-25 17:37 ` Laszlo Ersek 2016-02-25 19:44 ` Michael S. Tsirkin 0 siblings, 1 reply; 7+ messages in thread From: Laszlo Ersek @ 2016-02-25 17:37 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Igor Mammedov, qemu devel list, Gerd Hoffmann On 02/25/16 15:18, Michael S. Tsirkin wrote: > On Thu, Feb 25, 2016 at 03:05:08PM +0100, Laszlo Ersek wrote: >> On 02/25/16 14:30, Michael S. Tsirkin wrote: >>> virtio? >> >> ... was my first thought as well, but OVMF at the moment supports only >> legacy (0.9.5) virtio-pci devices > > Oh. We'll have to fix that too :( Yes, there's a BZ open about it. It's very big work. Due to independent reasons, I skimmed the virtio 1.0 spec the other day, specifically for seeing what it would take to port the OVMF drivers forward to virtio 1.0. It's going to be a *lot* of work. >> (and virtio-mmio only on AARCH64) -- >> those don't have MMIO BARs, only IO BARs. > > Well that's not exactly true - there is an MSI-X BAR. > Maybe OVMF does not enable that, though. Correct. The virtio stuff in OVMF adheres extremely closely to the 0.9.5 spec (and the actual QEMU code was only studied when the guest wouldn't work as described by the 0.9.5 spec -- this usually boiled down to silent framing assumptions made by QEMU, and then the guest code was accomodated), but the virtio code in OVMF is purposely absolutely minimal, feature-wise. I also looked up Gerd's virtio 1.0 patch series in the SeaBIOS git history (from summer 2015, IIRC). It was extensive. Extrapolating from that, you can imagine what it will take for OVMF. Thanks Laszlo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged 2016-02-25 17:37 ` Laszlo Ersek @ 2016-02-25 19:44 ` Michael S. Tsirkin 0 siblings, 0 replies; 7+ messages in thread From: Michael S. Tsirkin @ 2016-02-25 19:44 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Igor Mammedov, qemu devel list, Gerd Hoffmann On Thu, Feb 25, 2016 at 06:37:56PM +0100, Laszlo Ersek wrote: > On 02/25/16 15:18, Michael S. Tsirkin wrote: > > On Thu, Feb 25, 2016 at 03:05:08PM +0100, Laszlo Ersek wrote: > >> On 02/25/16 14:30, Michael S. Tsirkin wrote: > > >>> virtio? > >> > >> ... was my first thought as well, but OVMF at the moment supports only > >> legacy (0.9.5) virtio-pci devices > > > > Oh. We'll have to fix that too :( > > Yes, there's a BZ open about it. It's very big work. Due to independent > reasons, I skimmed the virtio 1.0 spec the other day, specifically for > seeing what it would take to port the OVMF drivers forward to virtio > 1.0. It's going to be a *lot* of work. A hint: review at least cs03 or latest draft csprd05. First hint on google is ancient draft csprd01. > >> (and virtio-mmio only on AARCH64) -- > >> those don't have MMIO BARs, only IO BARs. > > > > Well that's not exactly true - there is an MSI-X BAR. > > Maybe OVMF does not enable that, though. > > Correct. > > The virtio stuff in OVMF adheres extremely closely to the 0.9.5 spec > (and the actual QEMU code was only studied when the guest wouldn't work > as described by the 0.9.5 spec -- this usually boiled down to silent > framing assumptions made by QEMU, and then the guest code was > accomodated), but the virtio code in OVMF is purposely absolutely > minimal, feature-wise. > > I also looked up Gerd's virtio 1.0 patch series in the SeaBIOS git > history (from summer 2015, IIRC). It was extensive. Extrapolating from > that, you can imagine what it will take for OVMF. > > Thanks > Laszlo Basically the same amount as seabios I guess. -- MST ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-25 19:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20160225135613-mutt-send-email-mst@redhat.com> 2016-02-25 12:44 ` [Qemu-devel] how Windows treats BARs of driver-less devices when other devices are hotplugged Laszlo Ersek 2016-02-25 13:00 ` Laszlo Ersek 2016-02-25 13:30 ` Michael S. Tsirkin 2016-02-25 14:05 ` Laszlo Ersek 2016-02-25 14:18 ` Michael S. Tsirkin 2016-02-25 17:37 ` Laszlo Ersek 2016-02-25 19:44 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).