* Enable more than 255 VCPU support without irq remapping function in the guest @ 2016-04-26 16:14 Lan, Tianyu 2016-04-26 16:17 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Lan, Tianyu @ 2016-04-26 16:14 UTC (permalink / raw) To: pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, jan.kiszka, x86 Hi All: Recently I am working on extending max vcpu to more than 256 on the both KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to use X2APIC in the guest which supports 32-bit APIC id. Linux kernel requires irq remapping function during enabling X2APIC when max APIC id is more than 255(More detail please see try_to_enable_x2apic()). The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just supports 8-bit target APIC id field and only can deliver irq to cpu 0~255. So far both KVM/Xen doesn't enable irq remapping function. If enable the function, it seems a huge job which need to rework IO-APIC, local APIC, MSI parts and add virtual VTD support in the KVM. Other quick way to enable more than 256 VCPUs is to eliminate the dependency between irq remapping and X2APIC in the guest linux kernel. So far I can boot the guest after removing the dependency. The side effect I thought is that irq only can deliver to 0~255 vcpus but 256 vcpus seem enough to balance irq requests in the guest. In the most cases, there are fewer devices in the guest. I wonder whether it's feasible. There maybe some other side effects I didn't think of. Very appreciate for your comments. Tianyu Lan Best regards. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-26 16:14 Enable more than 255 VCPU support without irq remapping function in the guest Lan, Tianyu @ 2016-04-26 16:17 ` Jan Kiszka 2016-04-26 16:49 ` Radim Krčmář 2016-04-27 5:15 ` Lan Tianyu 0 siblings, 2 replies; 25+ messages in thread From: Jan Kiszka @ 2016-04-26 16:17 UTC (permalink / raw) To: Lan, Tianyu, pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, x86, Radim Krčmář, Peter Xu On 2016-04-26 18:14, Lan, Tianyu wrote: > Hi All: > > Recently I am working on extending max vcpu to more than 256 on the both > KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to > use X2APIC in the guest which supports 32-bit APIC id. Linux kernel > requires irq remapping function during enabling X2APIC when max APIC id > is more than 255(More detail please see try_to_enable_x2apic()). > > The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just > supports 8-bit target APIC id field and only can deliver irq to > cpu 0~255. > > So far both KVM/Xen doesn't enable irq remapping function. If enable the > function, it seems a huge job which need to rework IO-APIC, local APIC, > MSI parts and add virtual VTD support in the KVM. > > Other quick way to enable more than 256 VCPUs is to eliminate the > dependency between irq remapping and X2APIC in the guest linux kernel. > So far I can boot the guest after removing the dependency. > The side effect I thought is that irq only can deliver to 0~255 vcpus > but 256 vcpus seem enough to balance irq requests in the guest. In the > most cases, there are fewer devices in the guest. > > I wonder whether it's feasible. There maybe some other side effects I > didn't think of. Very appreciate for your comments. Radim is working on the KVM side already, Peter is currently driving the VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) Jan PS: Please no PV mess, at least without good reasons. -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-26 16:17 ` Jan Kiszka @ 2016-04-26 16:49 ` Radim Krčmář 2016-04-27 4:10 ` Yang Zhang 2016-04-27 5:39 ` Lan Tianyu 2016-04-27 5:15 ` Lan Tianyu 1 sibling, 2 replies; 25+ messages in thread From: Radim Krčmář @ 2016-04-26 16:49 UTC (permalink / raw) To: Jan Kiszka Cc: Lan, Tianyu, pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov 2016-04-26 18:17+0200, Jan Kiszka: > On 2016-04-26 18:14, Lan, Tianyu wrote: >> Hi All: >> >> Recently I am working on extending max vcpu to more than 256 on the both >> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >> requires irq remapping function during enabling X2APIC when max APIC id >> is more than 255(More detail please see try_to_enable_x2apic()). Our of curiosity, how many VCPUs are you aiming at? >> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just >> supports 8-bit target APIC id field and only can deliver irq to >> cpu 0~255. >> >> So far both KVM/Xen doesn't enable irq remapping function. If enable the >> function, it seems a huge job which need to rework IO-APIC, local APIC, >> MSI parts and add virtual VTD support in the KVM. >> >> Other quick way to enable more than 256 VCPUs is to eliminate the >> dependency between irq remapping and X2APIC in the guest linux kernel. >> So far I can boot the guest after removing the dependency. >> The side effect I thought is that irq only can deliver to 0~255 vcpus >> but 256 vcpus seem enough to balance irq requests in the guest. In the >> most cases, there are fewer devices in the guest. >> >> I wonder whether it's feasible. There maybe some other side effects I >> didn't think of. Very appreciate for your comments. > > Radim is working on the KVM side already, Peter is currently driving the > VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) + Igor extends QEMU to support more than 255 in internal structures and ACPI. What remains mostly untracked is Seabios/OVMF. > PS: Please no PV mess, at least without good reasons. Seconded. (If we designed all related devices as virtware, then it would not be that bad, but slightly modifying and putting hardware drivers into situations that cannot happen in hardware, not even in the spec, and then juggling the KVM side to make them work, is a road to hell.) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-26 16:49 ` Radim Krčmář @ 2016-04-27 4:10 ` Yang Zhang 2016-04-27 5:24 ` Jan Kiszka 2016-04-27 5:39 ` Lan Tianyu 1 sibling, 1 reply; 25+ messages in thread From: Yang Zhang @ 2016-04-27 4:10 UTC (permalink / raw) To: Radim Krčmář, Jan Kiszka Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/27 0:49, Radim Krčmář wrote: > 2016-04-26 18:17+0200, Jan Kiszka: >> On 2016-04-26 18:14, Lan, Tianyu wrote: >>> Hi All: >>> >>> Recently I am working on extending max vcpu to more than 256 on the both >>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>> requires irq remapping function during enabling X2APIC when max APIC id >>> is more than 255(More detail please see try_to_enable_x2apic()). > > Our of curiosity, how many VCPUs are you aiming at? > >>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just >>> supports 8-bit target APIC id field and only can deliver irq to >>> cpu 0~255. >>> >>> So far both KVM/Xen doesn't enable irq remapping function. If enable the >>> function, it seems a huge job which need to rework IO-APIC, local APIC, >>> MSI parts and add virtual VTD support in the KVM. >>> >>> Other quick way to enable more than 256 VCPUs is to eliminate the >>> dependency between irq remapping and X2APIC in the guest linux kernel. >>> So far I can boot the guest after removing the dependency. >>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>> but 256 vcpus seem enough to balance irq requests in the guest. In the >>> most cases, there are fewer devices in the guest. >>> >>> I wonder whether it's feasible. There maybe some other side effects I >>> didn't think of. Very appreciate for your comments. >> >> Radim is working on the KVM side already, Peter is currently driving the >> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) > > + Igor extends QEMU to support more than 255 in internal structures and > ACPI. What remains mostly untracked is Seabios/OVMF. If we don't want the interrupt from internal device delivers to CPU >255, do we still need the VT-d interrupt remapping emulation? I think firmware is able to send IPI to wakeup APs even without IR and OS is able to do it too. So basically, only KVM and Qemu's support is enough. > >> PS: Please no PV mess, at least without good reasons. > > Seconded. > > (If we designed all related devices as virtware, then it would not be > that bad, but slightly modifying and putting hardware drivers into > situations that cannot happen in hardware, not even in the spec, and > then juggling the KVM side to make them work, is a road to hell.) > -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 4:10 ` Yang Zhang @ 2016-04-27 5:24 ` Jan Kiszka 2016-04-27 6:24 ` Lan Tianyu 2016-04-27 9:39 ` Yang Zhang 0 siblings, 2 replies; 25+ messages in thread From: Jan Kiszka @ 2016-04-27 5:24 UTC (permalink / raw) To: Yang Zhang, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016-04-27 06:10, Yang Zhang wrote: > On 2016/4/27 0:49, Radim Krčmář wrote: >> 2016-04-26 18:17+0200, Jan Kiszka: >>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>> Hi All: >>>> >>>> Recently I am working on extending max vcpu to more than 256 on the >>>> both >>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>> requires irq remapping function during enabling X2APIC when max APIC id >>>> is more than 255(More detail please see try_to_enable_x2apic()). >> >> Our of curiosity, how many VCPUs are you aiming at? >> >>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC >>>> just >>>> supports 8-bit target APIC id field and only can deliver irq to >>>> cpu 0~255. >>>> >>>> So far both KVM/Xen doesn't enable irq remapping function. If enable >>>> the >>>> function, it seems a huge job which need to rework IO-APIC, local APIC, >>>> MSI parts and add virtual VTD support in the KVM. >>>> >>>> Other quick way to enable more than 256 VCPUs is to eliminate the >>>> dependency between irq remapping and X2APIC in the guest linux kernel. >>>> So far I can boot the guest after removing the dependency. >>>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>>> but 256 vcpus seem enough to balance irq requests in the guest. In the >>>> most cases, there are fewer devices in the guest. >>>> >>>> I wonder whether it's feasible. There maybe some other side effects I >>>> didn't think of. Very appreciate for your comments. >>> >>> Radim is working on the KVM side already, Peter is currently driving the >>> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) >> >> + Igor extends QEMU to support more than 255 in internal structures and >> ACPI. What remains mostly untracked is Seabios/OVMF. > > If we don't want the interrupt from internal device delivers to CPU >>255, do we still need the VT-d interrupt remapping emulation? I think > firmware is able to send IPI to wakeup APs even without IR and OS is > able to do it too. So basically, only KVM and Qemu's support is enough. What are "internal devices" for you? And which OS do you know that would handle such artificial setups without prio massive patching? We do need VT-d IR emulation in order to present our guest a well specified and support architecture for running > 255 CPUs. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 5:24 ` Jan Kiszka @ 2016-04-27 6:24 ` Lan Tianyu 2016-04-27 6:56 ` Jan Kiszka 2016-04-27 9:39 ` Yang Zhang 1 sibling, 1 reply; 25+ messages in thread From: Lan Tianyu @ 2016-04-27 6:24 UTC (permalink / raw) To: Jan Kiszka, Yang Zhang, Radim Krčmář Cc: pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016年04月27日 13:24, Jan Kiszka wrote: >> > If we don't want the interrupt from internal device delivers to CPU >>> >>255, do we still need the VT-d interrupt remapping emulation? I think >> > firmware is able to send IPI to wakeup APs even without IR and OS is >> > able to do it too. So basically, only KVM and Qemu's support is enough. Yes, just starting more than 255 APs doesn't need IR. > What are "internal devices" for you? And which OS do you know that would > handle such artificial setups without prio massive patching? > > We do need VT-d IR emulation in order to present our guest a well > specified and support architecture for running > 255 CPUs. Changing guest kernel will be big concern. I found commit ce69a784 did optimization to use X2APIC without IR in the guest when APIC id is less than 256 and so I proposed my idea to see everyone's feedback. Whether it's possible to relax the IR requirement when APIC id > 255 in the guest. commit ce69a784504222c3ab6f1b3c357d09ec5772127a Author: Gleb Natapov <gleb@redhat.com> Date: Mon Jul 20 15:24:17 2009 +0300 x86/apic: Enable x2APIC without interrupt remapping under KVM KVM would like to provide x2APIC interface to a guest without emulating interrupt remapping device. The reason KVM prefers guest to use x2APIC is that x2APIC interface is better virtualizable and provides better performance than mmio xAPIC interface: - msr exits are faster than mmio (no page table walk, emulation) - no need to read back ICR to look at the busy bit - one 64 bit ICR write instead of two 32 bit writes - shared code with the Hyper-V paravirt interface Included patch changes x2APIC enabling logic to enable it even if IR initialization failed, but kernel runs under KVM and no apic id is greater than 255 (if there is one spec requires BIOS to move to x2apic mode before starting an OS). It's great to know Peter already worked on the IR. -- Best regards Tianyu Lan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 6:24 ` Lan Tianyu @ 2016-04-27 6:56 ` Jan Kiszka 0 siblings, 0 replies; 25+ messages in thread From: Jan Kiszka @ 2016-04-27 6:56 UTC (permalink / raw) To: Lan Tianyu, Yang Zhang, Radim Krčmář Cc: pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016-04-27 08:24, Lan Tianyu wrote: > On 2016年04月27日 13:24, Jan Kiszka wrote: >>>> If we don't want the interrupt from internal device delivers to CPU >>>>>> 255, do we still need the VT-d interrupt remapping emulation? I think >>>> firmware is able to send IPI to wakeup APs even without IR and OS is >>>> able to do it too. So basically, only KVM and Qemu's support is enough. > > Yes, just starting more than 255 APs doesn't need IR. > >> What are "internal devices" for you? And which OS do you know that would >> handle such artificial setups without prio massive patching? >> >> We do need VT-d IR emulation in order to present our guest a well >> specified and support architecture for running > 255 CPUs. > > Changing guest kernel will be big concern. I found commit ce69a784 did > optimization to use X2APIC without IR in the guest when APIC id is less > than 256 and so I proposed my idea to see everyone's feedback. Whether > it's possible to relax the IR requirement when APIC id > 255 in the guest. You can't do that easily because you can't address those additional CPUs from *any* device then, only via IPIs. That means, Linux would have to be changed to only set up IRQ affinity masks in the 0-254 range. I suppose you would even have to patch tools like irqbalanced to not issue mask changes via /proc that include larger CPU IDs. Practically not feasible, already on Linux. Not to speak of other guest OSes. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 5:24 ` Jan Kiszka 2016-04-27 6:24 ` Lan Tianyu @ 2016-04-27 9:39 ` Yang Zhang 2016-04-27 9:45 ` Jan Kiszka 1 sibling, 1 reply; 25+ messages in thread From: Yang Zhang @ 2016-04-27 9:39 UTC (permalink / raw) To: Jan Kiszka, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/27 13:24, Jan Kiszka wrote: > On 2016-04-27 06:10, Yang Zhang wrote: >> On 2016/4/27 0:49, Radim Krčmář wrote: >>> 2016-04-26 18:17+0200, Jan Kiszka: >>>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>>> Hi All: >>>>> >>>>> Recently I am working on extending max vcpu to more than 256 on the >>>>> both >>>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>>> requires irq remapping function during enabling X2APIC when max APIC id >>>>> is more than 255(More detail please see try_to_enable_x2apic()). >>> >>> Our of curiosity, how many VCPUs are you aiming at? >>> >>>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC >>>>> just >>>>> supports 8-bit target APIC id field and only can deliver irq to >>>>> cpu 0~255. >>>>> >>>>> So far both KVM/Xen doesn't enable irq remapping function. If enable >>>>> the >>>>> function, it seems a huge job which need to rework IO-APIC, local APIC, >>>>> MSI parts and add virtual VTD support in the KVM. >>>>> >>>>> Other quick way to enable more than 256 VCPUs is to eliminate the >>>>> dependency between irq remapping and X2APIC in the guest linux kernel. >>>>> So far I can boot the guest after removing the dependency. >>>>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>>>> but 256 vcpus seem enough to balance irq requests in the guest. In the >>>>> most cases, there are fewer devices in the guest. >>>>> >>>>> I wonder whether it's feasible. There maybe some other side effects I >>>>> didn't think of. Very appreciate for your comments. >>>> >>>> Radim is working on the KVM side already, Peter is currently driving the >>>> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) >>> >>> + Igor extends QEMU to support more than 255 in internal structures and >>> ACPI. What remains mostly untracked is Seabios/OVMF. >> >> If we don't want the interrupt from internal device delivers to CPU >>> 255, do we still need the VT-d interrupt remapping emulation? I think >> firmware is able to send IPI to wakeup APs even without IR and OS is >> able to do it too. So basically, only KVM and Qemu's support is enough. > > What are "internal devices" for you? And which OS do you know that would > handle such artificial setups without prio massive patching? Sorry, a typo. I mean the external devices of IOAPIC/MSI/MSIX. Doesn't current Linux use x2apic without IR in VM? > > We do need VT-d IR emulation in order to present our guest a well > specified and support architecture for running > 255 CPUs. I mean in Tianyu's case, if he doesn't care about to deliver external interrupt to CPU >255, IR is not required. -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 9:39 ` Yang Zhang @ 2016-04-27 9:45 ` Jan Kiszka 2016-04-28 1:11 ` Yang Zhang 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2016-04-27 9:45 UTC (permalink / raw) To: Yang Zhang, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016-04-27 11:39, Yang Zhang wrote: > On 2016/4/27 13:24, Jan Kiszka wrote: >> On 2016-04-27 06:10, Yang Zhang wrote: >>> On 2016/4/27 0:49, Radim Krčmář wrote: >>>> 2016-04-26 18:17+0200, Jan Kiszka: >>>>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>>>> Hi All: >>>>>> >>>>>> Recently I am working on extending max vcpu to more than 256 on the >>>>>> both >>>>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>>>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>>>> requires irq remapping function during enabling X2APIC when max >>>>>> APIC id >>>>>> is more than 255(More detail please see try_to_enable_x2apic()). >>>> >>>> Our of curiosity, how many VCPUs are you aiming at? >>>> >>>>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC >>>>>> just >>>>>> supports 8-bit target APIC id field and only can deliver irq to >>>>>> cpu 0~255. >>>>>> >>>>>> So far both KVM/Xen doesn't enable irq remapping function. If enable >>>>>> the >>>>>> function, it seems a huge job which need to rework IO-APIC, local >>>>>> APIC, >>>>>> MSI parts and add virtual VTD support in the KVM. >>>>>> >>>>>> Other quick way to enable more than 256 VCPUs is to eliminate the >>>>>> dependency between irq remapping and X2APIC in the guest linux >>>>>> kernel. >>>>>> So far I can boot the guest after removing the dependency. >>>>>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>>>>> but 256 vcpus seem enough to balance irq requests in the guest. In >>>>>> the >>>>>> most cases, there are fewer devices in the guest. >>>>>> >>>>>> I wonder whether it's feasible. There maybe some other side effects I >>>>>> didn't think of. Very appreciate for your comments. >>>>> >>>>> Radim is working on the KVM side already, Peter is currently >>>>> driving the >>>>> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) >>>> >>>> + Igor extends QEMU to support more than 255 in internal structures and >>>> ACPI. What remains mostly untracked is Seabios/OVMF. >>> >>> If we don't want the interrupt from internal device delivers to CPU >>>> 255, do we still need the VT-d interrupt remapping emulation? I think >>> firmware is able to send IPI to wakeup APs even without IR and OS is >>> able to do it too. So basically, only KVM and Qemu's support is enough. >> >> What are "internal devices" for you? And which OS do you know that would >> handle such artificial setups without prio massive patching? > > Sorry, a typo. I mean the external devices of IOAPIC/MSI/MSIX. Doesn't > current Linux use x2apic without IR in VM? If and only if there only need to be 254 CPUs to be addressed. > >> >> We do need VT-d IR emulation in order to present our guest a well >> specified and support architecture for running > 255 CPUs. > > I mean in Tianyu's case, if he doesn't care about to deliver external > interrupt to CPU >255, IR is not required. What matters is the guest OS. See my other reply on this why this doesn't work, even for Linux. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 9:45 ` Jan Kiszka @ 2016-04-28 1:11 ` Yang Zhang 2016-04-28 6:54 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Yang Zhang @ 2016-04-28 1:11 UTC (permalink / raw) To: Jan Kiszka, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/27 17:45, Jan Kiszka wrote: > On 2016-04-27 11:39, Yang Zhang wrote: >> On 2016/4/27 13:24, Jan Kiszka wrote: >>> On 2016-04-27 06:10, Yang Zhang wrote: >>>> On 2016/4/27 0:49, Radim Krčmář wrote: >>>>> 2016-04-26 18:17+0200, Jan Kiszka: >>>>>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>>>>> Hi All: >>>>>>> >>>>>>> Recently I am working on extending max vcpu to more than 256 on the >>>>>>> both >>>>>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>>>>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>>>>> requires irq remapping function during enabling X2APIC when max >>>>>>> APIC id >>>>>>> is more than 255(More detail please see try_to_enable_x2apic()). >>>>> >>>>> Our of curiosity, how many VCPUs are you aiming at? >>>>> >>>>>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC >>>>>>> just >>>>>>> supports 8-bit target APIC id field and only can deliver irq to >>>>>>> cpu 0~255. >>>>>>> >>>>>>> So far both KVM/Xen doesn't enable irq remapping function. If enable >>>>>>> the >>>>>>> function, it seems a huge job which need to rework IO-APIC, local >>>>>>> APIC, >>>>>>> MSI parts and add virtual VTD support in the KVM. >>>>>>> >>>>>>> Other quick way to enable more than 256 VCPUs is to eliminate the >>>>>>> dependency between irq remapping and X2APIC in the guest linux >>>>>>> kernel. >>>>>>> So far I can boot the guest after removing the dependency. >>>>>>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>>>>>> but 256 vcpus seem enough to balance irq requests in the guest. In >>>>>>> the >>>>>>> most cases, there are fewer devices in the guest. >>>>>>> >>>>>>> I wonder whether it's feasible. There maybe some other side effects I >>>>>>> didn't think of. Very appreciate for your comments. >>>>>> >>>>>> Radim is working on the KVM side already, Peter is currently >>>>>> driving the >>>>>> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) >>>>> >>>>> + Igor extends QEMU to support more than 255 in internal structures and >>>>> ACPI. What remains mostly untracked is Seabios/OVMF. >>>> >>>> If we don't want the interrupt from internal device delivers to CPU >>>>> 255, do we still need the VT-d interrupt remapping emulation? I think >>>> firmware is able to send IPI to wakeup APs even without IR and OS is >>>> able to do it too. So basically, only KVM and Qemu's support is enough. >>> >>> What are "internal devices" for you? And which OS do you know that would >>> handle such artificial setups without prio massive patching? >> >> Sorry, a typo. I mean the external devices of IOAPIC/MSI/MSIX. Doesn't >> current Linux use x2apic without IR in VM? > > If and only if there only need to be 254 CPUs to be addressed. > >> >>> >>> We do need VT-d IR emulation in order to present our guest a well >>> specified and support architecture for running > 255 CPUs. >> >> I mean in Tianyu's case, if he doesn't care about to deliver external >> interrupt to CPU >255, IR is not required. > > What matters is the guest OS. See my other reply on this why this > doesn't work, even for Linux. Since there only few devices in his case, set the irq affinity manually is enough. -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-28 1:11 ` Yang Zhang @ 2016-04-28 6:54 ` Jan Kiszka 2016-04-28 15:32 ` Radim Krčmář 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2016-04-28 6:54 UTC (permalink / raw) To: Yang Zhang, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016-04-28 03:11, Yang Zhang wrote: > On 2016/4/27 17:45, Jan Kiszka wrote: >> On 2016-04-27 11:39, Yang Zhang wrote: >>> On 2016/4/27 13:24, Jan Kiszka wrote: >>>> On 2016-04-27 06:10, Yang Zhang wrote: >>>>> On 2016/4/27 0:49, Radim Krčmář wrote: >>>>>> 2016-04-26 18:17+0200, Jan Kiszka: >>>>>>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>>>>>> Hi All: >>>>>>>> >>>>>>>> Recently I am working on extending max vcpu to more than 256 on the >>>>>>>> both >>>>>>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job >>>>>>>> requires to >>>>>>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>>>>>> requires irq remapping function during enabling X2APIC when max >>>>>>>> APIC id >>>>>>>> is more than 255(More detail please see try_to_enable_x2apic()). >>>>>> >>>>>> Our of curiosity, how many VCPUs are you aiming at? >>>>>> >>>>>>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC >>>>>>>> just >>>>>>>> supports 8-bit target APIC id field and only can deliver irq to >>>>>>>> cpu 0~255. >>>>>>>> >>>>>>>> So far both KVM/Xen doesn't enable irq remapping function. If >>>>>>>> enable >>>>>>>> the >>>>>>>> function, it seems a huge job which need to rework IO-APIC, local >>>>>>>> APIC, >>>>>>>> MSI parts and add virtual VTD support in the KVM. >>>>>>>> >>>>>>>> Other quick way to enable more than 256 VCPUs is to eliminate the >>>>>>>> dependency between irq remapping and X2APIC in the guest linux >>>>>>>> kernel. >>>>>>>> So far I can boot the guest after removing the dependency. >>>>>>>> The side effect I thought is that irq only can deliver to 0~255 >>>>>>>> vcpus >>>>>>>> but 256 vcpus seem enough to balance irq requests in the guest. In >>>>>>>> the >>>>>>>> most cases, there are fewer devices in the guest. >>>>>>>> >>>>>>>> I wonder whether it's feasible. There maybe some other side >>>>>>>> effects I >>>>>>>> didn't think of. Very appreciate for your comments. >>>>>>> >>>>>>> Radim is working on the KVM side already, Peter is currently >>>>>>> driving the >>>>>>> VT-d interrupt emulation topic in QEMU. It's in reach, I would >>>>>>> say. :) >>>>>> >>>>>> + Igor extends QEMU to support more than 255 in internal >>>>>> structures and >>>>>> ACPI. What remains mostly untracked is Seabios/OVMF. >>>>> >>>>> If we don't want the interrupt from internal device delivers to CPU >>>>>> 255, do we still need the VT-d interrupt remapping emulation? I think >>>>> firmware is able to send IPI to wakeup APs even without IR and OS is >>>>> able to do it too. So basically, only KVM and Qemu's support is >>>>> enough. >>>> >>>> What are "internal devices" for you? And which OS do you know that >>>> would >>>> handle such artificial setups without prio massive patching? >>> >>> Sorry, a typo. I mean the external devices of IOAPIC/MSI/MSIX. Doesn't >>> current Linux use x2apic without IR in VM? >> >> If and only if there only need to be 254 CPUs to be addressed. >> >>> >>>> >>>> We do need VT-d IR emulation in order to present our guest a well >>>> specified and support architecture for running > 255 CPUs. >>> >>> I mean in Tianyu's case, if he doesn't care about to deliver external >>> interrupt to CPU >255, IR is not required. >> >> What matters is the guest OS. See my other reply on this why this >> doesn't work, even for Linux. > > Since there only few devices in his case, set the irq affinity manually > is enough. Ah, wait - are we talking about emulating the Xeon Phi architecture in QEMU, accelerated by KVM? Then maybe you can point to a more detailed description of its interrupt architecture than that rather vague "Xeon Phi Coprocessor System Software Developers Guide" I was just looking at provides. While the Phi may not have VT-d internally, it still has a need to translate incoming MSI/MSI-X messages (via that PEG port?) to something that can address more than 255 APIC IDs, no? Possibly, you only need an extended KVM kernel interface for the Phi that allows injecting APIC interrupts to more than 255 CPUs. That interface has to be designed anyway, for normal x86 systems, and is what Radim was talking about. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-28 6:54 ` Jan Kiszka @ 2016-04-28 15:32 ` Radim Krčmář 2016-04-29 2:09 ` Yang Zhang 0 siblings, 1 reply; 25+ messages in thread From: Radim Krčmář @ 2016-04-28 15:32 UTC (permalink / raw) To: Jan Kiszka Cc: Yang Zhang, Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov 2016-04-28 08:54+0200, Jan Kiszka: > On 2016-04-28 03:11, Yang Zhang wrote: >> On 2016/4/27 17:45, Jan Kiszka wrote: >>> On 2016-04-27 11:39, Yang Zhang wrote: >>>> I mean in Tianyu's case, if he doesn't care about to deliver external >>>> interrupt to CPU >255, IR is not required. >>> >>> What matters is the guest OS. See my other reply on this why this >>> doesn't work, even for Linux. >> >> Since there only few devices in his case, set the irq affinity manually >> is enough. You could configure non-IPIs to work, but we want to create options that are hard to break. > Ah, wait - are we talking about emulating the Xeon Phi architecture in > QEMU, accelerated by KVM? Knights Landing will also be manufactured as a CPU, hopefully without many peculiarities. I think we are talking about extending KVM's IR-less x2APIC, when standard x2APIC is the future. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-28 15:32 ` Radim Krčmář @ 2016-04-29 2:09 ` Yang Zhang 2016-04-29 3:01 ` Nadav Amit 2016-04-29 4:59 ` Jan Kiszka 0 siblings, 2 replies; 25+ messages in thread From: Yang Zhang @ 2016-04-29 2:09 UTC (permalink / raw) To: Radim Krčmář, Jan Kiszka Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/28 23:32, Radim Krčmář wrote: > 2016-04-28 08:54+0200, Jan Kiszka: >> On 2016-04-28 03:11, Yang Zhang wrote: >>> On 2016/4/27 17:45, Jan Kiszka wrote: >>>> On 2016-04-27 11:39, Yang Zhang wrote: >>>>> I mean in Tianyu's case, if he doesn't care about to deliver external >>>>> interrupt to CPU >255, IR is not required. >>>> >>>> What matters is the guest OS. See my other reply on this why this >>>> doesn't work, even for Linux. >>> >>> Since there only few devices in his case, set the irq affinity manually >>> is enough. > > You could configure non-IPIs to work, but we want to create options that > are hard to break. > >> Ah, wait - are we talking about emulating the Xeon Phi architecture in >> QEMU, accelerated by KVM? > > Knights Landing will also be manufactured as a CPU, hopefully without > many peculiarities. > > I think we are talking about extending KVM's IR-less x2APIC, when > standard x2APIC is the future. Yes, Since IR is only useful for the external device, and 255 CPUs is enough to handle the interrupts from external devices. Besides, i think virtual VT-d will bring extra performance impaction for devices, so if IR-less X2APIC also works well with more than 255 VCPUs, maybe extending KVM with IR-less x2apic is not a bad idea. -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-29 2:09 ` Yang Zhang @ 2016-04-29 3:01 ` Nadav Amit 2016-05-03 1:34 ` Yang Zhang 2016-04-29 4:59 ` Jan Kiszka 1 sibling, 1 reply; 25+ messages in thread From: Nadav Amit @ 2016-04-29 3:01 UTC (permalink / raw) To: Yang Zhang Cc: Radim Krčmář, Jan Kiszka, Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov Yang Zhang <yang.zhang.wz@gmail.com> wrote: > On 2016/4/28 23:32, Radim Krčmář wrote: >> I think we are talking about extending KVM's IR-less x2APIC, when >> standard x2APIC is the future. > > Yes, Since IR is only useful for the external device, and 255 CPUs is enough to handle the interrupts from external devices. Besides, i think virtual VT-d will bring extra performance impaction for devices, so if IR-less X2APIC also works well with more than 255 VCPUs, maybe extending KVM with IR-less x2apic is not a bad idea. So will you use x2APIC physical mode in this system? Try not to send a multicast IPI to 400 cores in the VM... Nadav ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-29 3:01 ` Nadav Amit @ 2016-05-03 1:34 ` Yang Zhang 0 siblings, 0 replies; 25+ messages in thread From: Yang Zhang @ 2016-05-03 1:34 UTC (permalink / raw) To: Nadav Amit Cc: Radim Krčmář, Jan Kiszka, Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/29 11:01, Nadav Amit wrote: > Yang Zhang <yang.zhang.wz@gmail.com> wrote: > >> On 2016/4/28 23:32, Radim Krčmář wrote: >>> I think we are talking about extending KVM's IR-less x2APIC, when >>> standard x2APIC is the future. >> >> Yes, Since IR is only useful for the external device, and 255 CPUs is enough to handle the interrupts from external devices. Besides, i think virtual VT-d will bring extra performance impaction for devices, so if IR-less X2APIC also works well with more than 255 VCPUs, maybe extending KVM with IR-less x2apic is not a bad idea. > > So will you use x2APIC physical mode in this system? Probably, cluster mode is the better choice. > Try not to send a multicast IPI to 400 cores in the VM... Yes, a multicast IPI to so many cores is a disaster in VM, like flush_tlb_others(). -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-29 2:09 ` Yang Zhang 2016-04-29 3:01 ` Nadav Amit @ 2016-04-29 4:59 ` Jan Kiszka 2016-05-03 1:52 ` Yang Zhang 1 sibling, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2016-04-29 4:59 UTC (permalink / raw) To: Yang Zhang, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016-04-29 04:09, Yang Zhang wrote: > On 2016/4/28 23:32, Radim Krčmář wrote: >> 2016-04-28 08:54+0200, Jan Kiszka: >>> On 2016-04-28 03:11, Yang Zhang wrote: >>>> On 2016/4/27 17:45, Jan Kiszka wrote: >>>>> On 2016-04-27 11:39, Yang Zhang wrote: >>>>>> I mean in Tianyu's case, if he doesn't care about to deliver external >>>>>> interrupt to CPU >255, IR is not required. >>>>> >>>>> What matters is the guest OS. See my other reply on this why this >>>>> doesn't work, even for Linux. >>>> >>>> Since there only few devices in his case, set the irq affinity manually >>>> is enough. >> >> You could configure non-IPIs to work, but we want to create options that >> are hard to break. >> >>> Ah, wait - are we talking about emulating the Xeon Phi architecture in >>> QEMU, accelerated by KVM? >> >> Knights Landing will also be manufactured as a CPU, hopefully without >> many peculiarities. >> >> I think we are talking about extending KVM's IR-less x2APIC, when >> standard x2APIC is the future. > > Yes, Since IR is only useful for the external device, and 255 CPUs is > enough to handle the interrupts from external devices. Besides, i think > virtual VT-d will bring extra performance impaction for devices, so if > IR-less X2APIC also works well with more than 255 VCPUs, maybe extending > KVM with IR-less x2apic is not a bad idea. IR-less x2APIC for guest architectures that are expected to provide IR remains a bad idea, at least until we have hard numbers what this suspected performance impact actually is. Unless you update IRQ affinities an insane rates, the impact should not be relevant because remapping results are cached (for the irqfd hot-path) or you are already taking the long way (userspace device emulation). Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-29 4:59 ` Jan Kiszka @ 2016-05-03 1:52 ` Yang Zhang 2016-05-03 2:03 ` Nadav Amit 0 siblings, 1 reply; 25+ messages in thread From: Yang Zhang @ 2016-05-03 1:52 UTC (permalink / raw) To: Jan Kiszka, Radim Krčmář Cc: Lan, Tianyu, pbonzini, kvm, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov On 2016/4/29 12:59, Jan Kiszka wrote: > On 2016-04-29 04:09, Yang Zhang wrote: >> On 2016/4/28 23:32, Radim Krčmář wrote: >>> 2016-04-28 08:54+0200, Jan Kiszka: >>>> On 2016-04-28 03:11, Yang Zhang wrote: >>>>> On 2016/4/27 17:45, Jan Kiszka wrote: >>>>>> On 2016-04-27 11:39, Yang Zhang wrote: >>>>>>> I mean in Tianyu's case, if he doesn't care about to deliver external >>>>>>> interrupt to CPU >255, IR is not required. >>>>>> >>>>>> What matters is the guest OS. See my other reply on this why this >>>>>> doesn't work, even for Linux. >>>>> >>>>> Since there only few devices in his case, set the irq affinity manually >>>>> is enough. >>> >>> You could configure non-IPIs to work, but we want to create options that >>> are hard to break. >>> >>>> Ah, wait - are we talking about emulating the Xeon Phi architecture in >>>> QEMU, accelerated by KVM? >>> >>> Knights Landing will also be manufactured as a CPU, hopefully without >>> many peculiarities. >>> >>> I think we are talking about extending KVM's IR-less x2APIC, when >>> standard x2APIC is the future. >> >> Yes, Since IR is only useful for the external device, and 255 CPUs is >> enough to handle the interrupts from external devices. Besides, i think >> virtual VT-d will bring extra performance impaction for devices, so if >> IR-less X2APIC also works well with more than 255 VCPUs, maybe extending >> KVM with IR-less x2apic is not a bad idea. > > IR-less x2APIC for guest architectures that are expected to provide IR > remains a bad idea, at least until we have hard numbers what this > suspected performance impact actually is. Unless you update IRQ > affinities an insane rates, the impact should not be relevant because > remapping results are cached (for the irqfd hot-path) or you are already > taking the long way (userspace device emulation). I think it is not only interrupt. There must have the DMAR emulation and the cost for DMA is heavy in VM(DMA operations are very frequently). I cannot remember whether there are strong dependency in hardware between DMAR and IR(I know IR is relying on QI). Even hardware dependency is ok, is it ok for OS running in hardware with IR but without DMAR? -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-05-03 1:52 ` Yang Zhang @ 2016-05-03 2:03 ` Nadav Amit 2016-05-03 4:55 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Nadav Amit @ 2016-05-03 2:03 UTC (permalink / raw) To: Yang Zhang Cc: Jan Kiszka, Radim Krčmář, Lan, Tianyu, Paolo Bonzini, kvm, Thomas Gleixner, mst, x86, Peter Xu, Igor Mammedov Yang Zhang <yang.zhang.wz@gmail.com> wrote: > I think it is not only interrupt. There must have the DMAR emulation and > the cost for DMA is heavy in VM(DMA operations are very frequently). I > cannot remember whether there are strong dependency in hardware between > DMAR and IR(I know IR is relying on QI). Even hardware dependency is ok, > is it ok for OS running in hardware with IR but without DMAR? Do you know a way for the IOMMU to report that DMAR is disabled, while IR is enabled? Anyhow, the VM can use IOMMU passthrough mode to avoid most IOMMU overhead. Regardless, a recent patch-set should improve DMAR performance considerably [1]. Regards, Nadav [1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg12386.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-05-03 2:03 ` Nadav Amit @ 2016-05-03 4:55 ` Jan Kiszka 2016-05-04 1:46 ` Yang Zhang 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2016-05-03 4:55 UTC (permalink / raw) To: Nadav Amit, Yang Zhang Cc: Radim Krčmář, Lan, Tianyu, Paolo Bonzini, kvm, Thomas Gleixner, mst, x86, Peter Xu, Igor Mammedov On 2016-05-03 04:03, Nadav Amit wrote: > Yang Zhang <yang.zhang.wz@gmail.com> wrote: > >> I think it is not only interrupt. There must have the DMAR emulation and >> the cost for DMA is heavy in VM(DMA operations are very frequently). I >> cannot remember whether there are strong dependency in hardware between >> DMAR and IR(I know IR is relying on QI). Even hardware dependency is ok, >> is it ok for OS running in hardware with IR but without DMAR? > > Do you know a way for the IOMMU to report that DMAR is disabled, while IR > is enabled? The hardware cannot decide about disabling this, but the guest can, of course. In fact, you can even configure Linux to have DMAR off by default until you pass "intel_iommu=on" on the command line (I think distros still do this - at least they used to). No idea about other OSes, though. > > Anyhow, the VM can use IOMMU passthrough mode to avoid most IOMMU overhead. > Regardless, a recent patch-set should improve DMAR performance > considerably [1]. The bottleneck with emulated DMAR is rather in QEMU. But DMAR can be almost as cheap as IR once we get it running for VFIO and vhost: both need proper caching because they do not work with QEMU in the loop for each and every DMA transfer. Still no need to deviate from physical hardware. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-05-03 4:55 ` Jan Kiszka @ 2016-05-04 1:46 ` Yang Zhang 2016-05-04 1:56 ` Nadav Amit 2016-05-04 5:38 ` Jan Kiszka 0 siblings, 2 replies; 25+ messages in thread From: Yang Zhang @ 2016-05-04 1:46 UTC (permalink / raw) To: Jan Kiszka, Nadav Amit Cc: Radim Krčmář, Lan, Tianyu, Paolo Bonzini, kvm, Thomas Gleixner, mst, x86, Peter Xu, Igor Mammedov On 2016/5/3 12:55, Jan Kiszka wrote: > On 2016-05-03 04:03, Nadav Amit wrote: >> Yang Zhang <yang.zhang.wz@gmail.com> wrote: >> >>> I think it is not only interrupt. There must have the DMAR emulation and >>> the cost for DMA is heavy in VM(DMA operations are very frequently). I >>> cannot remember whether there are strong dependency in hardware between >>> DMAR and IR(I know IR is relying on QI). Even hardware dependency is ok, >>> is it ok for OS running in hardware with IR but without DMAR? >> >> Do you know a way for the IOMMU to report that DMAR is disabled, while IR >> is enabled? > > The hardware cannot decide about disabling this, but the guest can, of > course. In fact, you can even configure Linux to have DMAR off by > default until you pass "intel_iommu=on" on the command line (I think > distros still do this - at least they used to). No idea about other > OSes, though. If we can disable DMAR in guest, it should be enough. > >> >> Anyhow, the VM can use IOMMU passthrough mode to avoid most IOMMU overhead. >> Regardless, a recent patch-set should improve DMAR performance >> considerably [1]. > > The bottleneck with emulated DMAR is rather in QEMU. But DMAR can be > almost as cheap as IR once we get it running for VFIO and vhost: both > need proper caching because they do not work with QEMU in the loop for > each and every DMA transfer. Still no need to deviate from physical > hardware. Sorry, i don't know detail about how VFIO and vhost work with IR. But it seems hard to do proper caching since DMA allocations are very frequently in Linux unless we move the whole iommu emulation to kernel. Another idea is using two iommus: one for Qemu and one for device in kernel like vfio and vhost. I did the similar thing in Xen in several years ago which uses two iommus solution and it works well in my experiment environment. Besides, this solution is easy for nested device pass-through. The Page 32 of [1] has more detail. [1] http://docplayer.net/10559370-Nested-virtualization.html -- best regards yang ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-05-04 1:46 ` Yang Zhang @ 2016-05-04 1:56 ` Nadav Amit 2016-05-04 5:38 ` Jan Kiszka 1 sibling, 0 replies; 25+ messages in thread From: Nadav Amit @ 2016-05-04 1:56 UTC (permalink / raw) To: Yang Zhang Cc: Jan Kiszka, Radim Krčmář, Lan, Tianyu, Paolo Bonzini, kvm, Thomas Gleixner, mst, x86, Peter Xu, Igor Mammedov Yang Zhang <yang.zhang.wz@gmail.com> wrote: > On 2016/5/3 12:55, Jan Kiszka wrote: >> On 2016-05-03 04:03, Nadav Amit wrote: >> >> The bottleneck with emulated DMAR is rather in QEMU. But DMAR can be >> almost as cheap as IR once we get it running for VFIO and vhost: both >> need proper caching because they do not work with QEMU in the loop for >> each and every DMA transfer. Still no need to deviate from physical >> hardware. > > Sorry, i don't know detail about how VFIO and vhost work with IR. But it seems hard to do proper caching since DMA allocations are very frequently in Linux unless we move the whole iommu emulation to kernel. Another idea is using two iommus: one for Qemu and one for device in kernel like vfio and vhost. I did the similar thing in Xen in several years ago which uses two iommus solution and it works well in my experiment environment. Besides, this solution is easy for nested device pass-through. The Page 32 of [1] has more detail. > > [1] http://docplayer.net/10559370-Nested-virtualization.html I did a similar work as well several years ago [2], and achieved similar results. The problem with these results is that you don’t show the CPU utilization. Sure, for 1GBE netperf it might be fine, but I am not sure it would be useful for more demanding tasks. Isn’t it possible to use the PASID as a sort of virtual function? Regards, Nadav [2] https://www.usenix.org/legacy/event/atc11/tech/final_files/Amit.pdf ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-05-04 1:46 ` Yang Zhang 2016-05-04 1:56 ` Nadav Amit @ 2016-05-04 5:38 ` Jan Kiszka 1 sibling, 0 replies; 25+ messages in thread From: Jan Kiszka @ 2016-05-04 5:38 UTC (permalink / raw) To: Yang Zhang, Nadav Amit Cc: Radim Krčmář, Lan, Tianyu, Paolo Bonzini, kvm, Thomas Gleixner, mst, x86, Peter Xu, Igor Mammedov On 2016-05-04 03:46, Yang Zhang wrote: > On 2016/5/3 12:55, Jan Kiszka wrote: >> On 2016-05-03 04:03, Nadav Amit wrote: >>> >>> Anyhow, the VM can use IOMMU passthrough mode to avoid most IOMMU >>> overhead. >>> Regardless, a recent patch-set should improve DMAR performance >>> considerably [1]. >> >> The bottleneck with emulated DMAR is rather in QEMU. But DMAR can be >> almost as cheap as IR once we get it running for VFIO and vhost: both >> need proper caching because they do not work with QEMU in the loop for >> each and every DMA transfer. Still no need to deviate from physical >> hardware. > > Sorry, i don't know detail about how VFIO and vhost work with IR. But it > seems hard to do proper caching since DMA allocations are very > frequently in Linux unless we move the whole iommu emulation to kernel. There is technically no reason for Linux to reprogram the DMAR units unless it changes partitioning (or really wants to enforce strict DMA containment for each device). You can surely tune this to no updates at all for the guest Linux under normal operations. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-26 16:49 ` Radim Krčmář 2016-04-27 4:10 ` Yang Zhang @ 2016-04-27 5:39 ` Lan Tianyu 2016-04-27 14:38 ` Radim Krčmář 1 sibling, 1 reply; 25+ messages in thread From: Lan Tianyu @ 2016-04-27 5:39 UTC (permalink / raw) To: Radim Krčmář, Jan Kiszka Cc: pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov Hi Radim: On 2016年04月27日 00:49, Radim Krčmář wrote: > 2016-04-26 18:17+0200, Jan Kiszka: >> On 2016-04-26 18:14, Lan, Tianyu wrote: >>> Hi All: >>> >>> Recently I am working on extending max vcpu to more than 256 on the both >>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>> requires irq remapping function during enabling X2APIC when max APIC id >>> is more than 255(More detail please see try_to_enable_x2apic()). > > Our of curiosity, how many VCPUs are you aiming at? I think it's 1024. In the short term, we hope hypervisor at least supports 288 vcpus because Xeon phi chip already supports 288 logical cpus. As hardware development, there will be more logical cpus and we hope one guest can totally uses all cpu resources on the chip to meet HPC requirement. > >>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just >>> supports 8-bit target APIC id field and only can deliver irq to >>> cpu 0~255. >>> >>> So far both KVM/Xen doesn't enable irq remapping function. If enable the >>> function, it seems a huge job which need to rework IO-APIC, local APIC, >>> MSI parts and add virtual VTD support in the KVM. >>> >>> Other quick way to enable more than 256 VCPUs is to eliminate the >>> dependency between irq remapping and X2APIC in the guest linux kernel. >>> So far I can boot the guest after removing the dependency. >>> The side effect I thought is that irq only can deliver to 0~255 vcpus >>> but 256 vcpus seem enough to balance irq requests in the guest. In the >>> most cases, there are fewer devices in the guest. >>> >>> I wonder whether it's feasible. There maybe some other side effects I >>> didn't think of. Very appreciate for your comments. >> >> Radim is working on the KVM side already, Peter is currently driving the >> VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) > > + Igor extends QEMU to support more than 255 in internal structures and > ACPI. What remains mostly untracked is Seabios/OVMF. Thanks for you information. How about KVM X2APIC part? Do you have patch to extend KVM X2APIC to support 32-bit APIC ID? > >> PS: Please no PV mess, at least without good reasons. > > Seconded. > > (If we designed all related devices as virtware, then it would not be > that bad, but slightly modifying and putting hardware drivers into > situations that cannot happen in hardware, not even in the spec, and > then juggling the KVM side to make them work, is a road to hell.) > -- Best regards Tianyu Lan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-27 5:39 ` Lan Tianyu @ 2016-04-27 14:38 ` Radim Krčmář 0 siblings, 0 replies; 25+ messages in thread From: Radim Krčmář @ 2016-04-27 14:38 UTC (permalink / raw) To: Lan Tianyu Cc: Jan Kiszka, pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, x86, Peter Xu, Igor Mammedov 2016-04-27 13:39+0800, Lan Tianyu: > On 2016年04月27日 00:49, Radim Krčmář wrote: >> 2016-04-26 18:17+0200, Jan Kiszka: >>> On 2016-04-26 18:14, Lan, Tianyu wrote: >>>> Hi All: >>>> >>>> Recently I am working on extending max vcpu to more than 256 on the both >>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >>>> requires irq remapping function during enabling X2APIC when max APIC id >>>> is more than 255(More detail please see try_to_enable_x2apic()). >> >> Our of curiosity, how many VCPUs are you aiming at? > > I think it's 1024. > > In the short term, we hope hypervisor at least supports 288 vcpus > because Xeon phi chip already supports 288 logical cpus. As hardware > development, there will be more logical cpus and we hope one guest can > totally uses all cpu resources on the chip to meet HPC requirement. Thanks, I think KVM will start by bumping the hard VCPU limit to 512 or 1024, with recommended maximum being 288. You'll be able to raise the hard limit just by configuing and recompiling. > How about KVM X2APIC part? Do you have patch > to extend KVM X2APIC to support 32-bit APIC ID? I do, in limbo, as QEMU cannot create VCPUs with higher APIC IDs, yet. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Enable more than 255 VCPU support without irq remapping function in the guest 2016-04-26 16:17 ` Jan Kiszka 2016-04-26 16:49 ` Radim Krčmář @ 2016-04-27 5:15 ` Lan Tianyu 1 sibling, 0 replies; 25+ messages in thread From: Lan Tianyu @ 2016-04-27 5:15 UTC (permalink / raw) To: Jan Kiszka, pbonzini, kvm, yang.zhang.wz, tglx, gleb, mst, x86, Radim Krčmář, Peter Xu On 2016年04月27日 00:17, Jan Kiszka wrote: > On 2016-04-26 18:14, Lan, Tianyu wrote: >> Hi All: >> >> Recently I am working on extending max vcpu to more than 256 on the both >> KVM/Xen. For some HPC cases, it needs many vcpus. The job requires to >> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel >> requires irq remapping function during enabling X2APIC when max APIC id >> is more than 255(More detail please see try_to_enable_x2apic()). >> >> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC just >> supports 8-bit target APIC id field and only can deliver irq to >> cpu 0~255. >> >> So far both KVM/Xen doesn't enable irq remapping function. If enable the >> function, it seems a huge job which need to rework IO-APIC, local APIC, >> MSI parts and add virtual VTD support in the KVM. >> >> Other quick way to enable more than 256 VCPUs is to eliminate the >> dependency between irq remapping and X2APIC in the guest linux kernel. >> So far I can boot the guest after removing the dependency. >> The side effect I thought is that irq only can deliver to 0~255 vcpus >> but 256 vcpus seem enough to balance irq requests in the guest. In the >> most cases, there are fewer devices in the guest. >> >> I wonder whether it's feasible. There maybe some other side effects I >> didn't think of. Very appreciate for your comments. > > Radim is working on the KVM side already, Peter is currently driving the > VT-d interrupt emulation topic in QEMU. It's in reach, I would say. :) Oh. Thanks for your information. Very helpful :) > > Jan > > PS: Please no PV mess, at least without good reasons. > -- Best regards Tianyu Lan ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-05-04 5:38 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-26 16:14 Enable more than 255 VCPU support without irq remapping function in the guest Lan, Tianyu 2016-04-26 16:17 ` Jan Kiszka 2016-04-26 16:49 ` Radim Krčmář 2016-04-27 4:10 ` Yang Zhang 2016-04-27 5:24 ` Jan Kiszka 2016-04-27 6:24 ` Lan Tianyu 2016-04-27 6:56 ` Jan Kiszka 2016-04-27 9:39 ` Yang Zhang 2016-04-27 9:45 ` Jan Kiszka 2016-04-28 1:11 ` Yang Zhang 2016-04-28 6:54 ` Jan Kiszka 2016-04-28 15:32 ` Radim Krčmář 2016-04-29 2:09 ` Yang Zhang 2016-04-29 3:01 ` Nadav Amit 2016-05-03 1:34 ` Yang Zhang 2016-04-29 4:59 ` Jan Kiszka 2016-05-03 1:52 ` Yang Zhang 2016-05-03 2:03 ` Nadav Amit 2016-05-03 4:55 ` Jan Kiszka 2016-05-04 1:46 ` Yang Zhang 2016-05-04 1:56 ` Nadav Amit 2016-05-04 5:38 ` Jan Kiszka 2016-04-27 5:39 ` Lan Tianyu 2016-04-27 14:38 ` Radim Krčmář 2016-04-27 5:15 ` Lan Tianyu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).