From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: [PATCH, RFC, resend] x86: eliminate bogus IRQ restrictions Date: Fri, 13 Aug 2010 14:26:03 +0100 Message-ID: <4C65640B020000780000FBAC@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__PartFDD0D9FB.0__=" Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__PartFDD0D9FB.0__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline As pointed out in http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00077.html=20= the limits introduced in c/s 20072 are at least questionable. Eliminate them in favor of a more dynamic approach: There's no real need for an upper limit on nr_irqs (as anything beyond nr_irqs_gsi isn't visible to domains anyway), and the split point (and hence ratio) between GSI and MSI/MSI-X IRQs doesn't need to be hard coded, but can instead be controlled on the command line in case there are *very* many GSIs. The default used for nr_irqs will be rather large with this patch, so it may not be acceptable without also switching to a sparse irq_desc[] as was done not so lomg ago in Linux. The added capping of any domain's nr_pirqs is based on the observation that no domain can possibly have more than the system wide number of IRQs. The opposite case may in fact also require some adjustment: Defaulting the number of non-GSI IRQs available (namely to Dom0) to a fixed value may not be the best choice going forward, since if there indeed are very many non-GSI interrupt sources, it won't be possible for the kernel to make use of them without giving "extra_guest_irqs=3D" on the command line (but the goal should be to allow things to work right by default even on large systems). Signed-off-by: Jan Beulich --- 2010-08-12.orig/xen/arch/x86/io_apic.c 2010-08-06 08:44:33.0000000= 00 +0200 +++ 2010-08-12/xen/arch/x86/io_apic.c 2010-08-12 17:01:37.000000000 = +0200 @@ -2503,6 +2503,9 @@ void dump_ioapic_irq_info(void) =20 unsigned highest_gsi(void); =20 +static unsigned int __initdata max_gsi_irqs; +integer_param("max_gsi_irqs", max_gsi_irqs); + void __init init_ioapic_mappings(void) { unsigned long ioapic_phys; @@ -2547,19 +2550,37 @@ void __init init_ioapic_mappings(void) =20 nr_irqs_gsi =3D max(nr_irqs_gsi, highest_gsi()); =20 + if ( max_gsi_irqs =3D=3D 0 ) + max_gsi_irqs =3D nr_irqs ? nr_irqs / 8 : PAGE_SIZE; + else if ( nr_irqs !=3D 0 && max_gsi_irqs > nr_irqs ) + { + printk(XENLOG_WARNING "\"max_gsi_irqs=3D\" cannot be specified = larger" + " than \"nr_irqs=3D\"\n"); + max_gsi_irqs =3D nr_irqs; + } + if ( max_gsi_irqs < 16 ) + max_gsi_irqs =3D 16; + + /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions */ + if ( max_gsi_irqs > PAGE_SIZE * 8 ) + max_gsi_irqs =3D PAGE_SIZE * 8; + if ( !smp_found_config || skip_ioapic_setup || nr_irqs_gsi < 16 ) nr_irqs_gsi =3D 16; - else if ( nr_irqs_gsi > MAX_GSI_IRQS) + else if ( nr_irqs_gsi > max_gsi_irqs ) { - /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions */ - printk(KERN_WARNING "Limiting number of GSI IRQs found (%u) to = %lu\n", - nr_irqs_gsi, MAX_GSI_IRQS); - nr_irqs_gsi =3D MAX_GSI_IRQS; + printk(XENLOG_WARNING "Limiting to %u GSI IRQs (found %u)\n", + max_gsi_irqs, nr_irqs_gsi); + nr_irqs_gsi =3D max_gsi_irqs; } =20 - if (nr_irqs < 2 * nr_irqs_gsi) - nr_irqs =3D 2 * nr_irqs_gsi; - - if (nr_irqs > MAX_NR_IRQS) - nr_irqs =3D MAX_NR_IRQS; + if ( nr_irqs =3D=3D 0 ) + nr_irqs =3D cpu_has_apic ? + max(16U + num_present_cpus() * NR_DYNAMIC_VECTORS, + 8 * nr_irqs_gsi) : + nr_irqs_gsi; + else if ( nr_irqs < 16 ) + nr_irqs =3D 16; + printk(XENLOG_INFO "IRQ limits: %u GSI, %u MSI/MSI-X\n", + nr_irqs_gsi, nr_irqs - nr_irqs_gsi); } --- 2010-08-12.orig/xen/arch/x86/irq.c 2010-07-05 08:49:19.000000000 = +0200 +++ 2010-08-12/xen/arch/x86/irq.c 2010-08-12 17:01:37.000000000 = +0200 @@ -29,7 +29,7 @@ int __read_mostly opt_noirqbalance =3D 0; boolean_param("noirqbalance", opt_noirqbalance); =20 unsigned int __read_mostly nr_irqs_gsi =3D 16; -unsigned int __read_mostly nr_irqs =3D 1024; +unsigned int __read_mostly nr_irqs; integer_param("nr_irqs", nr_irqs); =20 u8 __read_mostly *irq_vector; --- 2010-08-12.orig/xen/common/domain.c 2010-08-06 08:44:34.000000000 = +0200 +++ 2010-08-12/xen/common/domain.c 2010-08-12 17:01:37.000000000 = +0200 @@ -274,6 +274,8 @@ struct domain *domain_create( d->nr_pirqs =3D nr_irqs_gsi + extra_domU_irqs; else d->nr_pirqs =3D nr_irqs_gsi + extra_dom0_irqs; + if ( d->nr_pirqs > nr_irqs ) + d->nr_pirqs =3D nr_irqs; =20 d->pirq_to_evtchn =3D xmalloc_array(u16, d->nr_pirqs); d->pirq_mask =3D xmalloc_array( --- 2010-08-12.orig/xen/include/asm-x86/irq.h 2010-07-05 13:42:16.0000000= 00 +0200 +++ 2010-08-12/xen/include/asm-x86/irq.h 2010-08-12 17:01:37.0000000= 00 +0200 @@ -23,9 +23,6 @@ #define irq_to_desc(irq) (&irq_desc[irq]) #define irq_cfg(irq) (&irq_cfg[irq]) =20 -#define MAX_GSI_IRQS PAGE_SIZE * 8 -#define MAX_NR_IRQS (2 * MAX_GSI_IRQS) - struct irq_cfg { int vector; cpumask_t domain; --=__PartFDD0D9FB.0__= Content-Type: text/plain; name="x86-max-irqs-gsi.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="x86-max-irqs-gsi.patch" As pointed out in=0Ahttp://lists.xensource.com/archives/html/xen-devel/2010= -07/msg00077.html=0Athe limits introduced in c/s 20072 are at least = questionable. Eliminate=0Athem in favor of a more dynamic approach: = There's no real need for an=0Aupper limit on nr_irqs (as anything beyond = nr_irqs_gsi isn't visible to=0Adomains anyway), and the split point (and = hence ratio) between GSI and=0AMSI/MSI-X IRQs doesn't need to be hard = coded, but can instead be=0Acontrolled on the command line in case there = are *very* many GSIs.=0A=0AThe default used for nr_irqs will be rather = large with this patch, so=0Ait may not be acceptable without also = switching to a sparse irq_desc[]=0Aas was done not so lomg ago in = Linux.=0A=0AThe added capping of any domain's nr_pirqs is based on the = observation=0Athat no domain can possibly have more than the system wide = number of=0AIRQs. The opposite case may in fact also require some = adjustment:=0ADefaulting the number of non-GSI IRQs available (namely to = Dom0) to a=0Afixed value may not be the best choice going forward, since = if there=0Aindeed are very many non-GSI interrupt sources, it won't be = possible=0Afor the kernel to make use of them without giving "extra_guest_i= rqs=3D"=0Aon the command line (but the goal should be to allow things to = work=0Aright by default even on large systems).=0A=0ASigned-off-by: Jan = Beulich =0A=0A--- 2010-08-12.orig/xen/arch/x86/io_apic= .c 2010-08-06 08:44:33.000000000 +0200=0A+++ 2010-08-12/xen/arch/x86/i= o_apic.c 2010-08-12 17:01:37.000000000 +0200=0A@@ -2503,6 +2503,9 = @@ void dump_ioapic_irq_info(void)=0A =0A unsigned highest_gsi(void);=0A = =0A+static unsigned int __initdata max_gsi_irqs;=0A+integer_param("max_gsi_= irqs", max_gsi_irqs);=0A+=0A void __init init_ioapic_mappings(void)=0A = {=0A unsigned long ioapic_phys;=0A@@ -2547,19 +2550,37 @@ void __init = init_ioapic_mappings(void)=0A =0A nr_irqs_gsi =3D max(nr_irqs_gsi, = highest_gsi());=0A =0A+ if ( max_gsi_irqs =3D=3D 0 )=0A+ = max_gsi_irqs =3D nr_irqs ? nr_irqs / 8 : PAGE_SIZE;=0A+ else if ( = nr_irqs !=3D 0 && max_gsi_irqs > nr_irqs )=0A+ {=0A+ printk(XENLO= G_WARNING "\"max_gsi_irqs=3D\" cannot be specified larger"=0A+ = " than \"nr_irqs=3D\"\n");=0A+ max_gsi_irqs =3D = nr_irqs;=0A+ }=0A+ if ( max_gsi_irqs < 16 )=0A+ max_gsi_irqs = =3D 16;=0A+=0A+ /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions */=0A+ = if ( max_gsi_irqs > PAGE_SIZE * 8 )=0A+ max_gsi_irqs =3D = PAGE_SIZE * 8;=0A+=0A if ( !smp_found_config || skip_ioapic_setup || = nr_irqs_gsi < 16 )=0A nr_irqs_gsi =3D 16;=0A- else if ( = nr_irqs_gsi > MAX_GSI_IRQS)=0A+ else if ( nr_irqs_gsi > max_gsi_irqs = )=0A {=0A- /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions = */=0A- printk(KERN_WARNING "Limiting number of GSI IRQs found (%u) = to %lu\n",=0A- nr_irqs_gsi, MAX_GSI_IRQS);=0A- = nr_irqs_gsi =3D MAX_GSI_IRQS;=0A+ printk(XENLOG_WARNING "Limiting = to %u GSI IRQs (found %u)\n",=0A+ max_gsi_irqs, nr_irqs_gsi);= =0A+ nr_irqs_gsi =3D max_gsi_irqs;=0A }=0A =0A- if (nr_irqs = < 2 * nr_irqs_gsi)=0A- nr_irqs =3D 2 * nr_irqs_gsi;=0A-=0A- if = (nr_irqs > MAX_NR_IRQS)=0A- nr_irqs =3D MAX_NR_IRQS;=0A+ if ( = nr_irqs =3D=3D 0 )=0A+ nr_irqs =3D cpu_has_apic ?=0A+ = max(16U + num_present_cpus() * NR_DYNAMIC_VECTORS,=0A+ = 8 * nr_irqs_gsi) :=0A+ nr_irqs_gsi;=0A+ else if ( = nr_irqs < 16 )=0A+ nr_irqs =3D 16;=0A+ printk(XENLOG_INFO "IRQ = limits: %u GSI, %u MSI/MSI-X\n",=0A+ nr_irqs_gsi, nr_irqs - = nr_irqs_gsi);=0A }=0A--- 2010-08-12.orig/xen/arch/x86/irq.c 2010-07-05 = 08:49:19.000000000 +0200=0A+++ 2010-08-12/xen/arch/x86/irq.c 2010-08-12 = 17:01:37.000000000 +0200=0A@@ -29,7 +29,7 @@ int __read_mostly opt_noirqbal= ance =3D 0;=0A boolean_param("noirqbalance", opt_noirqbalance);=0A =0A = unsigned int __read_mostly nr_irqs_gsi =3D 16;=0A-unsigned int __read_mostl= y nr_irqs =3D 1024;=0A+unsigned int __read_mostly nr_irqs;=0A integer_param= ("nr_irqs", nr_irqs);=0A =0A u8 __read_mostly *irq_vector;=0A--- 2010-08-12= .orig/xen/common/domain.c 2010-08-06 08:44:34.000000000 +0200=0A+++ = 2010-08-12/xen/common/domain.c 2010-08-12 17:01:37.000000000 +0200=0A@@ = -274,6 +274,8 @@ struct domain *domain_create(=0A d->nr_pirqs = =3D nr_irqs_gsi + extra_domU_irqs;=0A else=0A = d->nr_pirqs =3D nr_irqs_gsi + extra_dom0_irqs;=0A+ if ( d->nr_pirqs = > nr_irqs )=0A+ d->nr_pirqs =3D nr_irqs;=0A =0A = d->pirq_to_evtchn =3D xmalloc_array(u16, d->nr_pirqs);=0A = d->pirq_mask =3D xmalloc_array(=0A--- 2010-08-12.orig/xen/include/asm-x86/i= rq.h 2010-07-05 13:42:16.000000000 +0200=0A+++ 2010-08-12/xen/include/as= m-x86/irq.h 2010-08-12 17:01:37.000000000 +0200=0A@@ -23,9 +23,6 @@=0A = #define irq_to_desc(irq) (&irq_desc[irq])=0A #define irq_cfg(irq) = (&irq_cfg[irq])=0A =0A-#define MAX_GSI_IRQS PAGE_SIZE * 8=0A-#define = MAX_NR_IRQS (2 * MAX_GSI_IRQS)=0A-=0A struct irq_cfg {=0A int = vector;=0A cpumask_t domain;=0A --=__PartFDD0D9FB.0__= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --=__PartFDD0D9FB.0__=--