* Re: [PATCH -v2] x86: increase NR_IRQS and nr_irqs [not found] ` <4B398ECD.1080506@kernel.org> @ 2010-01-04 3:06 ` Jesse Brandeburg 2010-01-04 3:20 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Jesse Brandeburg @ 2010-01-04 3:06 UTC (permalink / raw) To: Yinghai Lu Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On Mon, Dec 28, 2009 at 9:08 PM, Yinghai Lu <yinghai@kernel.org> wrote: > have a system with lots of igb and ixgbe, when iov/vf are enabled for > them, we hit the limit of 3064. > > when system have 20 pcie installed, and one card have 2 functions, and one > function need 64 msi-x, > may need 20 * 2 * 64 = 2560 for msi-x > but if iov and vf are enabled > may need 20 * 2 * 64 * 3 = 7680 for msi-x > assume system with 5 ioapic, nr_irqs_gsi will be 120. > NR_CPUS = 512, and nr_cpu_ids = 128 > will have NR_IRQS = 256 + 512 * 64 = 33024 > will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 > > when SPARSE_IRQ is not set, there is no increase with data > when NR_CPUS=128, and SPARSE_IRQ is set > text data bss dec hex filename > 21837444 4216564 12480736 38534744 24bfe58 vmlinux.before > 21837442 4216580 12480736 38534758 24bfe66 vmlinux.after > when NR_CPUS=4096, and SPARSE_IRQ is set > text data bss dec hex filename > 21878619 5610244 13415392 40904255 270263f vmlinux.before > 21878617 5610244 13415392 40904253 270263d vmlinux.after > > -v2: update comments to address Ingo's concern > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> I'm not sure this is the best plan, but may be okay for now. What happens when all of your slots have 6 port 82599 ixgbe adapters in them? They are being made[1], as well as quad port 82576 igb adapters, however I'm not fully sure of the SRIOV support of the bridges being used on those adapters. Is it on the table to (re-)design this subsystem to be a little more dynamic? There are probably examples in ppc64 or ia64 directories. Every time you suggest a limit I can find a case where it won't be enough. [1] http://www.hotlavasystems.com/products_10gbe.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -v2] x86: increase NR_IRQS and nr_irqs 2010-01-04 3:06 ` [PATCH -v2] x86: increase NR_IRQS and nr_irqs Jesse Brandeburg @ 2010-01-04 3:20 ` Yinghai Lu 2010-01-04 6:56 ` Subject: [PATCH 1/2] x86: get back 15 vectors Yinghai Lu ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 3:20 UTC (permalink / raw) To: Jesse Brandeburg Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/03/2010 07:06 PM, Jesse Brandeburg wrote: > On Mon, Dec 28, 2009 at 9:08 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> have a system with lots of igb and ixgbe, when iov/vf are enabled for >> them, we hit the limit of 3064. >> >> when system have 20 pcie installed, and one card have 2 functions, and one >> function need 64 msi-x, >> may need 20 * 2 * 64 = 2560 for msi-x >> but if iov and vf are enabled >> may need 20 * 2 * 64 * 3 = 7680 for msi-x >> assume system with 5 ioapic, nr_irqs_gsi will be 120. >> NR_CPUS = 512, and nr_cpu_ids = 128 >> will have NR_IRQS = 256 + 512 * 64 = 33024 >> will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 >> >> when SPARSE_IRQ is not set, there is no increase with data >> when NR_CPUS=128, and SPARSE_IRQ is set >> text data bss dec hex filename >> 21837444 4216564 12480736 38534744 24bfe58 vmlinux.before >> 21837442 4216580 12480736 38534758 24bfe66 vmlinux.after >> when NR_CPUS=4096, and SPARSE_IRQ is set >> text data bss dec hex filename >> 21878619 5610244 13415392 40904255 270263f vmlinux.before >> 21878617 5610244 13415392 40904253 270263d vmlinux.after >> >> -v2: update comments to address Ingo's concern >> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > I'm not sure this is the best plan, but may be okay for now. What > happens when all of your slots have 6 port 82599 ixgbe adapters in > them? They are being made[1], as well as quad port 82576 igb > adapters, however I'm not fully sure of the SRIOV support of the > bridges being used on those adapters. > > Is it on the table to (re-)design this subsystem to be a little more > dynamic? There are probably examples in ppc64 or ia64 directories. > Every time you suggest a limit I can find a case where it won't be > enough. > > [1] http://www.hotlavasystems.com/products_10gbe.html you mean 20 * 6 * 64 * 3 ? 23040 maybe we can just keep nr_irqs = (256 - 32 - 10) * nr_cpu_ids? 32: for exception 10: for IPI etc. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 3:20 ` Yinghai Lu @ 2010-01-04 6:56 ` Yinghai Lu 2010-01-04 16:18 ` Eric W. Biederman 2010-01-04 6:58 ` [PATCH 2/2] x86: get more exact nr_irqs Yinghai Lu 2010-01-04 6:59 ` [PATCH 1/2] x86: get back 15 vectors Yinghai Lu 2 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 6:56 UTC (permalink / raw) To: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Eric W. Biederman Cc: linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. also try to reuse 0x30 to 0x3f after smp_affinity for irq[0,15] is changed to other cpu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/apic/io_apic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c +++ linux-2.6/arch/x86/kernel/apic/io_apic.c @@ -1162,7 +1162,8 @@ __assign_irq_vector(int irq, struct irq_ * Also, we've got to be careful not to trash gate * 0x80, because int 0x80 is hm, kind of importantish. ;) */ - static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0; + static int current_vector = FIRST_EXTERNAL_VECTOR + 1; + static int current_offset = 0; unsigned int old_vector; int cpu, err; cpumask_var_t tmp_mask; @@ -1198,7 +1199,7 @@ next: if (vector >= first_system_vector) { /* If out of vectors on large boxen, must share them. */ offset = (offset + 1) % 8; - vector = FIRST_DEVICE_VECTOR + offset; + vector = FIRST_EXTERNAL_VECTOR + 1 + offset; } if (unlikely(current_vector == vector)) continue; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 6:56 ` Subject: [PATCH 1/2] x86: get back 15 vectors Yinghai Lu @ 2010-01-04 16:18 ` Eric W. Biederman 2010-01-04 18:40 ` Yinghai Lu 2010-01-04 19:01 ` H. Peter Anvin 0 siblings, 2 replies; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 16:18 UTC (permalink / raw) To: Yinghai Lu Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg Yinghai Lu <yinghai@kernel.org> writes: This patch is wrong. > between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) > > for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire priority level to ensure that the irq move cleanup ipi is of a lower priority. > also try to reuse 0x30 to 0x3f after smp_affinity for irq[0,15] is changed to other cpu. There may be a point with 0x30 to 0x3f as I recall when those irqs come through a legacy pic we need to reserve those vectors on all cpus. Eric > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > --- > arch/x86/kernel/apic/io_apic.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/x86/kernel/apic/io_apic.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c > +++ linux-2.6/arch/x86/kernel/apic/io_apic.c > @@ -1162,7 +1162,8 @@ __assign_irq_vector(int irq, struct irq_ > * Also, we've got to be careful not to trash gate > * 0x80, because int 0x80 is hm, kind of importantish. ;) > */ > - static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0; > + static int current_vector = FIRST_EXTERNAL_VECTOR + 1; > + static int current_offset = 0; > unsigned int old_vector; > int cpu, err; > cpumask_var_t tmp_mask; > @@ -1198,7 +1199,7 @@ next: > if (vector >= first_system_vector) { > /* If out of vectors on large boxen, must share them. */ > offset = (offset + 1) % 8; > - vector = FIRST_DEVICE_VECTOR + offset; > + vector = FIRST_EXTERNAL_VECTOR + 1 + offset; > } > if (unlikely(current_vector == vector)) > continue; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 16:18 ` Eric W. Biederman @ 2010-01-04 18:40 ` Yinghai Lu 2010-01-04 19:04 ` Eric W. Biederman 2010-01-04 19:01 ` H. Peter Anvin 1 sibling, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 18:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 08:18 AM, Eric W. Biederman wrote: > Yinghai Lu <yinghai@kernel.org> writes: > > This patch is wrong. > >> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >> >> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. > > We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire > priority level to ensure that the irq move cleanup ipi is of a lower > priority. > >> also try to reuse 0x30 to 0x3f after smp_affinity for irq[0,15] is changed to other cpu. > > There may be a point with 0x30 to 0x3f as I recall when those irqs come through a legacy > pic we need to reserve those vectors on all cpus. ok, I see. any reason that we can not use 0x40? Thanks Yinghai ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 18:40 ` Yinghai Lu @ 2010-01-04 19:04 ` Eric W. Biederman 2010-01-04 19:14 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 19:04 UTC (permalink / raw) To: Yinghai Lu Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg Yinghai Lu <yinghai@kernel.org> writes: > On 01/04/2010 08:18 AM, Eric W. Biederman wrote: >> Yinghai Lu <yinghai@kernel.org> writes: >> >> This patch is wrong. >> >>> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >>> >>> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. >> >> We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire >> priority level to ensure that the irq move cleanup ipi is of a lower >> priority. >> >>> also try to reuse 0x30 to 0x3f after smp_affinity for irq[0,15] is changed to other cpu. >> >> There may be a point with 0x30 to 0x3f as I recall when those irqs come through a legacy >> pic we need to reserve those vectors on all cpus. > > ok, I see. > > any reason that we can not use 0x40? Not that I now of. Reading the comment it looks like it was only skipped so that the initial assignment of vectors would be. 0x31, 0x41, 0x51, 0x61, 0x71, 0x81, 0x91, 0xa1, 0xb1, 0xc1, 0xd1, 0xe1 Instead of. 0x30, 0x40, 0x50, 0x60, 0x70, 0x90, 0xa0, 0xb0, 0xc0, 0xc0, 0xe0 Which doesn't seem to be the worst notion, but at the point we are looking for every vector we can get it does seem to be problematic. Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:04 ` Eric W. Biederman @ 2010-01-04 19:14 ` H. Peter Anvin 0 siblings, 0 replies; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 19:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 11:04 AM, Eric W. Biederman wrote: >> >> any reason that we can not use 0x40? > > Not that I now of. Reading the comment it looks like it was only > skipped so that the initial assignment of vectors would be. > > 0x31, 0x41, 0x51, 0x61, 0x71, 0x81, 0x91, 0xa1, 0xb1, 0xc1, 0xd1, 0xe1 > Instead of. > 0x30, 0x40, 0x50, 0x60, 0x70, 0x90, 0xa0, 0xb0, 0xc0, 0xc0, 0xe0 > > Which doesn't seem to be the worst notion, but at the point we are looking > for every vector we can get it does seem to be problematic. > This can presumably be worked around by tweaking the initial assignment algorithm slightly, without losing a whole vector to that. Also, if we abuse vector 0x1f as the IRQ reassignment vector, we free up a full 16 vectors per CPU -- this seems worthwhile especially since it is a decision that can be trivially undone in the future: this is all kernel internal, we're not creating any kind of API. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 16:18 ` Eric W. Biederman 2010-01-04 18:40 ` Yinghai Lu @ 2010-01-04 19:01 ` H. Peter Anvin 2010-01-04 19:09 ` Eric W. Biederman 1 sibling, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 19:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 08:18 AM, Eric W. Biederman wrote: > Yinghai Lu <yinghai@kernel.org> writes: > > This patch is wrong. > >> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >> >> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. > > We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire > priority level to ensure that the irq move cleanup ipi is of a lower > priority. > Almost makes one want to abuse 0x1f for that. Although 0x00..0x1f are reserved for exceptions, the APICs range down to 0x10, and well, when 0x1f ends up actually getting used as an exception vector that we support, then we can trivially change that. In the meantime it would actually make use of an otherwise-unusable APIC priority level. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:01 ` H. Peter Anvin @ 2010-01-04 19:09 ` Eric W. Biederman 2010-01-04 19:35 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 19:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg "H. Peter Anvin" <hpa@zytor.com> writes: > On 01/04/2010 08:18 AM, Eric W. Biederman wrote: >> Yinghai Lu <yinghai@kernel.org> writes: >> >> This patch is wrong. >> >>> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >>> >>> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. >> >> We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire >> priority level to ensure that the irq move cleanup ipi is of a lower >> priority. >> > > Almost makes one want to abuse 0x1f for that. Although 0x00..0x1f are > reserved for exceptions, the APICs range down to 0x10, and well, when > 0x1f ends up actually getting used as an exception vector that we > support, then we can trivially change that. In the meantime it would > actually make use of an otherwise-unusable APIC priority level. An optimization like that (with a big fat comment) seems reasonable to me. Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:09 ` Eric W. Biederman @ 2010-01-04 19:35 ` Yinghai Lu 2010-01-04 19:45 ` Suresh Siddha ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 19:35 UTC (permalink / raw) To: Eric W. Biederman Cc: H. Peter Anvin, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 11:09 AM, Eric W. Biederman wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > >> On 01/04/2010 08:18 AM, Eric W. Biederman wrote: >>> Yinghai Lu <yinghai@kernel.org> writes: >>> >>> This patch is wrong. >>> >>>> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >>>> >>>> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. >>> >>> We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire >>> priority level to ensure that the irq move cleanup ipi is of a lower >>> priority. >>> >> >> Almost makes one want to abuse 0x1f for that. Although 0x00..0x1f are >> reserved for exceptions, the APICs range down to 0x10, and well, when >> 0x1f ends up actually getting used as an exception vector that we >> support, then we can trivially change that. In the meantime it would >> actually make use of an otherwise-unusable APIC priority level. > > An optimization like that (with a big fat comment) seems reasonable > to me. so we can use [0x10, 0x1f] sth like this? Subject: [PATCH 1/2] x86: get back 16 vectors -v2: according to hpa that we could start from 0x10 according to Eric, we should hold 16 vectors for IRQ MOVE Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/irq_vectors.h | 3 ++- arch/x86/kernel/apic/io_apic.c | 5 +++-- arch/x86/kernel/irqinit.c | 11 +++++++++-- 3 files changed, 14 insertions(+), 5 deletions(-) Index: linux-2.6/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c +++ linux-2.6/arch/x86/kernel/apic/io_apic.c @@ -1162,7 +1162,8 @@ __assign_irq_vector(int irq, struct irq_ * Also, we've got to be careful not to trash gate * 0x80, because int 0x80 is hm, kind of importantish. ;) */ - static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0; + static int current_vector = 0; + static int current_offset = 0; unsigned int old_vector; int cpu, err; cpumask_var_t tmp_mask; @@ -1198,7 +1199,7 @@ next: if (vector >= first_system_vector) { /* If out of vectors on large boxen, must share them. */ offset = (offset + 1) % 8; - vector = FIRST_DEVICE_VECTOR + offset; + vector = 0 + offset; } if (unlikely(current_vector == vector)) continue; Index: linux-2.6/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6/arch/x86/include/asm/irq_vectors.h @@ -30,8 +30,9 @@ /* * IDT vectors usable for external interrupt sources start * at 0x20: + * hpa said we can start from 0x10 */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x10 #ifdef CONFIG_X86_32 # define SYSCALL_VECTOR 0x80 Index: linux-2.6/arch/x86/kernel/irqinit.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/irqinit.c +++ linux-2.6/arch/x86/kernel/irqinit.c @@ -149,6 +149,8 @@ static void __init smp_intr_init(void) { #ifdef CONFIG_SMP #if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC) + int i; + /* * The reschedule interrupt is a CPU-to-CPU reschedule-helper * IPI, driven by wakeup. @@ -174,7 +176,9 @@ static void __init smp_intr_init(void) /* Low priority IPI to cleanup after moving an irq */ set_intr_gate(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt); - set_bit(IRQ_MOVE_CLEANUP_VECTOR, used_vectors); + /* Eric said: Need to hold entire priority */ + for (i = IRQ_MOVE_CLEANUP_VECTOR; i < IRQ_MOVE_CLEANUP_VECTOR+0x10; i++) + set_bit(i, used_vectors); /* IPI used for rebooting/stopping */ alloc_intr_gate(REBOOT_VECTOR, reboot_interrupt); @@ -222,6 +226,9 @@ void __init native_init_IRQ(void) /* Execute any quirks before the call gates are initialised: */ x86_init.irqs.pre_vector_init(); + for (i = 0; i < 0x10; i++) + set_bit(i, used_vectors); + apic_intr_init(); /* @@ -229,7 +236,7 @@ void __init native_init_IRQ(void) * us. (some of these will be overridden and become * 'special' SMP interrupts) */ - for (i = FIRST_EXTERNAL_VECTOR; i < NR_VECTORS; i++) { + for (i = 0; i < NR_VECTORS; i++) { /* IA32_SYSCALL_VECTOR could be used in trap_init already. */ if (!test_bit(i, used_vectors)) set_intr_gate(i, interrupt[i-FIRST_EXTERNAL_VECTOR]); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:35 ` Yinghai Lu @ 2010-01-04 19:45 ` Suresh Siddha 2010-01-04 19:50 ` H. Peter Anvin 2010-01-04 19:48 ` H. Peter Anvin 2010-01-04 20:08 ` Eric W. Biederman 2 siblings, 1 reply; 38+ messages in thread From: Suresh Siddha @ 2010-01-04 19:45 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, H. Peter Anvin, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Brandeburg, Jesse On Mon, 2010-01-04 at 11:35 -0800, Yinghai Lu wrote: > sth like this? > > Subject: [PATCH 1/2] x86: get back 16 vectors > > -v2: according to hpa that we could start from 0x10 > according to Eric, we should hold 16 vectors for IRQ MOVE > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > Yinghai we have to change IRQ_MOVE_CLEANUP_VECTOR to 0x1f or so. From the cpu perspective this vector is documented as illegal, so we need to check if this change will work on the cpu's we have today to get some confidence. thanks, suresh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:45 ` Suresh Siddha @ 2010-01-04 19:50 ` H. Peter Anvin 2010-01-05 0:05 ` Suresh Siddha 0 siblings, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 19:50 UTC (permalink / raw) To: Suresh Siddha Cc: Yinghai Lu, Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Brandeburg, Jesse On 01/04/2010 11:45 AM, Suresh Siddha wrote: > On Mon, 2010-01-04 at 11:35 -0800, Yinghai Lu wrote: >> sth like this? >> >> Subject: [PATCH 1/2] x86: get back 16 vectors >> >> -v2: according to hpa that we could start from 0x10 >> according to Eric, we should hold 16 vectors for IRQ MOVE >> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org> >> > > Yinghai we have to change IRQ_MOVE_CLEANUP_VECTOR to 0x1f or so. From > the cpu perspective this vector is documented as illegal, so we need to > check if this change will work on the cpu's we have today to get some > confidence. > It's documented as reserved, not illegal. The ability for the APIC to generate vectors starting at 0x10 is documented, as is the ability for the CPU to receive any vector number as an interrupt -- in fact, the legacy BIOS relies on being able to receive interrupts starting at vector 0x08. It causes problems galore, but only at the software level. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:50 ` H. Peter Anvin @ 2010-01-05 0:05 ` Suresh Siddha 2010-01-05 0:16 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Suresh Siddha @ 2010-01-05 0:05 UTC (permalink / raw) To: H. Peter Anvin Cc: Yinghai Lu, Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Brandeburg, Jesse On Mon, 2010-01-04 at 11:50 -0800, H. Peter Anvin wrote: > On 01/04/2010 11:45 AM, Suresh Siddha wrote: > > On Mon, 2010-01-04 at 11:35 -0800, Yinghai Lu wrote: > >> sth like this? > >> > >> Subject: [PATCH 1/2] x86: get back 16 vectors > >> > >> -v2: according to hpa that we could start from 0x10 > >> according to Eric, we should hold 16 vectors for IRQ MOVE > >> > >> Signed-off-by: Yinghai Lu <yinghai@kernel.org> > >> > > > > Yinghai we have to change IRQ_MOVE_CLEANUP_VECTOR to 0x1f or so. From > > the cpu perspective this vector is documented as illegal, so we need to > > check if this change will work on the cpu's we have today to get some > > confidence. > > > > It's documented as reserved, not illegal. The ability for the APIC to > generate vectors starting at 0x10 is documented, as is the ability for > the CPU to receive any vector number as an interrupt -- in fact, the > legacy BIOS relies on being able to receive interrupts starting at > vector 0x08. It causes problems galore, but only at the software level. I have checked out couple of platforms (including 32-bit atom) and 0x1f vector logic seems to be working. Hopefully we won't have other hardware or software issues (vmm restrictions etc) with this logic. thanks, suresh ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-05 0:05 ` Suresh Siddha @ 2010-01-05 0:16 ` Yinghai Lu 0 siblings, 0 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-05 0:16 UTC (permalink / raw) To: Suresh Siddha Cc: H. Peter Anvin, Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Brandeburg, Jesse On 01/04/2010 04:05 PM, Suresh Siddha wrote: > On Mon, 2010-01-04 at 11:50 -0800, H. Peter Anvin wrote: >> On 01/04/2010 11:45 AM, Suresh Siddha wrote: >>> On Mon, 2010-01-04 at 11:35 -0800, Yinghai Lu wrote: >>>> sth like this? >>>> >>>> Subject: [PATCH 1/2] x86: get back 16 vectors >>>> >>>> -v2: according to hpa that we could start from 0x10 >>>> according to Eric, we should hold 16 vectors for IRQ MOVE >>>> >>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org> >>>> >>> >>> Yinghai we have to change IRQ_MOVE_CLEANUP_VECTOR to 0x1f or so. From >>> the cpu perspective this vector is documented as illegal, so we need to >>> check if this change will work on the cpu's we have today to get some >>> confidence. >>> >> >> It's documented as reserved, not illegal. The ability for the APIC to >> generate vectors starting at 0x10 is documented, as is the ability for >> the CPU to receive any vector number as an interrupt -- in fact, the >> legacy BIOS relies on being able to receive interrupts starting at >> vector 0x08. It causes problems galore, but only at the software level. > > I have checked out couple of platforms (including 32-bit atom) and 0x1f > vector logic seems to be working. > > Hopefully we won't have other hardware or software issues (vmm > restrictions etc) with this logic. > good. hope it is final version. let's have hpa own it. From: "H. Peter Anvin" <hpa@zytor.com> Subject: [PATCH] x86: get back 16 vectors -v2: according to hpa that we could start from 0x1f -v3: update comments from Eric -v4: update comments from hpa -v5: use round up for IRQ0_VECTOR according to hpa Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/irq_vectors.h | 40 ++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 14 deletions(-) Index: linux-2.6/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6/arch/x86/include/asm/irq_vectors.h @@ -30,26 +30,38 @@ /* * IDT vectors usable for external interrupt sources start * at 0x20: + * hpa said we can start from 0x1f. + * 0x1f is documented as reserved. However, the ability for the APIC + * to generate vectors starting at 0x10 is documented, as is the + * ability for the CPU to receive any vector number as an interrupt. + * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs + * an entire privilege level (16 vectors) all by itself at a higher + * priority than any actual device vector. Thus, by placing it in the + * otherwise-unusable 0x10 privilege level, we avoid wasting a full + * 16-vector block. */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x1f +#define IA32_SYSCALL_VECTOR 0x80 #ifdef CONFIG_X86_32 # define SYSCALL_VECTOR 0x80 -# define IA32_SYSCALL_VECTOR 0x80 -#else -# define IA32_SYSCALL_VECTOR 0x80 #endif /* - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering * cleanup after irq migration. + * this overlaps with the reserved range for cpu exceptions so this + * will need to be changed to 0x20 - 0x2f if the last cpu exception is + * ever allocated. */ + #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR /* - * Vectors 0x30-0x3f are used for ISA interrupts. + * Vectors 0x20-0x2f are used for ISA interrupts. + * round up to the next 16-vector boundary */ -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) +#define IRQ0_VECTOR ((FIRST_EXTERNAL_VECTOR + 16) & ~15) #define IRQ1_VECTOR (IRQ0_VECTOR + 1) #define IRQ2_VECTOR (IRQ0_VECTOR + 2) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:35 ` Yinghai Lu 2010-01-04 19:45 ` Suresh Siddha @ 2010-01-04 19:48 ` H. Peter Anvin 2010-01-04 20:06 ` Yinghai Lu 2010-01-04 20:08 ` Eric W. Biederman 2 siblings, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 19:48 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha [Adding Suresh to the Cc: list] On 01/04/2010 11:35 AM, Yinghai Lu wrote: > > so we can use [0x10, 0x1f] > > sth like this? > No!!! [0x10, 0x1f] is reserved for exceptions. We can probably get away with stealing *one* vector... presumably at the end (0x1f). However, we can absolutely not use the whole block: 0x10-0x13 is occupied by exceptions we already have OS support for (#MF, #AC, #MC, and #XM), and it's pretty much guaranteed we'll have more coming. However, growth is quite slow and since this is a kernel-internal vector (not accessible to user space) it is not creating an API. In other words, we could change FIRST_EXTERNAL_VECTOR to 0x1f, and use it for IRQ_MOVE_CLEANUP_VECTOR. Then use 0x20..0x2f for the legacy vectors. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:48 ` H. Peter Anvin @ 2010-01-04 20:06 ` Yinghai Lu 2010-01-04 20:14 ` Eric W. Biederman 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 20:06 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 11:48 AM, H. Peter Anvin wrote: > [Adding Suresh to the Cc: list] > > On 01/04/2010 11:35 AM, Yinghai Lu wrote: >> >> so we can use [0x10, 0x1f] >> >> sth like this? >> > > No!!! > > [0x10, 0x1f] is reserved for exceptions. We can probably get away with > stealing *one* vector... presumably at the end (0x1f). However, we can > absolutely not use the whole block: 0x10-0x13 is occupied by exceptions > we already have OS support for (#MF, #AC, #MC, and #XM), and it's pretty > much guaranteed we'll have more coming. However, growth is quite slow > and since this is a kernel-internal vector (not accessible to user > space) it is not creating an API. > > In other words, we could change FIRST_EXTERNAL_VECTOR to 0x1f, and use > it for IRQ_MOVE_CLEANUP_VECTOR. Then use 0x20..0x2f for the legacy vectors. > Subject: [PATCH 1/2] x86: get back 16 vectors -v2: according to hpa that we could start from 0x1f Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/irq_vectors.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6/arch/x86/include/asm/irq_vectors.h @@ -30,8 +30,9 @@ /* * IDT vectors usable for external interrupt sources start * at 0x20: + * hpa said we can start from 0x1f */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x1f #ifdef CONFIG_X86_32 # define SYSCALL_VECTOR 0x80 @@ -41,15 +42,15 @@ #endif /* - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering + * Reserve the lowest usable priority level 0x1f for triggering * cleanup after irq migration. */ #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR /* - * Vectors 0x30-0x3f are used for ISA interrupts. + * Vectors 0x20-0x2f are used for ISA interrupts. */ -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) #define IRQ1_VECTOR (IRQ0_VECTOR + 1) #define IRQ2_VECTOR (IRQ0_VECTOR + 2) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 20:06 ` Yinghai Lu @ 2010-01-04 20:14 ` Eric W. Biederman 2010-01-04 20:33 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 20:14 UTC (permalink / raw) To: Yinghai Lu Cc: H. Peter Anvin, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha Yinghai Lu <yinghai@kernel.org> writes: > On 01/04/2010 11:48 AM, H. Peter Anvin wrote: >> [Adding Suresh to the Cc: list] >> >> On 01/04/2010 11:35 AM, Yinghai Lu wrote: >>> >>> so we can use [0x10, 0x1f] >>> >>> sth like this? >>> >> >> No!!! >> >> [0x10, 0x1f] is reserved for exceptions. We can probably get away with >> stealing *one* vector... presumably at the end (0x1f). However, we can >> absolutely not use the whole block: 0x10-0x13 is occupied by exceptions >> we already have OS support for (#MF, #AC, #MC, and #XM), and it's pretty >> much guaranteed we'll have more coming. However, growth is quite slow >> and since this is a kernel-internal vector (not accessible to user >> space) it is not creating an API. >> >> In other words, we could change FIRST_EXTERNAL_VECTOR to 0x1f, and use >> it for IRQ_MOVE_CLEANUP_VECTOR. Then use 0x20..0x2f for the legacy vectors. >> > Subject: [PATCH 1/2] x86: get back 16 vectors > > -v2: according to hpa that we could start from 0x1f The code in the patch is ok, but the comments are wrong. > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/include/asm/irq_vectors.h | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > Index: linux-2.6/arch/x86/include/asm/irq_vectors.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h > +++ linux-2.6/arch/x86/include/asm/irq_vectors.h > @@ -30,8 +30,9 @@ > /* > * IDT vectors usable for external interrupt sources start > * at 0x20: > + * hpa said we can start from 0x1f You need to document the reasons here. > */ > -#define FIRST_EXTERNAL_VECTOR 0x20 > +#define FIRST_EXTERNAL_VECTOR 0x1f > > #ifdef CONFIG_X86_32 > # define SYSCALL_VECTOR 0x80 > @@ -41,15 +42,15 @@ > #endif > > /* > - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering > + * Reserve the lowest usable priority level 0x1f for triggering Should be: + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering > * cleanup after irq migration. + * this overlaps with the reserved range for cpu exceptions so this + * will need to be changed to 0x20 - 0x2f if the last cpu exception is + * ever allocated. > */ > #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR > > /* > - * Vectors 0x30-0x3f are used for ISA interrupts. > + * Vectors 0x20-0x2f are used for ISA interrupts. > */ > -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) > +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) > > #define IRQ1_VECTOR (IRQ0_VECTOR + 1) > #define IRQ2_VECTOR (IRQ0_VECTOR + 2) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 20:14 ` Eric W. Biederman @ 2010-01-04 20:33 ` Yinghai Lu 2010-01-04 21:10 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 20:33 UTC (permalink / raw) To: Eric W. Biederman Cc: H. Peter Anvin, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha Subject: [PATCH 1/2] x86: get back 16 vectors -v2: according to hpa that we could start from 0x1f -v3: update comments Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/irq_vectors.h | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) Index: linux-2.6/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6/arch/x86/include/asm/irq_vectors.h @@ -30,8 +30,12 @@ /* * IDT vectors usable for external interrupt sources start * at 0x20: + * hpa said we can start from 0x1f. + * 0x1f is documented as reserved. The ability for the APIC to + * generate vectors starting at 0x10 is documented, as is the ability for + * the CPU to receive any vector number as an interrupt */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x1f #ifdef CONFIG_X86_32 # define SYSCALL_VECTOR 0x80 @@ -41,15 +45,19 @@ #endif /* - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering * cleanup after irq migration. + * this overlaps with the reserved range for cpu exceptions so this + * will need to be changed to 0x20 - 0x2f if the last cpu exception is + * ever allocated. */ + #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR /* - * Vectors 0x30-0x3f are used for ISA interrupts. + * Vectors 0x20-0x2f are used for ISA interrupts. */ -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) #define IRQ1_VECTOR (IRQ0_VECTOR + 1) #define IRQ2_VECTOR (IRQ0_VECTOR + 2) @@ -68,6 +76,13 @@ #define IRQ15_VECTOR (IRQ0_VECTOR + 15) /* + * First APIC vector available to drivers: (vectors 0x30-0xee) we + * start at 0x31 to spread out vectors evenly between priority + * levels. (0x80 is the syscall vector) + */ +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) + +/* * Special IRQ vectors used by the SMP architecture, 0xf0-0xff * * some of the following vectors are 'rare', they are merged @@ -120,13 +135,6 @@ */ #define MCE_SELF_VECTOR 0xeb -/* - * First APIC vector available to drivers: (vectors 0x30-0xee) we - * start at 0x31(0x41) to spread out vectors evenly between priority - * levels. (0x80 is the syscall vector) - */ -#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) - #define NR_VECTORS 256 #define FPU_IRQ 13 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 20:33 ` Yinghai Lu @ 2010-01-04 21:10 ` H. Peter Anvin 2010-01-04 21:20 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 21:10 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 12:33 PM, Yinghai Lu wrote: > --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h > +++ linux-2.6/arch/x86/include/asm/irq_vectors.h > @@ -30,8 +30,12 @@ > /* > * IDT vectors usable for external interrupt sources start > * at 0x20: > + * hpa said we can start from 0x1f. > + * 0x1f is documented as reserved. The ability for the APIC to > + * generate vectors starting at 0x10 is documented, as is the ability for > + * the CPU to receive any vector number as an interrupt > */ > -#define FIRST_EXTERNAL_VECTOR 0x20 > +#define FIRST_EXTERNAL_VECTOR 0x1f > This really isn't a sufficient explanation either. I know writing English prose is very difficult for you, but I'm sorry, you really need to start getting better about your comments and commit messages. In this case, the text is missing one very important piece of the justification: otherwise we have to waste a full 16 vectors in order for the IRQ migration interrupt to get its own priority level. Thus, something like this: * 0x1f is documented as reserved. However, the ability for the APIC * to generate vectors starting at 0x10 is documented, as is the * ability for the CPU to receive any vector number as an interrupt. * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs * an entire privilege level (16 vectors) all by itself at a higher * priority than any actual device vector. Thus, by placing it in the * otherwise-unusable 0x10 privilege level, we avoid wasting a full * 16-vector block. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 21:10 ` H. Peter Anvin @ 2010-01-04 21:20 ` Yinghai Lu 2010-01-04 21:33 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 21:20 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 01:10 PM, H. Peter Anvin wrote: > * 0x1f is documented as reserved. However, the ability for the APIC > * to generate vectors starting at 0x10 is documented, as is the > * ability for the CPU to receive any vector number as an interrupt. > * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs > * an entire privilege level (16 vectors) all by itself at a higher > * priority than any actual device vector. Thus, by placing it in the > * otherwise-unusable 0x10 privilege level, we avoid wasting a full > * 16-vector block. Subject: [PATCH 1/2] x86: get back 16 vectors -v2: according to hpa that we could start from 0x1f -v3: update comments from Eric -v4: update comments from hpa Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/asm/irq_vectors.h | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-) Index: linux-2.6/arch/x86/include/asm/irq_vectors.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h +++ linux-2.6/arch/x86/include/asm/irq_vectors.h @@ -30,8 +30,17 @@ /* * IDT vectors usable for external interrupt sources start * at 0x20: + * hpa said we can start from 0x1f. + * 0x1f is documented as reserved. However, the ability for the APIC + * to generate vectors starting at 0x10 is documented, as is the + * ability for the CPU to receive any vector number as an interrupt. + * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs + * an entire privilege level (16 vectors) all by itself at a higher + * priority than any actual device vector. Thus, by placing it in the + * otherwise-unusable 0x10 privilege level, we avoid wasting a full + * 16-vector block. */ -#define FIRST_EXTERNAL_VECTOR 0x20 +#define FIRST_EXTERNAL_VECTOR 0x1f #ifdef CONFIG_X86_32 # define SYSCALL_VECTOR 0x80 @@ -41,15 +50,19 @@ #endif /* - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering * cleanup after irq migration. + * this overlaps with the reserved range for cpu exceptions so this + * will need to be changed to 0x20 - 0x2f if the last cpu exception is + * ever allocated. */ + #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR /* - * Vectors 0x30-0x3f are used for ISA interrupts. + * Vectors 0x20-0x2f are used for ISA interrupts. */ -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) #define IRQ1_VECTOR (IRQ0_VECTOR + 1) #define IRQ2_VECTOR (IRQ0_VECTOR + 2) @@ -68,6 +81,13 @@ #define IRQ15_VECTOR (IRQ0_VECTOR + 15) /* + * First APIC vector available to drivers: (vectors 0x30-0xee) we + * start at 0x31 to spread out vectors evenly between priority + * levels. (0x80 is the syscall vector) + */ +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) + +/* * Special IRQ vectors used by the SMP architecture, 0xf0-0xff * * some of the following vectors are 'rare', they are merged @@ -120,13 +140,6 @@ */ #define MCE_SELF_VECTOR 0xeb -/* - * First APIC vector available to drivers: (vectors 0x30-0xee) we - * start at 0x31(0x41) to spread out vectors evenly between priority - * levels. (0x80 is the syscall vector) - */ -#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) - #define NR_VECTORS 256 #define FPU_IRQ 13 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 21:20 ` Yinghai Lu @ 2010-01-04 21:33 ` H. Peter Anvin 2010-01-04 22:01 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 21:33 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 01:20 PM, Yinghai Lu wrote: > --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h > +++ linux-2.6/arch/x86/include/asm/irq_vectors.h > @@ -30,8 +30,17 @@ > /* > * IDT vectors usable for external interrupt sources start > * at 0x20: > + * hpa said we can start from 0x1f. > + * 0x1f is documented as reserved. However, the ability for the APIC > + * to generate vectors starting at 0x10 is documented, as is the > + * ability for the CPU to receive any vector number as an interrupt. > + * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs > + * an entire privilege level (16 vectors) all by itself at a higher > + * priority than any actual device vector. Thus, by placing it in the > + * otherwise-unusable 0x10 privilege level, we avoid wasting a full > + * 16-vector block. > */ > -#define FIRST_EXTERNAL_VECTOR 0x20 > +#define FIRST_EXTERNAL_VECTOR 0x1f > > #ifdef CONFIG_X86_32 > # define SYSCALL_VECTOR 0x80 > @@ -41,15 +50,19 @@ > #endif > > /* > - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering > + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering > * cleanup after irq migration. > + * this overlaps with the reserved range for cpu exceptions so this > + * will need to be changed to 0x20 - 0x2f if the last cpu exception is > + * ever allocated. > */ > + > #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR > > /* > - * Vectors 0x30-0x3f are used for ISA interrupts. > + * Vectors 0x20-0x2f are used for ISA interrupts. > */ > -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) > +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) > > #define IRQ1_VECTOR (IRQ0_VECTOR + 1) > #define IRQ2_VECTOR (IRQ0_VECTOR + 2) > @@ -68,6 +81,13 @@ > #define IRQ15_VECTOR (IRQ0_VECTOR + 15) > I'm not sure that making IRQ_MOVE_CLEANUP_VECTOR and IRQ0_VECTOR offsets from FIRST_EXTERNAL_VECTOR makes sense from a readability perspective. These are now magic numbers, and making them offsets is only confusing, as it implies we could do it differently. If nothing else, the actual logic for IRQ0_VECTOR should be: #define IRQ0_VECTOR ((FIRST_EXTERNAL_VECTOR + 16) & ~15) ... since that is what we actually want -- we round up to the next 16-vector boundary. Both +16 and +1 misrepresent the logic. > /* > + * First APIC vector available to drivers: (vectors 0x30-0xee) we > + * start at 0x31 to spread out vectors evenly between priority > + * levels. (0x80 is the syscall vector) > + */ > +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) > + We really should fix that so we can do +1 here instead of +2; that presumably means fixing the logic so we do something smarter than just jump over 0x80. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 21:33 ` H. Peter Anvin @ 2010-01-04 22:01 ` Yinghai Lu 2010-01-04 23:03 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 22:01 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 01:33 PM, H. Peter Anvin wrote: > On 01/04/2010 01:20 PM, Yinghai Lu wrote: >> --- linux-2.6.orig/arch/x86/include/asm/irq_vectors.h >> +++ linux-2.6/arch/x86/include/asm/irq_vectors.h >> @@ -30,8 +30,17 @@ >> /* >> * IDT vectors usable for external interrupt sources start >> * at 0x20: >> + * hpa said we can start from 0x1f. >> + * 0x1f is documented as reserved. However, the ability for the APIC >> + * to generate vectors starting at 0x10 is documented, as is the >> + * ability for the CPU to receive any vector number as an interrupt. >> + * 0x1f is used for IRQ_MOVE_CLEANUP_VECTOR since that vector needs >> + * an entire privilege level (16 vectors) all by itself at a higher >> + * priority than any actual device vector. Thus, by placing it in the >> + * otherwise-unusable 0x10 privilege level, we avoid wasting a full >> + * 16-vector block. >> */ >> -#define FIRST_EXTERNAL_VECTOR 0x20 >> +#define FIRST_EXTERNAL_VECTOR 0x1f >> >> #ifdef CONFIG_X86_32 >> # define SYSCALL_VECTOR 0x80 >> @@ -41,15 +50,19 @@ >> #endif >> >> /* >> - * Reserve the lowest usable priority level 0x20 - 0x2f for triggering >> + * Reserve the lowest usable priority level 0x10 - 0x1f for triggering >> * cleanup after irq migration. >> + * this overlaps with the reserved range for cpu exceptions so this >> + * will need to be changed to 0x20 - 0x2f if the last cpu exception is >> + * ever allocated. >> */ >> + >> #define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR >> >> /* >> - * Vectors 0x30-0x3f are used for ISA interrupts. >> + * Vectors 0x20-0x2f are used for ISA interrupts. >> */ >> -#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10) >> +#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 1) >> >> #define IRQ1_VECTOR (IRQ0_VECTOR + 1) >> #define IRQ2_VECTOR (IRQ0_VECTOR + 2) >> @@ -68,6 +81,13 @@ >> #define IRQ15_VECTOR (IRQ0_VECTOR + 15) >> > > I'm not sure that making IRQ_MOVE_CLEANUP_VECTOR and IRQ0_VECTOR offsets > from FIRST_EXTERNAL_VECTOR makes sense from a readability perspective. > These are now magic numbers, and making them offsets is only confusing, > as it implies we could do it differently. > > If nothing else, the actual logic for IRQ0_VECTOR should be: > > #define IRQ0_VECTOR ((FIRST_EXTERNAL_VECTOR + 16) & ~15) > > ... since that is what we actually want -- we round up to the next > 16-vector boundary. Both +16 and +1 misrepresent the logic. that will be good, if later update FIRST_EXTERNAL_VECTOR... > >> /* >> + * First APIC vector available to drivers: (vectors 0x30-0xee) we >> + * start at 0x31 to spread out vectors evenly between priority >> + * levels. (0x80 is the syscall vector) >> + */ >> +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) >> + > > We really should fix that so we can do +1 here instead of +2; that > presumably means fixing the logic so we do something smarter than just > jump over 0x80. we already use used_vectors to skip 0x80. so we could change that to +1? YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 22:01 ` Yinghai Lu @ 2010-01-04 23:03 ` H. Peter Anvin 2010-01-04 23:32 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 23:03 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha [-- Attachment #1: Type: text/plain, Size: 1383 bytes --] On 01/04/2010 02:01 PM, Yinghai Lu wrote: >> >>> /* >>> + * First APIC vector available to drivers: (vectors 0x30-0xee) we >>> + * start at 0x31 to spread out vectors evenly between priority >>> + * levels. (0x80 is the syscall vector) >>> + */ >>> +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) >>> + >> >> We really should fix that so we can do +1 here instead of +2; that >> presumably means fixing the logic so we do something smarter than just >> jump over 0x80. > > we already use used_vectors to skip 0x80. so we could change that to +1? > Yes, but the problem is that we *skip* 0x80, which leads to suboptimal allocation on systems with only a handful of vectors. The easy solution to accomplishing what we want without wasting vector 0x30 is obviously to start allocation at 0x31, but not by artificially limiting the vector space; see the attached patch. For what it's worth, this code(__assign_irq_vector() in arch/x86/kernel/apic/io_apic.c) has me somewhat confused about the use of the constant 8: vector += 8; The only justification that I can immediately think of is to try to assign exactly two sources to each priority level (since early APICs started losing interrupts with more than two sources per priority level.) This is ancient code -- predates not just the git but the bk history -- and as such I would assume that that is the motivation. -hpa [-- Attachment #2: 0001-x86-irq-Don-t-waste-a-vector-to-improve-vector-sprea.patch --] [-- Type: text/x-patch, Size: 2372 bytes --] >From 6e586d7d1da44360b16d2c0ac2bea623c1fcfa6e Mon Sep 17 00:00:00 2001 From: H. Peter Anvin <hpa@zytor.com> Date: Mon, 4 Jan 2010 15:00:57 -0800 Subject: [PATCH] x86, irq: Don't waste a vector to improve vector spread We want to use a vector-assignment sequence that avoids stumbling onto 0x80 earlier in the sequence, in order to improve the spread of vectors across priority levels on machines with a small number of interrupt sources. Right now, this is done by simply making the first vector (0x31 or 0x41) completely unusable. This is unnecessary; all we need is to start assignment at a +1 offset, we don't actually need to prohibit the usage of this vector once we have wrapped around. Signed-off-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/include/asm/irq_vectors.h | 9 +++++---- arch/x86/kernel/apic/io_apic.c | 3 ++- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 4611f08..718dcdd 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -121,11 +121,12 @@ #define MCE_SELF_VECTOR 0xeb /* - * First APIC vector available to drivers: (vectors 0x30-0xee) we - * start at 0x31(0x41) to spread out vectors evenly between priority - * levels. (0x80 is the syscall vector) + * First APIC vector available to drivers: (vectors 0x30-0xee). We + * start allocating at 0x31(0x41) to spread out vectors evenly between + * priority levels. (0x80 is the syscall vector) */ -#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 1) +#define VECTOR_OFFSET_START 1 #define NR_VECTORS 256 diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index de00c46..e289148 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1162,7 +1162,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask) * Also, we've got to be careful not to trash gate * 0x80, because int 0x80 is hm, kind of importantish. ;) */ - static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0; + static int current_vector = FIRST_DEVICE_VECTOR + VECTOR_OFFSET_START; + static int current_offset = VECTOR_OFFSET_START % 8; unsigned int old_vector; int cpu, err; cpumask_var_t tmp_mask; -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 23:03 ` H. Peter Anvin @ 2010-01-04 23:32 ` Yinghai Lu 2010-01-04 23:38 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 23:32 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 03:03 PM, H. Peter Anvin wrote: > On 01/04/2010 02:01 PM, Yinghai Lu wrote: >>> >>>> /* >>>> + * First APIC vector available to drivers: (vectors 0x30-0xee) we >>>> + * start at 0x31 to spread out vectors evenly between priority >>>> + * levels. (0x80 is the syscall vector) >>>> + */ >>>> +#define FIRST_DEVICE_VECTOR (IRQ15_VECTOR + 2) >>>> + >>> >>> We really should fix that so we can do +1 here instead of +2; that >>> presumably means fixing the logic so we do something smarter than just >>> jump over 0x80. >> >> we already use used_vectors to skip 0x80. so we could change that to +1? >> > > Yes, but the problem is that we *skip* 0x80, which leads to suboptimal > allocation on systems with only a handful of vectors. > > The easy solution to accomplishing what we want without wasting vector > 0x30 is obviously to start allocation at 0x31, but not by artificially > limiting the vector space; see the attached patch. > > For what it's worth, this code(__assign_irq_vector() in > arch/x86/kernel/apic/io_apic.c) has me somewhat confused about the use > of the constant 8: > > vector += 8; > > The only justification that I can immediately think of is to try to > assign exactly two sources to each priority level (since early APICs > started losing interrupts with more than two sources per priority level.) > > This is ancient code -- predates not just the git but the bk history -- > and as such I would assume that that is the motivation. yes the patch get back 0x30, 0x38, 0x40, 0x48 etc back. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 23:32 ` Yinghai Lu @ 2010-01-04 23:38 ` H. Peter Anvin 2010-01-04 23:42 ` Yinghai Lu 2010-01-04 23:49 ` Yinghai Lu 0 siblings, 2 replies; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 23:38 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 03:32 PM, Yinghai Lu wrote: > > yes the patch get back 0x30, 0x38, 0x40, 0x48 etc back. > I would have expected the old code to still have had 0x38, 0x40, 0x48, ... available, just missing the first one? -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 23:38 ` H. Peter Anvin @ 2010-01-04 23:42 ` Yinghai Lu 2010-01-04 23:49 ` Yinghai Lu 1 sibling, 0 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 23:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 03:38 PM, H. Peter Anvin wrote: > On 01/04/2010 03:32 PM, Yinghai Lu wrote: >> >> yes the patch get back 0x30, 0x38, 0x40, 0x48 etc back. >> > > I would have expected the old code to still have had 0x38, 0x40, 0x48, > ... available, just missing the first one? > you are right. other via offset=7 to be used. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 23:38 ` H. Peter Anvin 2010-01-04 23:42 ` Yinghai Lu @ 2010-01-04 23:49 ` Yinghai Lu 2010-01-04 23:59 ` H. Peter Anvin 1 sibling, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 23:49 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha why not start from 0x30, 0x38, 0x40, 0x48 ... instead of 0x31, 0x39, 0x41, 0x47...? YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 23:49 ` Yinghai Lu @ 2010-01-04 23:59 ` H. Peter Anvin 0 siblings, 0 replies; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 23:59 UTC (permalink / raw) To: Yinghai Lu Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg, Suresh Siddha On 01/04/2010 03:49 PM, Yinghai Lu wrote: > why not start from 0x30, 0x38, 0x40, 0x48 ... > instead of 0x31, 0x39, 0x41, 0x47...? > > YH The whole point is to defer the skip over 0x80. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: Subject: [PATCH 1/2] x86: get back 15 vectors 2010-01-04 19:35 ` Yinghai Lu 2010-01-04 19:45 ` Suresh Siddha 2010-01-04 19:48 ` H. Peter Anvin @ 2010-01-04 20:08 ` Eric W. Biederman 2 siblings, 0 replies; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 20:08 UTC (permalink / raw) To: Yinghai Lu Cc: H. Peter Anvin, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg Yinghai Lu <yinghai@kernel.org> writes: > On 01/04/2010 11:09 AM, Eric W. Biederman wrote: >> "H. Peter Anvin" <hpa@zytor.com> writes: >> >>> On 01/04/2010 08:18 AM, Eric W. Biederman wrote: >>>> Yinghai Lu <yinghai@kernel.org> writes: >>>> >>>> This patch is wrong. >>>> >>>>> between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) >>>>> >>>>> for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. >>>> >>>> We can not use any of 0x20 - 0x2f for ioapic irqs. We need the entire >>>> priority level to ensure that the irq move cleanup ipi is of a lower >>>> priority. >>>> >>> >>> Almost makes one want to abuse 0x1f for that. Although 0x00..0x1f are >>> reserved for exceptions, the APICs range down to 0x10, and well, when >>> 0x1f ends up actually getting used as an exception vector that we >>> support, then we can trivially change that. In the meantime it would >>> actually make use of an otherwise-unusable APIC priority level. >> >> An optimization like that (with a big fat comment) seems reasonable >> to me. > > so we can use [0x10, 0x1f] > > sth like this? Something. We can not use all of 0x10 - 0x1f, it is simply that hardware can address all of that. 0x10 is already defined as something I forget what. 0x12 is already the MCE_VECTOR. Since hardware has not yet defined 0x1f (and is not likely to for a while. We can use that). So we wind up using hardware priority a single ipi, and hardware exceptions. Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 3:20 ` Yinghai Lu 2010-01-04 6:56 ` Subject: [PATCH 1/2] x86: get back 15 vectors Yinghai Lu @ 2010-01-04 6:58 ` Yinghai Lu 2010-01-04 16:55 ` Eric W. Biederman 2010-01-04 6:59 ` [PATCH 1/2] x86: get back 15 vectors Yinghai Lu 2 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 6:58 UTC (permalink / raw) To: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Eric W. Biederman Cc: linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg first check with NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20 aka minus exceptions and system vectors. NR_CPUS = 512, and nr_cpu_ids = 128 will have NR_IRQS = 256 + 512 * 64 = 33024 assume we have 20 intel ixgbe 6 port cards (with sriov and ixgbevf) 20 * 6 * 64 * 3 = 23040 first will get: 128 * (256 - 64) = 24576 then with nr_irqs_gsi will get (120 + 8 * 128 + 120 * 256) = 31864 so 24576 will be used for nr_irqs. 24576 * 8 = 196608 bytes will be used for irq_desc_ptrs[] before this patch: have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 and irq_desc_ptrs[] is 70592 Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/apic/io_apic.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) Index: linux-2.6/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c +++ linux-2.6/arch/x86/kernel/apic/io_apic.c @@ -3833,15 +3833,20 @@ int __init arch_probe_nr_irqs(void) { int nr; - if (nr_irqs > (NR_VECTORS * nr_cpu_ids)) - nr_irqs = NR_VECTORS * nr_cpu_ids; + /* 0x20 for ipi etc system vectors */ + nr = NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20; + + nr *= nr_cpu_ids; + + if (nr < nr_irqs) + nr_irqs = nr; nr = nr_irqs_gsi + 8 * nr_cpu_ids; #if defined(CONFIG_PCI_MSI) || defined(CONFIG_HT_IRQ) /* * for MSI and HT dyn irq */ - nr += nr_irqs_gsi * 64; + nr += nr_irqs_gsi * 256; #endif if (nr < nr_irqs) nr_irqs = nr; ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 6:58 ` [PATCH 2/2] x86: get more exact nr_irqs Yinghai Lu @ 2010-01-04 16:55 ` Eric W. Biederman 2010-01-04 19:03 ` Yinghai Lu 0 siblings, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 16:55 UTC (permalink / raw) To: Yinghai Lu Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg Yinghai Lu <yinghai@kernel.org> writes: > first check with NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20 > aka minus exceptions and system vectors. > > NR_CPUS = 512, and nr_cpu_ids = 128 > will have NR_IRQS = 256 + 512 * 64 = 33024 > > assume we have 20 intel ixgbe 6 port cards (with sriov and ixgbevf) > 20 * 6 * 64 * 3 = 23040 > > first will get: > 128 * (256 - 64) = 24576 > then with nr_irqs_gsi will get > (120 + 8 * 128 + 120 * 256) = 31864 > > so 24576 will be used for nr_irqs. > > 24576 * 8 = 196608 bytes will be used for irq_desc_ptrs[] > > before this patch: > have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 > and irq_desc_ptrs[] is 70592 > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> I am lost. arch_probe_nr_irqs appears to be total nonsense. We have three concepts. - The number of irq sources we can talk about. ( nr_irqs) - The number of irqs we can possibly service. ((NR_VECTORS - 0x30) *nr_cpu_ids) - The number of irqs we actually connected up to cards in the system that we need to do something with. Why do we need to allocate arrays at all? arch_probe_nr_irqs looks like a pile of magic numbers (even more magic with the addition of 0x20), that is always going to be a little bit wrong. We should be able to remove the arrays all together and allocate irq_desc dynamically. > --- > arch/x86/kernel/apic/io_apic.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > Index: linux-2.6/arch/x86/kernel/apic/io_apic.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c > +++ linux-2.6/arch/x86/kernel/apic/io_apic.c > @@ -3833,15 +3833,20 @@ int __init arch_probe_nr_irqs(void) > { > int nr; > > - if (nr_irqs > (NR_VECTORS * nr_cpu_ids)) > - nr_irqs = NR_VECTORS * nr_cpu_ids; > + /* 0x20 for ipi etc system vectors */ > + nr = NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20; If you are going to subtract of the number of ipis please put appropriate defines in irq_vectors.h. A raw 0x20 is wrong. > + > + nr *= nr_cpu_ids; > + > + if (nr < nr_irqs) > + nr_irqs = nr; > nr = nr_irqs_gsi + 8 * nr_cpu_ids; > #if defined(CONFIG_PCI_MSI) || defined(CONFIG_HT_IRQ) > /* > * for MSI and HT dyn irq > */ > - nr += nr_irqs_gsi * 64; > + nr += nr_irqs_gsi * 256; This part seems like magic voodoo. Why should their be a correlation between the number of gsis and the number of msis? Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 16:55 ` Eric W. Biederman @ 2010-01-04 19:03 ` Yinghai Lu 2010-01-04 19:16 ` Eric W. Biederman 0 siblings, 1 reply; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 19:03 UTC (permalink / raw) To: Eric W. Biederman Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On Mon, Jan 4, 2010 at 8:55 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Yinghai Lu <yinghai@kernel.org> writes: > >> first check with NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20 >> aka minus exceptions and system vectors. >> >> NR_CPUS = 512, and nr_cpu_ids = 128 >> will have NR_IRQS = 256 + 512 * 64 = 33024 >> >> assume we have 20 intel ixgbe 6 port cards (with sriov and ixgbevf) >> 20 * 6 * 64 * 3 = 23040 >> >> first will get: >> 128 * (256 - 64) = 24576 >> then with nr_irqs_gsi will get >> (120 + 8 * 128 + 120 * 256) = 31864 >> >> so 24576 will be used for nr_irqs. >> >> 24576 * 8 = 196608 bytes will be used for irq_desc_ptrs[] >> >> before this patch: >> have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 >> and irq_desc_ptrs[] is 70592 >> >> Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > I am lost. arch_probe_nr_irqs appears to be total nonsense. > > We have three concepts. > - The number of irq sources we can talk about. ( nr_irqs) > - The number of irqs we can possibly service. ((NR_VECTORS - 0x30) *nr_cpu_ids) > - The number of irqs we actually connected up to cards in the > system that we need to do something with. > > Why do we need to allocate arrays at all? > irq_desc is allocated dynamically. but irq_desc_ptrs is pointer array, it need to be allocated after nr_irqs is probed. YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 19:03 ` Yinghai Lu @ 2010-01-04 19:16 ` Eric W. Biederman 2010-01-04 19:30 ` H. Peter Anvin 0 siblings, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 19:16 UTC (permalink / raw) To: Yinghai Lu Cc: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg Yinghai Lu <yinghai@kernel.org> writes: > On Mon, Jan 4, 2010 at 8:55 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Yinghai Lu <yinghai@kernel.org> writes: >> >>> first check with NR_VECTORS - FIRST_EXTERNAL_VECTOR - 0x20 >>> aka minus exceptions and system vectors. >>> >>> NR_CPUS = 512, and nr_cpu_ids = 128 >>> will have NR_IRQS = 256 + 512 * 64 = 33024 >>> >>> assume we have 20 intel ixgbe 6 port cards (with sriov and ixgbevf) >>> 20 * 6 * 64 * 3 = 23040 >>> >>> first will get: >>> 128 * (256 - 64) = 24576 >>> then with nr_irqs_gsi will get >>> (120 + 8 * 128 + 120 * 256) = 31864 >>> >>> so 24576 will be used for nr_irqs. >>> >>> 24576 * 8 = 196608 bytes will be used for irq_desc_ptrs[] >>> >>> before this patch: >>> have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 >>> and irq_desc_ptrs[] is 70592 >>> >>> Signed-off-by: Yinghai Lu <yinghai@kernel.org> >> >> I am lost. arch_probe_nr_irqs appears to be total nonsense. >> >> We have three concepts. >> - The number of irq sources we can talk about. ( nr_irqs) >> - The number of irqs we can possibly service. ((NR_VECTORS - 0x30) *nr_cpu_ids) >> - The number of irqs we actually connected up to cards in the >> system that we need to do something with. >> >> Why do we need to allocate arrays at all? >> > > irq_desc is allocated dynamically. > > but irq_desc_ptrs is pointer array, it need to be allocated after > nr_irqs is probed. If we care about memory use efficiency let's replace irq_desc_ptrs with a rbtree or a radix_tree. Something that moves the memory use penalty onto those machines that have a lot of irqs. Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 19:16 ` Eric W. Biederman @ 2010-01-04 19:30 ` H. Peter Anvin 2010-01-04 19:47 ` Yinghai Lu 2010-01-04 20:05 ` Eric W. Biederman 0 siblings, 2 replies; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 19:30 UTC (permalink / raw) To: Eric W. Biederman Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 11:16 AM, Eric W. Biederman wrote: > > If we care about memory use efficiency let's replace irq_desc_ptrs > with a rbtree or a radix_tree. Something that moves the memory use > penalty onto those machines that have a lot of irqs. > rbtree doesn't make much sense for something that is addressed by index, and doesn't need to answer questions of the form "give me the highest member <= X". A hash table or radix tree makes sense, depending on the expected sparseness of the index. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 19:30 ` H. Peter Anvin @ 2010-01-04 19:47 ` Yinghai Lu 2010-01-04 20:05 ` Eric W. Biederman 1 sibling, 0 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 19:47 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 11:30 AM, H. Peter Anvin wrote: > On 01/04/2010 11:16 AM, Eric W. Biederman wrote: >> >> If we care about memory use efficiency let's replace irq_desc_ptrs >> with a rbtree or a radix_tree. Something that moves the memory use >> penalty onto those machines that have a lot of irqs. >> > > rbtree doesn't make much sense for something that is addressed by index, > and doesn't need to answer questions of the form "give me the highest > member <= X". A hash table or radix tree makes sense, depending on the > expected sparseness of the index. will check if we can use radix with it like powerpc YH ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 19:30 ` H. Peter Anvin 2010-01-04 19:47 ` Yinghai Lu @ 2010-01-04 20:05 ` Eric W. Biederman 2010-01-04 21:50 ` H. Peter Anvin 1 sibling, 1 reply; 38+ messages in thread From: Eric W. Biederman @ 2010-01-04 20:05 UTC (permalink / raw) To: H. Peter Anvin Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg "H. Peter Anvin" <hpa@zytor.com> writes: > On 01/04/2010 11:16 AM, Eric W. Biederman wrote: >> >> If we care about memory use efficiency let's replace irq_desc_ptrs >> with a rbtree or a radix_tree. Something that moves the memory use >> penalty onto those machines that have a lot of irqs. >> > > rbtree doesn't make much sense for something that is addressed by index, > and doesn't need to answer questions of the form "give me the highest > member <= X". A hash table or radix tree makes sense, depending on the > expected sparseness of the index. Not counting irqs for msi's I think we are looking 36% to 25% fill. Maybe a little lower. The sparseness is much higher if we count the number of irqs that we might/use allocate as we do today. Short of driver hotplug msis should be allocated densely, unless we start reserving all possible 4K msi-x vectors. For each ioapic we allocate 16 gsis, and only maybe four of them are connected to actual pci slots. This is essentially a slow path operation, so as long as we are not too expensive we can use any data structure we want. In kernel hash tables don't grow well so I don't think a hash table is a good choice, and a hash table is essentially what we have now. The truth is we don't know how many irqs we will have until msi supporting drivers claim all of theirs. I think a radix-tree would likely be the least intrusive choice as it does not imply any changes to the data structure indexed. Eric ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] x86: get more exact nr_irqs 2010-01-04 20:05 ` Eric W. Biederman @ 2010-01-04 21:50 ` H. Peter Anvin 0 siblings, 0 replies; 38+ messages in thread From: H. Peter Anvin @ 2010-01-04 21:50 UTC (permalink / raw) To: Eric W. Biederman Cc: Yinghai Lu, Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg On 01/04/2010 12:05 PM, Eric W. Biederman wrote: >> >> rbtree doesn't make much sense for something that is addressed by index, >> and doesn't need to answer questions of the form "give me the highest >> member <= X". A hash table or radix tree makes sense, depending on the >> expected sparseness of the index. > > Not counting irqs for msi's I think we are looking 36% to 25% fill. Maybe > a little lower. The sparseness is much higher if we count the number of > irqs that we might/use allocate as we do today. > > Short of driver hotplug msis should be allocated densely, unless we start > reserving all possible 4K msi-x vectors. > > For each ioapic we allocate 16 gsis, and only maybe four of them are > connected to actual pci slots. > > This is essentially a slow path operation, so as long as we are not > too expensive we can use any data structure we want. In kernel hash > tables don't grow well so I don't think a hash table is a good choice, > and a hash table is essentially what we have now. > > The truth is we don't know how many irqs we will have until msi > supporting drivers claim all of theirs. > > I think a radix-tree would likely be the least intrusive choice as it > does not imply any changes to the data structure indexed. > Yes, for that kind of densities radix tree is a good choice. -hpa ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/2] x86: get back 15 vectors 2010-01-04 3:20 ` Yinghai Lu 2010-01-04 6:56 ` Subject: [PATCH 1/2] x86: get back 15 vectors Yinghai Lu 2010-01-04 6:58 ` [PATCH 2/2] x86: get more exact nr_irqs Yinghai Lu @ 2010-01-04 6:59 ` Yinghai Lu 2 siblings, 0 replies; 38+ messages in thread From: Yinghai Lu @ 2010-01-04 6:59 UTC (permalink / raw) To: Jesse Brandeburg, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Eric W. Biederman Cc: linux-kernel@vger.kernel.org, Andrew Morton, NetDEV list, Jesse Brandeburg between FIRST_EXTERNAL_VECTOR (0x20) and FIRST_DEVICE_VECTOR (0x41) for 0x20 and 0x2f, we are safe be used_vectors will prevent it to use used one. also try to reuse 0x30 to 0x3f after smp_affinity for irq[0,15] is changed to other cpu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/apic/io_apic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c +++ linux-2.6/arch/x86/kernel/apic/io_apic.c @@ -1162,7 +1162,8 @@ __assign_irq_vector(int irq, struct irq_ * Also, we've got to be careful not to trash gate * 0x80, because int 0x80 is hm, kind of importantish. ;) */ - static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0; + static int current_vector = FIRST_EXTERNAL_VECTOR + 1; + static int current_offset = 0; unsigned int old_vector; int cpu, err; cpumask_var_t tmp_mask; @@ -1198,7 +1199,7 @@ next: if (vector >= first_system_vector) { /* If out of vectors on large boxen, must share them. */ offset = (offset + 1) % 8; - vector = FIRST_DEVICE_VECTOR + offset; + vector = FIRST_EXTERNAL_VECTOR + 1 + offset; } if (unlikely(current_vector == vector)) continue; ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2010-01-05 0:16 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <4B347AEE.6030705@kernel.org> [not found] ` <20091228094707.GH24690@elte.hu> [not found] ` <4B398ECD.1080506@kernel.org> 2010-01-04 3:06 ` [PATCH -v2] x86: increase NR_IRQS and nr_irqs Jesse Brandeburg 2010-01-04 3:20 ` Yinghai Lu 2010-01-04 6:56 ` Subject: [PATCH 1/2] x86: get back 15 vectors Yinghai Lu 2010-01-04 16:18 ` Eric W. Biederman 2010-01-04 18:40 ` Yinghai Lu 2010-01-04 19:04 ` Eric W. Biederman 2010-01-04 19:14 ` H. Peter Anvin 2010-01-04 19:01 ` H. Peter Anvin 2010-01-04 19:09 ` Eric W. Biederman 2010-01-04 19:35 ` Yinghai Lu 2010-01-04 19:45 ` Suresh Siddha 2010-01-04 19:50 ` H. Peter Anvin 2010-01-05 0:05 ` Suresh Siddha 2010-01-05 0:16 ` Yinghai Lu 2010-01-04 19:48 ` H. Peter Anvin 2010-01-04 20:06 ` Yinghai Lu 2010-01-04 20:14 ` Eric W. Biederman 2010-01-04 20:33 ` Yinghai Lu 2010-01-04 21:10 ` H. Peter Anvin 2010-01-04 21:20 ` Yinghai Lu 2010-01-04 21:33 ` H. Peter Anvin 2010-01-04 22:01 ` Yinghai Lu 2010-01-04 23:03 ` H. Peter Anvin 2010-01-04 23:32 ` Yinghai Lu 2010-01-04 23:38 ` H. Peter Anvin 2010-01-04 23:42 ` Yinghai Lu 2010-01-04 23:49 ` Yinghai Lu 2010-01-04 23:59 ` H. Peter Anvin 2010-01-04 20:08 ` Eric W. Biederman 2010-01-04 6:58 ` [PATCH 2/2] x86: get more exact nr_irqs Yinghai Lu 2010-01-04 16:55 ` Eric W. Biederman 2010-01-04 19:03 ` Yinghai Lu 2010-01-04 19:16 ` Eric W. Biederman 2010-01-04 19:30 ` H. Peter Anvin 2010-01-04 19:47 ` Yinghai Lu 2010-01-04 20:05 ` Eric W. Biederman 2010-01-04 21:50 ` H. Peter Anvin 2010-01-04 6:59 ` [PATCH 1/2] x86: get back 15 vectors Yinghai Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).