From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751259AbaALXsM (ORCPT ); Sun, 12 Jan 2014 18:48:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:11277 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750899AbaALXsK (ORCPT ); Sun, 12 Jan 2014 18:48:10 -0500 Message-ID: <52D329AA.2010305@redhat.com> Date: Sun, 12 Jan 2014 18:47:54 -0500 From: Prarit Bhargava User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110419 Red Hat/3.1.10-1.el6_0 Thunderbird/3.1.10 MIME-Version: 1.0 To: Ingo Molnar CC: linux-kernel@vger.kernel.org, Andi Kleen , Michel Lespinasse , Seiji Aguchi , Yang Zhang , Paul Gortmaker , Janet Morgan , Tony Luck , Ruiv Wang , Gong Chen , "H. Peter Anvin" , x86@kernel.org, Thomas Gleixner Subject: Re: [PATCH] x86: Add check for number of available vectors before CPU down [v7] References: <1389220399-14591-1-git-send-email-prarit@redhat.com> <20140112095651.GA9205@gmail.com> In-Reply-To: <20140112095651.GA9205@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/12/2014 04:56 AM, Ingo Molnar wrote: > > * Prarit Bhargava wrote: > >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791 >> >> When a cpu is downed on a system, the irqs on the cpu are assigned to >> other cpus. It is possible, however, that when a cpu is downed there >> aren't enough free vectors on the remaining cpus to account for the >> vectors from the cpu that is being downed. >> >> This results in an interesting "overflow" condition where irqs are >> "assigned" to a CPU but are not handled. >> >> For example, when downing cpus on a 1-64 logical processor system: >> >> >> [ 232.021745] smpboot: CPU 61 is now offline >> [ 238.480275] smpboot: CPU 62 is now offline >> [ 245.991080] ------------[ cut here ]------------ >> [ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250() >> [ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out >> [ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas >> [ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14 >> [ 246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 >> [ 246.057371] 0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48 >> [ 246.065728] ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040 >> [ 246.074073] 0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000 >> [ 246.082430] Call Trace: >> [ 246.085174] [] dump_stack+0x46/0x58 >> [ 246.091633] [] warn_slowpath_common+0x8c/0xc0 >> [ 246.098352] [] warn_slowpath_fmt+0x46/0x50 >> [ 246.104786] [] dev_watchdog+0x246/0x250 >> [ 246.110923] [] ? dev_deactivate_queue.constprop.31+0x80/0x80 >> [ 246.119097] [] call_timer_fn+0x3a/0x110 >> [ 246.125224] [] ? update_process_times+0x6f/0x80 >> [ 246.132137] [] ? dev_deactivate_queue.constprop.31+0x80/0x80 >> [ 246.140308] [] run_timer_softirq+0x1f0/0x2a0 >> [ 246.146933] [] __do_softirq+0xe0/0x220 >> [ 246.152976] [] call_softirq+0x1c/0x30 >> [ 246.158920] [] do_softirq+0x55/0x90 >> [ 246.164670] [] irq_exit+0xa5/0xb0 >> [ 246.170227] [] smp_apic_timer_interrupt+0x4a/0x60 >> [ 246.177324] [] apic_timer_interrupt+0x6a/0x70 >> [ 246.184041] [] ? cpuidle_enter_state+0x5b/0xe0 >> [ 246.191559] [] ? cpuidle_enter_state+0x57/0xe0 >> [ 246.198374] [] cpuidle_idle_call+0xbd/0x200 >> [ 246.204900] [] arch_cpu_idle+0xe/0x30 >> [ 246.210846] [] cpu_startup_entry+0xd0/0x250 >> [ 246.217371] [] rest_init+0x77/0x80 >> [ 246.223028] [] start_kernel+0x3ee/0x3fb >> [ 246.229165] [] ? repair_env_string+0x5e/0x5e >> [ 246.235787] [] x86_64_start_reservations+0x2a/0x2c >> [ 246.242990] [] x86_64_start_kernel+0xf8/0xfc >> [ 246.249610] ---[ end trace fb74fdef54d79039 ]--- >> [ 246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout >> [ 246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter >> Last login: Mon Nov 11 08:35:14 from 10.18.17.119 >> [root@(none) ~]# [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5 >> [ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX >> [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5 >> [ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX >> >> (last lines keep repeating. ixgbe driver is dead until module reload.) >> >> If the downed cpu has more vectors than are free on the remaining cpus on the >> system, it is possible that some vectors are "orphaned" even though they are >> assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the >> watchdog fired and notified that something was wrong. >> >> This patch adds a function, check_vectors(), to compare the number of vectors >> on the CPU going down and compares it to the number of vectors available on >> the system. If there aren't enough vectors for the CPU to go down, an >> error is returned and propogated back to userspace. >> >> v2: Do not need to look at percpu irqs >> v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest >> Priority Mode >> v4: Additional changes suggested by Gong Chen. >> v5/v6/v7: Updated comment text >> >> Signed-off-by: Prarit Bhargava >> Reviewed-by: Gong Chen >> Cc: Andi Kleen >> Cc: Michel Lespinasse >> Cc: Seiji Aguchi >> Cc: Yang Zhang >> Cc: Paul Gortmaker >> Cc: Janet Morgan >> Cc: Tony Luck >> Cc: Ruiv Wang >> Cc: Gong Chen >> Cc: H. Peter Anvin >> Cc: Gong Chen >> Cc: x86@kernel.org >> --- >> arch/x86/include/asm/irq.h | 1 + >> arch/x86/kernel/irq.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ >> arch/x86/kernel/smpboot.c | 6 ++++ >> 3 files changed, 76 insertions(+) >> >> diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h >> index 0ea10f27..cb6cfcd 100644 >> --- a/arch/x86/include/asm/irq.h >> +++ b/arch/x86/include/asm/irq.h >> @@ -25,6 +25,7 @@ extern void irq_ctx_init(int cpu); >> >> #ifdef CONFIG_HOTPLUG_CPU >> #include >> +extern int check_irq_vectors_for_cpu_disable(void); >> extern void fixup_irqs(void); >> extern void irq_force_complete_move(int); >> #endif >> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c >> index 22d0687..579a49e 100644 >> --- a/arch/x86/kernel/irq.c >> +++ b/arch/x86/kernel/irq.c >> @@ -262,6 +262,75 @@ __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) >> EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq); >> >> #ifdef CONFIG_HOTPLUG_CPU >> +/* >> + * This cpu is going to be removed and its vectors migrated to the remaining >> + * online cpus. Check to see if there are enough vectors in the remaining cpus. >> + */ >> +int check_irq_vectors_for_cpu_disable(void) >> +{ >> + int irq, cpu; >> + unsigned int this_cpu, vector, this_count, count; >> + struct irq_desc *desc; >> + struct irq_data *data; >> + struct cpumask affinity_new, online_new; >> + >> + this_cpu = smp_processor_id(); >> + cpumask_copy(&online_new, cpu_online_mask); >> + cpu_clear(this_cpu, online_new); >> + >> + this_count = 0; >> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) { >> + irq = __this_cpu_read(vector_irq[vector]); >> + if (irq >= 0) { >> + desc = irq_to_desc(irq); >> + data = irq_desc_get_irq_data(desc); >> + cpumask_copy(&affinity_new, data->affinity); >> + cpu_clear(this_cpu, affinity_new); >> + >> + /* Do not count inactive or per-cpu irqs. */ >> + if (!irq_has_action(irq) || irqd_is_per_cpu(data)) >> + continue; >> + >> + /* >> + * A single irq may be mapped to multiple >> + * cpu's vector_irq[] (for example IOAPIC cluster >> + * mode). In this case we have two >> + * possibilities: >> + * >> + * 1) the resulting affinity mask is empty; that is >> + * this the down'd cpu is the last cpu in the irq's >> + * affinity mask, or >> + * >> + * 2) the resulting affinity mask is no longer >> + * a subset of the online cpus but the affinity >> + * mask is not zero; that is the down'd cpu is the >> + * last online cpu in a user set affinity mask. >> + */ >> + if (cpumask_empty(&affinity_new) || >> + !cpumask_subset(&affinity_new, &online_new)) >> + this_count++; >> + } >> + } >> + >> + count = 0; >> + for_each_online_cpu(cpu) { >> + if (cpu == this_cpu) >> + continue; >> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; >> + vector++) { >> + if (per_cpu(vector_irq, cpu)[vector] < 0) >> + count++; >> + } >> + } >> + >> + if (count < this_count) { >> + pr_warn("CPU %d disable failed: CPU has %u vectors assigned and there are only %u available.\n", >> + this_cpu, this_count, count); >> + return -ERANGE; >> + } >> + return 0; >> +} >> + >> /* A cpu has been removed from cpu_online_mask. Reset irq affinities. */ >> void fixup_irqs(void) >> { >> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >> index 85dc05a..391ea52 100644 >> --- a/arch/x86/kernel/smpboot.c >> +++ b/arch/x86/kernel/smpboot.c >> @@ -1312,6 +1312,12 @@ void cpu_disable_common(void) >> >> int native_cpu_disable(void) >> { >> + int ret; >> + >> + ret = check_irq_vectors_for_cpu_disable(); >> + if (ret) >> + return ret; >> + >> clear_local_APIC(); >> >> cpu_disable_common(); > > Looks mostly OK, but how about locking: can an IRQ be allocated in > parallel while all this is going on? Ingo, No -- the whole thing is called under stop_machine(). P. > > Thanks, > > Ingo