public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"
@ 2008-10-17 12:32 Max Kellermann
  2008-10-17 14:33 ` Glauber Costa
  2008-10-20  6:27 ` Ian Campbell
  0 siblings, 2 replies; 50+ messages in thread
From: Max Kellermann @ 2008-10-17 12:32 UTC (permalink / raw)
  To: linux-kernel, gcosta, ijc

Hi,

Ian: this is a follow-up to your post "NFS regression? Odd delays and
lockups accessing an NFS export" a few weeks ago
(http://lkml.org/lkml/2008/9/27/42).

I am able to trigger this bug within a few minutes on a customer's
machine (large web hoster, a *lot* of NFS traffic).

Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
"INFO: task migration/2:9 blocked for more than 120 seconds." with
varying task names.  Except for the high load average, the machine
seems to work.

With git bisect, I was finally able to identify the guilty commit,
it's not "Ensure we zap only the access and acl caches when setting
new acls" like you guessed, Ian.  According to my bisect,
6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.

Glauber: that is your patch "x86: minor adjustments for do_boot_cpu"
(http://lkml.org/lkml/2008/3/19/143).  I don't understand this patch
well, and I fail to see a connection with the symptom, but maybe
somebody else does...

See patch below (applies to 2.6.27.1).  So far, it looks like the
problem is solved on the server, no visible side effects.

Max


Revert "x86: minor adjustments for do_boot_cpu"

According to a bisect, Glauber Costa's patch induced high load and
"task ... blocked for more than 120 seconds" messages in dmesg.  This
patch reverts 6becedbb06072c5741d4057b9facecb4b3143711.

Signed-off-by: Max Kellermann <mk@cm4all.com>
---

 arch/x86/kernel/smpboot.c |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)


diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7985c5b..789cf84 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -808,7 +808,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
  * Returns zero if CPU booted OK, else error code from wakeup_secondary_cpu.
  */
 {
-	unsigned long boot_error = 0;
+	unsigned long boot_error;
 	int timeout;
 	unsigned long start_ip;
 	unsigned short nmi_high = 0, nmi_low = 0;
@@ -828,7 +828,11 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
 	}
 #endif
 
-	alternatives_smp_switch(1);
+	/*
+	 * Save current MTRR state in case it was changed since early boot
+	 * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
+	 */
+	mtrr_save_state();
 
 	c_idle.idle = get_idle_for_cpu(cpu);
 
@@ -873,6 +877,8 @@ do_rest:
 	/* start_ip had better be page-aligned! */
 	start_ip = setup_trampoline();
 
+	alternatives_smp_switch(1);
+
 	/* So we see what's up   */
 	printk(KERN_INFO "Booting processor %d/%d ip %lx\n",
 			  cpu, apicid, start_ip);
@@ -891,11 +897,6 @@ do_rest:
 		store_NMI_vector(&nmi_high, &nmi_low);
 
 		smpboot_setup_warm_reset_vector(start_ip);
-		/*
-		 * Be paranoid about clearing APIC errors.
-	 	*/
-		apic_write(APIC_ESR, 0);
-		apic_read(APIC_ESR);
 	}
 
 	/*
@@ -986,12 +987,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
 		return -ENOSYS;
 	}
 
-	/*
-	 * Save current MTRR state in case it was changed since early boot
-	 * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
-	 */
-	mtrr_save_state();
-
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
 #ifdef CONFIG_X86_32

^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2009-05-25 13:12 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-17 12:32 [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Max Kellermann
2008-10-17 14:33 ` Glauber Costa
2008-10-20  6:51   ` Max Kellermann
2008-10-20  7:43     ` Ian Campbell
2008-10-20 13:15     ` Glauber Costa
2008-10-20 14:12       ` Max Kellermann
2008-10-20 14:34         ` Cyrill Gorcunov
2008-10-20 14:21       ` Cyrill Gorcunov
2009-05-22 20:59     ` H. Peter Anvin
2009-05-25 13:12       ` Max Kellermann
2008-10-20  6:27 ` Ian Campbell
2008-11-01 11:45   ` Ian Campbell
2008-11-01 13:41     ` Trond Myklebust
2008-11-02 14:40       ` Ian Campbell
2008-11-07  2:12         ` kenneth johansson
2008-11-04 19:10       ` Ian Campbell
2008-11-25  7:09       ` Ian Campbell
2008-11-25 13:28         ` Trond Myklebust
2008-11-25 13:38           ` Ian Campbell
2008-11-25 13:57             ` Trond Myklebust
2008-11-25 14:04               ` Ian Campbell
2008-11-26 22:12                 ` Ian Campbell
2008-12-01  0:17                   ` [PATCH 0/3] " Trond Myklebust
2008-12-01  0:18                     ` [PATCH 1/3] SUNRPC: Ensure the server closes sockets in a timely fashion Trond Myklebust
2008-12-17 15:27                       ` Tom Tucker
2008-12-17 18:08                         ` Trond Myklebust
2008-12-17 18:59                           ` Tom Tucker
2008-12-01  0:19                     ` [PATCH 2/3] SUNRPC: We only need to call svc_delete_xprt() once Trond Myklebust
2008-12-01  0:20                     ` [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports Trond Myklebust
2008-12-17 15:35                       ` Tom Tucker
2008-12-17 19:07                         ` Trond Myklebust
2008-12-23 14:49                           ` Tom Tucker
2008-12-23 23:39                             ` Tom Tucker
2008-12-01  0:29                     ` [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" Trond Myklebust
2008-12-02 15:22                       ` Kasparek Tomas
2008-12-02 15:37                         ` Trond Myklebust
2008-12-02 16:26                           ` Kasparek Tomas
2008-12-02 18:10                             ` Trond Myklebust
2008-12-01 22:09                     ` Ian Campbell
2008-12-06 12:16                       ` Ian Campbell
2008-12-14 18:24                         ` Ian Campbell
2008-12-16 17:55                           ` J. Bruce Fields
2008-12-16 18:39                             ` Ian Campbell
2009-01-07 22:21                               ` J. Bruce Fields
2009-01-08 18:20                                 ` J. Bruce Fields
2009-01-08 21:22                                   ` Ian Campbell
2009-01-08 21:26                                     ` J. Bruce Fields
2009-01-12  9:46                                       ` Ian Campbell
2009-01-22  8:27                                       ` Ian Campbell
2009-01-22 16:44                                         ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox