Re: [patch 003/152] jbd: fix commit of ordered data buffers

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
       [not found]   ` <20060929090253.GA17124@atrey.karlin.mff.cuni.cz>
@ 2006-09-29 16:11     ` Badari Pulavarty
  2006-09-29 19:20       ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Badari Pulavarty @ 2006-09-29 16:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: akpm, torvalds, stable, ext4

On Fri, 2006-09-29 at 11:02 +0200, Jan Kara wrote:
...
> > >+		}
> > >+		/* Someone already cleaned up the buffer? */
> > >+		if (!buffer_jbd(bh)
> > >+			|| jh->b_transaction != commit_transaction
> > >+			|| jh->b_jlist != BJ_SyncData) {
> > >+			jbd_unlock_bh_state(bh);
> > >+			if (locked)
> > >+				unlock_buffer(bh);
> > >+			BUFFER_TRACE(bh, "already cleaned up");
> > >+			put_bh(bh);
> > >+			continue;
>    ---> Here the buffer was refiled by someone else

I am little concerned about this particular code. We know that
some one else will do the unfile/remove - but we will keep
spinning on it till that happens. Isn't it ? Why don't we
skip it and move to next one ?

We are seeing few following message while running tests and
wondering if your patch is causing it ..

BUG: spinlock lockup on CPU#1, scp/30189, c00000000fb503d8 (Not tainted)
Call Trace:
[C000000018FDB320] [C0000000000102E0] .show_stack+0x68/0x1b0 (unreliable)
[C000000018FDB3C0] [C0000000001734F4] ._raw_spin_lock+0x138/0x184
[C000000018FDB460] [C00000000025AD24] ._spin_lock+0x10/0x24
[C000000018FDB4E0] [D000000000172E14] .journal_dirty_data+0xa4/0x2c0 [jbd]
[C000000018FDB580] [D000000000205BAC] .ext3_journal_dirty_data+0x28/0x70 [ext3]
[C000000018FDB610] [D0000000002048BC] .walk_page_buffers+0xb0/0x134 [ext3]
[C000000018FDB6D0] [D000000000208280] .ext3_ordered_commit_write+0x74/0x114 
[ext
3]
[C000000018FDB780] [C00000000007E710] .generic_file_buffered_write+0x51c/0x794
[C000000018FDB930] 
[C00000000007ECB8] .__generic_file_aio_write_nolock+0x330/0x3
bc
[C000000018FDBA20] [C00000000007EDBC] .generic_file_aio_write+0x78/0x104
[C000000018FDBAE0] [D000000000202EFC] .ext3_file_write+0x2c/0xd4 [ext3]
[C000000018FDBB70] [C0000000000A8090] .do_sync_write+0xd4/0x130
[C000000018FDBCF0] [C0000000000A8E14] .vfs_write+0x118/0x200
[C000000018FDBD90] [C0000000000A9584] .sys_write+0x4c/0x8c
[C000000018FDBE30] [C000000000008434] syscall_exit+0x0/0x40

Thanks,
Badari


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 19:20       ` Andrew Morton
@ 2006-09-29 19:18         ` Ingo Molnar
  2006-09-29 19:54           ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2006-09-29 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Badari Pulavarty, Jan Kara, torvalds, stable, ext4, Andi Kleen


* Andrew Morton <akpm@osdl.org> wrote:

> gad, there have been so many all-CPU-backtrace patches over the years.
> 
> <optimistically cc's Ingo>
> 
> Ingo, do you think that's something which we shuld have in the 
> spinlock debugging code?  A trace to let us see which CPU is holding 
> that lock, and where from?  I guess if the other cpu is stuck in 
> spin_lock_irqsave() then we'll get stuck delivering the IPI, so it'd 
> need to be async.

used to have this in -rt for i686 and x86_64 for the NMI watchdog tick 
to print on all CPUs, in the next tick (i.e. no need to actually 
initiate an IPI) - but it was all a bit hacky [but worked]. It fell 
victim to some recent flux in that area.

< optimistically cc's Andi =B-) >

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 16:11     ` [patch 003/152] jbd: fix commit of ordered data buffers Badari Pulavarty
@ 2006-09-29 19:20       ` Andrew Morton
  2006-09-29 19:18         ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2006-09-29 19:20 UTC (permalink / raw)
  To: Badari Pulavarty, Ingo Molnar; +Cc: Jan Kara, torvalds, stable, ext4

On Fri, 29 Sep 2006 09:11:46 -0700
Badari Pulavarty <pbadari@us.ibm.com> wrote:

> On Fri, 2006-09-29 at 11:02 +0200, Jan Kara wrote:
> ...
> > > >+		}
> > > >+		/* Someone already cleaned up the buffer? */
> > > >+		if (!buffer_jbd(bh)
> > > >+			|| jh->b_transaction != commit_transaction
> > > >+			|| jh->b_jlist != BJ_SyncData) {
> > > >+			jbd_unlock_bh_state(bh);
> > > >+			if (locked)
> > > >+				unlock_buffer(bh);
> > > >+			BUFFER_TRACE(bh, "already cleaned up");
> > > >+			put_bh(bh);
> > > >+			continue;
> >    ---> Here the buffer was refiled by someone else
> 
> I am little concerned about this particular code. We know that
> some one else will do the unfile/remove - but we will keep
> spinning on it till that happens. Isn't it ? Why don't we
> skip it and move to next one ?

That check looks OK to me.  Either

a) the buffer has been taken off the journal altogether or

b) it's been moved to the running transaction or

c) it's jounalled, it's ont he committing transaction but it's not on
   t_sync_datalist any more.

So it shouldn't be possible for that buffer to be on
commit_transaction->t_sync_datalist.

> We are seeing few following message while running tests and
> wondering if your patch is causing it ..
> 
> BUG: spinlock lockup on CPU#1, scp/30189, c00000000fb503d8 (Not tainted)
> Call Trace:
> [C000000018FDB320] [C0000000000102E0] .show_stack+0x68/0x1b0 (unreliable)
> [C000000018FDB3C0] [C0000000001734F4] ._raw_spin_lock+0x138/0x184
> [C000000018FDB460] [C00000000025AD24] ._spin_lock+0x10/0x24
> [C000000018FDB4E0] [D000000000172E14] .journal_dirty_data+0xa4/0x2c0 [jbd]
> [C000000018FDB580] [D000000000205BAC] .ext3_journal_dirty_data+0x28/0x70 [ext3]
> [C000000018FDB610] [D0000000002048BC] .walk_page_buffers+0xb0/0x134 [ext3]
> [C000000018FDB6D0] [D000000000208280] .ext3_ordered_commit_write+0x74/0x114 

Presumably j_list_lock got stuck.  What we really need to see here is
the backtrace from other CPUs.

gad, there have been so many all-CPU-backtrace patches over the years.

<optimistically cc's Ingo>

Ingo, do you think that's something which we shuld have in the spinlock
debugging code?  A trace to let us see which CPU is holding that lock,
and where from?  I guess if the other cpu is stuck in spin_lock_irqsave()
then we'll get stuck delivering the IPI, so it'd need to be async.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 19:18         ` Ingo Molnar
@ 2006-09-29 19:54           ` Andi Kleen
  2006-09-29 20:20             ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2006-09-29 19:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Badari Pulavarty, Jan Kara, torvalds, stable, ext4

On Friday 29 September 2006 21:18, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > gad, there have been so many all-CPU-backtrace patches over the years.
> > 
> > <optimistically cc's Ingo>
> > 
> > Ingo, do you think that's something which we shuld have in the 
> > spinlock debugging code?  A trace to let us see which CPU is holding 
> > that lock, and where from?  I guess if the other cpu is stuck in 
> > spin_lock_irqsave() then we'll get stuck delivering the IPI, so it'd 
> > need to be async.
> 
> used to have this in -rt for i686 and x86_64 for the NMI watchdog tick 
> to print on all CPUs, in the next tick (i.e. no need to actually 
> initiate an IPI) - but it was all a bit hacky [but worked]. It fell 
> victim to some recent flux in that area.

You mean spinlock debugging setting a global variable and the NMI
watchdog testing that?  Makes sense. I can put it on my todo list.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 19:54           ` Andi Kleen
@ 2006-09-29 20:20             ` Andrew Morton
  2006-09-29 21:22               ` Ingo Molnar
  2006-09-29 21:26               ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2006-09-29 20:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Badari Pulavarty, Jan Kara, torvalds, stable, ext4

On Fri, 29 Sep 2006 21:54:30 +0200
Andi Kleen <ak@suse.de> wrote:

> On Friday 29 September 2006 21:18, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@osdl.org> wrote:
> > 
> > > gad, there have been so many all-CPU-backtrace patches over the years.
> > > 
> > > <optimistically cc's Ingo>
> > > 
> > > Ingo, do you think that's something which we shuld have in the 
> > > spinlock debugging code?  A trace to let us see which CPU is holding 
> > > that lock, and where from?  I guess if the other cpu is stuck in 
> > > spin_lock_irqsave() then we'll get stuck delivering the IPI, so it'd 
> > > need to be async.
> > 
> > used to have this in -rt for i686 and x86_64 for the NMI watchdog tick 
> > to print on all CPUs, in the next tick (i.e. no need to actually 
> > initiate an IPI) - but it was all a bit hacky [but worked]. It fell 
> > victim to some recent flux in that area.
> 
> You mean spinlock debugging setting a global variable and the NMI
> watchdog testing that?  Makes sense. I can put it on my todo list.

It does make sense.

<type, type>

Something like this?  (compiled only)

From: Andrew Morton <akpm@osdl.org>

When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.

Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 arch/i386/kernel/nmi.c   |   14 ++++++++++++++
 arch/x86_64/kernel/nmi.c |   17 ++++++++++++++++-
 include/asm-i386/nmi.h   |    3 +++
 include/asm-x86_64/nmi.h |    3 +++
 include/linux/nmi.h      |    5 +++++
 lib/spinlock_debug.c     |    4 ++++
 6 files changed, 45 insertions(+), 1 deletion(-)

diff -puN lib/spinlock_debug.c~spinlock-debug-all-cpu-backtrace lib/spinlock_debug.c
--- a/lib/spinlock_debug.c~spinlock-debug-all-cpu-backtrace
+++ a/lib/spinlock_debug.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/spinlock.h>
+#include <linux/nmi.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
 #include <linux/delay.h>
@@ -116,6 +117,9 @@ static void __spin_lock_debug(spinlock_t
 				raw_smp_processor_id(), current->comm,
 				current->pid, lock);
 			dump_stack();
+#ifdef CONFIG_SMP
+			trigger_all_cpu_backtrace();
+#endif
 		}
 	}
 }
diff -puN arch/i386/kernel/nmi.c~spinlock-debug-all-cpu-backtrace arch/i386/kernel/nmi.c
--- a/arch/i386/kernel/nmi.c~spinlock-debug-all-cpu-backtrace
+++ a/arch/i386/kernel/nmi.c
@@ -23,6 +23,7 @@
 #include <linux/percpu.h>
 #include <linux/dmi.h>
 #include <linux/kprobes.h>
+#include <linux/cpumask.h>
 
 #include <asm/smp.h>
 #include <asm/nmi.h>
@@ -40,6 +41,8 @@
 static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner);
 static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]);
 
+static cpumask_t backtrace_mask = CPU_MASK_NONE;
+
 /* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
  * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
  */
@@ -905,6 +908,12 @@ __kprobes int nmi_watchdog_tick(struct p
 		touched = 1;
 	}
 
+	if (cpu_isset(cpu, backtrace_mask)) {
+		cpu_clear(cpu, backtrace_mask);
+		printk("NMI backtrace for cpu %d\n", cpu);
+		dump_stack();
+	}
+
 	sum = per_cpu(irq_stat, cpu).apic_timer_irqs;
 
 	/* if the apic timer isn't firing, this cpu isn't doing much */
@@ -1031,6 +1040,11 @@ int proc_nmi_enabled(struct ctl_table *t
 
 #endif
 
+void __trigger_all_cpu_backtrace(void)
+{
+	backtrace_mask = CPU_MASK_ALL;
+}
+
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
 EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
diff -puN arch/x86_64/kernel/nmi.c~spinlock-debug-all-cpu-backtrace arch/x86_64/kernel/nmi.c
--- a/arch/x86_64/kernel/nmi.c~spinlock-debug-all-cpu-backtrace
+++ a/arch/x86_64/kernel/nmi.c
@@ -12,14 +12,15 @@
  *  Mikael Pettersson	: PM converted to driver model. Disable/enable API.
  */
 
+#include <linux/nmi.h>
 #include <linux/mm.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/sysdev.h>
-#include <linux/nmi.h>
 #include <linux/sysctl.h>
 #include <linux/kprobes.h>
+#include <linux/cpumask.h>
 
 #include <asm/smp.h>
 #include <asm/nmi.h>
@@ -37,6 +38,8 @@
 static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner);
 static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[2]);
 
+static cpumask_t backtrace_mask = CPU_MASK_NONE;
+
 /* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
  * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
  */
@@ -778,6 +781,7 @@ int __kprobes nmi_watchdog_tick(struct p
 {
 	int sum;
 	int touched = 0;
+	int cpu = smp_processor_id();
 	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 	u64 dummy;
 	int rc=0;
@@ -795,6 +799,12 @@ int __kprobes nmi_watchdog_tick(struct p
 		touched = 1;
 	}
 
+	if (cpu_isset(cpu, backtrace_mask)) {
+		cpu_clear(cpu, backtrace_mask);
+		printk("NMI backtrace for cpu %d\n", cpu);
+		dump_stack();
+	}
+
 #ifdef CONFIG_X86_MCE
 	/* Could check oops_in_progress here too, but it's safer
 	   not too */
@@ -927,6 +937,11 @@ int proc_nmi_enabled(struct ctl_table *t
 
 #endif
 
+void __trigger_all_cpu_backtrace(void)
+{
+	backtrace_mask = CPU_MASK_ALL;
+}
+
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
 EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
diff -puN include/linux/nmi.h~spinlock-debug-all-cpu-backtrace include/linux/nmi.h
--- a/include/linux/nmi.h~spinlock-debug-all-cpu-backtrace
+++ a/include/linux/nmi.h
@@ -14,9 +14,14 @@
  * disables interrupts for a long time. This call is stateless.
  */
 #ifdef ARCH_HAS_NMI_WATCHDOG
+#include <asm/nmi.h>
 extern void touch_nmi_watchdog(void);
 #else
 # define touch_nmi_watchdog() do { } while(0)
 #endif
 
+#ifndef trigger_all_cpu_backtrace
+#define trigger_all_cpu_backtrace() do { } while (0)
+#endif
+
 #endif
diff -puN include/asm-i386/nmi.h~spinlock-debug-all-cpu-backtrace include/asm-i386/nmi.h
--- a/include/asm-i386/nmi.h~spinlock-debug-all-cpu-backtrace
+++ a/include/asm-i386/nmi.h
@@ -36,4 +36,7 @@ extern unsigned int nmi_watchdog;
 #define NMI_LOCAL_APIC	2
 #define NMI_INVALID	3
 
+void __trigger_all_cpu_backtrace(void);
+#define trigger_all_cpu_backtrace() __trigger_all_cpu_backtrace()
+
 #endif /* ASM_NMI_H */
diff -puN include/asm-x86_64/nmi.h~spinlock-debug-all-cpu-backtrace include/asm-x86_64/nmi.h
--- a/include/asm-x86_64/nmi.h~spinlock-debug-all-cpu-backtrace
+++ a/include/asm-x86_64/nmi.h
@@ -70,4 +70,7 @@ extern unsigned int nmi_watchdog;
 #define NMI_LOCAL_APIC	2
 #define NMI_INVALID	3
 
+void __trigger_all_cpu_backtrace(void);
+#define trigger_all_cpu_backtrace() __trigger_all_cpu_backtrace()
+
 #endif /* ASM_NMI_H */
_



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 20:20             ` Andrew Morton
@ 2006-09-29 21:22               ` Ingo Molnar
  2006-09-29 21:26               ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2006-09-29 21:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Badari Pulavarty, Jan Kara, torvalds, stable, ext4


* Andrew Morton <akpm@osdl.org> wrote:

> +	if (cpu_isset(cpu, backtrace_mask)) {
> +		cpu_clear(cpu, backtrace_mask);
> +		printk("NMI backtrace for cpu %d\n", cpu);
> +		dump_stack();
> +	}

yeah. There's just one issue: this will mix all the printks from all 
CPUs at once (the NMIs if IO-APIC triggered fire at once). The way i 
solved this was to add a private spinlock around the printk and 
dump_stack().

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 003/152] jbd: fix commit of ordered data buffers
  2006-09-29 20:20             ` Andrew Morton
  2006-09-29 21:22               ` Ingo Molnar
@ 2006-09-29 21:26               ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2006-09-29 21:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Badari Pulavarty, Jan Kara, torvalds, stable, ext4

* Andrew Morton <akpm@osdl.org> wrote:

> +void __trigger_all_cpu_backtrace(void)
> +{
> +	backtrace_mask = CPU_MASK_ALL;
> +}

i'd suggest cpu_online_map, i also added a busy-loop to wait for the 
backtrace_mask to go back to 0. That way we get a consistent dump all 
across. (as the NMI will dump on this CPU too)

if the NMI isnt working on some CPU, the wait might be quite long ;-) 
But it's not a big issue i think as the system is likely hosed already.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-09-29 21:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200609260630.k8Q6UrvQ011999@shell0.pdx.osdl.net>
     [not found] ` <451C4DDE.60307@us.ibm.com>
     [not found]   ` <20060929090253.GA17124@atrey.karlin.mff.cuni.cz>
2006-09-29 16:11     ` [patch 003/152] jbd: fix commit of ordered data buffers Badari Pulavarty
2006-09-29 19:20       ` Andrew Morton
2006-09-29 19:18         ` Ingo Molnar
2006-09-29 19:54           ` Andi Kleen
2006-09-29 20:20             ` Andrew Morton
2006-09-29 21:22               ` Ingo Molnar
2006-09-29 21:26               ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox