* [Cluster-devel] [GFS2] Git tree update
@ 2007-04-06 9:19 Steven Whitehouse
0 siblings, 0 replies; 10+ messages in thread
From: Steven Whitehouse @ 2007-04-06 9:19 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Linus has announced 2.6.21-rc6, so I've updated the -nmw GFS2 git tree
accordingly,
Steve.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] GFS2 git tree update
@ 2007-08-06 7:37 Steven Whitehouse
0 siblings, 0 replies; 10+ messages in thread
From: Steven Whitehouse @ 2007-08-06 7:37 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Linus has released 2.6.23-rc2 so I've rebased the GFS2/DLM -nmw git
tree,
Steve.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] GFS2 git tree update
@ 2007-09-25 8:55 Steven Whitehouse
0 siblings, 0 replies; 10+ messages in thread
From: Steven Whitehouse @ 2007-09-25 8:55 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Linus has released 2.6.23-rc8, so I've updated the -nmw git tree to that
level,
Steve.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
@ 2007-10-29 8:29 Steven Whitehouse
2007-10-29 10:25 ` Fabio Massimo Di Nitto
0 siblings, 1 reply; 10+ messages in thread
From: Steven Whitehouse @ 2007-10-29 8:29 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Since Linus released -rc1 when I was on holiday last week, I've just
updated the git tree. There have been quite a lot of changes in the core
kernel which affect GFS2 but which were not part of the GFS2 -nmw git
tree in this last merge window, so thats another reason for updating so
that we have all those changes in the -nmw tree now,
Steve.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
2007-10-29 8:29 [Cluster-devel] [GFS2] Git " Steven Whitehouse
@ 2007-10-29 10:25 ` Fabio Massimo Di Nitto
2007-10-31 15:34 ` Steven Whitehouse
0 siblings, 1 reply; 10+ messages in thread
From: Fabio Massimo Di Nitto @ 2007-10-29 10:25 UTC (permalink / raw)
To: cluster-devel.redhat.com
Steven Whitehouse wrote:
> Hi,
>
> Since Linus released -rc1 when I was on holiday last week, I've just
> updated the git tree. There have been quite a lot of changes in the core
> kernel which affect GFS2 but which were not part of the GFS2 -nmw git
> tree in this last merge window, so thats another reason for updating so
> that we have all those changes in the -nmw tree now,
>
> Steve.
>
Hi Steve,
I am still getting a bunch of OOPS at mount time with this very latest tree.
Last commit in -nwm: 9904cd0ecaf9c72de837214fa1b359d4c87220a8
dmesg in attachement.
Cheers
Fabio
--
I'm going to make him an offer he can't refuse.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20071029/18ff174f/attachment.ksh>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
2007-10-29 10:25 ` Fabio Massimo Di Nitto
@ 2007-10-31 15:34 ` Steven Whitehouse
2007-11-01 7:30 ` Fabio Massimo Di Nitto
2007-11-08 17:35 ` Fabio Massimo Di Nitto
0 siblings, 2 replies; 10+ messages in thread
From: Steven Whitehouse @ 2007-10-31 15:34 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
On Mon, 2007-10-29 at 11:25 +0100, Fabio Massimo Di Nitto wrote:
> Steven Whitehouse wrote:
> > Hi,
> >
> > Since Linus released -rc1 when I was on holiday last week, I've just
> > updated the git tree. There have been quite a lot of changes in the core
> > kernel which affect GFS2 but which were not part of the GFS2 -nmw git
> > tree in this last merge window, so thats another reason for updating so
> > that we have all those changes in the -nmw tree now,
> >
> > Steve.
> >
>
> Hi Steve,
>
> I am still getting a bunch of OOPS at mount time with this very latest tree.
>
> Last commit in -nwm: 9904cd0ecaf9c72de837214fa1b359d4c87220a8
>
> dmesg in attachement.
>
> Cheers
> Fabio
>
This is a bit odd. I've been trying to track this down, but I've not
been able to reproduce it here. The odd thing is that it appears from
the context that the spin lock has been unlocked by something inside the
call to run_queue() (from gfs2_glmutex_unlock()) but I can't spot where
that has happened.
Even more odd, the original report hit a different area of the code: the
check for the spinlock in gfs2_demote_wake(). The reason that particular
one stood out is that according to the stack trace it was called from
drop_bh() where the preceding function call was to lock that very lock.
So in that case it looks like a race with something else getting to the
glock's spin lock in the space of the call to that function. If you are
running with spin lock debugging, that means that the spin lock must
have been initialised correctly, otherwise that would have flagged up
earlier, so it looks like its been overwritten, or that the glock has
been freed underneath it, both of which events should be impossible.
So I'm still looking into this, but slightly confused at the moment. Has
anybody else seen anything similar? I still can't figure out why you see
this and I don't,
Steve.
> plain text document attachment (dmesg)
> [ 0.000000] Linux version 2.6.24-rc1 (fabbione at gordian) (gcc version 4.2.3 20071023 (prerelease) (Ubuntu 4.2.2-3ubuntu3)) #1 Mon Oct 29 09:57:01 CET 2007
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
> [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
> [ 0.000000] BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
> [ 0.000000] BIOS-e820: 000000000fef0000 - 000000000feff000 (ACPI data)
> [ 0.000000] BIOS-e820: 000000000feff000 - 000000000ff00000 (ACPI NVS)
> [ 0.000000] BIOS-e820: 000000000ff00000 - 0000000010000000 (usable)
> [ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
> [ 0.000000] 256MB LOWMEM available.
> [ 0.000000] Entering add_active_range(0, 0, 65536) 0 entries of 256 used
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0 -> 4096
> [ 0.000000] Normal 4096 -> 65536
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] early_node_map[1] active PFN ranges
> [ 0.000000] 0: 0 -> 65536
> [ 0.000000] On node 0 totalpages: 65536
> [ 0.000000] DMA zone: 32 pages used for memmap
> [ 0.000000] DMA zone: 0 pages reserved
> [ 0.000000] DMA zone: 4064 pages, LIFO batch:0
> [ 0.000000] Normal zone: 480 pages used for memmap
> [ 0.000000] Normal zone: 60960 pages, LIFO batch:15
> [ 0.000000] Movable zone: 0 pages used for memmap
> [ 0.000000] DMI present.
> [ 0.000000] Allocating PCI resources starting at 20000000 (gap: 10000000:eec00000)
> [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 65024
> [ 0.000000] Kernel command line: root=UUID=eaf2480b-e70d-4420-8223-a13ddc6b5277 ro quiet splash
> [ 0.000000] Enabling fast FPU save and restore... done.
> [ 0.000000] Enabling unmasked SIMD FPU exception support... done.
> [ 0.000000] Initializing CPU#0
> [ 0.000000] PID hash table entries: 1024 (order: 10, 4096 bytes)
> [ 0.000000] Detected 2161.181 MHz processor.
> [ 8600.278550] Console: colour VGA+ 80x25
> [ 8600.278763] console [tty0] enabled
> [ 8600.279619] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
> [ 8600.279742] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
> [ 8600.280599] Memory: 251276k/262144k available (1473k kernel code, 10192k reserved, 657k data, 136k init, 0k highmem)
> [ 8600.280628] virtual kernel memory layout:
> [ 8600.280632] fixmap : 0xffffb000 - 0xfffff000 ( 16 kB)
> [ 8600.280634] vmalloc : 0xd0800000 - 0xffff9000 ( 759 MB)
> [ 8600.280637] lowmem : 0xc0000000 - 0xd0000000 ( 256 MB)
> [ 8600.280638] .init : 0xc0318000 - 0xc033a000 ( 136 kB)
> [ 8600.280640] .data : 0xc0270577 - 0xc0314b5c ( 657 kB)
> [ 8600.280641] .text : 0xc0100000 - 0xc0270577 (1473 kB)
> [ 8600.280662] Checking if this processor honours the WP bit even in supervisor mode... Ok.
> [ 8600.281519] SLUB: Genslabs=11, HWalign=64, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
> [ 8600.431714] Calibrating delay using timer specific routine.. 4326.86 BogoMIPS (lpj=21634340)
> [ 8600.431870] Mount-cache hash table entries: 512
> [ 8600.432026] CPU: After generic identify, caps: 0febfbff 20100000 00000000 00000000 00002211 00000000 00000001 00000000
> [ 8600.432182] CPU: L1 I cache: 32K, L1 D cache: 32K
> [ 8600.432208] CPU: L2 cache: 4096K
> [ 8600.432320] CPU: After all inits, caps: 0febfbff 20100000 00000000 00003940 00002211 00000000 00000001 00000000
> [ 8600.432457] Compat vDSO mapped to ffffe000.
> [ 8600.432613] CPU: Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz stepping 08
> [ 8600.432765] Checking 'hlt' instruction... OK.
> [ 8600.471731] Freeing SMP alternatives: 0k freed
> [ 8600.475196] net_namespace: 64 bytes
> [ 8600.475770] NET: Registered protocol family 16
> [ 8600.476856] PCI: PCI BIOS revision 2.10 entry at 0xfd9a0, last bus=2
> [ 8600.476889] PCI: Using configuration type 1
> [ 8600.476914] Setting up standard PCI resources
> [ 8600.478911] PCI: Probing PCI hardware
> [ 8600.479011] PCI: Probing PCI hardware (bus 00)
> [ 8600.479393] PCI quirk: region 1000-103f claimed by PIIX4 ACPI
> [ 8600.479427] PCI quirk: region 1040-104f claimed by PIIX4 SMB
> [ 8600.481295] PCI: Using IRQ router PIIX/ICH [8086/7110] at 0000:00:07.0
> [ 8600.481451] PCI: setting IRQ 11 as level-triggered
> [ 8600.481479] PCI: Found IRQ 11 for device 0000:00:11.0
> [ 8600.502857] PCI: Bridge: 0000:00:01.0
> [ 8600.502873] IO window: disabled.
> [ 8600.502938] MEM window: disabled.
> [ 8600.502984] PREFETCH window: disabled.
> [ 8600.503049] PCI: Bridge: 0000:00:11.0
> [ 8600.503082] IO window: 2000-2fff
> [ 8600.503117] MEM window: 20000000-200fffff
> [ 8600.503134] PREFETCH window: disabled.
> [ 8600.503290] PCI: Setting latency timer of device 0000:00:01.0 to 64
> [ 8600.503446] PCI: Found IRQ 11 for device 0000:00:11.0
> [ 8600.503602] NET: Registered protocol family 2
> [ 8600.511710] Time: tsc clocksource has been installed.
> [ 8600.591507] IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
> [ 8600.591663] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
> [ 8600.591727] TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
> [ 8600.591795] TCP: Hash tables configured (established 8192 bind 8192)
> [ 8600.591849] TCP reno registered
> [ 8600.621458] Unpacking initramfs... done
> [ 8600.811142] Freeing initrd memory: 5389k freed
> [ 8600.812831] IA-32 Microcode Update Driver: v1.14a <tigran@aivazian.fsnet.co.uk>
> [ 8600.814163] io scheduler noop registered
> [ 8600.814223] io scheduler cfq registered (default)
> [ 8600.814306] Limiting direct PCI/PCI transfers.
> [ 8600.814391] Boot video device is 0000:00:0f.0
> [ 8600.840908] Real Time Clock Driver v1.12ac
> [ 8600.840908] toshiba: not a supported Toshiba laptop
> [ 8601.354347] serio: i8042 KBD port at 0x60,0x64 irq 1
> [ 8601.354458] serio: i8042 AUX port at 0x60,0x64 irq 12
> [ 8601.355472] TCP cubic registered
> [ 8601.355496] NET: Registered protocol family 1
> [ 8601.355629] NET: Registered protocol family 10
> [ 8601.355785] lo: Disabled Privacy Extensions
> [ 8601.355941] Using IPI Shortcut mode
> [ 8601.360324] Freeing unused kernel memory: 136k freed
> [ 8601.362943] input: AT Translated Set 2 keyboard as /class/input/input0
> [ 8601.859353] SCSI subsystem initialized
> [ 8601.863796] pcnet32.c:v1.34 14.Aug.2007 tsbogend at alpha.franken.de
> [ 8601.863952] pcnet32: PCnet/PCI II 79C970A at 0x2000, 00 0c 29 96 8c ce assigned IRQ 11.
> [ 8601.864108] eth0: registered as PCnet/PCI II 79C970A
> [ 8601.864248] pcnet32: 1 cards_found.
> [ 8601.875638] Fusion MPT base driver 3.04.06
> [ 8601.875653] Copyright (c) 1999-2007 LSI Corporation
> [ 8601.879201] Fusion MPT SPI Host driver 3.04.06
> [ 8601.879357] PCI: setting IRQ 9 as level-triggered
> [ 8601.879362] PCI: Found IRQ 9 for device 0000:00:10.0
> [ 8601.879518] mptbase: ioc0: Initiating bringup
> [ 8602.058963] ioc0: LSI53C1030 B0: Capabilities={Initiator}
> [ 8602.378627] scsi0 : ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, MaxQ=128, IRQ=9
> [ 8602.659033] scsi 0:0:0:0: Direct-Access VMware, VMware Virtual S 1.0 PQ: 0 ANSI: 2
> [ 8602.659189] target0:0:0: Beginning Domain Validation
> [ 8602.659345] target0:0:0: Domain Validation skipping write tests
> [ 8602.659369] target0:0:0: Ending Domain Validation
> [ 8602.659525] target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
> [ 8602.677893] sd 0:0:0:0: [sda] 2097152 512-byte hardware sectors (1074 MB)
> [ 8602.678049] sd 0:0:0:0: [sda] Write Protect is off
> [ 8602.678076] sd 0:0:0:0: [sda] Mode Sense: 5d 00 00 00
> [ 8602.678199] sd 0:0:0:0: [sda] Cache data unavailable
> [ 8602.678224] sd 0:0:0:0: [sda] Assuming drive cache: write through
> [ 8602.678380] sd 0:0:0:0: [sda] 2097152 512-byte hardware sectors (1074 MB)
> [ 8602.678498] sd 0:0:0:0: [sda] Write Protect is off
> [ 8602.678502] sd 0:0:0:0: [sda] Mode Sense: 5d 00 00 00
> [ 8602.678549] sd 0:0:0:0: [sda] Cache data unavailable
> [ 8602.678552] sd 0:0:0:0: [sda] Assuming drive cache: write through
> [ 8602.678708] sda: sda1 sda2 < sda5 >
> [ 8602.681818] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 8602.682303] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ 8602.741145] EXT3-fs: INFO: recovery required on readonly filesystem.
> [ 8602.741163] EXT3-fs: write access will be enabled during recovery.
> [ 8602.852901] kjournald starting. Commit interval 5 seconds
> [ 8602.853098] EXT3-fs: sda1: orphan cleanup on readonly fs
> [ 8602.853254] ext3_orphan_cleanup: deleting unreferenced inode 87514
> [ 8602.857672] ext3_orphan_cleanup: deleting unreferenced inode 34644
> [ 8602.857891] ext3_orphan_cleanup: deleting unreferenced inode 33228
> [ 8602.858090] ext3_orphan_cleanup: deleting unreferenced inode 33226
> [ 8602.858407] ext3_orphan_cleanup: deleting unreferenced inode 33042
> [ 8602.858588] ext3_orphan_cleanup: deleting unreferenced inode 32690
> [ 8602.858607] ext3_orphan_cleanup: deleting unreferenced inode 44139
> [ 8602.858803] ext3_orphan_cleanup: deleting unreferenced inode 32600
> [ 8602.859009] EXT3-fs: sda1: 8 orphan inodes deleted
> [ 8602.859033] EXT3-fs: recovery complete.
> [ 8602.861884] EXT3-fs: mounted filesystem with ordered data mode.
> [ 8603.509589] eth0: link up
> [ 8607.232336] loop: module loaded
> [ 8607.422493] Adding 112412k swap on /dev/sda5. Priority:-1 extents:1 across:112412k
> [ 8607.522211] EXT3 FS on sda1, internal journal
> [ 8607.681659] NET: Registered protocol family 17
> [ 8609.955508] aoe: AoE v32 initialised.
> [ 8609.978283] aoe: e1.0: setting 1024 byte data frames on eth0
> [ 8609.985190] aoe: e0.0: setting 1024 byte data frames on eth0
> [ 8609.995532] aoe: 00055d5e8f24 e1.0 v400a has 2097152 sectors
> [ 8609.995629] aoe: 00055d5e8f24 e0.0 v400a has 10485760 sectors
> [ 8609.995791] etherd/e1.0: unknown partition table
> [ 8610.007530] etherd/e0.0: unknown partition table
> [ 8612.783214] DLM (built Oct 29 2007 09:54:25) installed
> [ 8613.430435] GFS2 (built Oct 29 2007 09:54:33) installed
> [ 8613.459845] Lock_DLM (built Oct 29 2007 09:54:38) installed
> [ 8614.448363] eth0: no IPv6 routers present
> [ 9461.041465] GFS <CVS> (built Oct 29 2007 10:48:13) installed
> [ 9827.567183] dlm: closing connection to node 2
> [ 9830.412134] dlm: closing connection to node 3
> [ 9882.757970] dlm: closing connection to node 1
> [ 9985.743663] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
> [ 9985.791351] dlm: Using TCP for communications
> [ 9985.875126] GFS2: fsid=gutsy:gfs2.0: Joined cluster. Now mounting FS...
> [ 9985.999197] GFS2: fsid=gutsy:gfs2.0: jid=0, already locked for use
> [ 9985.999236] GFS2: fsid=gutsy:gfs2.0: jid=0: Looking at journal...
> [ 9986.053325] ------------[ cut here ]------------
> [ 9986.053428] kernel BUG at fs/gfs2/glock.c:704!
> [ 9986.053513] invalid opcode: 0000 [#1]
> [ 9986.053645] Modules linked in: gfs lock_dlm gfs2 dlm configfs aoe af_packet loop evdev sg sd_mod mptspi mptscsih mptbase scsi_transport_spi pcnet32 mii scsi_mod
> [ 9986.053948]
> [ 9986.054079] Pid: 3151, comm: gfs2_scand Not tainted (2.6.24-rc1 #1)
> [ 9986.054161] EIP: 0060:[<d08e1a58>] EFLAGS: 00010296 CPU: 0
> [ 9986.054540] EIP is at gfs2_glmutex_unlock+0x18/0x20 [gfs2]
> [ 9986.054626] EAX: 00000000 EBX: c1516488 ECX: c15164cc EDX: c1516488
> [ 9986.054707] ESI: 00000000 EDI: c1516488 EBP: 00000000 ESP: c2271fb4
> [ 9986.054788] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 9986.054900] Process gfs2_scand (pid: 3151, ti=c2270000 task=c261d9f0 task.ti=c2270000)
> [ 9986.054992] Stack: d08e080d d08e2050 0000069c 00000000 d08e0840 00000000 d08e0871 fffffffc
> [ 9986.055173] c0122c42 c0122c00 00000000 00000000 c0102b2b c1c07e8c 00000000 00000000
> [ 9986.055303] 00000000 0009b4a1 00001c01
> [ 9986.055374] Call Trace:
> [ 9986.055566] [<d08e080d>] examine_bucket+0x4d/0x80 [gfs2]
> [ 9986.055679] [<d08e2050>] scan_glock+0x0/0x60 [gfs2]
> [ 9986.055769] [<d08e0840>] gfs2_scand+0x0/0x60 [gfs2]
> [ 9986.055845] [<d08e0871>] gfs2_scand+0x31/0x60 [gfs2]
> [ 9986.055929] [<c0122c42>] kthread+0x42/0x70
> [ 9986.056154] [<c0122c00>] kthread+0x0/0x70
> [ 9986.056215] [<c0102b2b>] kernel_thread_helper+0x7/0x1c
> [ 9986.056397] =======================
> [ 9986.056472] Code: 5b e9 bd fd ff ff 8d b6 00 00 00 00 8d bc 27 00 00 00 00 0f ba 70 08 01 c7 40 2c 00 00 00 00 c7 40 30 00 00 00 00 e8 38 f9 ff ff <0f> 0b eb fe 8d 74 26 00 83 ec 0c 89 c1 89 5c 24 04 89 74 24 08
> [ 9986.056816] EIP: [<d08e1a58>] gfs2_glmutex_unlock+0x18/0x20 [gfs2] SS:ESP 0068:c2271fb4
> [ 9986.539316] GFS2: fsid=gutsy:gfs2.0: jid=0: Done
> [ 9986.539620] GFS2: fsid=gutsy:gfs2.0: jid=1: Trying to acquire journal lock...
> [ 9986.552102] GFS2: fsid=gutsy:gfs2.0: jid=1: Looking at journal...
> [ 9987.083169] GFS2: fsid=gutsy:gfs2.0: jid=1: Done
> [ 9987.083407] GFS2: fsid=gutsy:gfs2.0: jid=2: Trying to acquire journal lock...
> [ 9987.086628] ------------[ cut here ]------------
> [ 9987.087017] kernel BUG at fs/gfs2/glock.c:464!
> [ 9987.087187] invalid opcode: 0000 [#2]
> [ 9987.087563] Modules linked in: gfs lock_dlm gfs2 dlm configfs aoe af_packet loop evdev sg sd_mod mptspi mptscsih mptbase scsi_transport_spi pcnet32 mii scsi_mod
> [ 9987.089921]
> [ 9987.090108] Pid: 4053, comm: lock_dlm2 Tainted: G D (2.6.24-rc1 #1)
> [ 9987.090290] EIP: 0060:[<d08df890>] EFLAGS: 00010246 CPU: 0
> [ 9987.090589] EIP is at gfs2_demote_wake+0x0/0x10 [gfs2]
> [ 9987.090882] EAX: c227bbc8 EBX: c227bbc8 ECX: 00000000 EDX: 00000000
> [ 9987.091060] ESI: 00000000 EDI: 00000000 EBP: c247c000 ESP: c14a5f34
> [ 9987.091245] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 9987.091418] Process lock_dlm2 (pid: 4053, ti=c14a4000 task=c1512a60 task.ti=c14a4000)
> [ 9987.091602] Stack: d08e0ca4 c247c000 c14a5fb4 d08f8e80 c247c000 c14a5fb4 c227bbc8 c1708000
> [ 9987.092870] d08e0ab7 c026e7b9 c1708150 c1708000 c2762700 fffefffe d089799c c1512a90
> [ 9987.094058] 0646ec9e 00000143 00000001 00003359 c033f730 c14a5fb4 c170818c 00000000
> [ 9987.095675] Call Trace:
> [ 9987.095973] [<d08e0ca4>] drop_bh+0x84/0x150 [gfs2]
> [ 9987.096265] [<d08e0ab7>] gfs2_glock_cb+0x87/0x1a0 [gfs2]
> [ 9987.096563] [<c026e7b9>] schedule+0x149/0x270
> [ 9987.096861] [<d089799c>] gdlm_thread+0x3bc/0x660 [lock_dlm]
> [ 9987.097165] [<c010ca90>] default_wake_function+0x0/0x10
> [ 9987.097461] [<d0897c40>] gdlm_thread2+0x0/0x10 [lock_dlm]
> [ 9987.097745] [<c0122c42>] kthread+0x42/0x70
> [ 9987.098126] [<c0122c00>] kthread+0x0/0x70
> [ 9987.098541] [<c0102b2b>] kernel_thread_helper+0x7/0x1c
> [ 9987.099088] =======================
> [ 9987.099254] Code: 24 89 d3 89 74 24 04 89 c6 e8 9d fd ff ff 89 d9 89 f2 8b 1c 24 8b 74 24 04 83 c4 08 e9 6a ff ff ff 8d 76 00 8d bc 27 00 00 00 00 <0f> 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 53 89 c2 8b 58
> [ 9987.107178] EIP: [<d08df890>] gfs2_demote_wake+0x0/0x10 [gfs2] SS:ESP 0068:c14a5f34
> [ 9987.107816] ------------[ cut here ]------------
> [ 9987.108003] kernel BUG at fs/gfs2/glock.c:464!
> [ 9987.108171] invalid opcode: 0000 [#3]
> [ 9987.108492] Modules linked in: gfs lock_dlm gfs2 dlm configfs aoe af_packet loop evdev sg sd_mod mptspi mptscsih mptbase scsi_transport_spi pcnet32 mii scsi_mod
> [ 9987.111008]
> [ 9987.111159] Pid: 4052, comm: lock_dlm1 Tainted: G D (2.6.24-rc1 #1)
> [ 9987.111339] EIP: 0060:[<d08df890>] EFLAGS: 00010246 CPU: 0
> [ 9987.111522] EIP is at gfs2_demote_wake+0x0/0x10 [gfs2]
> [ 9987.111692] EAX: c1516658 EBX: c1516658 ECX: 00000000 EDX: c1516658
> [ 9987.111869] ESI: 00000000 EDI: 00000000 EBP: c247c000 ESP: c246bf34
> [ 9987.112048] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 9987.112222] Process lock_dlm1 (pid: 4052, ti=c246a000 task=c1512530 task.ti=c246a000)
> [ 9987.112407] Stack: d08e0ca4 c247c000 c246bfb4 d08f9040 c247c000 c246bfb4 c1516658 c1708000
> [ 9987.113591] d08e0ab7 c026e7b9 c1708150 c1708000 c1c5db80 fffefffe d089799c c1512560
> [ 9987.114866] 06113ba1 00000143 00000001 000045a5 c033f730 c246bfb4 c170818c 00000001
> [ 9987.116068] Call Trace:
> [ 9987.116365] [<d08e0ca4>] drop_bh+0x84/0x150 [gfs2]
> [ 9987.116656] [<d08e0ab7>] gfs2_glock_cb+0x87/0x1a0 [gfs2]
> [ 9987.116947] [<c026e7b9>] schedule+0x149/0x270
> [ 9987.117224] [<d089799c>] gdlm_thread+0x3bc/0x660 [lock_dlm]
> [ 9987.117514] [<c010ca90>] default_wake_function+0x0/0x10
> [ 9987.117796] [<d0897c50>] gdlm_thread1+0x0/0x10 [lock_dlm]
> [ 9987.118105] [<c0122c42>] kthread+0x42/0x70
> [ 9987.118384] [<c0122c00>] kthread+0x0/0x70
> [ 9987.119677] [<c0102b2b>] kernel_thread_helper+0x7/0x1c
> [ 9987.119957] =======================
> [ 9987.120119] Code: 24 89 d3 89 74 24 04 89 c6 e8 9d fd ff ff 89 d9 89 f2 8b 1c 24 8b 74 24 04 83 c4 08 e9 6a ff ff ff 8d 76 00 8d bc 27 00 00 00 00 <0f> 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 53 89 c2 8b 58
> [ 9987.127641] EIP: [<d08df890>] gfs2_demote_wake+0x0/0x10 [gfs2] SS:ESP 0068:c246bf34
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
2007-10-31 15:34 ` Steven Whitehouse
@ 2007-11-01 7:30 ` Fabio Massimo Di Nitto
2007-11-08 17:35 ` Fabio Massimo Di Nitto
1 sibling, 0 replies; 10+ messages in thread
From: Fabio Massimo Di Nitto @ 2007-11-01 7:30 UTC (permalink / raw)
To: cluster-devel.redhat.com
Steven Whitehouse wrote:
> Hi,
>
> On Mon, 2007-10-29 at 11:25 +0100, Fabio Massimo Di Nitto wrote:
>> Steven Whitehouse wrote:
>>> Hi,
>>>
>>> Since Linus released -rc1 when I was on holiday last week, I've just
>>> updated the git tree. There have been quite a lot of changes in the core
>>> kernel which affect GFS2 but which were not part of the GFS2 -nmw git
>>> tree in this last merge window, so thats another reason for updating so
>>> that we have all those changes in the -nmw tree now,
>>>
>>> Steve.
>>>
>> Hi Steve,
>>
>> I am still getting a bunch of OOPS at mount time with this very latest tree.
>>
>> Last commit in -nwm: 9904cd0ecaf9c72de837214fa1b359d4c87220a8
>>
>> dmesg in attachement.
>>
>> Cheers
>> Fabio
>>
> This is a bit odd. I've been trying to track this down, but I've not
> been able to reproduce it here. The odd thing is that it appears from
> the context that the spin lock has been unlocked by something inside the
> call to run_queue() (from gfs2_glmutex_unlock()) but I can't spot where
> that has happened.
Could it be related to the setup I am using or the kernel config?
This is a testbed so I can do virtually everything to debug and test.
The device is one 5 GB aoe disk. The cluster is a 3 node vmware workstation
machines. The config is a simple as it can be to run such cluster.
> Even more odd, the original report hit a different area of the code: the
> check for the spinlock in gfs2_demote_wake(). The reason that particular
> one stood out is that according to the stack trace it was called from
> drop_bh() where the preceding function call was to lock that very lock.
> So in that case it looks like a race with something else getting to the
> glock's spin lock in the space of the call to that function. If you are
> running with spin lock debugging, that means that the spin lock must
> have been initialised correctly, otherwise that would have flagged up
> earlier, so it looks like its been overwritten, or that the glock has
> been freed underneath it, both of which events should be impossible.
I did try to look at the code with very little luck. I could smell the race
as i found the lock right before entering the code the OOPS but I have no idea
what could have unlock it. I am traveling until the 11th of Nov with very little
network connectivity, but the next steps are to try a clean 2.6.24-* without
-nwm and start bisecting the tree to see when this problem has been introduced.
Assuming it's not aoe causing it.
>
> So I'm still looking into this, but slightly confused at the moment. Has
> anybody else seen anything similar? I still can't figure out why you see
> this and I don't,
Not that I know off. I doubt there are that many people testing gfs2 "crack of
the day" as the only reason I was there was to test gfs1 and told myself: what
the heck, let's give it a spin. Speaking of which gfs1 works fine so at least
the locking code in gfs2 that is shared between the two looks good.
Fabio
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
2007-10-31 15:34 ` Steven Whitehouse
2007-11-01 7:30 ` Fabio Massimo Di Nitto
@ 2007-11-08 17:35 ` Fabio Massimo Di Nitto
2007-11-12 11:48 ` Fabio Massimo Di Nitto
1 sibling, 1 reply; 10+ messages in thread
From: Fabio Massimo Di Nitto @ 2007-11-08 17:35 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi Steven,
a quick update on this issue. I am still looking into it.
I was able to reproduce the problem in .22/.23/.24 vanilla but NOT on .22 ubuntu
kernel that has a .22 + a pull from at the time -22-nwm.
I start to strongly believe that the problem is caused by either a gcc bug OR my
kernel config.
Anyway I will keep you posted as soon as I will have time to make more progresses.
Thanks
Fabio
Steven Whitehouse wrote:
> Hi
>
> On Mon, 2007-10-29 at 11:25 +0100, Fabio Massimo Di Nitto wrote:
>> Steven Whitehouse wrote:
>>> Hi,
>>>
>>> Since Linus released -rc1 when I was on holiday last week, I've just
>>> updated the git tree. There have been quite a lot of changes in the core
>>> kernel which affect GFS2 but which were not part of the GFS2 -nmw git
>>> tree in this last merge window, so thats another reason for updating so
>>> that we have all those changes in the -nmw tree now,
>>>
>>> Steve.
>>>
>> Hi Steve,
>>
>> I am still getting a bunch of OOPS at mount time with this very latest tree.
>>
>> Last commit in -nwm: 9904cd0ecaf9c72de837214fa1b359d4c87220a8
>>
>> dmesg in attachement.
>>
>> Cheers
>> Fabio
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] [GFS2] Git tree update
2007-11-08 17:35 ` Fabio Massimo Di Nitto
@ 2007-11-12 11:48 ` Fabio Massimo Di Nitto
0 siblings, 0 replies; 10+ messages in thread
From: Fabio Massimo Di Nitto @ 2007-11-12 11:48 UTC (permalink / raw)
To: cluster-devel.redhat.com
Fabio Massimo Di Nitto wrote:
> Hi Steven,
>
> a quick update on this issue. I am still looking into it.
>
> I was able to reproduce the problem in .22/.23/.24 vanilla but NOT on .22 ubuntu
> kernel that has a .22 + a pull from at the time -22-nwm.
>
> I start to strongly believe that the problem is caused by either a gcc bug OR my
> kernel config.
I can confirm this is a kernel config issue.
I just booted 2.6.24-rc2 and here are the results:
http://people.ubuntu.com/~fabbione/oops/
bad.config will OOPS on mount with the same OOPS as posted before.
good.config will mount and operate fine. We get an OOPS on umount on 2/3 of the
nodes.
dmesg.node* have the umount OOPS.
If you have any suggestion on what kernel options might cause the mount failure,
I will dedicate time to test them one at a time.
Cheers
Fabio
--
I'm going to make him an offer he can't refuse.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cluster-devel] GFS2: Git tree update
@ 2009-09-28 8:42 Steven Whitehouse
0 siblings, 0 replies; 10+ messages in thread
From: Steven Whitehouse @ 2009-09-28 8:42 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Since -rc1 is now out, I've rebased the -nmw git tree to that. Also I've
temporarily dropped out the patch removing the sysfs files as there
appears to still be some userspace issues there. I'll resubmit that
patch once those issues are dealt with.
I've also pushed in the patch for making the subsequent meta mounts work
correctly. Its a slightly different version to the last one, since the
bd_mount_sem has gone away, but is otherwise identical,
Steve.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-09-28 8:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-06 9:19 [Cluster-devel] [GFS2] Git tree update Steven Whitehouse
-- strict thread matches above, loose matches on Subject: below --
2007-08-06 7:37 [Cluster-devel] GFS2 git " Steven Whitehouse
2007-09-25 8:55 Steven Whitehouse
2007-10-29 8:29 [Cluster-devel] [GFS2] Git " Steven Whitehouse
2007-10-29 10:25 ` Fabio Massimo Di Nitto
2007-10-31 15:34 ` Steven Whitehouse
2007-11-01 7:30 ` Fabio Massimo Di Nitto
2007-11-08 17:35 ` Fabio Massimo Di Nitto
2007-11-12 11:48 ` Fabio Massimo Di Nitto
2009-09-28 8:42 [Cluster-devel] GFS2: " Steven Whitehouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).