* [Ocfs2-devel] Error when mount ocfs2 partition on kernel 2.6.6
@ 2004-05-21 5:24 Chen, Yukun
2004-05-21 13:07 ` Mark Fasheh
0 siblings, 1 reply; 2+ messages in thread
From: Chen, Yukun @ 2004-05-21 5:24 UTC (permalink / raw)
To: ocfs2-devel
Hi All
I find a hang issue when trying to mount an ocfs2 partition in
a cluster environment. Machine 1 and machine 2 share a raid array.
Steps to reproduce: (m1 represent machine 1, and m2 represent
machine 2)
First In m1
1. load_ocfs2
2. mkfs.ocfs2 -b 4 -F -L ocfs2 -m /ocfs /dev/sdd2
3. mount /dev/sdd2 /ocfs -t ocfs2
Then in m2
1. load_ocfs2
2. mount /dev/sdd2 /ocfs -t ocfs2 <===System hangs and the
following is the call trace information
Oracle Cluster FileSystem x y (build z)
ocfs2: hostname is westvile2
sectbits=9, clusterbits=12, dirbits=17, filebits=9
ocfs2: Adding west1 (node 0) to clustered device (8,66)
(3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1395: DISKVOTE!!:
req_lock=2, flags=40000401, offset=1532416, inode=3
(3263) ERROR@/root/ocfs/ocfs2-6/src/dlm.c, 1398: DISKVOTE!!: this=1,
master=0, locktype=0, ronode=-1, romap=00000000
Call Trace:
[<dc16dbf4>] (0xdc16dbf4)
.........
[<c0ed0000>] (0xdc16dfd0)
[<ffffe410>] (0xdc16dfec)
(3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1395: DISKVOTE!!:
req_lock=2, flags=40000401, offset=1532416, inode=3
(3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1398: DISKVOTE!!: this=1,
master=0, locktype=0, ronode=-1, romap=00000000
Call Trace:
[<dc16dbf4>] (0xdc16dbf4)
[<ffffffff>] (0xdc16dc04)
...........
[<c0105f71>] (0xdc16dfc0)
[<c0ed0000>] (0xdc16dfd0)
[<ffffe410>] (0xdc16dfec)
ocfs2: Removing west1 (node 0) from clustered device (8,66)
ocfs2: Recovering node 0 from device (8,66)
(fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0,
recovered transactions 5 to 5
(fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 0 and revoked
0/0 blocks
kjournald starting. Commit interval 5 seconds
Unable to handle kernel NULL pointer dereference at virtual address
0000001c
printing eip:
e0ad4eeb
*pde = 1dfef067
*pte = 00000000
Oops: 0000 [#1]
SMP
CPU: 2
EIP: 0060:[<e0ad4eeb>] Not tainted
EFLAGS: 00010286 (2.6.6)
EIP is at ocfs_bh_sem_lookup+0x1b/0x370 [ocfs2]
eax: 00000000 ebx: 00000800 ecx: db405dc4 edx: 00000000
esi: 00000000 edi: 00000000 ebp: dbe51e84 esp: dbe51e44
ds: 007b es: 007b ss: 0068
Process ocfs2rec-0 (pid: 3267, threadinfo=dbe50000 task=dd3ed480)
Stack: db58d378 dc2ee600 db58d378 c1416be0 00014025 88ad2fc6 000000ba
dd3ed630
c1416be0 c0498e20 dbe51ea8 00000000 c0498e20 00000000 00000000
00000000
dbe51e98 e0ad5257 00000000 00000000 00000000 dbe51efc e0ac175b
00000000
Call Trace:
[<e0ad5257>] ocfs_bh_sem_lock+0x17/0x60 [ocfs2]
[<e0ac175b>] ocfs_sync_local_to_main+0x9b/0x7e0 [ocfs2]
[<e0ad4c90>] ocfs_bh_sem_put+0x20/0xb0 [ocfs2]
[<e0ad5472>] ocfs_bh_sem_unlock+0x32/0x50 [ocfs2]
[<e0adee39>] ocfs_read_bhs+0x3f9/0x8e0 [ocfs2]
[<e0ac369d>] ocfs_shutdown_local_alloc+0xdd/0x350 [ocfs2]
[<e0ac39f7>] ocfs_recover_local_alloc+0xe7/0x1c6 [ocfs2]
[<e0ae4ac9>] ocfs_recover_vol+0x699/0xc70 [ocfs2]
[<e0ae4163>] __ocfs_recovery_thread+0x113/0x1e0 [ocfs2]
[<e0ae4050>] __ocfs_recovery_thread+0x0/0x1e0 [ocfs2]
[<c01042e5>] kernel_thread_helper+0x5/0x10
Any ideas on this issue?
Thanx.
Aaron
Intel China Software Lab
Tel: 8621-52574545 Ext.1587
E_mail:yukun.chen@intel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20040521/9aabd332/attachment.htm
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Ocfs2-devel] Error when mount ocfs2 partition on kernel 2.6.6
2004-05-21 5:24 [Ocfs2-devel] Error when mount ocfs2 partition on kernel 2.6.6 Chen, Yukun
@ 2004-05-21 13:07 ` Mark Fasheh
0 siblings, 0 replies; 2+ messages in thread
From: Mark Fasheh @ 2004-05-21 13:07 UTC (permalink / raw)
To: ocfs2-devel
Hi,
On Fri, May 21, 2004 at 06:24:25PM +0800, Chen, Yukun wrote:
> Hi All
>
> I find a hang issue when trying to mount an ocfs2 partition in a
> cluster environment. Machine 1 and machine 2 share a raid array.
>
> Steps to reproduce: (m1 represent machine 1, and m2 represent machine
> 2)
<snip>
So one of the machines is trying to recover the other one and dies, right?
<snip>
> Oracle Cluster FileSystem x y (build z)
> ocfs2: hostname is westvile2
> sectbits=9, clusterbits=12, dirbits=17, filebits=9
> ocfs2: Adding west1 (node 0) to clustered device (8,66)
> (3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1395: DISKVOTE!!: req_lock=2,
> flags=40000401, offset=1532416, inode=3
> (3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1398: DISKVOTE!!: this=1, master=
> 0, locktype=0, ronode=-1, romap=00000000
> Call Trace:
> [<dc16dbf4>] (0xdc16dbf4)
>
> ???
> [<c0ed0000>] (0xdc16dfd0)
> [<ffffe410>] (0xdc16dfec)
>
> (3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1395: DISKVOTE!!: req_lock=2,
> flags=40000401, offset=1532416, inode=3
> (3263) ERROR at /root/ocfs/ocfs2-6/src/dlm.c, 1398: DISKVOTE!!: this=1, master=
> 0, locktype=0, ronode=-1, romap=00000000
> Call Trace:
> [<dc16dbf4>] (0xdc16dbf4)
> [<ffffffff>] (0xdc16dc04)
>
> ???..
> [<c0105f71>] (0xdc16dfc0)
> [<c0ed0000>] (0xdc16dfd0)
> [<ffffe410>] (0xdc16dfec)
>
> ocfs2: Removing west1 (node 0) from clustered device (8,66)
> ocfs2: Recovering node 0 from device (8,66)
> (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0,
> recovered transactions 5 to 5
> (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 0 and revoked 0/0
> blocks
> kjournald starting. Commit interval 5 seconds
> Unable to handle kernel NULL pointer dereference at virtual address 0000001c
> printing eip:
> e0ad4eeb
> *pde = 1dfef067
> *pte = 00000000
> Oops: 0000 [#1]
> SMP
> CPU: 2
> EIP: 0060:[<e0ad4eeb>] Not tainted
> EFLAGS: 00010286 (2.6.6)
> EIP is at ocfs_bh_sem_lookup+0x1b/0x370 [ocfs2]
> eax: 00000000 ebx: 00000800 ecx: db405dc4 edx: 00000000
> esi: 00000000 edi: 00000000 ebp: dbe51e84 esp: dbe51e44
> ds: 007b es: 007b ss: 0068
> Process ocfs2rec-0 (pid: 3267, threadinfo=dbe50000 task=dd3ed480)
> Stack: db58d378 dc2ee600 db58d378 c1416be0 00014025 88ad2fc6 000000ba dd3ed630
> c1416be0 c0498e20 dbe51ea8 00000000 c0498e20 00000000 00000000 00000000
> dbe51e98 e0ad5257 00000000 00000000 00000000 dbe51efc e0ac175b 00000000
> Call Trace:
> [<e0ad5257>] ocfs_bh_sem_lock+0x17/0x60 [ocfs2]
> [<e0ac175b>] ocfs_sync_local_to_main+0x9b/0x7e0 [ocfs2]
> [<e0ad4c90>] ocfs_bh_sem_put+0x20/0xb0 [ocfs2]
> [<e0ad5472>] ocfs_bh_sem_unlock+0x32/0x50 [ocfs2]
> [<e0adee39>] ocfs_read_bhs+0x3f9/0x8e0 [ocfs2]
> [<e0ac369d>] ocfs_shutdown_local_alloc+0xdd/0x350 [ocfs2]
> [<e0ac39f7>] ocfs_recover_local_alloc+0xe7/0x1c6 [ocfs2]
> [<e0ae4ac9>] ocfs_recover_vol+0x699/0xc70 [ocfs2]
> [<e0ae4163>] __ocfs_recovery_thread+0x113/0x1e0 [ocfs2]
> [<e0ae4050>] __ocfs_recovery_thread+0x0/0x1e0 [ocfs2]
> [<c01042e5>] kernel_thread_helper+0x5/0x10
>
>
> Any ideas on this issue?
Ahh yes, that's a bug I inadvertantly introduced recently. The fix is in my
local tree and will get pushed out shortly with some other changes I'm
making to alloc.c. In the meantime you can apply this patch. Ignore the
warnings you'll get due to unused variables and whatnot -- that'll be
cleaned up in my commit.
--Mark
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com
Index: src/alloc.c
===================================================================
--- src/alloc.c (revision 930)
+++ src/alloc.c (working copy)
@@ -3416,50 +3416,6 @@ static int ocfs_sync_local_to_main(ocfs_
}
OCFS_BH_PUT_DATA(local_alloc_bh);
- if (bm_lock_bh) {
- local_lock = 0;
- bh = bm_lock_bh;
- }
-
- if (bm_inode) {
- atomic_inc(&bm_inode->i_count);
- local_inode = bm_inode;
- } else {
- local_inode = ocfs_iget(osb, OCFS_BITMAP_LOCK_OFFSET, NULL);
- if (!local_inode) {
- status = -EINVAL;
- LOG_ERROR_STATUS(status);
- goto bail;
- }
- }
-
- if (local_lock) {
- down (&(osb->vol_alloc_sem));
-
- /* Get the allocation lock here */
- status = ocfs_acquire_lock (osb, OCFS_BITMAP_LOCK_OFFSET,
- OCFS_DLM_EXCLUSIVE_LOCK,
- (in_recovery) ? FLAG_FILE_RECOVERY
- : 0, &bh, local_inode);
- if (status < 0) {
- if (status != -EINTR)
- LOG_ERROR_STATUS (status);
- goto bail;
- }
- got_lock = 1;
- }
-
- bitmapblocks = (OCFS_ALIGN(osb->cluster_bitmap.validbits,
- OCFS_BITS_IN_CHUNK) / OCFS_BITS_IN_CHUNK);
-
- status = ocfs_read_bhs(osb, osb->vol_layout.bitmap_off,
- bitmapblocks * osb->sect_size,
- osb->cluster_bitmap.chunk, 0, NULL);
- if (status < 0) {
- LOG_ERROR_STATUS(status);
- goto bail;
- }
-
if (!(*f)) {
*f = ocfs_alloc_bitmap_free_head();
if (*f == NULL) {
@@ -3500,23 +3456,6 @@ static int ocfs_sync_local_to_main(ocfs_
OCFS_BH_PUT_DATA(local_alloc_bh);
bail:
- if (local_lock) {
- up (&(osb->vol_alloc_sem));
-
- if (got_lock) {
- tmpstat = ocfs_release_lock (osb,
- OCFS_BITMAP_LOCK_OFFSET,
- OCFS_DLM_EXCLUSIVE_LOCK,
- 0, bh, local_inode);
- if (tmpstat < 0)
- LOG_ERROR_STATUS (tmpstat);
- }
- if (bh != NULL)
- brelse(bh);
- }
-
- if (local_inode)
- iput(local_inode);
LOG_EXIT_STATUS(status);
return(status);
@@ -3997,7 +3936,7 @@ void ocfs_shutdown_local_alloc(ocfs_supe
else
bh = osb->local_alloc_bh;
- status = ocfs_sync_local_to_main(osb, &f, NULL, NULL, NULL,
+ status = ocfs_sync_local_to_main(osb, &f, bh, NULL, NULL,
in_recovery);
if (status < 0)
LOG_ERROR_STATUS(status);
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-05-21 13:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-21 5:24 [Ocfs2-devel] Error when mount ocfs2 partition on kernel 2.6.6 Chen, Yukun
2004-05-21 13:07 ` Mark Fasheh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.