All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 12:40 John L. Villalovos
  2004-03-09 15:13 ` Mark Fasheh
  0 siblings, 1 reply; 6+ messages in thread
From: John L. Villalovos @ 2004-03-09 12:40 UTC (permalink / raw)
  To: ocfs2-devel

I have encountered on my system a bug when OCFS2 tries to do a journal_wipe.

At the time that it does the call it gets back an error of -22.

The problem is that it seems to leave stuff in an inconsistent state when it exits out of the functions that have called it.  So later on bad things happen :(

This diff simulates the error that I received.  I am trying to figure out what is the stuff that has been partially initialized when this gets called but I am having a bit of difficulty and tracking it all down :(

John


Index: journal.c
===================================================================
--- journal.c	(revision 766)
+++ journal.c	(working copy)
@@ -1261,8 +1261,11 @@
  	if (!journal)
  		BUG();

-	status = journal_wipe(journal->k_journal, full);
+// FIXME: Simulate BUG
+//	status = journal_wipe(journal->k_journal, full);
+	status = -22;

+
  	LOG_EXIT_STATUS(status);
  	return(status);
  }

^ permalink raw reply	[flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 15:18 Villalovos, John L
  2004-03-09 15:43 ` Mark Fasheh
  2004-03-09 17:00 ` Mark Fasheh
  0 siblings, 2 replies; 6+ messages in thread
From: Villalovos, John L @ 2004-03-09 15:18 UTC (permalink / raw)
  To: ocfs2-devel

> At what point did you first see the error? Was it a 1st mount 
> of a fresh
> file system or just a normal mount? I assume the file system 
> failed to mount
> because of this error... Can you be more specific as to what 
> Bad Things (TM)
> were happening?  :) Did it crash or what?

It was NOT a 1st mount.  It was a disk that had been previously used.

It appears that the mount fails but then some globals are probably in a
partially set state.

Here is what I saw after it happened:

# mount -t ocfs2 /dev/sda1 /ocfs2
JBD: no valid journal superblock found
(1856) ERROR: status = -22, /root/ocfs/ocfs2/src/osb.c, 424
(1856) ERROR: status = -22, /root/ocfs/ocfs2/src/super.c, 1047
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       or too many mounted file systems
[root@linuxjohn2 load_ocfs]# Unable to handle kernel NULL pointer
dereference at virtual address 00000000
 printing eip:
d10bb369
*pde = 0e9b5067
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<d10bb369>]    Tainted: GF
EFLAGS: 00010286
EIP is at ocfs_bh_sem_lookup+0x29/0x650 [ocfs2]
eax: 00000000   ebx: cbc57984   ecx: 000000f9   edx: 000007f9
esi: cbc57984   edi: cbc57984   ebp: 00000800   esp: cc1d5eac
ds: 007b   es: 007b   ss: 0068
Process ocfs2nm-0 (pid: 1857, threadinfo=cc1d4000 task=ccfbd660)
Stack: cbc6a374 cc1d5ed0 0000001f cba16e90 00000010 00000010 ce654a00
cfa0a200
^[[6~Stack: cbc6a374 cc1d5ed0 0000001f cba16e90 00000010 00000010
ce654a00 cfa0a200
       cfa0a200 00000000 00000000 00000000 00000000 ccfbea40 00000000
cbc57984
       00000000 cc1d5f3c c035cd80 ccdc87b4 00000010 ce6549f0 00011c00
cfa0a200
Call Trace:
 [<d10bb9a1>] ocfs_bh_sem_lock+0x11/0x60 [ocfs2]
 [<d10c6267>] ocfs_read_bhs+0x227/0x930 [ocfs2]
 [<d10bbd6a>] ocfs_bh_sem_hash_prune+0x19a/0x390 [ocfs2]
 [<d10d5d6e>] ocfs_volume_thread+0x29e/0x930 [ocfs2]
 [<d10d5ad0>] ocfs_volume_thread+0x0/0x930 [ocfs2]
 [<c0109295>] kernel_thread_helper+0x5/0x10

Code: 8b 00 89 c3 d3 e3 8d 4d f6 d3 e0 31 c3 88 d1 89 5c 24 34 8b



After this point I couldn't unload OCFS2 anymore.

John

^ permalink raw reply	[flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 19:14 Villalovos, John L
  0 siblings, 0 replies; 6+ messages in thread
From: Villalovos, John L @ 2004-03-09 19:14 UTC (permalink / raw)
  To: ocfs2-devel

> Ok, could you update from latest SVN and let me know if that fixed it?
> I wasn't getting the NULL pointer error in ocfs_bh_sem_lookup 
> like you, but
> I was definitely seeing one in ocfs_inode_hash_prune_all where we were
> assuming that an inode existed on the inum when in fact it 
> didn't :) The fix
> of course, we to check for it's existence before acting on it!
> 
> Alternatively, if you don't want to update from SVN, you can 
> apply this
> patch.

I will try to give that a try.  Though I reformatted my partition so I
may not be able to reproduce.

Just a note.  I am doing this on a 2.6.3 kernel.

Where I was having it crash was on:

ocfs_bh_sem * ocfs_bh_sem_lookup(struct buffer_head *bh)
{
        int depth, bucket;
        struct list_head *head, *iter = NULL;
        ocfs_bh_sem *sem = NULL, *newsem = NULL;

        bucket = ocfs_bh_sem_hash_fn(bh);  <<<<<<<<<----



#define ocfs_bh_sem_hash_fn(_b)   \
        (_hashfn((unsigned int)BH_GET_DEVICE((_b)), (_b)->b_blocknr) &
ocfs_bh_hash_shift)


This macro is where the NULL reference occurs:

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
#define BH_GET_DEVICE(bh) ((bh->b_bdev)->bd_dev)  <<<<-------------
#else
#define BH_GET_DEVICE(bh) (bh->b_dev)
#endif

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-03-09 19:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-09 12:40 [Ocfs2-devel] Bug in error handling John L. Villalovos
2004-03-09 15:13 ` Mark Fasheh
  -- strict thread matches above, loose matches on Subject: below --
2004-03-09 15:18 Villalovos, John L
2004-03-09 15:43 ` Mark Fasheh
2004-03-09 17:00 ` Mark Fasheh
2004-03-09 19:14 Villalovos, John L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.