* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 12:40 John L. Villalovos
2004-03-09 15:13 ` Mark Fasheh
0 siblings, 1 reply; 6+ messages in thread
From: John L. Villalovos @ 2004-03-09 12:40 UTC (permalink / raw)
To: ocfs2-devel
I have encountered on my system a bug when OCFS2 tries to do a journal_wipe.
At the time that it does the call it gets back an error of -22.
The problem is that it seems to leave stuff in an inconsistent state when it exits out of the functions that have called it. So later on bad things happen :(
This diff simulates the error that I received. I am trying to figure out what is the stuff that has been partially initialized when this gets called but I am having a bit of difficulty and tracking it all down :(
John
Index: journal.c
===================================================================
--- journal.c (revision 766)
+++ journal.c (working copy)
@@ -1261,8 +1261,11 @@
if (!journal)
BUG();
- status = journal_wipe(journal->k_journal, full);
+// FIXME: Simulate BUG
+// status = journal_wipe(journal->k_journal, full);
+ status = -22;
+
LOG_EXIT_STATUS(status);
return(status);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
2004-03-09 12:40 [Ocfs2-devel] Bug in error handling John L. Villalovos
@ 2004-03-09 15:13 ` Mark Fasheh
0 siblings, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2004-03-09 15:13 UTC (permalink / raw)
To: ocfs2-devel
At what point did you first see the error? Was it a 1st mount of a fresh
file system or just a normal mount? I assume the file system failed to mount
because of this error... Can you be more specific as to what Bad Things (TM)
were happening? :) Did it crash or what?
--Mark
On Tue, Mar 09, 2004 at 10:39:52AM -0800, John L. Villalovos wrote:
> I have encountered on my system a bug when OCFS2 tries to do a journal_wipe.
>
> At the time that it does the call it gets back an error of -22.
>
> The problem is that it seems to leave stuff in an inconsistent state when
> it exits out of the functions that have called it. So later on bad things
> happen :(
>
> This diff simulates the error that I received. I am trying to figure out
> what is the stuff that has been partially initialized when this gets called
> but I am having a bit of difficulty and tracking it all down :(
>
> John
>
>
> Index: journal.c
> ===================================================================
> --- journal.c (revision 766)
> +++ journal.c (working copy)
> @@ -1261,8 +1261,11 @@
> if (!journal)
> BUG();
>
> - status = journal_wipe(journal->k_journal, full);
> +// FIXME: Simulate BUG
> +// status = journal_wipe(journal->k_journal, full);
> + status = -22;
>
> +
> LOG_EXIT_STATUS(status);
> return(status);
> }
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 15:18 Villalovos, John L
2004-03-09 15:43 ` Mark Fasheh
2004-03-09 17:00 ` Mark Fasheh
0 siblings, 2 replies; 6+ messages in thread
From: Villalovos, John L @ 2004-03-09 15:18 UTC (permalink / raw)
To: ocfs2-devel
> At what point did you first see the error? Was it a 1st mount
> of a fresh
> file system or just a normal mount? I assume the file system
> failed to mount
> because of this error... Can you be more specific as to what
> Bad Things (TM)
> were happening? :) Did it crash or what?
It was NOT a 1st mount. It was a disk that had been previously used.
It appears that the mount fails but then some globals are probably in a
partially set state.
Here is what I saw after it happened:
# mount -t ocfs2 /dev/sda1 /ocfs2
JBD: no valid journal superblock found
(1856) ERROR: status = -22, /root/ocfs/ocfs2/src/osb.c, 424
(1856) ERROR: status = -22, /root/ocfs/ocfs2/src/super.c, 1047
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
or too many mounted file systems
[root@linuxjohn2 load_ocfs]# Unable to handle kernel NULL pointer
dereference at virtual address 00000000
printing eip:
d10bb369
*pde = 0e9b5067
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<d10bb369>] Tainted: GF
EFLAGS: 00010286
EIP is at ocfs_bh_sem_lookup+0x29/0x650 [ocfs2]
eax: 00000000 ebx: cbc57984 ecx: 000000f9 edx: 000007f9
esi: cbc57984 edi: cbc57984 ebp: 00000800 esp: cc1d5eac
ds: 007b es: 007b ss: 0068
Process ocfs2nm-0 (pid: 1857, threadinfo=cc1d4000 task=ccfbd660)
Stack: cbc6a374 cc1d5ed0 0000001f cba16e90 00000010 00000010 ce654a00
cfa0a200
^[[6~Stack: cbc6a374 cc1d5ed0 0000001f cba16e90 00000010 00000010
ce654a00 cfa0a200
cfa0a200 00000000 00000000 00000000 00000000 ccfbea40 00000000
cbc57984
00000000 cc1d5f3c c035cd80 ccdc87b4 00000010 ce6549f0 00011c00
cfa0a200
Call Trace:
[<d10bb9a1>] ocfs_bh_sem_lock+0x11/0x60 [ocfs2]
[<d10c6267>] ocfs_read_bhs+0x227/0x930 [ocfs2]
[<d10bbd6a>] ocfs_bh_sem_hash_prune+0x19a/0x390 [ocfs2]
[<d10d5d6e>] ocfs_volume_thread+0x29e/0x930 [ocfs2]
[<d10d5ad0>] ocfs_volume_thread+0x0/0x930 [ocfs2]
[<c0109295>] kernel_thread_helper+0x5/0x10
Code: 8b 00 89 c3 d3 e3 8d 4d f6 d3 e0 31 c3 88 d1 89 5c 24 34 8b
After this point I couldn't unload OCFS2 anymore.
John
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
2004-03-09 15:18 Villalovos, John L
@ 2004-03-09 15:43 ` Mark Fasheh
2004-03-09 17:00 ` Mark Fasheh
1 sibling, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2004-03-09 15:43 UTC (permalink / raw)
To: ocfs2-devel
On Tue, Mar 09, 2004 at 01:18:11PM -0800, Villalovos, John L wrote:
> It was NOT a 1st mount. It was a disk that had been previously used.
>
> It appears that the mount fails but then some globals are probably in a
> partially set state.
>
> Here is what I saw after it happened:
>
> # mount -t ocfs2 /dev/sda1 /ocfs2
> JBD: no valid journal superblock found
> (1856) ERROR: status = -22, /root/ocfs/ocfs2/src/osb.c, 424
> (1856) ERROR: status = -22, /root/ocfs/ocfs2/src/super.c, 1047
> mount: wrong fs type, bad option, bad superblock on /dev/sda1,
> or too many mounted file systems
> [root@linuxjohn2 load_ocfs]# Unable to handle kernel NULL pointer
> dereference at virtual address 00000000
<snip>
Alright, we should definitely *not* be doing that :) I'll add your previous
patch and check it out.
--Mark
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
2004-03-09 15:18 Villalovos, John L
2004-03-09 15:43 ` Mark Fasheh
@ 2004-03-09 17:00 ` Mark Fasheh
1 sibling, 0 replies; 6+ messages in thread
From: Mark Fasheh @ 2004-03-09 17:00 UTC (permalink / raw)
To: ocfs2-devel
On Tue, Mar 09, 2004 at 01:18:11PM -0800, Villalovos, John L wrote:
> It was NOT a 1st mount. It was a disk that had been previously used.
>
> It appears that the mount fails but then some globals are probably in a
> partially set state.
Ok, could you update from latest SVN and let me know if that fixed it?
I wasn't getting the NULL pointer error in ocfs_bh_sem_lookup like you, but
I was definitely seeing one in ocfs_inode_hash_prune_all where we were
assuming that an inode existed on the inum when in fact it didn't :) The fix
of course, we to check for it's existence before acting on it!
Alternatively, if you don't want to update from SVN, you can apply this
patch.
--Mark
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com
Index: hash.c
===================================================================
--- hash.c (revision 766)
+++ hash.c (working copy)
@@ -1267,18 +1267,19 @@ static int ocfs_inode_hash_prune_all(ocf
inum = list_entry(iter, ocfs_inode_num, i_list);
list_del(&inum->i_list);
- /* this log_error_args is mainly for debugging */
- if (atomic_read(&inum->i_inode->i_count) > 2)
- LOG_ERROR_ARGS("inode (%lu) with i_count = %u left in "
- "system, (voteoff = %u.%u, "
- "fileoff = %u.%u)\n",
- inum->i_inode->i_ino,
- atomic_read(&inum->i_inode->i_count),
- HILO(inum->i_voteoff),
- HILO(inum->i_feoff));
+ if (inum->i_inode) {
+ /* this log_error_args is mainly for debugging */
+ if (atomic_read(&inum->i_inode->i_count) > 2)
+ LOG_ERROR_ARGS("inode (%lu) with i_count = %u "
+ "left in system, (voteoff = "
+ "%u.%u, fileoff = %u.%u)\n",
+ inum->i_inode->i_ino,
+ atomic_read(&inum->i_inode->i_count),
+ HILO(inum->i_voteoff),
+ HILO(inum->i_feoff));
- if (inum->i_inode)
iput(inum->i_inode);
+ }
ocfs_free_inode_num(inum);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] Bug in error handling
@ 2004-03-09 19:14 Villalovos, John L
0 siblings, 0 replies; 6+ messages in thread
From: Villalovos, John L @ 2004-03-09 19:14 UTC (permalink / raw)
To: ocfs2-devel
> Ok, could you update from latest SVN and let me know if that fixed it?
> I wasn't getting the NULL pointer error in ocfs_bh_sem_lookup
> like you, but
> I was definitely seeing one in ocfs_inode_hash_prune_all where we were
> assuming that an inode existed on the inum when in fact it
> didn't :) The fix
> of course, we to check for it's existence before acting on it!
>
> Alternatively, if you don't want to update from SVN, you can
> apply this
> patch.
I will try to give that a try. Though I reformatted my partition so I
may not be able to reproduce.
Just a note. I am doing this on a 2.6.3 kernel.
Where I was having it crash was on:
ocfs_bh_sem * ocfs_bh_sem_lookup(struct buffer_head *bh)
{
int depth, bucket;
struct list_head *head, *iter = NULL;
ocfs_bh_sem *sem = NULL, *newsem = NULL;
bucket = ocfs_bh_sem_hash_fn(bh); <<<<<<<<<----
#define ocfs_bh_sem_hash_fn(_b) \
(_hashfn((unsigned int)BH_GET_DEVICE((_b)), (_b)->b_blocknr) &
ocfs_bh_hash_shift)
This macro is where the NULL reference occurs:
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
#define BH_GET_DEVICE(bh) ((bh->b_bdev)->bd_dev) <<<<-------------
#else
#define BH_GET_DEVICE(bh) (bh->b_dev)
#endif
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-03-09 19:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-09 12:40 [Ocfs2-devel] Bug in error handling John L. Villalovos
2004-03-09 15:13 ` Mark Fasheh
-- strict thread matches above, loose matches on Subject: below --
2004-03-09 15:18 Villalovos, John L
2004-03-09 15:43 ` Mark Fasheh
2004-03-09 17:00 ` Mark Fasheh
2004-03-09 19:14 Villalovos, John L
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.