[Ocfs2-devel] Crash in ocfs_volume

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] Crash in ocfs_volume_thread
@ 2004-03-17 15:42 Villalovos, John L
  2004-03-17 16:32 ` Mark Fasheh
  0 siblings, 1 reply; 4+ messages in thread
From: Villalovos, John L @ 2004-03-17 15:42 UTC (permalink / raw)
  To: ocfs2-devel

I am trying to debug an issue where I get a crash in the
ocfs_volume_thread.

Here is the scenario.

This is all done under a 2.6.x kernel

I have a corrupted partition (I think).  

I created this corrupted partition by:

1. Run mkfs.ocfs2
2. Mount the partition once.  This caused errors.
3. Reboot the system.
4. Try to mount the partition again and then it crashes in the
ocfs_volume_thread.

I get the following errors:

kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427
kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c,
1063

Then the crash occurs because I have a NULL pointer in line 670 of io.c
which seems to be called by ocfs_volume_thread.

The NULL pointer is in bh->b_bdev.  This being NULL causes a crash to
occur later on when BH_GET_DEVICE(bh) is called and tries to do a
bh->b_bdev->bd_dev.

From looking at the code in super.c it seems like it creates the thread
but when the error occurs it doesn't take care of destroying the
thread(s) that have been created.  Does this interpretation seem
correct.

So should the threads get killed if the mount fails?

Below is snippets of the code to help you find your way.

Thanks,
John

The osb.c code for the error:
        /* If the journal was unmounted cleanly then we don't want to
         * recover anything. Otherwise, journal_load will do that
         * dirty work for us :) */
        if (!mounted) {
--->>>          status = ocfs_journal_wipe(&osb->journal, 0);
                if (status < 0) {
                        LOG_ERROR_STATUS(status);
                        goto finally;
                }

The super.c code for the error:
        /* Read the publish sector for this node and cleanup dirent
being */
        /* modified when we crashed. */
        LOG_TRACE_STR ("ocfs_check_volume...");
        ocfs_down_sem (&(osb->osb_res), true);
-->>    status = ocfs_check_volume (osb);
        ocfs_up_sem (&(osb->osb_res));
        if (status < 0) {
                LOG_ERROR_STATUS (status);
                goto leave;
        }

The io.c code for the error:
        for (i = 0 ; i < nr ; i++) {
                if (bhs[i] == NULL) {
--->>>                  bhs[i] = getblk (dev, blocknum++,
sb->s_blocksize);
                        if (bhs[i] == NULL) {
                                LOG_TRACE_STR("bh == NULL");
                                status = -EIO;
                                LOG_ERROR_STATUS(status);
                                goto bail;
                        }
                }
                bh = bhs[i];

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] Crash in ocfs_volume_thread
  2004-03-17 15:42 [Ocfs2-devel] Crash in ocfs_volume_thread Villalovos, John L
@ 2004-03-17 16:32 ` Mark Fasheh
  2004-03-17 17:12   ` Rusty Lynch
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Fasheh @ 2004-03-17 16:32 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Mar 17, 2004 at 01:42:12PM -0800, Villalovos, John L wrote:
> I am trying to debug an issue where I get a crash in the
> ocfs_volume_thread.
> 
> Here is the scenario.
> 
> This is all done under a 2.6.x kernel
> 
> I have a corrupted partition (I think).  
> 
> I created this corrupted partition by:
> 
> 1. Run mkfs.ocfs2
> 2. Mount the partition once.  This caused errors.
> 3. Reboot the system.
> 4. Try to mount the partition again and then it crashes in the
> ocfs_volume_thread.
Hmm, this particular scenario I don't believe has been considered very
much... (trying to mount a partition that failed the whole 1st mount
business)

> I get the following errors:
> 
> kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/osb.c, 427
> kernel: (17777) ERROR: status = -22, /root/ocfs/new-ocfs2/src/super.c,
> 1063
> 
> Then the crash occurs because I have a NULL pointer in line 670 of io.c
> which seems to be called by ocfs_volume_thread.
> 
> The NULL pointer is in bh->b_bdev.  This being NULL causes a crash to
> occur later on when BH_GET_DEVICE(bh) is called and tries to do a
> bh->b_bdev->bd_dev.
> 
> 
> >From looking at the code in super.c it seems like it creates the thread
> but when the error occurs it doesn't take care of destroying the
> thread(s) that have been created.  Does this interpretation seem
> correct.
> 
> So should the threads get killed if the mount fails?
Definitely.

So then the fs is being cleaned up, but the nm thread obviously isn't. Are
any of the other threads still alive too? (Perhaps we haven't even gotten a
chance to start them yet actually).
	--Mark

--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] Crash in ocfs_volume_thread
  2004-03-17 16:32 ` Mark Fasheh
@ 2004-03-17 17:12   ` Rusty Lynch
  0 siblings, 0 replies; 4+ messages in thread
From: Rusty Lynch @ 2004-03-17 17:12 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Mar 17, 2004 at 02:32:16PM -0800, Mark Fasheh wrote:
> On Wed, Mar 17, 2004 at 01:42:12PM -0800, Villalovos, John L wrote:
> > I am trying to debug an issue where I get a crash in the
> > ocfs_volume_thread.
> > 
> > Here is the scenario.
> > 
> > This is all done under a 2.6.x kernel
> > 
> > I have a corrupted partition (I think).  
> > 
> > I created this corrupted partition by:
> > 
> > 1. Run mkfs.ocfs2
> > 2. Mount the partition once.  This caused errors.
> > 3. Reboot the system.
> > 4. Try to mount the partition again and then it crashes in the
> > ocfs_volume_thread.
> Hmm, this particular scenario I don't believe has been considered very
> much... (trying to mount a partition that failed the whole 1st mount
> business)

I currently get around this bug by making sure my first mount of a
partition is from a 2.4 build.  Maybe the fix for this is to just
go ahead and rip out the first mount logic and place it in mkfs.ocfs
like has mentioned before.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Ocfs2-devel] Crash in ocfs_volume_thread
@ 2004-03-17 17:45 Villalovos, John L
  0 siblings, 0 replies; 4+ messages in thread
From: Villalovos, John L @ 2004-03-17 17:45 UTC (permalink / raw)
  To: ocfs2-devel

> Hmm, this particular scenario I don't believe has been considered very
> much... (trying to mount a partition that failed the whole 1st mount
> business)

Well I guess you could think of it is a corrupted filesystem.
Attempting to mount a corrupted file system shouldn't cause the module
to crash with a NULL pointer error.  It should just fail in the mount.

> > So should the threads get killed if the mount fails?
> Definitely.
> 
> So then the fs is being cleaned up, but the nm thread 
> obviously isn't. Are
> any of the other threads still alive too? (Perhaps we haven't 
> even gotten a
> chance to start them yet actually).

Well it appears that at least two threads are getting created before
that part of super.c.  Then when it fails it just exits with out
cleaning up those threads.

John

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-03-17 17:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-17 15:42 [Ocfs2-devel] Crash in ocfs_volume_thread Villalovos, John L
2004-03-17 16:32 ` Mark Fasheh
2004-03-17 17:12   ` Rusty Lynch
  -- strict thread matches above, loose matches on Subject: below --
2004-03-17 17:45 Villalovos, John L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.