BUG in reiserfs_write_full

All of lore.kernel.org
 help / color / mirror / Atom feed

* BUG in reiserfs_write_full_page().
@ 2003-07-17 21:19 Michael Gaughen
  2003-07-17 21:33 ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Gaughen @ 2003-07-17 21:19 UTC (permalink / raw)
  To: reiserfs-list

Hello,

We have a test machine that continues to BUG() in 
reiserfs_write_full_page().
The machine is running SLES8 (2.4.19-152, UP). Here is the (kdb) stack 
trace:

qar3s2 login: kernel BUG at inode.c:2220!

[0]kdb> bt
EBP        EIP        Function (args)
0xc1f65b3c 0xcfe4bdd8 [reiserfs]reiserfs_write_full_page+0x98 (0xc10bde90, 
0x95)
                               reiserfs .text 0xcfe40060 0xcfe4bd40 0xcfe4c130
0xc1f65b4c 0xcfe4c178 [reiserfs]reiserfs_writepage+0x28 (0xc10bde90, 0x1d2, 
0xc1f65b80, 0xc1
f64000, 0x0)
                               reiserfs .text 0xcfe40060 0xcfe4c150 0xcfe4c180
0xc1f65b80 0xc014c02f shrink_cache+0x42f (0xc1f65bb0, 0x1d2, 0x3a, 0x1b)
                               kernel .text 0xc0100000 0xc014bc00 0xc014c0d0
0xc1f65b98 0xc014c309 shrink_caches+0x49 (0xc1f65bb0, 0x1d2, 0x0, 0xc03d4680, 
0x1)
                               kernel .text 0xc0100000 0xc014c2c0 0xc014c320
0xc1f65bc0 0xc014c382 try_to_free_pages+0x62 (0x0, 0x1d2, 0x0, 0xc03d506c, 
0x1)
                               kernel .text 0xc0100000 0xc014c320 0xc014c430
0xc1f65bdc 0xc014d6e2 balance_classzone+0x72 (0xc1f65c04, 0x1, 0x1000, 0x8, 
0xc1f64000)
                               kernel .text 0xc0100000 0xc014d670 0xc014d810
0xc1f65c14 0xc014da04 _wrapped_alloc_pages+0x1f4 (0x1d2, 0x0, 0xc03d5060, 
0xc1f75514, 0x1)
                               kernel .text 0xc0100000 0xc014d810 0xc014db20
0xc1f65c34 0xc014db43 __alloc_pages+0x23 (0xc172a940, 0xc101fa30, 0x0, 
0xcfa81708, 0xc101fa3
0)
                               kernel .text 0xc0100000 0xc014db20 0xc014dbe0
0xc1f65c60 0xc0141f25 page_cache_read+0xa5 (0xc172a940, 0x0, 0x5, 0x7)
                               kernel .text 0xc0100000 0xc0141e80 0xc0141f90
0xc1f65c78 0xc0141fce read_cluster_nonblocking+0x3e (0xc1f75514, 0x5, 
0xcfa8171c, 0xc1f754a4
, 0x7)
                               kernel .text 0xc0100000 0xc0141f90 0xc0141fe0
0xc1f65cac 0xc014394c filemap_nopage+0x12c (0xcff2b650, 0x804d000, 0x0, 
0xcfb0cf60, 0x0)
[0]more>
                               kernel .text 0xc0100000 0xc0143820 0xc0143a60
0xc1f65cf8 0xc013d956 do_no_page+0xc6 (0xcff30940, 0xcff2b650, 0x804da24, 
0x0, 0xc1c93268)
                               kernel .text 0xc0100000 0xc013d890 0xc013dc90
0xc1f65d2c 0xc013df70 handle_mm_fault+0xf0 (0xcff30940, 0xcff2b650, 
0x804da24, 0x0, 0xffffff85)
                               kernel .text 0xc0100000 0xc013de80 0xc013e040
0xc1f65e20 0xc0121580 do_page_fault_hook+0x26b (0xc1f65e30, 0x0, 0xc48e9000, 
0x1, 0xc2d65ca0)
                               kernel .text 0xc0100000 0xc0121315 0xc0121afb
           0xc0109a0c error_code+0x34
                               kernel .text 0xc0100000 0xc01099d8 0xc0109a14


Based on the line number info (inode.c:2220) here is the code snippet, 
taken from
reiserfs_write_full_page():

    if (reiserfs_transaction_running(inode->i_sb)) {
        BUG();
    }

The reason is most likely due to the fact that syslogd (the current 
process at the
time of the BUG()) was attempting to write to a log file.  As a part of 
that write,
a new transaction was started via 
__block_prepare_write()->reiserfs_get_block().
Then do_generic_file_write() took a page fault, leading to the 
memory-freeing
code and the call to reiserfs_write_full_page().

AFAICT, this problem only exists in the data-logging patch, and it still 
exists in
the latest patch against 2.4.22.  I searched around, but I couldn't find 
any mention
of this problem or a fix.

Any thoughts? Ideas?

Thanks,
-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-17 21:19 BUG in reiserfs_write_full_page() Michael Gaughen
@ 2003-07-17 21:33 ` Chris Mason
  2003-07-17 21:53   ` Michael Gaughen
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2003-07-17 21:33 UTC (permalink / raw)
  To: Michael Gaughen; +Cc: reiserfs-list

On Thu, 2003-07-17 at 17:19, Michael Gaughen wrote:
> Hello,
> 
> We have a test machine that continues to BUG() in 
> reiserfs_write_full_page().
> The machine is running SLES8 (2.4.19-152, UP). Here is the (kdb) stack 
> trace:

Hmmm, the allocation masks are supposed to be set such that writepage
won't get called.  I'll take a look.  How easy is it to reproduce?  If
you have any tests cases that can trigger it, please send them along.

-chris



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-17 21:33 ` Chris Mason
@ 2003-07-17 21:53   ` Michael Gaughen
  2003-07-18 13:23     ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Gaughen @ 2003-07-17 21:53 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

Chris Mason wrote:

>
>Hmmm, the allocation masks are supposed to be set such that writepage
>won't get called.  I'll take a look.  How easy is it to reproduce?  If
>you have any tests cases that can trigger it, please send them along.
>

Actually, the problem may have to do with the common journal_info kept
within the task structure.  Looking at ext3_writepage, they have this to 
say:

        /*
         * We give up here if we're reentered, because it might be
         * for a different filesystem.  One *could* look for a
         * nested transaction opportunity.
         */
        lock_kernel();
        if (ext3_journal_current_handle())
                goto out_fail;

and in the case of out_fail, they do:

        unlock_kernel();
        SetPageDirty(page);
        UnlockPage(page);
        return ret;

So, in the case of reiserfs_write_full_page(), the BUG() is falsely 
triggered
due to a transaction that was started on another filesystem (ext3).  And the
fix would simply be to do something along the lines of ext3...

Thanks,
-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-17 21:53   ` Michael Gaughen
@ 2003-07-18 13:23     ` Chris Mason
  2003-07-18 13:45       ` Nikita Danilov
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2003-07-18 13:23 UTC (permalink / raw)
  To: Michael Gaughen; +Cc: reiserfs-list

On Thu, 2003-07-17 at 17:53, Michael Gaughen wrote:
> Chris Mason wrote:
> 
> >
> >Hmmm, the allocation masks are supposed to be set such that writepage
> >won't get called.  I'll take a look.  How easy is it to reproduce?  If
> >you have any tests cases that can trigger it, please send them along.
> >
> 
> Actually, the problem may have to do with the common journal_info kept
> within the task structure.  Looking at ext3_writepage, they have this to 
> say:
> 
>         /*
>          * We give up here if we're reentered, because it might be
>          * for a different filesystem.  One *could* look for a
>          * nested transaction opportunity.
>          */
>         lock_kernel();
>         if (ext3_journal_current_handle())
>                 goto out_fail;
> 
> and in the case of out_fail, they do:
> 
>         unlock_kernel();
>         SetPageDirty(page);
>         UnlockPage(page);
>         return ret;
> 
> So, in the case of reiserfs_write_full_page(), the BUG() is falsely 
> triggered
> due to a transaction that was started on another filesystem (ext3).  And the
> fix would simply be to do something along the lines of ext3...

The reiserfs data logging actually does more than ext3 to make sure
things get along, recording the super of the filesystem holding the
transaction.  So, it is actually possible to start a new non-nested
transaction.

But, we shouldn't have to.  Other parts of the OS should be protecting
us from a writepage being called at this time, which is why the bug is
there.  Someone did a non GFP_NOFS allocation with a transaction
running, which can trigger a variety of deadlocks, especially from
shrink_icache and friends.

-chris



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-18 13:23     ` Chris Mason
@ 2003-07-18 13:45       ` Nikita Danilov
  2003-07-18 14:00         ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Nikita Danilov @ 2003-07-18 13:45 UTC (permalink / raw)
  To: Chris Mason; +Cc: Michael Gaughen, reiserfs-list

Chris Mason writes:
 > On Thu, 2003-07-17 at 17:53, Michael Gaughen wrote:
 > > Chris Mason wrote:
 > > 
 > > >
 > > >Hmmm, the allocation masks are supposed to be set such that writepage
 > > >won't get called.  I'll take a look.  How easy is it to reproduce?  If
 > > >you have any tests cases that can trigger it, please send them along.
 > > >
 > > 
 > > Actually, the problem may have to do with the common journal_info kept
 > > within the task structure.  Looking at ext3_writepage, they have this to 
 > > say:
 > > 
 > >         /*
 > >          * We give up here if we're reentered, because it might be
 > >          * for a different filesystem.  One *could* look for a
 > >          * nested transaction opportunity.
 > >          */
 > >         lock_kernel();
 > >         if (ext3_journal_current_handle())
 > >                 goto out_fail;
 > > 
 > > and in the case of out_fail, they do:
 > > 
 > >         unlock_kernel();
 > >         SetPageDirty(page);
 > >         UnlockPage(page);
 > >         return ret;
 > > 
 > > So, in the case of reiserfs_write_full_page(), the BUG() is falsely 
 > > triggered
 > > due to a transaction that was started on another filesystem (ext3).  And the
 > > fix would simply be to do something along the lines of ext3...
 > 
 > The reiserfs data logging actually does more than ext3 to make sure
 > things get along, recording the super of the filesystem holding the
 > transaction.  So, it is actually possible to start a new non-nested
 > transaction.

This can result in a/b<->b/a deadlock, right?

 > 
 > But, we shouldn't have to.  Other parts of the OS should be protecting
 > us from a writepage being called at this time, which is why the bug is
 > there.  Someone did a non GFP_NOFS allocation with a transaction

__bread()->__getblk()->__find_get_block()->find_get_page() allocates
page with bdev->bd_inode->i_mapping->gfp_flags, which is GFP_USER, that
includes GFP_FS.

 > running, which can trigger a variety of deadlocks, especially from
 > shrink_icache and friends.
 > 
 > -chris
 > 

Nikita.

 > 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-18 13:45       ` Nikita Danilov
@ 2003-07-18 14:00         ` Chris Mason
  2003-07-18 18:42           ` Michael Gaughen
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2003-07-18 14:00 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Michael Gaughen, reiserfs-list

On Fri, 2003-07-18 at 09:45, Nikita Danilov wrote:

>  > > So, in the case of reiserfs_write_full_page(), the BUG() is falsely 
>  > > triggered
>  > > due to a transaction that was started on another filesystem (ext3).  And the
>  > > fix would simply be to do something along the lines of ext3...
>  > 
>  > The reiserfs data logging actually does more than ext3 to make sure
>  > things get along, recording the super of the filesystem holding the
>  > transaction.  So, it is actually possible to start a new non-nested
>  > transaction.
> 
> This can result in a/b<->b/a deadlock, right?
> 
Sorry, I wasn't clear.  The transaction nesting code could detect and
deal with it (making sure not to nest into an ext3 transaction or a
reiserfs transaction on a different FS), but there are still other
deadlocks to deal with.

>  > 
>  > But, we shouldn't have to.  Other parts of the OS should be protecting
>  > us from a writepage being called at this time, which is why the bug is
>  > there.  Someone did a non GFP_NOFS allocation with a transaction
> 
> __bread()->__getblk()->__find_get_block()->find_get_page() allocates
> page with bdev->bd_inode->i_mapping->gfp_flags, which is GFP_USER, that
> includes GFP_FS.

You're in 2.5 land ;-)  There seem to be a few problems there, I've got
an oops in find_inode and a deadlock under load, but I still need to
read the (huge) sysrq-t to figure things out.

-chris



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-18 14:00         ` Chris Mason
@ 2003-07-18 18:42           ` Michael Gaughen
  2003-07-21 21:24             ` Michael Gaughen
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Gaughen @ 2003-07-18 18:42 UTC (permalink / raw)
  To: Chris Mason; +Cc: Nikita Danilov, reiserfs-list

Chris Mason wrote:

>On Fri, 2003-07-18 at 09:45, Nikita Danilov wrote:
>  
>
>> > 
>> > But, we shouldn't have to.  Other parts of the OS should be protecting
>> > us from a writepage being called at this time, which is why the bug is
>> > there.  Someone did a non GFP_NOFS allocation with a transaction
>>
>>__bread()->__getblk()->__find_get_block()->find_get_page() allocates
>>page with bdev->bd_inode->i_mapping->gfp_flags, which is GFP_USER, that
>>includes GFP_FS.
>>    
>>
>
>You're in 2.5 land ;-)  There seem to be a few problems there, I've got
>an oops in find_inode and a deadlock under load, but I still need to
>read the (huge) sysrq-t to figure things out.
>

But, in 2.4, the page fault handling code also gets the gfp_mask from 
the inode's
address_space gfp_mask (in ->page_cache_read()->page_cache_alloc()).  It 
looks
like clean_inode() sets the gfp_mask to GFP_HIGHUSER, which also includes
GFP_FS, and unless I am missing something, I don't see that GFP_FS is ever
cleared in this case.

-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG in reiserfs_write_full_page().
  2003-07-18 18:42           ` Michael Gaughen
@ 2003-07-21 21:24             ` Michael Gaughen
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Gaughen @ 2003-07-21 21:24 UTC (permalink / raw)
  To: Chris Mason; +Cc: Michael Gaughen, reiserfs-list

Michael Gaughen wrote:

> Chris Mason wrote:
>
>> On Fri, 2003-07-18 at 09:45, Nikita Danilov wrote:
>>  
>>
>>> > > But, we shouldn't have to.  Other parts of the OS should be 
>>> protecting
>>> > us from a writepage being called at this time, which is why the 
>>> bug is
>>> > there.  Someone did a non GFP_NOFS allocation with a transaction
>>>
>>> __bread()->__getblk()->__find_get_block()->find_get_page() allocates
>>> page with bdev->bd_inode->i_mapping->gfp_flags, which is GFP_USER, that
>>> includes GFP_FS.
>>>   
>>
>>
>> You're in 2.5 land ;-)  There seem to be a few problems there, I've got
>> an oops in find_inode and a deadlock under load, but I still need to
>> read the (huge) sysrq-t to figure things out.
>>
>
> But, in 2.4, the page fault handling code also gets the gfp_mask from 
> the inode's
> address_space gfp_mask (in ->page_cache_read()->page_cache_alloc()).  
> It looks
> like clean_inode() sets the gfp_mask to GFP_HIGHUSER, which also includes
> GFP_FS, and unless I am missing something, I don't see that GFP_FS is 
> ever
> cleared in this case. 


And to take this a step further (if what I described is indeed the 
problem),
GFP_NOFS could be cleared, in inode->i_mapping->gfp_mask, inside of
reiserfs_read_inode2() and reiserfs_new_inode().  Then you don't have to
rely on the core kernel for protection, and you truly guarantee that write-
page will not be called in this case.  Does that sound reasonable?

-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-07-21 21:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-17 21:19 BUG in reiserfs_write_full_page() Michael Gaughen
2003-07-17 21:33 ` Chris Mason
2003-07-17 21:53   ` Michael Gaughen
2003-07-18 13:23     ` Chris Mason
2003-07-18 13:45       ` Nikita Danilov
2003-07-18 14:00         ` Chris Mason
2003-07-18 18:42           ` Michael Gaughen
2003-07-21 21:24             ` Michael Gaughen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.