All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Dmitri Monakhov <dmonakhov@openvz.org>,
	ext4 <linux-ext4@vger.kernel.org>
Subject: Re: delayed allocation result in BUG at fs/buffer.c:2880!
Date: Thu, 20 Mar 2008 23:32:34 +0530	[thread overview]
Message-ID: <20080320180234.GB6931@skywalker> (raw)
In-Reply-To: <1206033709.3637.15.camel@localhost.localdomain>


Adding linux-ext4 back.

On Thu, Mar 20, 2008 at 10:21:49AM -0700, Mingming Cao wrote:
> On Thu, 2008-03-20 at 20:46 +0530, Aneesh Kumar K.V wrote:
> > On Thu, Mar 20, 2008 at 05:04:47PM +0300, Dmitri Monakhov wrote:
> > > On 17:39 Thu 20 Mar     , Aneesh Kumar K.V wrote:
> > > > On Thu, Mar 20, 2008 at 11:16:19AM +0300, Dmitri Monakhov wrote:
> > > > > On 21:39 Wed 19 Mar     , Eric Sandeen wrote:
> > > > > > Solofo.Ramangalahy@bull.net wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > During stress testing (workload: racer from ltp + fio/iometer), here
> > > > > > > is an error I am encountering:
> > > > > > > 8<------------------------------------------------------------------------------
> > > > > > > kernel: WARNING: at fs/buffer.c:1680 __block_write_full_page+0xd4/0x2af()
> > > > > > 
> > > > > > So this is WARN_ON(bh->b_size != blocksize);
> > > > > > 
> > > > > > What is b_size in this case?
> > > > > FS block size, because this page pinned bh (it comes from page_buffers(page)), but
> > > > > not dummy bh which may comes from {write,read}pages or direct_IO. 
> > > > > Page's bh i_size must always be equal to fs blocksize.
> > > > > This bh always constructed via following construction
> > > > > if (!page_has_buffers(page))
> > > > > 	create_empty_buffers(page, 1<<inode->i_blkbits, flags)
> > > > > So page's bh->b_size was inited with right value from very beginning, but
> > > > > apparently somewhere this size was changed 
> > > > > I guess i've localized buggy place, at least it's looks strange.
> > > > > ext4_da_get_block_prep ()
> > > > > {
> > > > > ...
> > > > > 	BUG_ON(create == 0);
> > > > >         BUG_ON(bh_result->b_size != inode->i_sb->s_blocksize);
> > > > > 	ret =  ext4_get_blocks_wrap(NULL,  inode, iblock, 1,  bh_result, 0, 0);
> > > > > #Here ext4_get_block_write called with max_blocks == 1  ^^^^^
> > > > > 	...
> > > > > 	if (ret > 0) {
> > > > >                         bh_result->b_size = (ret << inode->i_blkbits);
> > > > > 	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > > ## I don't understand this place. I hoped what (ret <= max_blocks) must always
> > > > > ##be true true. But after I've add debug info printing I've got following result.
> > > > >                 ret = 0;
> > > > >         }
> > > > > ...
> > > > > }
> > > > > Some times I've seen following ,message 
> > > > >  bh= {state=0,size=114688, blknr=18446744073709551615 dev=0000000000000000,count=0}, ret=28
> > > > > And because it was page-cache's bh later this result in WARNING.
> > > > 
> > > > Is that a fallocate space ?. For falloc space we can return values
> > > > greater than max_blocks. ext4_ext_get_blocks was made to return  >0
> > > > for a read on prealloc space to ensure delalloc doesn't reserve space
> > > > for the same. I guess we need to make sure we don't return more than
> > > > max_blocks. Can you try the patch below
> > > Ok Warning has gone, but resulted bh still incorrectly filled.
> > > I've found what function ext4_da_get_block_prep() return bh witch 
> > > is !mapped and !delayed, which is prohibited because it is always called with
> > > create != 0. BH debug info at the end of this function result in following msg
> > > 
> > > BH={state=0, size=4096, blknr=18446744073709551615,dev=0000000000000000,
> > >   count=0} block =288 ret=1
> > > 
> > > Later this incorrectly filled bh result in BUG_ON triggering
> > > ------------[ cut here ]------------
> > >  kernel BUG at fs/buffer.c:2880!
> > >  invalid opcode: 0000 [1] SMP
> > >  CPU 1

.....

> > > 
> > > > 
> > > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > > > index d6ae40a..4985fd5 100644
> > > > --- a/fs/ext4/extents.c
> > > > +++ b/fs/ext4/extents.c
> > > > @@ -2600,8 +2600,18 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> > > >  			}
> > > >  			if (create == EXT4_CREATE_UNINITIALIZED_EXT)
> > > >  				goto out;
> > > > -			if (!create)
> > > > +			if (!create) {
> > > > +				/*
> > > > +				 * We have blocks reserved already. We
> > > > +				 * return allocated blocks so that delalloc
> > > > +				 * won't do block reservation for us. But
> > > > +				 * the buffer head will be unmapped so that
> > > > +				 * a read from the block return 0
> > > > +				 */
> > > > +				if (allocated > max_blocks)
> > > > +					allocated = max_blocks;
> > > >  				goto out2;
> > > > +			}
> > > >  
> > > >  			ret = ext4_ext_convert_to_initialized(handle, inode,
> > > >  								path, iblock,
> > 
> > With prealloc space we still need to make sure buffer heads are marked
> > new and delayed. 
> I doubt this. prealloc space should not mark as delayed. The allocation
> already done. delayed flag triggeres block reservation for delayed
> allocation, with is not needed for preallocation, that will cause double
> accounting for free space.
> 
> With delayed allocation, where hit preallocated space, get_block() right
> now returns bh as new but return value > 0 (it's possible that returns >
> maxblocks, as we just return a single large extent).


As Dmitri mentioned in the previous mail if the buffer head is not
marked as delayed or new, in __block_prepare_write after get_block
we would do a ll_rw_block(READ, 1, &bh); and that will result in BUG_ON.


> 
> > Only difference between prealloc and get_block failure
> > case should be in failure case we need to do block reservation.
> 
> Correct, in the failure case, the returned number of blocks from
> get_block() is 0, but with preallocation, the return value is positive.
> Both case the resulting bh is remains new, unmapped.
> 
> >   With
> > prealloc we still like to get get_block called again with create = 1
> > so that the uninit extent get split. 
> > 
> I could not see why we still need doing create =1 at write_begin time
> with delayed allocation, if the space has already preallocated.
> 
> The preallocation extent split could be defered at write out time,
> get_block() is always called with create = 1 at writepage() time. 
> 


I meant at writepage time.



> > I would also like to test it locally. How are you reproducing it. Just
> > fsstress won't reproduce it right ?
> > 
> 
> Not sure which ext4 tree Dmitri is testing, I have a patch to handle
> preallocation case in delayed allocation, I wonder if that makes the
> problem goes away? 
> 
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=delalloc-ext4-preallocation-handling.patch;h=ba3b70ecba99137d452b6692c92caabe8831392e;hb=80aeb2ef59cdb97bf527570cb273f6e5d5d27e3f
> 

  parent reply	other threads:[~2008-03-20 18:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-18  9:49 [2.6.25-rc5-ext4-36c86] attempt to access beyond end of device Solofo.Ramangalahy
2008-03-18 16:23 ` Dmitri Monakhov
2008-03-20  2:19   ` Eric Sandeen
2008-03-19 11:35 ` Andreas Dilger
2008-03-19 11:56   ` Solofo.Ramangalahy
2008-03-20  2:39 ` Eric Sandeen
2008-03-20  3:04   ` Solofo.Ramangalahy
2008-03-20  8:16   ` Dmitri Monakhov
2008-03-20 12:09     ` Aneesh Kumar K.V
2008-03-20 14:04       ` delayed allocation result in BUG at fs/buffer.c:2880! Dmitri Monakhov
     [not found]         ` <20080320151645.GA23301@skywalker>
     [not found]           ` <1206033709.3637.15.camel@localhost.localdomain>
2008-03-20 18:02             ` Aneesh Kumar K.V [this message]
2008-03-20 18:18               ` Mingming Cao
2008-03-21  0:49     ` [2.6.25-rc5-ext4-36c86] attempt to access beyond end of device Mingming Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080320180234.GB6931@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.