linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: raid10 make_request failure during iozone benchmark upon btrfs
       [not found]       ` <4FF2554D.2040300@gmail.com>
@ 2012-07-03  2:47         ` NeilBrown
  2012-07-03 15:08           ` Chris Mason
  0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2012-07-03  2:47 UTC (permalink / raw)
  To: Kerin Millar; +Cc: linux-raid, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5526 bytes --]

On Tue, 03 Jul 2012 03:13:33 +0100 Kerin Millar <kerframil@gmail.com> wrote:

> Hi,
> 
> On 03/07/2012 02:39, NeilBrown wrote:
> 
> [snip]
> 
>  >>> Could you please double check that you are running a kernel with
>  >>>
>  >>> commit aba336bd1d46d6b0404b06f6915ed76150739057
>  >>> Author: NeilBrown<neilb@suse.de>
>  >>> Date:   Thu May 31 15:39:11 2012 +1000
>  >>>
>  >>>       md: raid1/raid10: fix problem with merge_bvec_fn
>  >>>
>  >>> in it?
>  >>
>  >> I am indeed. I searched the list beforehand and noticed the patch in
>  >> question. Not sure which -rc it landed in but I checked my source tree
>  >> and it's definitely in there.
>  >>
>  >> Cheers,
>  >>
>  >> --Kerin
>  >
>  > Thanks.
>  > Looking at it again I see that it is definitely a different bug, that patch
>  > wouldn't affect it.
>  >
>  > But I cannot see what could possibly be causing the problem.
>  > You have a 256K chunk size, so requests should be limited to 512 sectors
>  > aligned at a 512-sector boundary.
>  > However all the requests that a causing errors are 512 sectors long, but
>  > aligned on a 256-sector boundary (which is not also 512-sector).  This is
>  > wrong.
> 
> I see.
> 
>  >
>  > It could be that btrfs is submitting bad requests, but I think it always uses
>  > bio_add_page, and bio_add_page appears to do the right thing.
>  > It could be that dm-linear is causing problem, but it seems to correctly after
>  > the underlying device for alignment, and reports that alignment to
>  > bio_add_page.
>  > It could be that md/raid10 is the problem but I cannot find any fault in
>  > raid10_mergeable_bvec - performs much the same tests that the
>  > raid01 make_request function does.
>  >
>  > So it is a mystery.
>  >
>  > Is this failure repeatable?
> 
> Yes, it's reproducible with 100% consistency. Furthermore, I tried to
> use the btrfs volume as a store for the package manager, so as to try
> with a 'realistic' workload. Many of these errors were triggered
> immediately upon invoking the package manager. In case it matters, the
> package manager is portage (in Gentoo Linux) and the directory structure
> entails a shallow directory depth with a large number of distributed
> small files. I haven't been able to reproduce with xfs, ext4 or reiserfs.
> 
>  >
>  > If so, could you please insert
>  >     WARN_ON_ONCE(1);
>  > in drivers/md/raid10.c where it prints out the message: just after the
>  > "bad_map:" label.
>  >
>  > Also, in raid10_mergeable_bvec, insert
>  >     WARN_ON_ONCE(max<  0);
>  > just before
>  > 		if (max<  0)
>  > 			/* bio_add cannot handle a negative return */
>  > 			max = 0;
>  >
>  > and then see if either of those generate a warning, and post the full stack
>  > trace  if they do.
> 
> OK. I ran iozone again on a fresh filesystem, mounted with the default
> options. Here's the trace that appears, just before the first
> make_request_bug message:
> 
> WARNING: at drivers/md/raid10.c:1094 make_request+0xda5/0xe20()
> Hardware name: ProLiant MicroServer
> Modules linked in: btrfs zlib_deflate lzo_compress kvm_amd kvm sp5100_tco i2c_piix4
> Pid: 1031, comm: btrfs-submit-1 Not tainted 3.5.0-rc5 #3
> Call Trace:
> [<ffffffff81031987>] ? warn_slowpath_common+0x67/0xa0
> [<ffffffff81442b45>] ? make_request+0xda5/0xe20
> [<ffffffff81460b34>] ? __split_and_process_bio+0x2d4/0x600
> [<ffffffff81063429>] ? set_next_entity+0x29/0x60
> [<ffffffff810652c3>] ? pick_next_task_fair+0x63/0x140
> [<ffffffff81450b7f>] ? md_make_request+0xbf/0x1e0
> [<ffffffff8123d12f>] ? generic_make_request+0xaf/0xe0
> [<ffffffff8123d1c3>] ? submit_bio+0x63/0xe0
> [<ffffffff81040abd>] ? try_to_del_timer_sync+0x7d/0x120
> [<ffffffffa016839a>] ? run_scheduled_bios+0x23a/0x520 [btrfs]
> [<ffffffffa0170e40>] ? worker_loop+0x120/0x520 [btrfs]
> [<ffffffffa0170d20>] ? btrfs_queue_worker+0x2e0/0x2e0 [btrfs]
> [<ffffffff810520c5>] ? kthread+0x85/0xa0
> [<ffffffff815441f4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81052040>] ? kthread_freezable_should_stop+0x60/0x60
> [<ffffffff815441f0>] ? gs_change+0xb/0xb
> 
> Cheers,
> 
> --Kerin

Thanks.  Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)

The symptom is that iozone on btrfs on md/raid10 can result in

[  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
[  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0


i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
two chunks - the last half of one chunk and the first half of the next.
That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
prevent it.

However btrfs_map_bio() sets ->bi_sector to a new value without verifying
that the resulting bio is still acceptable - which it isn't.

The core problem is that you cannot build a bio for one location, then use it
freely at another location.
md/raid1 handles this by checking each addition to a bio against all the
possible location that it might read/write it.  Maybe btrfs could do the
same.
Alternately we could work with Kent Overstreet (of bcache fame) to remove the
restriction that the fs must make the bio compatible with the device -
instead requiring the device to split bios when needed, and making it easy to
do that (currently it is not easy).
And there are probably other alternative.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-03  2:47         ` raid10 make_request failure during iozone benchmark upon btrfs NeilBrown
@ 2012-07-03 15:08           ` Chris Mason
  0 siblings, 0 replies; 2+ messages in thread
From: Chris Mason @ 2012-07-03 15:08 UTC (permalink / raw)
  To: NeilBrown
  Cc: Kerin Millar, linux-raid@vger.kernel.org,
	linux-btrfs@vger.kernel.org

On Mon, Jul 02, 2012 at 08:47:27PM -0600, NeilBrown wrote:
> Thanks.  Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)
> 
> The symptom is that iozone on btrfs on md/raid10 can result in
> 
> [  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
> [  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> 
> 
> i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
> two chunks - the last half of one chunk and the first half of the next.
> That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
> prevent it.
> 
> However btrfs_map_bio() sets ->bi_sector to a new value without verifying
> that the resulting bio is still acceptable - which it isn't.
> 
> The core problem is that you cannot build a bio for one location, then use it
> freely at another location.
> md/raid1 handles this by checking each addition to a bio against all the
> possible location that it might read/write it.  Maybe btrfs could do the
> same.
> Alternately we could work with Kent Overstreet (of bcache fame) to remove the
> restriction that the fs must make the bio compatible with the device -
> instead requiring the device to split bios when needed, and making it easy to
> do that (currently it is not easy).
> And there are probably other alternative.

In this case btrfs should really break the bio down to smaller chunks
and hand feed the lower layers.  There are corners where we think the
device can go a certain size and then later on figure out we were just
too optimistic.  So we should deal with it by breaking the bio up and
then lowering our max.

-chris


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-07-03 15:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4FF108A8.6090606@gmail.com>
     [not found] ` <20120702125227.179c4343@notabene.brown>
     [not found]   ` <4FF10E71.2090501@gmail.com>
     [not found]     ` <20120703113943.3e4c43ad@notabene.brown>
     [not found]       ` <4FF2554D.2040300@gmail.com>
2012-07-03  2:47         ` raid10 make_request failure during iozone benchmark upon btrfs NeilBrown
2012-07-03 15:08           ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).