linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Cc: linux-btrfs@vger.kernel.org, Eryu Guan <guaneryu@gmail.com>,
	David Sterba <dsterba@suse.com>
Subject: Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
Date: Wed, 15 Jun 2016 17:09:55 -0700	[thread overview]
Message-ID: <20160616000955.GC8071@localhost.localdomain> (raw)
In-Reply-To: <2040012.GtZQCNmHYt@localhost.localdomain>

On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > Hello Liu Bo,
> > 
> > We have to fix the following check in check_super() as well,
> > 
> >        if (btrfs_super_stripesize(sb) != 4096) {
> >                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
> >                 goto error_out;
> >         }
> > 
> > i.e. btrfs_super_stripesize(sb) must be equal to
> > btrfs_super_sectorsize(sb).
> > 
> > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > devices with existing Btrfs filesystem instances,
> > 
> > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > check_super() to,
> > 
> >         if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> >                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
> >                 goto error_out;
> >         }
> > 
> > 
> > main()
> >  for each device file passed as an argument,
> >    test_dev_for_mkfs()
> >      check_mounted
> >        check_mounted_where
> >          btrfs_scan_one_device
> >            btrfs_read_dev_super
> >              check_super() call will fail for existing filesystems which
> > have stripesize set to 4k. All existing filesystem instances will fall into
> > this category.
> > 
> > This error value is pushed up the call stack and this causes the device to
> > not get added to the fs_devices_mnt list in check_mounted_where(). Hence we
> > would fail to correctly check the mount status of the multi-device btrfs
> > filesystems.
> > 
> 
> 
> We can end up in the following scenario,
> - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
>   filesystem. The filesystem was created by an older version of mkfs.btrfs
>   which set stripesize to 4k. 
> - losetup -a
>    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)  
>    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)  
>    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)  
> - /etc/mtab lists only /dev/loop0
> - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
>    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
>    though /dev/loop1 has already been mounted and has
>    /root/disk-imgs/file-1.img as its backing file.
> 
> So IMHO the only solution is to have the stripesize check in check_super() to
> allow both '4k' and 'sectorsize' as valid values i.e.
> 
>         if ((btrfs_super_stripesize(sb) != 4096)
> 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
>                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
>                 goto error_out;
>         }

That's a good one.

But if we go back to the original point, in the kernel side,
1. in open_ctree(), root->stripesize = btrfs_super_stripesize();

2. in find_free_extent(),
	...
	search_start = ALIGN(offset, root->stripesize);
	...
3. in btrfs_alloc_tree_block(),
	...
	ret = btrfs_reserve_extent(..., &ins, ...);
	...
	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);

4. in btrfs_init_new_buffer(),
	...
	buf = btrfs_find_create_tree_block(root, bytenr);
	...

Because 'num_bytes' we pass to find_free_extent() is aligned to
sectorsize, the free space we can find is aligned to sectorsize,
which means 'offset' in '1. find_free_extent()' is aligned to sectorsize.

If our stripesize is larger than sectorsize, say 4 * sectorsize,
for data allocation it's fine while for metadata block allocations it's
not.  It is possible that when we allocate a new metadata block, we can
end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.

PS: There is something wrong around '2. in find_free_extent()',
we only do alignment on offset, but for num_bytes, we don't do that,
so we may end up with a overlap with other data extents or metadata
blocks.

So I think we can just replace this ALING with a warning since the offset
returned by searching free space tree is aligned to block_group->full_stripe_len,
which is either sectorsize or BTRFS_STRIPE_LEN * nr_stripes (for
raid56), then we can just drop the check for stripesize everywhere.

Thanks,

-liubo

  reply	other threads:[~2016-06-16  0:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14 21:33 [PATCH] Btrfs: let super_stripesize match with sectorsize Liu Bo
2016-06-15  3:42 ` Chandan Rajendra
2016-06-15 10:20   ` Chandan Rajendra
2016-06-16  0:09     ` Liu Bo [this message]
2016-06-16  8:23       ` Chandan Rajendra
2016-06-16 17:01         ` Liu Bo
2016-06-17  5:18           ` Chandan Rajendra
2016-06-17  6:08             ` Liu Bo
2016-06-17  6:51               ` Chandan Rajendra
2016-06-17 16:30                 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160616000955.GC8071@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=chandan@linux.vnet.ibm.com \
    --cc=dsterba@suse.com \
    --cc=guaneryu@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).