From: Liu Bo <bo.li.liu@oracle.com>
To: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Cc: linux-btrfs@vger.kernel.org, Eryu Guan <guaneryu@gmail.com>,
David Sterba <dsterba@suse.com>
Subject: Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
Date: Wed, 15 Jun 2016 17:09:55 -0700 [thread overview]
Message-ID: <20160616000955.GC8071@localhost.localdomain> (raw)
In-Reply-To: <2040012.GtZQCNmHYt@localhost.localdomain>
On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > Hello Liu Bo,
> >
> > We have to fix the following check in check_super() as well,
> >
> > if (btrfs_super_stripesize(sb) != 4096) {
> > error("invalid stripesize %u", btrfs_super_stripesize(sb));
> > goto error_out;
> > }
> >
> > i.e. btrfs_super_stripesize(sb) must be equal to
> > btrfs_super_sectorsize(sb).
> >
> > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > devices with existing Btrfs filesystem instances,
> >
> > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > check_super() to,
> >
> > if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> > error("invalid stripesize %u", btrfs_super_stripesize(sb));
> > goto error_out;
> > }
> >
> >
> > main()
> > for each device file passed as an argument,
> > test_dev_for_mkfs()
> > check_mounted
> > check_mounted_where
> > btrfs_scan_one_device
> > btrfs_read_dev_super
> > check_super() call will fail for existing filesystems which
> > have stripesize set to 4k. All existing filesystem instances will fall into
> > this category.
> >
> > This error value is pushed up the call stack and this causes the device to
> > not get added to the fs_devices_mnt list in check_mounted_where(). Hence we
> > would fail to correctly check the mount status of the multi-device btrfs
> > filesystems.
> >
>
>
> We can end up in the following scenario,
> - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> filesystem. The filesystem was created by an older version of mkfs.btrfs
> which set stripesize to 4k.
> - losetup -a
> /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> - /etc/mtab lists only /dev/loop0
> - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
> though /dev/loop1 has already been mounted and has
> /root/disk-imgs/file-1.img as its backing file.
>
> So IMHO the only solution is to have the stripesize check in check_super() to
> allow both '4k' and 'sectorsize' as valid values i.e.
>
> if ((btrfs_super_stripesize(sb) != 4096)
> && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> error("invalid stripesize %u", btrfs_super_stripesize(sb));
> goto error_out;
> }
That's a good one.
But if we go back to the original point, in the kernel side,
1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
2. in find_free_extent(),
...
search_start = ALIGN(offset, root->stripesize);
...
3. in btrfs_alloc_tree_block(),
...
ret = btrfs_reserve_extent(..., &ins, ...);
...
buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
4. in btrfs_init_new_buffer(),
...
buf = btrfs_find_create_tree_block(root, bytenr);
...
Because 'num_bytes' we pass to find_free_extent() is aligned to
sectorsize, the free space we can find is aligned to sectorsize,
which means 'offset' in '1. find_free_extent()' is aligned to sectorsize.
If our stripesize is larger than sectorsize, say 4 * sectorsize,
for data allocation it's fine while for metadata block allocations it's
not. It is possible that when we allocate a new metadata block, we can
end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.
PS: There is something wrong around '2. in find_free_extent()',
we only do alignment on offset, but for num_bytes, we don't do that,
so we may end up with a overlap with other data extents or metadata
blocks.
So I think we can just replace this ALING with a warning since the offset
returned by searching free space tree is aligned to block_group->full_stripe_len,
which is either sectorsize or BTRFS_STRIPE_LEN * nr_stripes (for
raid56), then we can just drop the check for stripesize everywhere.
Thanks,
-liubo
next prev parent reply other threads:[~2016-06-16 0:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 21:33 [PATCH] Btrfs: let super_stripesize match with sectorsize Liu Bo
2016-06-15 3:42 ` Chandan Rajendra
2016-06-15 10:20 ` Chandan Rajendra
2016-06-16 0:09 ` Liu Bo [this message]
2016-06-16 8:23 ` Chandan Rajendra
2016-06-16 17:01 ` Liu Bo
2016-06-17 5:18 ` Chandan Rajendra
2016-06-17 6:08 ` Liu Bo
2016-06-17 6:51 ` Chandan Rajendra
2016-06-17 16:30 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160616000955.GC8071@localhost.localdomain \
--to=bo.li.liu@oracle.com \
--cc=chandan@linux.vnet.ibm.com \
--cc=dsterba@suse.com \
--cc=guaneryu@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.