From: Anand Jain <anand.jain@oracle.com>
To: Naohiro Aota <naohiro.aota@wdc.com>,
linux-btrfs@vger.kernel.org, dsterba@suse.com
Subject: Re: [PATCH 0/1] zoned: moving superblock logging zones
Date: Mon, 29 Mar 2021 11:33:02 +0800 [thread overview]
Message-ID: <814c44b8-b700-7878-a881-f89cd99982e8@oracle.com> (raw)
In-Reply-To: <cover.1615773143.git.naohiro.aota@wdc.com>
On 15/03/2021 13:53, Naohiro Aota wrote:
> The following patch will change the superblock logging zones' location from
> fixed zone number to fixed LBAs.
>
> Here is a background of how the superblock is working on zoned btrfs.
>
> This document will be promoted to btrfs-dev-docs in the future.
>
> # Superblock logging for zoned btrfs
>
> The superblock and its copies are the only data structures in btrfs with a
> fixed location on a device. Since we cannot overwrite these blocks if they
> are placed in sequential write required zones, we cannot use the regular
> method of updating superblocks with zoned btrfs.
Looks like a ZBC which does the write pointer reset and write could
have helped here.
> We also cannot limit the
> position of superblocks to conventional zones as that would prevent using
> zoned block devices that do not have this zone type (e.g. NVMe ZNS SSDs).
>
> To solve this problem, we use superblock log writing. This method uses two
> sequential write required zones as a circular buffer to write updated
> superblocks. Once the first zone is filled up, start writing into the
> second zone. When both zones are filled up and before start writing to the
> first zone again, the first zone is reset and writing continues in the
> first zone. Once the first zone is full, reset the second zone, and write
> the latest superblock in the second zone. With this logging, we can always
> determine the position of the latest superblock by inspecting the zones'
> write pointer information provided by the device. One corner case is when
> both zones are full. For this situation, we read out the last superblock of
> each zone and compare them to determine which copy is the latest one.
>
> ## Placement of superblock logging zones
>
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
>
> - Primary superblock: zone starting at offset 0 and the following zone
> - First copy: zone containing offset 64GB and the following zone
> - Second copy: zone containing offset 256GB and the following zone
>
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
>
> The first copy position is much larger than for a regular btrfs volume
> (64M). This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
>
> ## Writing superblock in conventional zones
>
> Conventional zones do not have a write pointer. This zone type thus cannot
> be used with superblock logging since determining the position of the
> latest copy of the superblock in a zone pair would be impossible.
>
> To address this problem, if either of the zones containing the fixed offset
> locations for zone logging is a conventional zone, superblock updates are
> done in-place using the first block of the conventional zone.
>
> ## Reading zoned btrfs dump image without zone information
>
> Reading a zoned btrfs image without zone information is challenging but
> possible.
>
> We can always find a superblock copy at or after the fixed offset locations
> determining the logging zones position. With such copy, the superblock
> incompatible flags indicates if the volume is zoned or not. With a chunk
> item in the sys_chunk_array, we can determine the zone size from the size
> of a device extent, itself determined from the chunk length, num_stripes,
> and sub_stripes. With this information, all blocks within the 2 logging
> zones containing the fixed locations can be inspected to find the newest
> superblock copy.
>
> The first zone of a log pair may be empty and have no superblock copy. This
> can happen if a system crashes after resetting the first zone of a pair and
> before writing out a new superblock. In this case, a superblock copy can be
> found in the second zone of a log pair. The start of this second zone can
> be found by inspecting the blocks located at the fixed offset of the log
> pair plus the possible zone size (4M [1], 8M, 16M, 32M, 64M, 128M, 256M,
> 512M, 1G, 2G, 4G, 8G [2])[3]. Once we find a superblock, we can follow the
> same instruction above to find the latest superblock copy within the zone
> log pair.
>
> [1] 4M = BTRFS_MKFS_SYSTEM_GROUP_SIZE. We cannot mkfs on a device with a
> zone size less than 4MB because we cannot create the initial temporary
> system chunk with the size.
> [2] The maximum size we support for now.
> [3] The zone size is limited to these 11 cases, as it must be a power of 2.
>
> Once we find the latest superblock, it is no different than reading a
> regular btrfs image. You can further confirm the determined zone size by
> comparing it with the size of a device extent because it is the same as the
> zone size.
>
> Actually, since the writing offset within the logging buffer is different
> from the primary to copies [4], the timing when resetting the former zone
> will become different. So, we can also try reading the head of the buffer
> of a copy in case of missing superblock at offset 0.
>
> [4] Because mkfs update the primary in the initial process, advancing only
> the write pointer of the primary log buffer
>
> ## Superblock writing on an emulated zoned device
>
> By mounting a regular device in zoned mode, btrfs emulates conventional
> zones by slicing the device with a fixed size. In this case, however, we do
> not follow the above rule of writing superblocks at the head of the logging
> zones if they are conventional. Doing so would introduce a chicken-and-egg
> problem. To know the given btrfs is zoned btrfs, we need to read a
> superblock to see the incompatible flags. But, to read a superblock
> properly from a zoned position, we need to know the file-system is zoned a
> priori (e.g. resided in a zoned device), leading to a recursive dependency.
>
> We can use the regular super block update method on an emulated zoned
> device to break the recursion. Since the zones containing the regular
> locations are always reserved, it is safe to do so. Then, we can naturally
> read a regular superblock on a regular device and determine the file-system
> is zoned or not.
>
> Naohiro Aota (1):
> btrfs: zoned: move superblock logging zone location
>
> fs/btrfs/zoned.c | 40 ++++++++++++++++++++++++++++++----------
> 1 file changed, 30 insertions(+), 10 deletions(-)
>
next prev parent reply other threads:[~2021-03-29 3:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-15 5:53 [PATCH 0/1] zoned: moving superblock logging zones Naohiro Aota
2021-03-15 5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
2021-03-19 8:19 ` Johannes Thumshirn
2021-03-24 8:42 ` Damien Le Moal
2021-03-26 15:56 ` Johannes Thumshirn
2021-04-07 17:52 ` Josef Bacik
2021-04-07 18:31 ` Johannes Thumshirn
2021-04-07 18:56 ` Josef Bacik
2021-03-29 3:33 ` Anand Jain [this message]
2021-03-29 7:36 ` [PATCH 0/1] zoned: moving superblock logging zones Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=814c44b8-b700-7878-a881-f89cd99982e8@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).