From: Theodore Ts'o <tytso@mit.edu>
To: linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives
Date: Tue, 11 Feb 2014 13:43:43 -0500 [thread overview]
Message-ID: <20140211184343.GA11971@thunk.org> (raw)
In-Reply-To: <nsxtxckhfsh.fsf@closure.thunk.org>
Based on the comments raised on the list, here is a revised version of
the proposed ZBC kernel interface.
Changes from the last version:
1) Aligned ZBC_FLAG values to be aligned with the ZBC specification to
simplify implementations
2) Aligned the free_sector_criteria values to be mostly aligned with the ZBC
specification
3) Clarified the behaviour of blkdev_query_zones()
4) Added an ioctl interface to expose this functionality to userspace
5) Removed the proposed simplified data variant
Please let me know what you think!
- Ted
/*
* Note: this structure is 24 bytes. Using 256 MB zones, an 8TB drive
* will have 32,768 zones. That means if we tried to use a contiguous
* array we would need to allocate 768k of contiguous, non-swappable
* kernel memory. (Boo, hiss.)
*
* This large enough that it would be painful to hang an array off the
* block_device structure. So we will define a function
* blkdev_query_zones() to selectively return information for some
* number of zones.
*
* It is anticipated that the block device driver will store this
* information in a compressed form, and that z_checkpoint_offset will
* not be dynamically tracked. That is, the checkpoint offset will,
* if non-zero, indicates that drive suffered a power fail event, and
* the file system or userspace process may need to implement recovery
* procedures. Once the file system or userspace process writes to an
* SMR band, the checkpoint offset will be cleared and future queries
* for the SMR band will return the checkpoint offset == write_ptr.
*/
struct zone_status {
sector_t z_start;
__u32 z_length;
__u32 z_write_ptr_offset; /* offset */
__u32 z_checkpoint_offset; /* offset */
__u32 z_flags; /* full, ro, offline, reset_requested */
};
#define Z_FLAG_RESET_REQUESTED 0x0001
#define Z_FLAGS_OFFLINE 0x0002
#define Z_FLAGS_RO 0x0004
#define Z_FLAGS_FULL 0x0008
#define Z_FLAG_TYPE_MASK 0x0F00
#define Z_FLAG_TYPE_CONVENTIONAL 0x0100
#define Z_FLAG_TYPE_SEQUENTIAL 0x0200
/*
* Query the block_device bdev for information about the zones
* starting at start_sector that match the criteria specified by
* free_sectors_criteria. Zone status information for at most
* max_zones will be placed into the memory array ret_zones (which is
* allocated by the caller, not by the blkdev_query_zones function),
* in ascending LBA order. The return value will be a kernel error
* code if negative, or the number of zones actually returned if
* non-nonegative.
*
* If free_sectors_criteria is positive, then return zones that have
* at least that many sectors available to be written. If it is zero,
* then match all zones. If free_sectors_criteria is negative, then
* return the zones that match the following criteria:
*
* -1 Match all full zones
* -2 Match all open zones
* (the zone has at least one written sector and is not full)
* -3 Match all free zones
* (the zone has no written sectors)
* -4 Match all read-only zones
* -5 Match all offline zones
* -6 Match all zones where the write ptr != the checkpoint ptr
*
* The negative values are taken from Table 4 of 14-010r1, with the
* exception of -6, which is not in the draft spec --- but IMHO should
* be :-) It is anticipated, though, that the kernel will keep this
* info in in memory and so will handle matching zones which meet
* these criteria itself, without needing to issue a ZBC command for
* each call to blkdev_query_zones().
*/
extern int blkdev_query_zones(struct block_device *bdev,
sector_t start_sector,
int free_sectors_criteria,
int max_zones,
struct zone_status *ret_zones);
/*
* Reset the write pointer for a sequential write zone.
*
* Returns -EINVAL if the start_sector is not the beginning of a
* sequential write zone.
*/
extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
sector_t start_sector);
/* ioctl interface */
ZBCQUERY
u64 starting_lba /* IN */
u32 criteria /* IN */
u32 *num_zones /* IN/OUT */
struct zone_status *ptr /* OUT */
ZBCRESETZONE
u64 starting_lba
next prev parent reply other threads:[~2014-02-11 18:43 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-31 5:38 [RFC] Draft Linux kernel interfaces for ZBC drives Theodore Ts'o
2014-01-31 13:07 ` Matthew Wilcox
2014-01-31 15:44 ` Theodore Ts'o
2014-02-03 21:01 ` Jeff Moyer
2014-02-03 21:07 ` Martin K. Petersen
2014-02-03 21:38 ` Theodore Ts'o
2014-02-03 22:26 ` Jeff Moyer
2014-02-03 21:03 ` Eric Sandeen
2014-02-03 22:17 ` Theodore Ts'o
2014-02-04 2:00 ` HanBin Yoon
2014-02-04 16:27 ` Theodore Ts'o
2014-02-11 18:43 ` Theodore Ts'o [this message]
2014-02-11 19:04 ` [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Andreas Dilger
2014-02-11 19:53 ` Theodore Ts'o
2014-02-13 2:08 ` Andreas Dilger
2014-02-13 3:09 ` Theodore Ts'o
2014-02-21 10:02 ` [RFC] Draft Linux kernel interfaces for ZBC drives Rohan Puri
2014-02-21 15:49 ` Theodore Ts'o
2014-02-25 9:36 ` Rohan Puri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140211184343.GA11971@thunk.org \
--to=tytso@mit.edu \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).