linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives
Date: Tue, 11 Feb 2014 13:43:43 -0500	[thread overview]
Message-ID: <20140211184343.GA11971@thunk.org> (raw)
In-Reply-To: <nsxtxckhfsh.fsf@closure.thunk.org>

Based on the comments raised on the list, here is a revised version of
the proposed ZBC kernel interface.

Changes from the last version:

1)  Aligned ZBC_FLAG values to be aligned with the ZBC specification to
	simplify implementations
2)  Aligned the free_sector_criteria values to be mostly aligned with the ZBC
	specification
3)  Clarified the behaviour of blkdev_query_zones()
4)  Added an ioctl interface to expose this functionality to userspace
5)  Removed the proposed simplified data variant

Please let me know what you think!

						- Ted


/*
 * Note: this structure is 24 bytes.  Using 256 MB zones, an 8TB drive
 * will have 32,768 zones.   That means if we tried to use a contiguous
 * array we would need to allocate 768k of contiguous, non-swappable
 * kernel memory.  (Boo, hiss.) 
 *
 * This large enough that it would be painful to hang an array off the
 * block_device structure.  So we will define a function
 * blkdev_query_zones() to selectively return information for some
 * number of zones.
 *
 * It is anticipated that the block device driver will store this
 * information in a compressed form, and that z_checkpoint_offset will
 * not be dynamically tracked.  That is, the checkpoint offset will,
 * if non-zero, indicates that drive suffered a power fail event, and
 * the file system or userspace process may need to implement recovery
 * procedures.  Once the file system or userspace process writes to an
 * SMR band, the checkpoint offset will be cleared and future queries
 * for the SMR band will return the checkpoint offset == write_ptr.
 */
struct zone_status {
       sector_t	z_start;
       __u32	z_length;
       __u32	z_write_ptr_offset;  /* offset */
       __u32	z_checkpoint_offset; /* offset */
       __u32	z_flags;	     /* full, ro, offline, reset_requested */
};

#define Z_FLAG_RESET_REQUESTED	0x0001
#define Z_FLAGS_OFFLINE		0x0002
#define Z_FLAGS_RO		0x0004
#define Z_FLAGS_FULL		0x0008

#define Z_FLAG_TYPE_MASK	0x0F00
#define Z_FLAG_TYPE_CONVENTIONAL 0x0100
#define Z_FLAG_TYPE_SEQUENTIAL	0x0200


/*
 * Query the block_device bdev for information about the zones
 * starting at start_sector that match the criteria specified by
 * free_sectors_criteria.  Zone status information for at most
 * max_zones will be placed into the memory array ret_zones (which is
 * allocated by the caller, not by the blkdev_query_zones function),
 * in ascending LBA order.  The return value will be a kernel error
 * code if negative, or the number of zones actually returned if
 * non-nonegative.
 *
 * If free_sectors_criteria is positive, then return zones that have
 * at least that many sectors available to be written.  If it is zero,
 * then match all zones.  If free_sectors_criteria is negative, then
 * return the zones that match the following criteria:
 *
 *	-1     Match all full zones
 *	-2     Match all open zones
 *		  (the zone has at least one written sector and is not full)
 *	-3     Match all free zones
 *		  (the zone has no written sectors)
 *      -4     Match all read-only zones
 *      -5     Match all offline zones
 *      -6     Match all zones where the write ptr != the checkpoint ptr
 *
 * The negative values are taken from Table 4 of 14-010r1, with the
 * exception of -6, which is not in the draft spec --- but IMHO should
 * be :-) It is anticipated, though, that the kernel will keep this
 * info in in memory and so will handle matching zones which meet
 * these criteria itself, without needing to issue a ZBC command for
 * each call to blkdev_query_zones().
 */
extern int blkdev_query_zones(struct block_device *bdev,
			      sector_t start_sector,
			      int free_sectors_criteria,
			      int max_zones,
       			      struct zone_status *ret_zones);

/*
 * Reset the write pointer for a sequential write zone.
 *
 * Returns -EINVAL if the start_sector is not the beginning of a
 * sequential write zone.
 */
extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
				 sector_t start_sector);


/* ioctl interface */

ZBCQUERY
	u64 starting_lba	/* IN */
	u32 criteria		/* IN */
	u32 *num_zones		/* IN/OUT */
	struct zone_status *ptr	/* OUT */

ZBCRESETZONE
	u64 starting_lba



  parent reply	other threads:[~2014-02-11 18:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-31  5:38 [RFC] Draft Linux kernel interfaces for ZBC drives Theodore Ts'o
2014-01-31 13:07 ` Matthew Wilcox
2014-01-31 15:44   ` Theodore Ts'o
2014-02-03 21:01 ` Jeff Moyer
2014-02-03 21:07   ` Martin K. Petersen
2014-02-03 21:38   ` Theodore Ts'o
2014-02-03 22:26     ` Jeff Moyer
2014-02-03 21:03 ` Eric Sandeen
2014-02-03 22:17   ` Theodore Ts'o
2014-02-04  2:00 ` HanBin Yoon
2014-02-04 16:27   ` Theodore Ts'o
2014-02-11 18:43 ` Theodore Ts'o [this message]
2014-02-11 19:04   ` [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Andreas Dilger
2014-02-11 19:53     ` Theodore Ts'o
2014-02-13  2:08       ` Andreas Dilger
2014-02-13  3:09         ` Theodore Ts'o
2014-02-21 10:02 ` [RFC] Draft Linux kernel interfaces for ZBC drives Rohan Puri
2014-02-21 15:49   ` Theodore Ts'o
2014-02-25  9:36     ` Rohan Puri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211184343.GA11971@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).