From: Christoph Hellwig <hch@infradead.org>
To: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@infradead.org>,
linux-block <linux-block@vger.kernel.org>,
Damien Le Moal <Damien.LeMoal@wdc.com>,
Keith Busch <kbusch@kernel.org>,
"linux-scsi @ vger . kernel . org" <linux-scsi@vger.kernel.org>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
"linux-fsdevel @ vger . kernel . org"
<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v4 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands
Date: Tue, 7 Apr 2020 10:05:01 -0700 [thread overview]
Message-ID: <20200407170501.GF13893@infradead.org> (raw)
In-Reply-To: <20200403101250.33245-8-johannes.thumshirn@wdc.com>
On Fri, Apr 03, 2020 at 07:12:47PM +0900, Johannes Thumshirn wrote:
> Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) with a
> start LBA set to the target zone write pointer position.
>
> In order to always know the write pointer position of a sequential write
> zone, the queue flag QUEUE_FLAG_ZONE_WP_OFST is set to get an
> initialized write pointer offset array attached to the device request
> queue. The values of the cache are maintained in sync with the device
> as follows:
> 1) the write pointer offset of a zone is reset to 0 when a
> REQ_OP_ZONE_RESET command completes.
> 2) the write pointer offset of a zone is set to the zone size when a
> REQ_OP_ZONE_FINISH command completes.
> 3) the write pointer offset of a zone is incremented by the number of
> 512B sectors written when a write or a zone append command completes
> 4) the write pointer offset of all zones is reset to 0 when a
> REQ_OP_ZONE_RESET_ALL command completes.
>
> Since the block layer does not write lock zones for zone append
> commands, to ensure a sequential ordering of the write commands used for
> the emulation, the target zone of a zone append command is locked when
> the function sd_zbc_prepare_zone_append() is called from
> sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained
> (e.g. a zone append is in-flight or a regular write has already locked
> the zone), the zone append command dispatching is delayed by returning
> BLK_STS_ZONE_RESOURCE.
>
> Since zone reset and finish operations can be issued concurrently with
> writes and zone append requests, ensure a coherent update of the zone
> write pointer offsets by also write locking the target zones for these
> zone management requests.
>
> Finally, to avoid the need for write locking all zones for
> REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and
> modifications of the zone write pointer offsets. This spinlock is
> initialized from sd_probe() using the new function sd_zbc_init().
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> drivers/scsi/sd.c | 26 ++-
> drivers/scsi/sd.h | 38 ++++-
> drivers/scsi/sd_zbc.c | 367 +++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 413 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 2710a0e5ae6d..569b22ab394e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1206,6 +1206,14 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
> }
> }
>
> + if (req_op(rq) == REQ_OP_ZONE_APPEND) {
> + ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks);
> + if (ret) {
> + scsi_mq_uninit_cmd(cmd);
> + return ret;
> + }
> + }
> +
> fua = rq->cmd_flags & REQ_FUA ? 0x8 : 0;
> dix = scsi_prot_sg_count(cmd);
> dif = scsi_host_dif_capable(cmd->device->host, sdkp->protection_type);
> @@ -1287,6 +1295,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd)
> return sd_setup_flush_cmnd(cmd);
> case REQ_OP_READ:
> case REQ_OP_WRITE:
> + case REQ_OP_ZONE_APPEND:
> return sd_setup_read_write_cmnd(cmd);
> case REQ_OP_ZONE_RESET:
> return sd_zbc_setup_zone_mgmt_cmnd(cmd, ZO_RESET_WRITE_POINTER,
> @@ -2055,7 +2064,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
>
> out:
> if (sd_is_zoned(sdkp))
> - sd_zbc_complete(SCpnt, good_bytes, &sshdr);
> + good_bytes = sd_zbc_complete(SCpnt, good_bytes, &sshdr);
>
> SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
> "sd_done: completed %d of %d bytes\n",
> @@ -3371,6 +3380,10 @@ static int sd_probe(struct device *dev)
> sdkp->first_scan = 1;
> sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;
>
> + error = sd_zbc_init_disk(sdkp);
> + if (error)
> + goto out_free_index;
> +
> sd_revalidate_disk(gd);
>
> gd->flags = GENHD_FL_EXT_DEVT;
> @@ -3408,6 +3421,7 @@ static int sd_probe(struct device *dev)
> out_put:
> put_disk(gd);
> out_free:
> + sd_zbc_release_disk(sdkp);
> kfree(sdkp);
> out:
> scsi_autopm_put_device(sdp);
> @@ -3484,6 +3498,8 @@ static void scsi_disk_release(struct device *dev)
> put_disk(disk);
> put_device(&sdkp->device->sdev_gendev);
>
> + sd_zbc_release_disk(sdkp);
> +
> kfree(sdkp);
> }
>
> @@ -3664,19 +3680,19 @@ static int __init init_sd(void)
> if (!sd_page_pool) {
> printk(KERN_ERR "sd: can't init discard page pool\n");
> err = -ENOMEM;
> - goto err_out_ppool;
> + goto err_out_cdb_pool;
> }
>
> err = scsi_register_driver(&sd_template.gendrv);
> if (err)
> - goto err_out_driver;
> + goto err_out_ppool;
>
> return 0;
>
> -err_out_driver:
> +err_out_ppool:
> mempool_destroy(sd_page_pool);
>
> -err_out_ppool:
> +err_out_cdb_pool:
> mempool_destroy(sd_cdb_pool);
>
> err_out_cache:
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 50fff0bf8c8e..74448c250fea 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -79,6 +79,10 @@ struct scsi_disk {
> u32 zones_optimal_open;
> u32 zones_optimal_nonseq;
> u32 zones_max_open;
> + u32 *zones_wp_ofst;
> + spinlock_t zones_wp_ofst_lock;
> + struct work_struct zone_wp_ofst_work;
> + char *zone_wp_update_buf;
> #endif
> atomic_t openers;
> sector_t capacity; /* size in logical blocks */
> @@ -207,17 +211,32 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
>
> #ifdef CONFIG_BLK_DEV_ZONED
>
> +int sd_zbc_init_disk(struct scsi_disk *sdkp);
> +void sd_zbc_release_disk(struct scsi_disk *sdkp);
> extern int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer);
> extern void sd_zbc_print_zones(struct scsi_disk *sdkp);
> blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd,
> unsigned char op, bool all);
> -extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
> - struct scsi_sense_hdr *sshdr);
> +unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
> + struct scsi_sense_hdr *sshdr);
> int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
> unsigned int nr_zones, report_zones_cb cb, void *data);
>
> +blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba,
> + unsigned int nr_blocks);
> +
> #else /* CONFIG_BLK_DEV_ZONED */
>
> +static inline int sd_zbc_init(void)
> +{
> + return 0;
> +}
> +
> +static inline void sd_zbc_exit(void) {}
> +
> +static inline void sd_zbc_init_disk(struct scsi_disk *sdkp) {}
> +static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {}
> +
> static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
> unsigned char *buf)
> {
> @@ -233,9 +252,18 @@ static inline blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd,
> return BLK_STS_TARGET;
> }
>
> -static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
> - unsigned int good_bytes,
> - struct scsi_sense_hdr *sshdr) {}
> +static inline unsigned int sd_zbc_complete(struct scsi_cmnd *cmd,
> + unsigned int good_bytes, struct scsi_sense_hdr *sshdr)
> +{
> + return 0;
> +}
> +
> +static inline blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd,
> + sector_t *lba,
> + unsigned int nr_blocks)
> +{
> + return BLK_STS_TARGET;
> +}
>
> #define sd_zbc_report_zones NULL
>
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index ee156fbf3780..49c78040fa84 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -42,6 +42,30 @@ static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
> return cb(&zone, idx, data);
> }
>
> +static unsigned int sd_zbc_get_zone_wp_ofst(struct blk_zone *zone)
> +{
> + if (zone->type == ZBC_ZONE_TYPE_CONV)
> + return 0;
> +
> + switch (zone->cond) {
> + case BLK_ZONE_COND_IMP_OPEN:
> + case BLK_ZONE_COND_EXP_OPEN:
> + case BLK_ZONE_COND_CLOSED:
> + return zone->wp - zone->start;
> + case BLK_ZONE_COND_FULL:
> + return zone->len;
> + case BLK_ZONE_COND_EMPTY:
> + case BLK_ZONE_COND_OFFLINE:
> + case BLK_ZONE_COND_READONLY:
> + default:
> + /*
> + * Offline and read-only zones do not have a valid
> + * write pointer. Use 0 as for an empty zone.
> + */
> + return 0;
> + }
> +}
> +
> /**
> * sd_zbc_do_report_zones - Issue a REPORT ZONES scsi command.
> * @sdkp: The target disk
> @@ -229,6 +253,134 @@ static blk_status_t sd_zbc_cmnd_checks(struct scsi_cmnd *cmd)
> return BLK_STS_OK;
> }
>
> +#define SD_ZBC_INVALID_WP_OFST ~(0u)
Normally you want the braces on the outside:
(~OU)
to avoid issues with complex expressions.
> + /* Only zone reset and zone finish need zone write locking */
> + if (op != ZO_RESET_WRITE_POINTER && op != ZO_FINISH_ZONE)
> + return BLK_STS_OK;
> +
> + if (all) {
> + /* We do not write lock all zones for an all zone reset */
> + if (op == ZO_RESET_WRITE_POINTER)
> + return BLK_STS_OK;
> +
> + /* Finishing all zones is not supported */
> + return BLK_STS_IOERR;
I still find different locking rules for the all vs individual zone
operations weird.
next prev parent reply other threads:[~2020-04-07 17:05 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 10:12 [PATCH v4 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 02/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-04-07 16:48 ` Christoph Hellwig
2020-04-08 11:31 ` Johannes Thumshirn
2020-04-08 15:51 ` Christoph Hellwig
2020-04-08 15:53 ` Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 03/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-04-07 16:53 ` Christoph Hellwig
2020-04-07 16:54 ` Christoph Hellwig
2020-04-08 8:29 ` [PATCH v4 04/10] block: Modify revalidate zones Damien Le Moal
2020-04-08 15:58 ` hch
2020-04-03 10:12 ` Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 05/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 06/10] scsi: export scsi_mq_uninit_cmnd Johannes Thumshirn
2020-04-07 17:00 ` Christoph Hellwig
2020-04-08 11:32 ` Johannes Thumshirn
2020-04-08 15:59 ` hch
2020-04-03 10:12 ` [PATCH v4 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-04-07 17:05 ` Christoph Hellwig [this message]
2020-04-08 8:14 ` Damien Le Moal
2020-04-08 15:58 ` hch
2020-04-08 16:13 ` Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-04-07 17:05 ` Christoph Hellwig
2020-04-03 10:12 ` [PATCH v4 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-04-03 10:12 ` [PATCH v4 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-04-07 16:49 ` [PATCH v4 00/10] Introduce Zone Append for writing to zoned block devices Christoph Hellwig
2020-04-08 8:28 ` Johannes Thumshirn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200407170501.GF13893@infradead.org \
--to=hch@infradead.org \
--cc=Damien.LeMoal@wdc.com \
--cc=axboe@kernel.dk \
--cc=johannes.thumshirn@wdc.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).