* Re: [PATCH 1/1] dm-zoned: Zoned block device target
[not found] <20170209041849.30964-1-damien.lemoal@wdc.com>
@ 2017-02-09 9:36 ` Christoph Hellwig
2017-02-10 0:51 ` Damien Le Moal
2017-02-09 15:27 ` Mike Snitzer
2017-02-09 16:07 ` Hannes Reinecke
2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-02-09 9:36 UTC (permalink / raw)
To: Damien Le Moal
Cc: dm-devel, Alasdair Kergon, Mike Snitzer, Hannes Reinecke,
Christoph Hellwig, linux-block
On Thu, Feb 09, 2017 at 01:18:49PM +0900, Damien Le Moal wrote:
> +
> +/*
> + * Target BIO completion.
> + */
> +static inline void dmz_bio_end(struct bio *bio, int err)
> +{
> + struct dm_zone_bioctx *bioctx =
> + dm_per_bio_data(bio, sizeof(struct dm_zone_bioctx));
> +
> + if (atomic_dec_and_test(&bioctx->ref)) {
It seems like this could be simplified a bit by using bio_chain
to chain the clones to the original bio.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] dm-zoned: Zoned block device target
[not found] <20170209041849.30964-1-damien.lemoal@wdc.com>
2017-02-09 9:36 ` [PATCH 1/1] dm-zoned: Zoned block device target Christoph Hellwig
@ 2017-02-09 15:27 ` Mike Snitzer
2017-02-10 1:09 ` Damien Le Moal
2017-02-09 16:07 ` Hannes Reinecke
2 siblings, 1 reply; 6+ messages in thread
From: Mike Snitzer @ 2017-02-09 15:27 UTC (permalink / raw)
To: Damien Le Moal
Cc: dm-devel, Alasdair Kergon, Hannes Reinecke, Christoph Hellwig,
linux-block
On Wed, Feb 08 2017 at 11:18pm -0500,
Damien Le Moal <damien.lemoal@wdc.com> wrote:
> The dm-zoned device mapper target provides transparent write access
> to zoned block devices (ZBC and ZAC compliant block devices).
> dm-zoned hides to the device user (a file system or an application
> doing raw block device accesses) any constraint imposed on write
> requests by the device. Write requests are processed using a
> combination of on-disk buffering using the device conventional zones,
> allowing any random write access to be safely executed, or direct in-place
> processing for requests aligned on a zone sequential write pointer position.
> A background reclaim process ensures that some conventional zones are always
> available for executing unaligned write requests. The reclaim process
> overhead is minimized by managing buffer zones in a least-recently-written
> order and first targeting the oldest buffer zones. Doing so, blocks under
> regular write access (such as metadata blocks of an FS) remain stored in
> conventional zones, resulting in no apparent write overhead.
>
> dm-zoned implementation focus on simplicity and on minimizing overhead
> (CPU, memory and storage overhead). For a 10TB host-managed disk with
> 256 MB zones, dm-zoned memory usage per disk instance is at most 4.5 MB
> and as little as 5 zones will be used internally for storing metadata
> and performing buffer zone reclaim operations. This is achieved using
> zone level indirection rather than a full block indirection system for
> managing block movement between zones.
>
> dm-zoned primary target is host-managed zoned block devices but it can also
> be used with host-aware device models to mitigate potential device-side
> performance degradation due to excessive random writing.
>
> dm-zoned backend devices can be formatted and checked using the dmzadm
> utility available at:
>
> https://github.com/hgst/dm-zoned-tools
>
> This patch applies on top of 4.10-rc7 tree.
>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Looks like this work has come a long way. While I am _still_ not
hearing from any customers (or partners) that SMR is a priority I do
acknowledge that that obviously isn't the case for WDC and others.
So, I'll sign up for reviewing this DM target, with a focus on DM
interface and implementation details, _after_ I get some technical
review (e.g. Reviewed-by) from people like Hannes and others more
invested in the SMR technologies. Basically I don't know enough about
the constraints of these devices to fully appreciate and catch some
issue in the particulars of the algorithms used to mitigate SMR's
quirks.
I'd also like to understand how invested WDC is in maintaining this DM
target for years and years if/when I were to merge this DM target into
mainline.
Thanks,
Mike
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] dm-zoned: Zoned block device target
[not found] <20170209041849.30964-1-damien.lemoal@wdc.com>
2017-02-09 9:36 ` [PATCH 1/1] dm-zoned: Zoned block device target Christoph Hellwig
2017-02-09 15:27 ` Mike Snitzer
@ 2017-02-09 16:07 ` Hannes Reinecke
2017-02-10 1:24 ` Damien Le Moal
2 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2017-02-09 16:07 UTC (permalink / raw)
To: Damien Le Moal, dm-devel, Alasdair Kergon, Mike Snitzer
Cc: Christoph Hellwig, linux-block
On 02/09/2017 05:18 AM, Damien Le Moal wrote:
> The dm-zoned device mapper target provides transparent write access
> to zoned block devices (ZBC and ZAC compliant block devices).
> dm-zoned hides to the device user (a file system or an application
> doing raw block device accesses) any constraint imposed on write
> requests by the device. Write requests are processed using a
> combination of on-disk buffering using the device conventional zones,
> allowing any random write access to be safely executed, or direct in-place
> processing for requests aligned on a zone sequential write pointer position.
> A background reclaim process ensures that some conventional zones are always
> available for executing unaligned write requests. The reclaim process
> overhead is minimized by managing buffer zones in a least-recently-written
> order and first targeting the oldest buffer zones. Doing so, blocks under
> regular write access (such as metadata blocks of an FS) remain stored in
> conventional zones, resulting in no apparent write overhead.
>
[ .. ]
> +/*
> + * CRC32
> + */
> +static u32 dmz_sb_crc32(u32 crc, const void *buf, size_t length)
> +{
> + unsigned char *p = (unsigned char *)buf;
> + int i;
> +
> +#define CRCPOLY_LE 0xedb88320
> +
> + while (length--) {
> + crc ^= *p++;
> + for (i = 0; i < 8; i++)
> + crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
> + }
> +
> + return crc;
> +}
> +
Why do you insist on your own CRC32 implementations?
Don't we have enough already?
[ .. ]
> +/*
> + * Write blocks.
> + */
> +static int dmz_reclaim_write(struct dm_zoned_target *dzt,
> + struct dm_zone *zone,
> + struct dm_zoned_ioreg **ioregs,
> + unsigned int nr_ioregs)
> +{
> + struct dm_zoned_ioreg *ioreg;
> + sector_t chunk_block;
> + int i, ret = 0;
> +
> + for (i = 0; i < nr_ioregs; i++) {
> +
> + ioreg = ioregs[i];
> +
> + /* Wait for the read I/O to complete */
> + wait_for_completion_io(&ioreg->wait);
> +
> + if (ret || ioreg->err) {
> + if (ret == 0)
> + ret = ioreg->err;
> + dmz_reclaim_free_ioreg(ioreg);
> + ioregs[i] = NULL;
> + continue;
> + }
> +
> + chunk_block = ioreg->chunk_block;
> +
> + dmz_dev_debug(dzt,
> + "Reclaim: Write %s zone %u, block %llu+%u\n",
> + dmz_is_rnd(zone) ? "RND" : "SEQ",
> + dmz_id(dzt, zone),
> + (unsigned long long)chunk_block,
> + ioreg->nr_blocks);
> +
> + /*
> + * If we are writing in a sequential zones,
> + * we must make sure that writes are sequential. So
> + * fill up any eventual hole between writes.
> + */
> + if (dmz_is_seq(zone)) {
> + ret = dmz_reclaim_align_wp(dzt, zone, chunk_block);
> + if (ret)
> + break;
> + }
> +
> + /* Do write */
> + dmz_reclaim_submit_ioreg(dzt, zone, ioreg, REQ_OP_WRITE);
> + wait_for_completion_io(&ioreg->wait);
> +
> + ret = ioreg->err;
> + if (ret) {
> + dmz_dev_err(dzt, "Reclaim: Write failed\n");
> + } else {
> + if (dmz_is_seq(zone))
> + zone->wp_block += ioreg->nr_blocks;
> + ret = dmz_validate_blocks(dzt, zone, chunk_block,
> + ioreg->nr_blocks);
> + }
> +
> + ioregs[i] = NULL;
> + dmz_reclaim_free_ioreg(ioreg);
> +
> + }
> +
> + return ret;
> +}
> +
> +/*
> + * Move valid blocks of src_zone into dst_zone.
> + */
> +static int dmz_reclaim_copy_zone(struct dm_zoned_target *dzt,
> + struct dm_zone *src_zone,
> + struct dm_zone *dst_zone)
> +{
> + struct dm_zoned_ioreg *ioregs[DMZ_RECLAIM_MAX_IOREGS];
> + struct dm_zoned_ioreg *ioreg;
> + sector_t chunk_block = 0;
> + sector_t first_block, end_block;
> + int nr_ioregs = 0, i, ret;
> +
> + if (dmz_is_seq(src_zone))
> + end_block = src_zone->wp_block;
> + else
> + end_block = dzt->zone_nr_blocks;
> +
> + while (chunk_block < end_block) {
> +
> + /* Read valid regions from source zone */
> + nr_ioregs = 0;
> + first_block = chunk_block;
> + while (nr_ioregs < DMZ_RECLAIM_MAX_IOREGS &&
> + chunk_block < end_block) {
> +
> + ioreg = dmz_reclaim_read(dzt, src_zone, chunk_block);
> + if (IS_ERR(ioreg)) {
> + ret = PTR_ERR(ioreg);
> + goto err;
> + }
> + if (!ioreg)
> + break;
> +
> + chunk_block = ioreg->chunk_block + ioreg->nr_blocks;
> + ioregs[nr_ioregs] = ioreg;
> + nr_ioregs++;
> +
> + }
> +
> + /* Are we done ? */
> + if (!nr_ioregs)
> + break;
> +
> + /* Write in destination zone */
> + ret = dmz_reclaim_write(dzt, dst_zone, ioregs, nr_ioregs);
> + if (ret != 0) {
> + dmz_invalidate_blocks(dzt, dst_zone,
> + first_block,
> + end_block - first_block);
> + goto err;
> + }
> +
> + }
> +
> + return 0;
> +
> +err:
> + for (i = 0; i < nr_ioregs; i++) {
> + ioreg = ioregs[i];
> + if (ioreg) {
> + wait_for_completion_io(&ioreg->wait);
> + dmz_reclaim_free_ioreg(ioreg);
> + }
> + }
> +
> + return ret;
> +}
> +
Have you looked at using dm-kcopyd for doing the reclaim?
What's inhibiting you from using it?
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: F. Imend�rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N�rnberg)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] dm-zoned: Zoned block device target
2017-02-09 9:36 ` [PATCH 1/1] dm-zoned: Zoned block device target Christoph Hellwig
@ 2017-02-10 0:51 ` Damien Le Moal
0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2017-02-10 0:51 UTC (permalink / raw)
To: Christoph Hellwig
Cc: dm-devel, Alasdair Kergon, Mike Snitzer, Hannes Reinecke,
linux-block
Christoph,
On 2/9/17 18:36, Christoph Hellwig wrote:
> On Thu, Feb 09, 2017 at 01:18:49PM +0900, Damien Le Moal wrote:
>> +
>> +/*
>> + * Target BIO completion.
>> + */
>> +static inline void dmz_bio_end(struct bio *bio, int err)
>> +{
>> + struct dm_zone_bioctx *bioctx =
>> + dm_per_bio_data(bio, sizeof(struct dm_zone_bioctx));
>> +
>> + if (atomic_dec_and_test(&bioctx->ref)) {
>
> It seems like this could be simplified a bit by using bio_chain
> to chain the clones to the original bio.
Thank you for the hint. I will look into this.
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
Damien.LeMoal@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] dm-zoned: Zoned block device target
2017-02-09 15:27 ` Mike Snitzer
@ 2017-02-10 1:09 ` Damien Le Moal
0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2017-02-10 1:09 UTC (permalink / raw)
To: Mike Snitzer
Cc: dm-devel, Alasdair Kergon, Hannes Reinecke, Christoph Hellwig,
linux-block, Bart Van Assche
Mike,
On 2/10/17 00:27, Mike Snitzer wrote:
> Looks like this work has come a long way. While I am _still_ not
> hearing from any customers (or partners) that SMR is a priority I do
> acknowledge that that obviously isn't the case for WDC and others.
>
> So, I'll sign up for reviewing this DM target, with a focus on DM
> interface and implementation details, _after_ I get some technical
> review (e.g. Reviewed-by) from people like Hannes and others more
> invested in the SMR technologies. Basically I don't know enough about
> the constraints of these devices to fully appreciate and catch some
> issue in the particulars of the algorithms used to mitigate SMR's
> quirks.
Fair enough. I understand.
I would also like to let you know that I worked closely with Hannes and
Martin and also Shaun Tauncheff (Seagate) to put together the final
patches of the ZBC support for 4.10. I also contributed to part of the
f2fs support for these devices, which also was included in 4.10.
(And I also maintain the libzbc project in github)
> I'd also like to understand how invested WDC is in maintaining this DM
> target for years and years if/when I were to merge this DM target into
> mainline.
We are fully invested. I am part of the system software group in WD
Research and one of our mission is to support Linux kernel and open
source software in general. Bart (Van Assche), who I believe you know,
is also a member of our group now that Sandisk and WD merger completed.
Hannes and Christoph already sent some comments regarding the code. I
will make appropriate changes and re-post the patch for review.
Thank you.
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
Damien.LeMoal@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] dm-zoned: Zoned block device target
2017-02-09 16:07 ` Hannes Reinecke
@ 2017-02-10 1:24 ` Damien Le Moal
0 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2017-02-10 1:24 UTC (permalink / raw)
To: Hannes Reinecke, dm-devel, Alasdair Kergon, Mike Snitzer
Cc: Christoph Hellwig, linux-block
Hannes,
On 2/10/17 01:07, Hannes Reinecke wrote:
>> +/*
>> + * CRC32
>> + */
>> +static u32 dmz_sb_crc32(u32 crc, const void *buf, size_t length)
>> +{
>> + unsigned char *p = (unsigned char *)buf;
>> + int i;
>> +
>> +#define CRCPOLY_LE 0xedb88320
>> +
>> + while (length--) {
>> + crc ^= *p++;
>> + for (i = 0; i < 8; i++)
>> + crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
>> + }
>> +
>> + return crc;
>> +}
>> +
> Why do you insist on your own CRC32 implementations?
> Don't we have enough already?
Indeed. I will change this to use existing kernel API.
> Have you looked at using dm-kcopyd for doing the reclaim?
> What's inhibiting you from using it?
I did look at it, and I did like it very much. It would be much more
simple to use than open coding everything like I did. But the problem is
that while the read part is fine, the write part is not. The reason is
that most of the time, the zone being reclaimed is badly fragmented
(valid blocks all over the place in the zone). Reading those valid
blocks with kcopyd is fine, but when moving them to the target zone, the
overall write sequence MUST be sequential, which means that
blkdev_issue_zeroout calls must be inserted between non sequential
regions (the indirection system is per zone, not per block, so the
relative offsets of the blocks read in the zone being reclaimed must be
preserved in the destination zone).
I could modify kcopyd_copy to do this zeroout insertion of course. In
fact, I think that would be the better approach. But I think that can be
done later, once/if Mike accepts dm-zoned. Or I could re-submit dm-zoned
together with a patch for kcopyd_copy.
Mike,
Any advice on which approach is best/more acceptable for you ?
Thanks !
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
Damien.LeMoal@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-02-10 1:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170209041849.30964-1-damien.lemoal@wdc.com>
2017-02-09 9:36 ` [PATCH 1/1] dm-zoned: Zoned block device target Christoph Hellwig
2017-02-10 0:51 ` Damien Le Moal
2017-02-09 15:27 ` Mike Snitzer
2017-02-10 1:09 ` Damien Le Moal
2017-02-09 16:07 ` Hannes Reinecke
2017-02-10 1:24 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).