linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	hch@lst.de, darrick.wong@oracle.com, axboe@kernel.dk,
	tytso@mit.edu, adilger.kernel@dilger.ca, ming.lei@redhat.com,
	jthumshirn@suse.de, minwoo.im.dev@gmail.com,
	damien.lemoal@wdc.com, andrea.parri@amarulasolutions.com,
	hare@suse.com, tj@kernel.org, hannes@cmpxchg.org,
	khlebnikov@yandex-team.ru, ajay.joshi@wdc.com,
	bvanassche@acm.org, arnd@arndb.de, houtao1@huawei.com,
	asml.silence@gmail.com, linux-block@vger.kernel.org,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE
Date: Wed, 22 Apr 2020 20:40:01 -0400	[thread overview]
Message-ID: <yq11rofghlq.fsf@oracle.com> (raw)
In-Reply-To: <20200419223646.GB9765@dread.disaster.area> (Dave Chinner's message of "Mon, 20 Apr 2020 08:36:46 +1000")


Dave,

>> Not before overwriting, no. Once you have allocated an LBA it remains
>> allocated until you discard it.

> Ok, so you are confirming what I thought: it's almost completely
> useless to us.
>
> i.e. this requires issuing IO to "reserve" space whilst preserving
> data before every metadata object goes from clean to dirty in memory.

You can only reserve the space prior to writing a block for the first
time. Once an LBA has been written ("Mapped" in the SCSI state machine),
it remains allocated until it is explicitly deallocated (via a
discard/Unmap operation).

This part of the SCSI spec was written eons ago under the assumption
that when a physical resource backing a given LBA had been established,
you could write the block over and over without having to allocate new
space.

This used to be true, but obviously the introduction of de-duplication
blew a major hole in that. I have been perusing the spec over and over
trying to understand how block provisioning state transitions are
defined when dedup is in the picture. However, much is left unexplained.

As a result, I reached out to various folks. Including the people who
worked on this feature in the standards way back. And the response that
I get from them is that allocation operation got irreparably broken when
support for de-duplication was added to the spec. Nobody attempted to
fix the state transitions since most vendors only cared about
deallocation. Consequently specifying the exact behavior of the
allocation operation in the context of dedup fell by the wayside.

The recommendation I got was that we should not rely on this feature
despite it being advertised as supported by the storage. I looked at
whether it was feasible to support it on non-dedup devices only, but it
does not look like it's worthwhile to pursue. And as a result there is
no need for block layer allocation operation to have parity with
SCSI. Although we may want to keep NVMe in mind when defining the
semantics.

-- 
Martin K. Petersen	Oracle Linux Engineering

  reply	other threads:[~2020-04-23  0:40 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-29 17:47 [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 1/4] block: create payloadless issue bio helper Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 2/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 3/4] loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0) Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 4/4] ext4: Notify block device about alloc-assigned blk Chaitanya Kulkarni
     [not found] ` <(Chaitanya>
     [not found]   ` <Kulkarni's>
     [not found]     ` <message>
     [not found]       ` <of>
     [not found]         ` <"Mon>
     [not found]           ` <13>
     [not found]         ` <"Tue>
     [not found]           ` <12>
     [not found]         ` <"Fri>
     [not found]           ` <3>
     [not found]         ` <"Sun>
     [not found]           ` <29>
     [not found]             ` <Mar>
     [not found]               ` <2020>
     [not found]                 ` <10:47:10>
     [not found]                   ` <-0700")>
2020-04-01  2:29                     ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Martin K. Petersen
2020-04-01  4:53                       ` Chaitanya Kulkarni
2020-05-12 16:01                     ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Martin K. Petersen
2020-05-12 16:04                       ` Christoph Hellwig
2020-05-12 16:12                         ` Martin K. Petersen
2020-05-12 16:18                           ` Johannes Thumshirn
2020-05-12 16:24                             ` Martin K. Petersen
     [not found]                 ` <13:57:57>
     [not found]                   ` <+1100")>
2020-04-03  3:45                     ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Martin K. Petersen
2020-04-07  2:27                       ` Dave Chinner
2020-04-08  4:10                         ` Martin K. Petersen
2020-04-19 22:36                           ` Dave Chinner
2020-04-23  0:40                             ` Martin K. Petersen [this message]
     [not found]                 ` <20:35:11>
     [not found]                   ` <+0800")>
2020-07-13 16:47                     ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Martin K. Petersen
2020-07-13 17:50                       ` Coly Li
2020-04-01  6:22 ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Konstantin Khlebnikov
2020-04-02  2:29   ` Martin K. Petersen
2020-04-02  9:49     ` Konstantin Khlebnikov
2020-04-02 22:41 ` Dave Chinner
2020-04-03  1:34   ` Martin K. Petersen
2020-04-03  2:57     ` Dave Chinner
     [not found]       ` <(Dave>
  -- strict thread matches above, loose matches on Subject: below --
2020-05-12  8:55 [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 02/10] block: rename __bio_add_pc_page to bio_add_hw_page Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 03/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 04/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 05/10] block: Modify revalidate zones Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 06/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-05-12 13:17 ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Christoph Hellwig
     [not found]   ` <(Christoph>
2020-05-13  2:37 ` Jens Axboe
2020-07-13 12:35 [PATCH 0/2] two generic block layer fixes for 5.9 Coly Li
2020-07-13 12:35 ` [PATCH 1/2] block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers Coly Li
2020-07-13 23:12   ` Damien Le Moal
2020-07-13 12:35 ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Coly Li
     [not found]   ` <(Coly>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq11rofghlq.fsf@oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=ajay.joshi@wdc.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=jthumshirn@suse.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=minwoo.im.dev@gmail.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).