linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	hch@lst.de, darrick.wong@oracle.com, axboe@kernel.dk,
	tytso@mit.edu, adilger.kernel@dilger.ca, ming.lei@redhat.com,
	jthumshirn@suse.de, minwoo.im.dev@gmail.com,
	damien.lemoal@wdc.com, andrea.parri@amarulasolutions.com,
	hare@suse.com, tj@kernel.org, hannes@cmpxchg.org,
	khlebnikov@yandex-team.ru, ajay.joshi@wdc.com,
	bvanassche@acm.org, arnd@arndb.de, houtao1@huawei.com,
	asml.silence@gmail.com, linux-block@vger.kernel.org,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE
Date: Fri, 3 Apr 2020 13:57:57 +1100	[thread overview]
Message-ID: <20200403025757.GL10737@dread.disaster.area> (raw)
In-Reply-To: <yq1imih4aj0.fsf@oracle.com>

On Thu, Apr 02, 2020 at 09:34:43PM -0400, Martin K. Petersen wrote:
> 
> Hi Dave!
> 
> > Ok, so ext4 has a very limited max allocation size for an extent, so
> > I expect this won't cause huge latency problems. However, what
> > happens when we use XFS, have a 64kB block size, and fallocate() is
> > allocating disk space in continguous 100GB extents and passing those
> > down to the block device?
> 
> Depends on the device.

Great. :(

> > How does this get split by dm devices? Are raid stripes going to dice
> > this into separate stripe unit sized bios, so instead of single large
> > requests we end up with hundreds or thousands or tiny allocation
> > requests being issued?
> 
> There is nothing special about this operation. It needs to be handled
> the same way as all other splits. I.e. ideally coalesced at the bottom
> of the stack so we can issue larger, contiguous commands to the
> hardware.
> 
> > How are we expecting hardware to behave here? Is this a queued
> > command in the scsi/nvme/sata protocols? Or is this, for the moment,
> > just a special snowflake that we can't actually use in production
> > because the hardware just can't handle what we throw at it?
> 
> For now it's SCSI and queued. Only found in high-end thinly provisioned
> storage arrays and not in your average SSD.

So it's a special snowflake :)

> The performance expectation for REQ_OP_ALLOCATE is that it is faster
> than a write to the same block range since the device potentially needs
> to do less work. I.e. the device simply needs to decrement the free
> space and mark the LBAs reserved in a map. It doesn't need to write all
> the blocks to zero them. If you want zeroed blocks, use
> REQ_OP_WRITE_ZEROES.

I suspect that the implications of wiring filesystems directly up to
this hasn't been thought through entirely....

> > IOWs, what sort of latency issues is this operation going to cause
> > on real hardware? Is this going to be like discard? i.e. where we
> > end up not using it at all because so few devices actually handle
> > the massive stream of operations the filesystem will end up sending
> > the device(s) in the course of normal operations?
> 
> The intended use case, from a SCSI perspective, is that on a thinly
> provisioned device you can use this operation to preallocate blocks so
> that future writes to the LBAs in question will not fail due to the
> device being out of space. I.e. you would use this to pin down block
> ranges where you can not tolerate write failures. The advantage over
> writing the blocks individually is that dedup won't apply and that the
> device doesn't actually have to go write all the individual blocks.

.... because when backed by thinp storage, plumbing user level
fallocate() straight through from the filesystem introduces a
trivial, user level storage DOS vector....

i.e. a user can just fallocate a bunch of files and, because the
filesystem can do that instantly, can also run the back end array
out of space almost instantly. Storage admins are going to love
this!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2020-04-03  2:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-29 17:47 [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 1/4] block: create payloadless issue bio helper Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 2/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 3/4] loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0) Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 4/4] ext4: Notify block device about alloc-assigned blk Chaitanya Kulkarni
2020-04-01  6:22 ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Konstantin Khlebnikov
2020-04-02  2:29   ` Martin K. Petersen
2020-04-02  9:49     ` Konstantin Khlebnikov
2020-04-02 22:41 ` Dave Chinner
2020-04-03  1:34   ` Martin K. Petersen
2020-04-03  2:57     ` Dave Chinner [this message]
     [not found]       ` <(Dave>
     [not found] ` <(Chaitanya>
     [not found]   ` <Kulkarni's>
     [not found]     ` <message>
     [not found]       ` <of>
     [not found]         ` <"Fri>
     [not found]           ` <3>
     [not found]         ` <"Mon>
     [not found]           ` <13>
     [not found]         ` <"Tue>
     [not found]           ` <12>
     [not found]         ` <"Sun>
     [not found]           ` <29>
     [not found]             ` <Mar>
     [not found]               ` <2020>
     [not found]                 ` <10:47:10>
     [not found]                   ` <-0700")>
2020-04-01  2:29                     ` Martin K. Petersen
2020-04-01  4:53                       ` Chaitanya Kulkarni
2020-05-12 16:01                     ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Martin K. Petersen
2020-05-12 16:04                       ` Christoph Hellwig
2020-05-12 16:12                         ` Martin K. Petersen
2020-05-12 16:18                           ` Johannes Thumshirn
2020-05-12 16:24                             ` Martin K. Petersen
     [not found]                 ` <13:57:57>
     [not found]                   ` <+1100")>
2020-04-03  3:45                     ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Martin K. Petersen
2020-04-07  2:27                       ` Dave Chinner
2020-04-08  4:10                         ` Martin K. Petersen
2020-04-19 22:36                           ` Dave Chinner
2020-04-23  0:40                             ` Martin K. Petersen
     [not found]                 ` <20:35:11>
     [not found]                   ` <+0800")>
2020-07-13 16:47                     ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Martin K. Petersen
2020-07-13 17:50                       ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2020-05-12  8:55 [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 02/10] block: rename __bio_add_pc_page to bio_add_hw_page Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 03/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 04/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 05/10] block: Modify revalidate zones Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 06/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-05-12 13:17 ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Christoph Hellwig
     [not found]   ` <(Christoph>
2020-05-13  2:37 ` Jens Axboe
2020-07-13 12:35 [PATCH 0/2] two generic block layer fixes for 5.9 Coly Li
2020-07-13 12:35 ` [PATCH 1/2] block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers Coly Li
2020-07-13 23:12   ` Damien Le Moal
2020-07-13 12:35 ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Coly Li
     [not found]   ` <(Coly>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200403025757.GL10737@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=ajay.joshi@wdc.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=jthumshirn@suse.de \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=minwoo.im.dev@gmail.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).