linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model
Date: Tue, 22 Mar 2016 09:36:21 +1100	[thread overview]
Message-ID: <20160321223621.GN11812@dastard> (raw)
In-Reply-To: <20160321133346.GD25476@redhat.com>

On Mon, Mar 21, 2016 at 02:33:46PM +0100, Carlos Maiolino wrote:
> Hi.
> 
> From my point of view, I like the idea of an interface between the filesystem,
> and the thin-provisioned device, so that we can actually know if the thin
> volume is running out of space or not, but, before we actually start to discuss
> how this should be implemented, I'd like to ask if this should be implemented.

TL;DR: No-brainer, yes.

> After a few days discussing this with some block layer and dm-thin developers,
> what I most hear/read is that a thin volume should be transparent to the
> filesystem. So, the filesystem itself should not know it's running over a
> thin-provisioned volume. And such interface being discussed here, breaks this
> abstraction.

We're adding things like fallocate to block devices to control
preallocation, zeroing and freeing of ranges within the block device
from user space. If filesystems can't directly control and query
block device ranges on thinp block devices, then why should we let
userspace have this capability?

The problem we need to solve is that users want transparency between
filesystems and thinp devices. They don't want the filesytsem to
tell them they have lots of space available, and then get unexpected
ENOSPC because the thinp pool backing the fs has run out of space.
Users don't want a write over a region they have run
posix_fallocate() on to return ENOSPC because the thinp pool ran out
of space, even after the filesystem said it guaranteed space was
available.Filesystems want to know that they should run fstrim
passes internally when the underlying thinp pool is running out of
space so that it can free as much unused space as possible.

So there's lots of reasons why we need closer functional integration of
the filesytem and block layers, but doing this does not need to
break the abstraction layer between the filesystem and block device.
Indeed, we have already have mechanisms to provide block layer
functionality to the filesystems, and this patchset uses it - the
bdev ops structure.

Just because the filesystem knows that the underlying device has
it's own space management and it has to interact with it to give
users the correct results does not mean we are "breaking layering
abstractions". Filesystems has long assumed that the the LBA space
presented by the block device is a physical representation of the
underlying device.

We know this is not true, and has not been true for a long time.
Most devices really present a virtual LBA space to the higher
layers, and manipulate their underlying "physical" storage in a
manner that suits them best. SSDs do this, thinp does this, RAID
does this, dedupe/compressing/encrypting storage does this, etc.
IOWs, we've got virtual LBA abstractions right through the storage
stack, whether the higher layers realise it or not.

IOWs, we know that filesystems have been using virutal LBA address
spaces for a long time, yet we keep a block device model that
treats them as a physical, unchangable address space with known
physical characteristics (e.g. seek time is correlated with LBA
distance). We need to stop thinking of block devices as linear
devices and start treating them as they really are - a set of
devices capable of complex management operations, and we need
to start exposing those management operations for the higher layer
to be able to take advantage of.

Filesystems can take advantage of block devices that expose some of
their space management operations. We can make the interactions
users have on these storage stacks much better if we expose smarter
primitives from the block devices to the filesystems. We don't need
to break or change any abstractions - the filesystem is still very
much separate from the block device - but we need to improve the
communications and functionality channels between them.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2016-03-21 22:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-17 14:30 [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-03-17 14:30 ` [RFC PATCH 1/9] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-03-21 12:08   ` Carlos Maiolino
2016-03-21 21:53     ` Dave Chinner
2016-03-22 12:05       ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 2/9] dm: add " Brian Foster
2016-03-21 12:17   ` Carlos Maiolino
2016-03-17 14:30 ` [RFC PATCH 3/9] dm thin: " Brian Foster
2016-03-17 14:30 ` [RFC PATCH 4/9] dm thin: update reserve space func to allow reduction Brian Foster
2016-03-17 14:30 ` [RFC PATCH 5/9] block: add a block_device_operations method to provision space Brian Foster
2016-03-17 14:30 ` [RFC PATCH 6/9] dm: add " Brian Foster
2016-03-17 14:30 ` [RFC PATCH 7/9] dm thin: " Brian Foster
2016-03-17 14:30 ` [RFC PATCH 8/9] xfs: thin block device reservation mechanism Brian Foster
2016-03-17 14:30 ` [RFC PATCH 9/9] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-03-21 13:33 ` [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model Carlos Maiolino
2016-03-21 22:36   ` Dave Chinner [this message]
2016-03-22 12:06     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160321223621.GN11812@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).