From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <57287339.7070802@redhat.com> References: <929635034.3140318.1461840230292.JavaMail.yahoo.ref@mail.yahoo.com> <929635034.3140318.1461840230292.JavaMail.yahoo@mail.yahoo.com> <445afc4b9ae3fbf477f8f66db9d28580@dds.nl> <57234421.5040902@redhat.com> <57287339.7070802@redhat.com> Date: Tue, 3 May 2016 06:41:37 -0400 Message-ID: From: Mark Mielke Content-Type: multipart/alternative; boundary=001a113ac342ea88fd0531edc09a Subject: Re: [linux-lvm] thin handling of available space Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: Zdenek Kabelac Cc: LVM general discussion and development --001a113ac342ea88fd0531edc09a Content-Type: text/plain; charset=UTF-8 On Tue, May 3, 2016 at 5:45 AM, Zdenek Kabelac wrote: > On 2.5.2016 16:32, Mark Mielke wrote: >> >> If you seek for a filesystem with over-provisioning - look at btrfs, >> zfs >> and other variants... >> >> I have to say that I am disappointed with this view, particularly if this >> is a >> view held by Red Hat. To me this represents a misunderstanding of the >> purpose >> > > So first - this is AMAZING deduction you've just shown. > > You've cut sentence out of the middle of a thread and used as kind of > evidence > that Red Hat is suggesting usage of ZFS, Btrfs - sorry man - read this > thread again... > My intent wasn't to cut a sentence in the middle. I responded to the each sentence in its place. I think it really comes down to this: This seems to be a crux of this debate between you and the other people. You >> think the block storage should be as transparent as possible, as if the >> storage was not thin. Others, including me, think that this theory is >> impractical, as it leads to edge cases where the file system could choose >> to >> > > It's purely practical and it's the 'crucial' difference between > > i.e. thin+XFS/ext4 and BTRFS. > I think I captured the crux of this pretty well. If anybody suggests that there could be value to exposing any information related to the nature of the "thinly provisioned block devices", you suggest that the only route forwards here is BTRFS and ZFS. You are saying directly and indirectly, that anybody who disagrees with you should switch to what you feel are the only solutions that are in this space, and that LVM should never be in this space. I think I understand your perspective. However, I don't agree with it. I don't agree that the best solution is one that fails at the last instant with ENOSPC and/or for the file system to become read-only. I think there is a whole lot of grey possibilities between the polar extremes of "BTRFS/ZFS" vs "thin+XFS/ext4 with last instant failure". What started me on this list was the CYA mandatory warning about over provisioning that I think is inappropriate, and causing us tooling problems. But seeing the debate unfold, and having seen some related failures in the Docker LVM thin pool case where the system may completely lock up, I have a conclusion that this type of failure represents a fundamental difference in opinion around what thin volumes are for, and what place they have. As I see them as highly valuable for various reasons including Docker image layers (something Red Hat appears to agree with, having targeted LVM thinp instead of the union file systems), and the snapshot use cases I presented prior, I think there must be a way to avoid the worst scenarios, if the right people consider all the options, and don't write off options prematurely due to preconceived notions about what is and what is not appropriate in terms of communication of information between system layers. There are many types of information that *are* passed from the block device layer to the file system layer. I don't see why awareness of thin volumes, should not be one of them. For example, and I'm not pretending this is the best idea that should be implemented, but just to see where the discussion might lead: The Linux kernel needs to deal with problems such as memory being swapped out due to memory pressures. In various cases, it is dangerous to swap memory out. The memory can be protected from being swapped out where required using various technique such as pinning pages. This takes up extra RAM, but ensures that the memory can be safely accessed and written as required. If the file system has particular areas of importance that need to be writable to prevent file system failure, perhaps the file system should have a way of communicating this to the volume layer. The naive approach here might be to preallocate these critical blocks before proceeding with any updates to these blocks, such that the failure situations can all be "safe" situations, where ENOSPC can be returned without a danger of the file system locking up or going read-only. Or, maybe I am out of my depth, and this is crazy talk... :-) (Personally, I'm not really needing a "df" to approximate available storage... I just don't want the system to fail badly in the "out of disk space" scenario... I can't speak for others, though... I do *not* want BTRFS/ZFS... I just want a sanely behaving LVM + XFS...) -- Mark Mielke --001a113ac342ea88fd0531edc09a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On T= ue, May 3, 2016 at 5:45 AM, Zdenek Kabelac <zkabelac@redhat.com><= /span> wrote:
On 2.5.201= 6 16:32, Mark Mielke wrote:
=C2=A0 =C2=A0 If you seek for a filesystem with over-provisioning - look at= btrfs, zfs
=C2=A0 =C2=A0 and other variants...

I have to say that I am disappointed with this view, particularly if this i= s a
view held by Red Hat. To me this represents a misunderstanding of the purpo= se

So first - this is=C2=A0 AMAZING deduction you've just shown.

You've cut sentence out of the middle of a thread and used as kind of e= vidence
that Red Hat is suggesting usage of ZFS, Btrfs=C2=A0 - sorry man - read thi= s thread again...

My intent wasn't = to cut a sentence in the middle. I responded to the each sentence in its pl= ace. I think it really comes down to this:

This seems to be a crux of this debate between you and the other people. Yo= u
think the block storage should be as transparent as possible, as if the
storage was not thin. Others, including me, think that this theory is
impractical, as it leads to edge cases where the file system could choose t= o

It's purely practical and it's the 'crucial' difference bet= ween

i.e. thin+XFS/ext4=C2=A0 =C2=A0 =C2=A0and=C2=A0 =C2=A0BTRFS.


I think I captured the crux of thi= s pretty well. If anybody suggests that there could be value to exposing an= y information related to the nature of the "thinly provisioned block d= evices", you suggest that the only route forwards here is BTRFS and ZF= S. You are saying directly and indirectly, that anybody who disagrees with = you should switch to what you feel are the only solutions that are in this = space, and that LVM should never be in this space.

I think I understand your pers= pective. However, I don't agree with it. I don't agree that the bes= t solution is one that fails at the last instant with ENOSPC and/or for the= file system to become read-only. I think there is a whole lot of grey poss= ibilities between the polar extremes of "BTRFS/ZFS" vs "thin= +XFS/ext4 with last instant failure".
=
What started me on this list was the C= YA mandatory warning about over provisioning that I think is inappropriate,= and causing us tooling problems. But seeing the debate unfold, and having = seen some related failures in the Docker LVM thin pool case where the syste= m may completely lock up, I have a conclusion that this type of failure rep= resents a fundamental difference in opinion around what thin volumes are fo= r, and what place they have. As I see them as highly valuable for various r= easons including Docker image layers (something Red Hat appears to agree wi= th, having targeted LVM thinp instead of the union file systems), and the s= napshot use cases I presented prior, I think there must be a way to avoid t= he worst scenarios, if the right people consider all the options, and don&#= 39;t write off options prematurely due to preconceived notions about what i= s and what is not appropriate in terms of communication of information betw= een system layers.

There are many types of information that *are* passed from the= block device layer to the file system layer. I don't see why awareness= of thin volumes, should not be one of them.

For example, and I'm not pretend= ing this is the best idea that should be implemented, but just to see where= the discussion might lead:

The Linux kernel needs to deal with problems such as = memory being swapped out due to memory pressures. In various cases, it is d= angerous to swap memory out. The memory can be protected from being swapped= out where required using various technique such as pinning pages. This tak= es up extra RAM, but ensures that the memory can be safely accessed and wri= tten as required. If the file system has particular areas of importance tha= t need to be writable to prevent file system failure, perhaps the file syst= em should have a way of communicating this to the volume layer. The naive a= pproach here might be to preallocate these critical blocks before proceedin= g with any updates to these blocks, such that the failure situations can al= l be "safe" situations, where ENOSPC can be returned without a da= nger of the file system locking up or going read-only.

Or, maybe I am out of my d= epth, and this is crazy talk... :-)

(Personally, I'm not really needing a &qu= ot;df" to approximate available storage... I just don't want the s= ystem to fail badly in the "out of disk space" scenario... I can&= #39;t speak for others, though... I do *not* want BTRFS/ZFS... I just want = a sanely behaving LVM + XFS...)

<= div class=3D"gmail_extra">

--
Mark Mielke <mark.mielke@gmail.com>

--001a113ac342ea88fd0531edc09a--