linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zkabelac@redhat.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] Reserve space for specific thin logical volumes
Date: Wed, 13 Sep 2017 01:02:20 +0200	[thread overview]
Message-ID: <dc2f993c-6a90-ff8d-6d06-e629ac4931a4@redhat.com> (raw)
In-Reply-To: <d994dea1f0b3a800cc8d546bf8d9214f@assyoma.it>

Dne 13.9.2017 v 00:41 Gionatan Danti napsal(a):
> Il 13-09-2017 00:16 Zdenek Kabelac ha scritto:
>> Dne 12.9.2017 v 23:36 Gionatan Danti napsal(a):
>>> Il 12-09-2017 21:44 matthew patton ha scritto:
>>
>>> Again, please don't speak about things you don't know.
>>> I am *not* interested in thin provisioning itself at all; on the other 
>>> side, I find CoW and fast snapshots very useful.
>>
>>
>> Not going to comment KVM storage architecture - but with this statemnet -
>> you have VERY simple usage:
>>
>>
>> Just minimize chance for overprovisioning -
>>
>> let's go by example:
>>
>> you have� 10� 10GiB volumes� and you have 20 snapshots...
>>
>>
>> to not overprovision - you need 10 GiB * 30 LV� = 300GiB thin-pool.
>>
>> if that sounds too-much.
>>
>> you can go with 150 GiB - to always 100% cover all 'base' volumes.
>> and have some room for snapshots.
>>
>>
>> Now the fun begins - while monitoring is running -
>> you get callback for� 50%, 55%... 95% 100%
>> at each moment� you can do whatever action you need.
>>
>>
>> So assume 100GiB is bare minimum for base volumes - you ignore any
>> state with less then 66% occupancy of thin-pool and you start solving
>> problems with 85% (~128GiB)- you know some snapshot is better to be
>> dropped.
>> You may try 'harder' actions for higher percentage.
>> (you need to consider how many dirty pages you leave floating your system
>> and other variables)
>>
>> Also you pick with some logic the snapshot which you want to drop -
>> Maybe the oldest ?
>> (see airplane :) URL link)....
>>
>> Anyway - you have plenty of time to solve it still at this moment
>> without any danger of losing write operation...
>> All you can lose is some 'snapshot' which might have been present a
>> bit longer...� but that is supposedly fine with your model workflow...
>>
>> Of course you are getting in serious problem, if you try to keep all
>> these demo-volumes within 50GiB with massive overprovisioning ;)
>>
>> There you have much hard times what should happen what should be
>> removed and where is possibly better to STOP everything and let admin
>> decide what is the ideal next step....
>>
> 
> Hi Zdenek,
> I fully agree with what you said above, and I sincerely thank you for taking 
> the time to reply.
> However, I am not sure to understand *why* reserving space for a thin volume 
> seems a bad idea to you.
> 
> Lets have a 100 GB thin pool, and wanting to *never* run out of space in spite 
> of taking multiple snapshots.
> To achieve that, I need to a) carefully size the original volume, b) ask the 
> thin pool to reserve the needed space and c) counting the "live" data (REFER 
> in ZFS terms) allocated inside the thin volume.
> 
> Step-by-step example:
> - create a 40 GB thin volume and subtract its size from the thin pool (USED 40 
> GB, FREE 60 GB, REFER 0 GB);
> - overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
> - snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
> - completely overwrite the original volume (USED 80 GB, FREE 20 GB, REFER 40 GB);
> - a new snapshot creation will fails (REFER is higher then FREE).
> 
> Result: thin pool is *never allowed* to fill. You need to keep track of 
> per-volume USED and REFER space, but thinp performance should not be impacted 
> in any manner. This is not theoretical: it is already working in this manner 
> with ZVOLs and refreservation, *without* involing/requiring any advanced 
> coupling/integration between block and filesystem layers.
> 
> Don't get me wrong: I am sure that, if you choose to not implement this 
> scheme, you have a very good reason to do that. Moreover, I understand that 
> patches are welcome :)
> 
> But I would like to understand *why* this possibility is ruled out with such 
> firmness.
> 

There could be a simple answer and complex one :)

I'd start with simple one - already presented here -

when you write to INDIVIDUAL thin volume target - respective dn thin target 
DOES manipulate with single btree set - it does NOT care there are some other 
snapshot and never influnces them -

You ask here to heavily 'change' thin-pool logic - so writing to THIN volume A 
  can remove/influence volume B - this is very problematic for meny reasons.

We can go into details of BTree updates  (that should be really discussed with 
its authors on dm channel ;)) - but I think the key element is capturing the 
idea the usage of thinLV A does not change thinLV B.


----


Now to your free 'reserved' space fiction :)
There is NO way to decide WHO deserves to use the reserve :)

Every thin volume is equal - (the fact we call some thin LV snapshot is 
user-land fiction - in kernel all thinLV are just equal -  every thinLV 
reference set of thin-pool chunks)  -

(for late-night thinking -  what would be snapshot of snapshot which is fully 
overwritten ;))

So when you now see that all thinLVs  just maps set of chunks,
and all thinLVs can be active and running concurrently - how do you want to 
use reserves in thin-pool :) ?
When do you decide it ?  (you need to see this is total race-lend)
How do you actually orchestrate locking around this single point of failure ;) ?
You will surely come with and idea of having reserve separate for every thinLV ?
How big it should actually be ?
Are you going to 'refill' those reserves  when thin-pool gets emptier ?
How you decide which thinLV deserves bigger reserves ;) ??

I assume you can start to SEE the whole point of this misery....

So instead -  you can start with normal thin-pool - keep it simple in kernel,
and solve complexity in user-space.

There you can decide - if you want to extend thin-pool...
You may drop some snapshot...
You may fstrim mounted thinLVs...
You can kill volumes way before the situation becomes unmaintable....

All you need to accept is - you will kill them at 95% -
in your world with reserves it would be already reported as 100% full,
with totally unknown size of reserves :)

Regards

Zdenek

  reply	other threads:[~2017-09-12 23:02 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <418002318.647314.1505245480415.ref@mail.yahoo.com>
     [not found] ` <418002318.647314.1505245480415@mail.yahoo.com>
2017-09-12 21:36   ` [linux-lvm] Reserve space for specific thin logical volumes Gionatan Danti
2017-09-12 22:16     ` Zdenek Kabelac
2017-09-12 22:41       ` Gionatan Danti
2017-09-12 23:02         ` Zdenek Kabelac [this message]
2017-09-12 23:15           ` Gionatan Danti
2017-09-12 23:31             ` Zdenek Kabelac
2017-09-13  8:21               ` Gionatan Danti
     [not found] <914479528.2618507.1505463313888.ref@mail.yahoo.com>
2017-09-15  8:15 ` matthew patton
2017-09-15 10:01   ` Zdenek Kabelac
2017-09-15 18:35   ` Xen
     [not found] <498090067.1559855.1505338608244.ref@mail.yahoo.com>
2017-09-13 21:36 ` matthew patton
     [not found] <1771452279.913055.1505269434212.ref@mail.yahoo.com>
2017-09-13  2:23 ` matthew patton
2017-09-13  7:25   ` Zdenek Kabelac
     [not found] <57374453.843393.1505261050977.ref@mail.yahoo.com>
2017-09-13  0:04 ` matthew patton
2017-09-13  7:10   ` Zdenek Kabelac
     [not found] <691633892.829188.1505258696384.ref@mail.yahoo.com>
2017-09-12 23:24 ` matthew patton
     [not found] <1575245610.821680.1505258554456.ref@mail.yahoo.com>
2017-09-12 23:22 ` matthew patton
2017-09-13  7:53   ` Gionatan Danti
2017-09-13  8:15     ` Zdenek Kabelac
2017-09-13  8:28       ` Gionatan Danti
2017-09-13  8:44         ` Zdenek Kabelac
2017-09-13 10:46           ` Gionatan Danti
     [not found] <1806055156.426333.1505228581063.ref@mail.yahoo.com>
2017-09-12 15:03 ` matthew patton
2017-09-12 17:09   ` Gionatan Danti
2017-09-12 22:41     ` Zdenek Kabelac
2017-09-12 22:55       ` Gionatan Danti
2017-09-12 23:11         ` Zdenek Kabelac
     [not found] <832949972.1610294.1505170613541.ref@mail.yahoo.com>
2017-09-11 22:56 ` matthew patton
2017-09-12  5:28   ` Gionatan Danti
2017-09-08 10:35 Gionatan Danti
2017-09-08 11:06 ` Xen
2017-09-09 22:04 ` Gionatan Danti
2017-09-11 10:35   ` Zdenek Kabelac
2017-09-11 10:55     ` Xen
2017-09-11 11:20       ` Zdenek Kabelac
2017-09-11 12:06         ` Xen
2017-09-11 12:45           ` Xen
2017-09-11 13:11           ` Zdenek Kabelac
2017-09-11 13:46             ` Xen
2017-09-12 11:46               ` Zdenek Kabelac
2017-09-12 12:37                 ` Xen
2017-09-12 14:37                   ` Zdenek Kabelac
2017-09-12 16:44                     ` Xen
2017-09-12 17:14                     ` Gionatan Danti
2017-09-12 21:57                       ` Zdenek Kabelac
2017-09-13 17:41                         ` Xen
2017-09-13 19:17                           ` Zdenek Kabelac
2017-09-14  3:19                             ` Xen
2017-09-12 17:00                 ` Gionatan Danti
2017-09-12 23:25                   ` Brassow Jonathan
2017-09-13  8:15                     ` Gionatan Danti
2017-09-13  8:33                       ` Zdenek Kabelac
2017-09-13 18:43                     ` Xen
2017-09-13 19:35                       ` Zdenek Kabelac
2017-09-14  5:59                         ` Xen
2017-09-14 19:05                           ` Zdenek Kabelac
2017-09-15  2:06                             ` Brassow Jonathan
2017-09-15  6:02                               ` Gionatan Danti
2017-09-15  8:37                               ` Xen
2017-09-15  7:34                             ` Xen
2017-09-15  9:22                               ` Zdenek Kabelac
2017-09-16 22:33                                 ` Xen
2017-09-17  6:31                                   ` Xen
2017-09-17  7:10                                     ` Xen
2017-09-18 19:20                                       ` Gionatan Danti
2017-09-20 13:05                                         ` Xen
2017-09-21  9:49                                           ` Zdenek Kabelac
2017-09-21 10:22                                             ` Xen
2017-09-21 13:02                                               ` Zdenek Kabelac
2017-09-21 14:49                                                 ` Xen
2017-09-21 20:32                                                   ` Zdenek Kabelac
2017-09-18  8:56                                   ` Zdenek Kabelac
2017-09-11 14:00             ` Xen
2017-09-11 17:34               ` Zdenek Kabelac
2017-09-11 15:31             ` Eric Ren
2017-09-11 15:52               ` Zdenek Kabelac
2017-09-11 21:35                 ` Eric Ren
2017-09-11 17:41               ` David Teigland
2017-09-11 21:08                 ` Eric Ren
2017-09-11 16:55             ` David Teigland
2017-09-11 17:43               ` Zdenek Kabelac
2017-09-11 21:59     ` Gionatan Danti
2017-09-12 11:01       ` Zdenek Kabelac
2017-09-12 11:34         ` Gionatan Danti
2017-09-12 12:03           ` Zdenek Kabelac
2017-09-12 12:47             ` Xen
2017-09-12 13:51               ` pattonme
2017-09-12 14:57               ` Zdenek Kabelac
2017-09-12 16:49                 ` Xen
2017-09-12 16:57             ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc2f993c-6a90-ff8d-6d06-e629ac4931a4@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=g.danti@assyoma.it \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).