From: Zdenek Kabelac <zkabelac@redhat.com>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: LVM general discussion and development <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] Reserve space for specific thin logical volumes
Date: Wed, 13 Sep 2017 01:02:20 +0200 [thread overview]
Message-ID: <dc2f993c-6a90-ff8d-6d06-e629ac4931a4@redhat.com> (raw)
In-Reply-To: <d994dea1f0b3a800cc8d546bf8d9214f@assyoma.it>
Dne 13.9.2017 v 00:41 Gionatan Danti napsal(a):
> Il 13-09-2017 00:16 Zdenek Kabelac ha scritto:
>> Dne 12.9.2017 v 23:36 Gionatan Danti napsal(a):
>>> Il 12-09-2017 21:44 matthew patton ha scritto:
>>
>>> Again, please don't speak about things you don't know.
>>> I am *not* interested in thin provisioning itself at all; on the other
>>> side, I find CoW and fast snapshots very useful.
>>
>>
>> Not going to comment KVM storage architecture - but with this statemnet -
>> you have VERY simple usage:
>>
>>
>> Just minimize chance for overprovisioning -
>>
>> let's go by example:
>>
>> you have� 10� 10GiB volumes� and you have 20 snapshots...
>>
>>
>> to not overprovision - you need 10 GiB * 30 LV� = 300GiB thin-pool.
>>
>> if that sounds too-much.
>>
>> you can go with 150 GiB - to always 100% cover all 'base' volumes.
>> and have some room for snapshots.
>>
>>
>> Now the fun begins - while monitoring is running -
>> you get callback for� 50%, 55%... 95% 100%
>> at each moment� you can do whatever action you need.
>>
>>
>> So assume 100GiB is bare minimum for base volumes - you ignore any
>> state with less then 66% occupancy of thin-pool and you start solving
>> problems with 85% (~128GiB)- you know some snapshot is better to be
>> dropped.
>> You may try 'harder' actions for higher percentage.
>> (you need to consider how many dirty pages you leave floating your system
>> and other variables)
>>
>> Also you pick with some logic the snapshot which you want to drop -
>> Maybe the oldest ?
>> (see airplane :) URL link)....
>>
>> Anyway - you have plenty of time to solve it still at this moment
>> without any danger of losing write operation...
>> All you can lose is some 'snapshot' which might have been present a
>> bit longer...� but that is supposedly fine with your model workflow...
>>
>> Of course you are getting in serious problem, if you try to keep all
>> these demo-volumes within 50GiB with massive overprovisioning ;)
>>
>> There you have much hard times what should happen what should be
>> removed and where is possibly better to STOP everything and let admin
>> decide what is the ideal next step....
>>
>
> Hi Zdenek,
> I fully agree with what you said above, and I sincerely thank you for taking
> the time to reply.
> However, I am not sure to understand *why* reserving space for a thin volume
> seems a bad idea to you.
>
> Lets have a 100 GB thin pool, and wanting to *never* run out of space in spite
> of taking multiple snapshots.
> To achieve that, I need to a) carefully size the original volume, b) ask the
> thin pool to reserve the needed space and c) counting the "live" data (REFER
> in ZFS terms) allocated inside the thin volume.
>
> Step-by-step example:
> - create a 40 GB thin volume and subtract its size from the thin pool (USED 40
> GB, FREE 60 GB, REFER 0 GB);
> - overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
> - snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
> - completely overwrite the original volume (USED 80 GB, FREE 20 GB, REFER 40 GB);
> - a new snapshot creation will fails (REFER is higher then FREE).
>
> Result: thin pool is *never allowed* to fill. You need to keep track of
> per-volume USED and REFER space, but thinp performance should not be impacted
> in any manner. This is not theoretical: it is already working in this manner
> with ZVOLs and refreservation, *without* involing/requiring any advanced
> coupling/integration between block and filesystem layers.
>
> Don't get me wrong: I am sure that, if you choose to not implement this
> scheme, you have a very good reason to do that. Moreover, I understand that
> patches are welcome :)
>
> But I would like to understand *why* this possibility is ruled out with such
> firmness.
>
There could be a simple answer and complex one :)
I'd start with simple one - already presented here -
when you write to INDIVIDUAL thin volume target - respective dn thin target
DOES manipulate with single btree set - it does NOT care there are some other
snapshot and never influnces them -
You ask here to heavily 'change' thin-pool logic - so writing to THIN volume A
can remove/influence volume B - this is very problematic for meny reasons.
We can go into details of BTree updates (that should be really discussed with
its authors on dm channel ;)) - but I think the key element is capturing the
idea the usage of thinLV A does not change thinLV B.
----
Now to your free 'reserved' space fiction :)
There is NO way to decide WHO deserves to use the reserve :)
Every thin volume is equal - (the fact we call some thin LV snapshot is
user-land fiction - in kernel all thinLV are just equal - every thinLV
reference set of thin-pool chunks) -
(for late-night thinking - what would be snapshot of snapshot which is fully
overwritten ;))
So when you now see that all thinLVs just maps set of chunks,
and all thinLVs can be active and running concurrently - how do you want to
use reserves in thin-pool :) ?
When do you decide it ? (you need to see this is total race-lend)
How do you actually orchestrate locking around this single point of failure ;) ?
You will surely come with and idea of having reserve separate for every thinLV ?
How big it should actually be ?
Are you going to 'refill' those reserves when thin-pool gets emptier ?
How you decide which thinLV deserves bigger reserves ;) ??
I assume you can start to SEE the whole point of this misery....
So instead - you can start with normal thin-pool - keep it simple in kernel,
and solve complexity in user-space.
There you can decide - if you want to extend thin-pool...
You may drop some snapshot...
You may fstrim mounted thinLVs...
You can kill volumes way before the situation becomes unmaintable....
All you need to accept is - you will kill them at 95% -
in your world with reserves it would be already reported as 100% full,
with totally unknown size of reserves :)
Regards
Zdenek
next prev parent reply other threads:[~2017-09-12 23:02 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <418002318.647314.1505245480415.ref@mail.yahoo.com>
[not found] ` <418002318.647314.1505245480415@mail.yahoo.com>
2017-09-12 21:36 ` [linux-lvm] Reserve space for specific thin logical volumes Gionatan Danti
2017-09-12 22:16 ` Zdenek Kabelac
2017-09-12 22:41 ` Gionatan Danti
2017-09-12 23:02 ` Zdenek Kabelac [this message]
2017-09-12 23:15 ` Gionatan Danti
2017-09-12 23:31 ` Zdenek Kabelac
2017-09-13 8:21 ` Gionatan Danti
[not found] <914479528.2618507.1505463313888.ref@mail.yahoo.com>
2017-09-15 8:15 ` matthew patton
2017-09-15 10:01 ` Zdenek Kabelac
2017-09-15 18:35 ` Xen
[not found] <498090067.1559855.1505338608244.ref@mail.yahoo.com>
2017-09-13 21:36 ` matthew patton
[not found] <1771452279.913055.1505269434212.ref@mail.yahoo.com>
2017-09-13 2:23 ` matthew patton
2017-09-13 7:25 ` Zdenek Kabelac
[not found] <57374453.843393.1505261050977.ref@mail.yahoo.com>
2017-09-13 0:04 ` matthew patton
2017-09-13 7:10 ` Zdenek Kabelac
[not found] <691633892.829188.1505258696384.ref@mail.yahoo.com>
2017-09-12 23:24 ` matthew patton
[not found] <1575245610.821680.1505258554456.ref@mail.yahoo.com>
2017-09-12 23:22 ` matthew patton
2017-09-13 7:53 ` Gionatan Danti
2017-09-13 8:15 ` Zdenek Kabelac
2017-09-13 8:28 ` Gionatan Danti
2017-09-13 8:44 ` Zdenek Kabelac
2017-09-13 10:46 ` Gionatan Danti
[not found] <1806055156.426333.1505228581063.ref@mail.yahoo.com>
2017-09-12 15:03 ` matthew patton
2017-09-12 17:09 ` Gionatan Danti
2017-09-12 22:41 ` Zdenek Kabelac
2017-09-12 22:55 ` Gionatan Danti
2017-09-12 23:11 ` Zdenek Kabelac
[not found] <832949972.1610294.1505170613541.ref@mail.yahoo.com>
2017-09-11 22:56 ` matthew patton
2017-09-12 5:28 ` Gionatan Danti
2017-09-08 10:35 Gionatan Danti
2017-09-08 11:06 ` Xen
2017-09-09 22:04 ` Gionatan Danti
2017-09-11 10:35 ` Zdenek Kabelac
2017-09-11 10:55 ` Xen
2017-09-11 11:20 ` Zdenek Kabelac
2017-09-11 12:06 ` Xen
2017-09-11 12:45 ` Xen
2017-09-11 13:11 ` Zdenek Kabelac
2017-09-11 13:46 ` Xen
2017-09-12 11:46 ` Zdenek Kabelac
2017-09-12 12:37 ` Xen
2017-09-12 14:37 ` Zdenek Kabelac
2017-09-12 16:44 ` Xen
2017-09-12 17:14 ` Gionatan Danti
2017-09-12 21:57 ` Zdenek Kabelac
2017-09-13 17:41 ` Xen
2017-09-13 19:17 ` Zdenek Kabelac
2017-09-14 3:19 ` Xen
2017-09-12 17:00 ` Gionatan Danti
2017-09-12 23:25 ` Brassow Jonathan
2017-09-13 8:15 ` Gionatan Danti
2017-09-13 8:33 ` Zdenek Kabelac
2017-09-13 18:43 ` Xen
2017-09-13 19:35 ` Zdenek Kabelac
2017-09-14 5:59 ` Xen
2017-09-14 19:05 ` Zdenek Kabelac
2017-09-15 2:06 ` Brassow Jonathan
2017-09-15 6:02 ` Gionatan Danti
2017-09-15 8:37 ` Xen
2017-09-15 7:34 ` Xen
2017-09-15 9:22 ` Zdenek Kabelac
2017-09-16 22:33 ` Xen
2017-09-17 6:31 ` Xen
2017-09-17 7:10 ` Xen
2017-09-18 19:20 ` Gionatan Danti
2017-09-20 13:05 ` Xen
2017-09-21 9:49 ` Zdenek Kabelac
2017-09-21 10:22 ` Xen
2017-09-21 13:02 ` Zdenek Kabelac
2017-09-21 14:49 ` Xen
2017-09-21 20:32 ` Zdenek Kabelac
2017-09-18 8:56 ` Zdenek Kabelac
2017-09-11 14:00 ` Xen
2017-09-11 17:34 ` Zdenek Kabelac
2017-09-11 15:31 ` Eric Ren
2017-09-11 15:52 ` Zdenek Kabelac
2017-09-11 21:35 ` Eric Ren
2017-09-11 17:41 ` David Teigland
2017-09-11 21:08 ` Eric Ren
2017-09-11 16:55 ` David Teigland
2017-09-11 17:43 ` Zdenek Kabelac
2017-09-11 21:59 ` Gionatan Danti
2017-09-12 11:01 ` Zdenek Kabelac
2017-09-12 11:34 ` Gionatan Danti
2017-09-12 12:03 ` Zdenek Kabelac
2017-09-12 12:47 ` Xen
2017-09-12 13:51 ` pattonme
2017-09-12 14:57 ` Zdenek Kabelac
2017-09-12 16:49 ` Xen
2017-09-12 16:57 ` Gionatan Danti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc2f993c-6a90-ff8d-6d06-e629ac4931a4@redhat.com \
--to=zkabelac@redhat.com \
--cc=g.danti@assyoma.it \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).