From mboxrd@z Thu Jan 1 00:00:00 1970 References: <1438f48b-0a6d-4fb7-92dc-3688251e0a00@assyoma.it> <2f9c4346d4e9646ca058efdf535d435e@xenhideout.nl> <5df13342-8c31-4a0b-785e-1d12f0d2d9e8@redhat.com> <6dd12ab9-0390-5c07-f4b7-de0d8fbbeacf@redhat.com> <3831e817d7d788e93a69f20e5dda1159@xenhideout.nl> <0ab1c4e1-b15e-b22e-9455-5569eeaa0563@redhat.com> <51faeb921acf634609b61bff5fd269d4@xenhideout.nl> <4b4d56ef-3127-212b-0e68-00b595faa241@redhat.com> <6dd3a268-8a86-31dd-7a0b-dd08fdefdd55@redhat.com> <9142007eeb745a0f4774710b7c007375@assyoma.it> From: Zdenek Kabelac Message-ID: Date: Wed, 28 Feb 2018 22:43:26 +0100 MIME-Version: 1.0 In-Reply-To: <9142007eeb745a0f4774710b7c007375@assyoma.it> Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development , Gionatan Danti Cc: Xen Dne 28.2.2018 v 20:07 Gionatan Danti napsal(a): > Hi all, > > Il 28-02-2018 10:26 Zdenek Kabelac ha scritto: >> Overprovisioning on DEVICE level simply IS NOT equivalent to full >> filesystem like you would like to see all the time here and you've >> been already many times explained that filesystems are simply not >> there ready - fixes are on going but it will take its time and it's >> really pointless to exercise this on 2-3 year old kernels... > > this was really beaten to death in the past months/threads. I generally agree > with Zedenk. > > To recap (Zdeneck, correct me if I am wrong): the main problem is that, on a > full pool, async writes will more-or-less silenty fail (with errors shown on > dmesg, but nothing more). Another possible cause of problem is that, even on a > full pool, *some* writes will complete correctly (the one on already allocated > chunks). On default - full pool starts to 'error' all 'writes' in 60 seconds. > > In the past was argued that putting the entire pool in read-only mode (where > *all* writes fail, but read are permitted to complete) would be a better > fail-safe mechanism; however, it was stated that no current dmtarget permit that. Yep - I'd probably like to see slightly different mechanism - that all on going writes would be failing - so far - some 'writes' will pass (those to already provisioned areas) - some will fail (those to unprovisioned). The main problem is - after reboot - this 'missing/unprovisioned' space may provide some old data... > > Two (good) solution where given, both relying on scripting (see "thin_command" > option on lvm.conf): > - fsfreeze on a nearly full pool (ie: >=98%); > - replace the dmthinp target with the error target (using dmsetup). Yep - this all can happen via 'monitoring. The key is to do it early before disaster happens. > I really think that with the good scripting infrastructure currently built in > lvm this is a more-or-less solved problem. It still depends - there is always some sort of 'race' - unless you are willing to 'give-up' too early to be always sure, considering there are technologies that may write many GB/s... >> Do NOT take thin snapshot of your root filesystem so you will avoid >> thin-pool overprovisioning problem. > > But is someone *really* pushing thinp for root filesystem? I always used it You can use rootfs with thinp - it's very fast for testing i.e. upgrades and quickly revert back - just there should be enough free space. > In stress testing, I never saw a system crash on a full thin pool, but I was > not using it on root filesystem. There are any ill effect on system stability > which I need to know? Depends on version of kernel and filesystem in use. Note RHEL/Centos kernel has lots of backport even when it's look quite old. > The solution is to use scripting/thin_command with lvm tags. For example: > - tag all snapshot with a "snap" tag; > - when usage is dangerously high, drop all volumes with "snap" tag. Yep - every user has different plans in his mind - scripting gives user freedom to adapt this logic to local needs... >>> However, I don't have the space for a full copy of every filesystem, so if >>> I snapshot, I will automatically overprovision. As long as admin responsible controls space in thin-pool and takes action long time before thin-pool runs out-of-space all is fine. If admin hopes in some kind of magic to happen - we have a problem.... >> >> Back to rule #1 - thin-p is about 'delaying' deliverance of real space. >> If you already have plan to never deliver promised space - you need to >> live with consequences.... > > I am not sure to 100% agree on that. Thinp is not only about "delaying" space > provisioning; it clearly is also (mostly?) about fast, modern, usable > snapshots. Docker, snapper, stratis, etc. all use thinp mainly for its fast, > efficent snapshot capability. Denying that is not so useful and led to > "overwarning" (ie: when snapshotting a volume on a virtually-fillable thin pool). Snapshot are using space - with hope that if you will 'really' need that space you either add this space to you system - or you drop snapshots. Still the same logic applied.... >> !SNAPSHOTS ARE NOT BACKUPS! >> >> This is the key problem with your thinking here (unfortunately you are >> not 'alone' with this thinking) > > Snapshot are not backups, as they do not protect from hardware problems (and > denying that would be lame); however, they are an invaluable *part* of a > successfull backup strategy. Having multiple rollaback target, even on the > same machine, is a very usefull tool. Backups primarily sits on completely different storage. If you keep backup of data in same pool: 1.) error on this in single chunk shared by all your backup + origin - means it's total data loss - especially in case where filesystem are using 'BTrees' and some 'root node' is lost - can easily render you origin + all backups completely useless. 2.) problems in thin-pool metadata can make all your origin+backups just an unordered mess of chunks. > Again, I don't understand by we are speaking about system crashes. On root > *not* using thinp, I never saw a system crash due to full data pool. > > Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are way too > limited). Yep - this case is known to be pretty stable. But as said - with today 'rush' of development and load of updates - user do want to try 'new disto upgrade' - if it works - all is fine - if it doesn't let's have a quick road back - so using thin volume for rootfs is pretty wanted case. Trouble is there is quite a lot of issues non-trivial to solve. There are also some on going ideas/projects - one of them was to have thinLVs with priority to be always fully provisioned - so such thinLV could never be the one to have unprovisioned chunks.... Other was a better integration of filesystem with 'provisioned' volumes. Zdenek