* [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
@ 2017-03-08 16:14 Gionatan Danti
2017-03-08 18:55 ` Zdenek Kabelac
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-08 16:14 UTC (permalink / raw)
To: linux-lvm
Hi list,
I would like to understand if this is a lvmthin metadata size bug of if
I am simply missing something.
These are my system specs:
- CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
- LVM version 2.02.166-1.el7_3.2
- two linux software RAID device, md127 (root) and md126 (storage)
MD array specs (the interesting one is md126)
Personalities : [raid10]
md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
bitmap: 1/5 pages [4KB], 65536KB chunk
md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
As you can see, /dev/md126 has a 128KB chunk size. I used this device to
host a physical volume and volume group on which I created a thinpool of
512GB. Then, I create a thin logical volume of the same size (512 GB)
and started to fill it. Somewhere near (but not at) the full capacity, I
saw the volume offline due to metadata exhaustion.
Let see how the logical volume was created and how it appear:
[root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G; lvs -a -o
+chunk_size
Using default stripesize 64.00 KiB.
Logical volume "thinpool" created.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.83
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 512.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 7.62g
0
The metadata volume is quite smaller (~2x) than I expected, and not big
enough to reach 100% data utilization. Indeed, thin_metadata_size show a
minimum metadata volume size of over 130 MB:
[root@blackhole ]# thin_metadata_size -b 128k -s 512g -m 1 -u m
thin_metadata_size - 130.04 mebibytes estimated metadata area size for
"--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"
Now, the interesting thing: by explicitly setting --chunksize=128, the
metadata volume is 2x bigger (and in line with my expectations):
[root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G
--chunksize=128; lvs -a -o +chunk_size
Using default stripesize 64.00 KiB.
Logical volume "thinpool" created.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 256.00m
0
thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.42
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 512.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 256.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 7.62g
0
Why I saw two very different metadata volume sizes? Chunksize was 128 KB
in both cases; the only difference is that I explicitly specified it on
the command line...
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-08 16:14 [linux-lvm] Possible bug in thin metadata size with Linux MDRAID Gionatan Danti
@ 2017-03-08 18:55 ` Zdenek Kabelac
2017-03-09 11:24 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-08 18:55 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
Dne 8.3.2017 v 17:14 Gionatan Danti napsal(a):
> Hi list,
> I would like to understand if this is a lvmthin metadata size bug of if I am
> simply missing something.
>
> These are my system specs:
> - CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
> - LVM version 2.02.166-1.el7_3.2
> - two linux software RAID device, md127 (root) and md126 (storage)
>
> MD array specs (the interesting one is md126)
> Personalities : [raid10]
> md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
> 557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
> bitmap: 1/5 pages [4KB], 65536KB chunk
>
> md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
> 67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> As you can see, /dev/md126 has a 128KB chunk size. I used this device to host
> a physical volume and volume group on which I created a thinpool of 512GB.
> Then, I create a thin logical volume of the same size (512 GB) and started to
> fill it. Somewhere near (but not at) the full capacity, I saw the volume
> offline due to metadata exhaustion.
>
> Let see how the logical volume was created and how it appear:
> [root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G; lvs -a -o +chunk_size
> Using default stripesize 64.00 KiB.
> Logical volume "thinpool" created.
> LV VG Attr LSize Pool Origin Data% Meta% Move
> Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-a-tz-- 512.00g 0.00 0.83
> 128.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 512.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-ao---- 7.62g
> 0
>
> The metadata volume is quite smaller (~2x) than I expected, and not big enough
> to reach 100% data utilization. Indeed, thin_metadata_size show a minimum
> metadata volume size of over 130 MB:
>
> [root@blackhole ]# thin_metadata_size -b 128k -s 512g -m 1 -u m
> thin_metadata_size - 130.04 mebibytes estimated metadata area size for
> "--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"
>
> Now, the interesting thing: by explicitly setting --chunksize=128, the
> metadata volume is 2x bigger (and in line with my expectations):
Hi
If you do NOT specify any setting - lvm2 targets 128M metadata size.
If you specify '--chunksize' lvm2 tries to find better fit and it happens
to be slightly better with 256M metadata size.
Basically - you could specify anything to the last bit - and if you don't lvm2
does a little 'magic' and tries to come with 'reasonable' defaults for given
kernel and time.
That said - I've in my git tree some rework of this code - mainly for better
support of metadata profiles.
(And my git calculation gives me 256K chunksize + 128M metadata size - so
there was possibly something not completely right in version 166)
> Why I saw two very different metadata volume sizes? Chunksize was 128 KB in
> both cases; the only difference is that I explicitly specified it on the
> command line...
You should NOT forget - that using 'thin-pool' without any monitoring and
automatic resize is somewhat 'dangerous'.
So while lvm2 is not (ATM) enforcing automatic resize when data or metadata
space has reached predefined threshold - I'd highly recommnend to use it.
Upcoming version 169 will provide even support for 'external tool' to be
called when threshold levels are surpassed for even more advanced
configuration options.
Regards
Zdenek
NB. metadata size is not related to mdraid in any way.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-08 18:55 ` Zdenek Kabelac
@ 2017-03-09 11:24 ` Gionatan Danti
2017-03-09 11:53 ` Zdenek Kabelac
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-09 11:24 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 08/03/2017 19:55, Zdenek Kabelac wrote:
>
> Hi
>
> If you do NOT specify any setting - lvm2 targets 128M metadata size.
>
> If you specify '--chunksize' lvm2 tries to find better fit and it happens
> to be slightly better with 256M metadata size.
>
> Basically - you could specify anything to the last bit - and if you
> don't lvm2 does a little 'magic' and tries to come with 'reasonable'
> defaults for given kernel and time.
>
> That said - I've in my git tree some rework of this code - mainly for
> better support of metadata profiles.
> (And my git calculation gives me 256K chunksize + 128M metadata size -
> so there was possibly something not completely right in version 166)
>
>
256 KB chunksize would be perfectly reasonable
>> Why I saw two very different metadata volume sizes? Chunksize was 128
>> KB in
>> both cases; the only difference is that I explicitly specified it on the
>> command line...
>
> You should NOT forget - that using 'thin-pool' without any monitoring
> and automatic resize is somewhat 'dangerous'.
>
True, but I should have no problem if not using snapshot or
overprovisioning - ie when all data chunks are allocated (filesystem
full) but no overprovisioned. This time, however, the created metadata
pool was *insufficient* to even address the provisioned data chunks.
> So while lvm2 is not (ATM) enforcing automatic resize when data or
> metadata space has reached predefined threshold - I'd highly recommnend
> to use it.
>
> Upcoming version 169 will provide even support for 'external tool' to be
> called when threshold levels are surpassed for even more advanced
> configuration options.
>
>
> Regards
>
> Zdenek
>
>
> NB. metadata size is not related to mdraid in any way.
>
>
>
I am under impression that 128 KB size was chosen because this was MD
chunk size. Indeed further tests seem to confirm this.
WITH 128 KB MD CHUNK SIZE:
[root@gdanti-laptop test]# mdadm --create md127 --level=raid10
--assume-clean --chunk=128 --raid-devices=4 /dev/loop0 /dev/loop1
/dev/loop2 /dev/loop3
[root@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.80
128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
WITH 256 KB MD CHUNK SIZE:
[root@gdanti-laptop test]# mdadm --create md127 --level=raid10
--assume-clean --chunk=256 --raid-devices=4 /dev/loop0 /dev/loop1
/dev/loop2 /dev/loop3
[root@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
256.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
So it seems MD chunk size has a strong influence on LVM thin chunk choice.
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-09 11:24 ` Gionatan Danti
@ 2017-03-09 11:53 ` Zdenek Kabelac
2017-03-09 15:33 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-09 11:53 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
Dne 9.3.2017 v 12:24 Gionatan Danti napsal(a):
> On 08/03/2017 19:55, Zdenek Kabelac wrote:
>>
>> Hi
>>
>> If you do NOT specify any setting - lvm2 targets 128M metadata size.
>>
>> If you specify '--chunksize' lvm2 tries to find better fit and it happens
>> to be slightly better with 256M metadata size.
>>
>> Basically - you could specify anything to the last bit - and if you
>> don't lvm2 does a little 'magic' and tries to come with 'reasonable'
>> defaults for given kernel and time.
>>
>> That said - I've in my git tree some rework of this code - mainly for
>> better support of metadata profiles.
>> (And my git calculation gives me 256K chunksize + 128M metadata size -
>> so there was possibly something not completely right in version 166)
>>
>>
>
> 256 KB chunksize would be perfectly reasonable
>
>>> Why I saw two very different metadata volume sizes? Chunksize was 128
>>> KB in
>>> both cases; the only difference is that I explicitly specified it on the
>>> command line...
>>
>> You should NOT forget - that using 'thin-pool' without any monitoring
>> and automatic resize is somewhat 'dangerous'.
>>
>
> True, but I should have no problem if not using snapshot or overprovisioning -
> ie when all data chunks are allocated (filesystem full) but no
> overprovisioned. This time, however, the created metadata pool was
> *insufficient* to even address the provisioned data chunks.
Hmm - it would be interesting to see your 'metadata' - it should be still
quite good fit 128M of metadata for 512G when you are not using snapshots.
What's been your actual test scenario ?? (Lots of LVs??)
But as said - there is no guarantee of the size to fit for any possible use
case - user is supposed to understand what kind of technology he is using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.
Otherwise you would have to simply always create 16G metadata LV if you do not
want to run out of metadata space.
> I am under impression that 128 KB size was chosen because this was MD chunk
> size. Indeed further tests seem to confirm this.
Ahh yeah - there was small issue - when the 'hint' for device geometry was
used it has started from 'default' 64K size - instead of already counted 256K
chunk size.
Regards
Zdenek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-09 11:53 ` Zdenek Kabelac
@ 2017-03-09 15:33 ` Gionatan Danti
2017-03-20 9:47 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-09 15:33 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 09/03/2017 12:53, Zdenek Kabelac wrote:
>
> Hmm - it would be interesting to see your 'metadata' - it should be still
> quite good fit 128M of metadata for 512G when you are not using snapshots.
>
> What's been your actual test scenario ?? (Lots of LVs??)
>
Nothing unusual - I had a single thinvol with an XFS filesystem used to
store an HDD image gathered using ddrescue.
Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
512GB volume with 128 KB chunks? My testing suggests something
different. For example, give it a look at this empty thinpool/thinvol:
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 0.00
0.81 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
As you can see, as it is a empty volume, metadata is at only 0.81% Let
write 5 GB (1% of thin data volume):
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-aotz-- 500.00g 1.00
1.80 128.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Metadata grown by the same 1%. Accounting for the initial 0.81
utilization, this means that a near full data volume (with *no*
overprovisionig nor snapshots) will exhaust its metadata *before* really
becoming 100% full.
While I can absolutely understand that this is expected behavior when
using snapshots and/or overprovisioning, in this extremely simple case
metadata should not be exhausted before data. In other words, the
initial metadata creation process should be *at least* consider that a
plain volume can be 100% full, and allocate according.
The interesting part is that when not using MD, all is working properly:
metadata are about 2x their minimal value (as reported by
thin_metadata_size), and this provide ample buffer for
snapshotting/overprovisioning. When using MD, the bad iteration between
RAID chunks and thin metadata chunks ends with a too small metadata volume.
This can become very bad. Give a look at what happens when creating a
thin pool on a MD raid whose chunks are at 64 KB:
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
> But as said - there is no guarantee of the size to fit for any possible
> use case - user is supposed to understand what kind of technology he is
> using,
> and when he 'opt-out' from automatic resize - he needs to deploy his own
> monitoring.
True, but this trivial case should really works without
tuning/monitoring. In short, let fail gracefully on a simple case...
>
> Otherwise you would have to simply always create 16G metadata LV if you
> do not want to run out of metadata space.
>
>
Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-09 15:33 ` Gionatan Danti
@ 2017-03-20 9:47 ` Gionatan Danti
2017-03-20 9:51 ` Zdenek Kabelac
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 9:47 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
Hi all,
any comments on the report below?
Thanks.
On 09/03/2017 16:33, Gionatan Danti wrote:
> On 09/03/2017 12:53, Zdenek Kabelac wrote:
>>
>> Hmm - it would be interesting to see your 'metadata' - it should be
>> still
>> quite good fit 128M of metadata for 512G when you are not using
>> snapshots.
>>
>> What's been your actual test scenario ?? (Lots of LVs??)
>>
>
> Nothing unusual - I had a single thinvol with an XFS filesystem used to
> store an HDD image gathered using ddrescue.
>
> Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
> 512GB volume with 128 KB chunks? My testing suggests something
> different. For example, give it a look at this empty thinpool/thinvol:
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
> LV VG Attr LSize Pool Origin Data%
> Meta% Move Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-aotz-- 500.00g 0.00
> 0.81 128.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-ao---- 3.75g
> 0
>
> As you can see, as it is a empty volume, metadata is at only 0.81% Let
> write 5 GB (1% of thin data volume):
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
> LV VG Attr LSize Pool Origin Data%
> Meta% Move Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-aotz-- 500.00g 1.00
> 1.80 128.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-ao---- 3.75g
> 0
>
> Metadata grown by the same 1%. Accounting for the initial 0.81
> utilization, this means that a near full data volume (with *no*
> overprovisionig nor snapshots) will exhaust its metadata *before* really
> becoming 100% full.
>
> While I can absolutely understand that this is expected behavior when
> using snapshots and/or overprovisioning, in this extremely simple case
> metadata should not be exhausted before data. In other words, the
> initial metadata creation process should be *at least* consider that a
> plain volume can be 100% full, and allocate according.
>
> The interesting part is that when not using MD, all is working properly:
> metadata are about 2x their minimal value (as reported by
> thin_metadata_size), and this provide ample buffer for
> snapshotting/overprovisioning. When using MD, the bad iteration between
> RAID chunks and thin metadata chunks ends with a too small metadata volume.
>
> This can become very bad. Give a look at what happens when creating a
> thin pool on a MD raid whose chunks are at 64 KB:
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
> LV VG Attr LSize Pool Origin Data% Meta%
> Move Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
> 64.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-ao---- 3.75g
> 0
>
> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
> volume size. Now metadata can only address ~50% of thin volume space.
>
>> But as said - there is no guarantee of the size to fit for any possible
>> use case - user is supposed to understand what kind of technology he is
>> using,
>> and when he 'opt-out' from automatic resize - he needs to deploy his own
>> monitoring.
>
> True, but this trivial case should really works without
> tuning/monitoring. In short, let fail gracefully on a simple case...
>>
>> Otherwise you would have to simply always create 16G metadata LV if you
>> do not want to run out of metadata space.
>>
>>
>
> Absolutely true. I've written this email to report a bug, indeed ;)
> Thank you all for this outstanding work.
>
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 9:47 ` Gionatan Danti
@ 2017-03-20 9:51 ` Zdenek Kabelac
2017-03-20 10:45 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20 9:51 UTC (permalink / raw)
To: LVM general discussion and development, Gionatan Danti
Dne 20.3.2017 v 10:47 Gionatan Danti napsal(a):
> Hi all,
> any comments on the report below?
>
> Thanks.
Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may change).
Not sure for which other comment you look for.
Zdenek
>
> On 09/03/2017 16:33, Gionatan Danti wrote:
>> On 09/03/2017 12:53, Zdenek Kabelac wrote:
>>>
>>> Hmm - it would be interesting to see your 'metadata' - it should be
>>> still
>>> quite good fit 128M of metadata for 512G when you are not using
>>> snapshots.
>>>
>>> What's been your actual test scenario ?? (Lots of LVs??)
>>>
>>
>> Nothing unusual - I had a single thinvol with an XFS filesystem used to
>> store an HDD image gathered using ddrescue.
>>
>> Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
>> 512GB volume with 128 KB chunks? My testing suggests something
>> different. For example, give it a look at this empty thinpool/thinvol:
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>> LV VG Attr LSize Pool Origin Data%
>> Meta% Move Log Cpy%Sync Convert Chunk
>> [lvol0_pmspare] vg_kvm ewi------- 128.00m
>> 0
>> thinpool vg_kvm twi-aotz-- 500.00g 0.00
>> 0.81 128.00k
>> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
>> 0
>> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
>> 0
>> thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 0.00
>> 0
>> root vg_system -wi-ao---- 50.00g
>> 0
>> swap vg_system -wi-ao---- 3.75g
>> 0
>>
>> As you can see, as it is a empty volume, metadata is at only 0.81% Let
>> write 5 GB (1% of thin data volume):
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>> LV VG Attr LSize Pool Origin Data%
>> Meta% Move Log Cpy%Sync Convert Chunk
>> [lvol0_pmspare] vg_kvm ewi------- 128.00m
>> 0
>> thinpool vg_kvm twi-aotz-- 500.00g 1.00
>> 1.80 128.00k
>> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
>> 0
>> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
>> 0
>> thinvol vg_kvm Vwi-a-tz-- 500.00g thinpool 1.00
>> 0
>> root vg_system -wi-ao---- 50.00g
>> 0
>> swap vg_system -wi-ao---- 3.75g
>> 0
>>
>> Metadata grown by the same 1%. Accounting for the initial 0.81
>> utilization, this means that a near full data volume (with *no*
>> overprovisionig nor snapshots) will exhaust its metadata *before* really
>> becoming 100% full.
>>
>> While I can absolutely understand that this is expected behavior when
>> using snapshots and/or overprovisioning, in this extremely simple case
>> metadata should not be exhausted before data. In other words, the
>> initial metadata creation process should be *at least* consider that a
>> plain volume can be 100% full, and allocate according.
>>
>> The interesting part is that when not using MD, all is working properly:
>> metadata are about 2x their minimal value (as reported by
>> thin_metadata_size), and this provide ample buffer for
>> snapshotting/overprovisioning. When using MD, the bad iteration between
>> RAID chunks and thin metadata chunks ends with a too small metadata volume.
>>
>> This can become very bad. Give a look at what happens when creating a
>> thin pool on a MD raid whose chunks are at 64 KB:
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>> LV VG Attr LSize Pool Origin Data% Meta%
>> Move Log Cpy%Sync Convert Chunk
>> [lvol0_pmspare] vg_kvm ewi------- 128.00m
>> 0
>> thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
>> 64.00k
>> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
>> 0
>> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
>> 0
>> root vg_system -wi-ao---- 50.00g
>> 0
>> swap vg_system -wi-ao---- 3.75g
>> 0
>>
>> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
>> volume size. Now metadata can only address ~50% of thin volume space.
>>
>>> But as said - there is no guarantee of the size to fit for any possible
>>> use case - user is supposed to understand what kind of technology he is
>>> using,
>>> and when he 'opt-out' from automatic resize - he needs to deploy his own
>>> monitoring.
>>
>> True, but this trivial case should really works without
>> tuning/monitoring. In short, let fail gracefully on a simple case...
>>>
>>> Otherwise you would have to simply always create 16G metadata LV if you
>>> do not want to run out of metadata space.
>>>
>>>
>>
>> Absolutely true. I've written this email to report a bug, indeed ;)
>> Thank you all for this outstanding work.
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 9:51 ` Zdenek Kabelac
@ 2017-03-20 10:45 ` Gionatan Danti
2017-03-20 11:01 ` Zdenek Kabelac
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 10:45 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 20/03/2017 10:51, Zdenek Kabelac wrote:
>
> Please check upstream behavior (git HEAD)
> It will still take a while before final release so do not use it
> regularly yet (as few things still may change).
I will surely try with git head and report back here.
>
> Not sure for which other comment you look for.
>
> Zdenek
>
>
>
1. you suggested that a 128 MB metadata volume is "quite good" for a
512GB volume and 128KB chunkgs. However, my tests show that a near full
data volume (with *no* overprovisionig nor snapshots) will exhaust its
metadata *before* really becoming 100% full.
2. On a MD RAID with 64KB chunk size, things become much worse:
[root@gdanti-laptop test]# lvs -a -o +chunk_size
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
64.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-ao---- 3.75g
0
Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.
So, I am missing something or the RHEL 7.3-provided LVM has some serious
problems identifing correct metadata volume size when running on top of
a MD RAID device?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 10:45 ` Gionatan Danti
@ 2017-03-20 11:01 ` Zdenek Kabelac
2017-03-20 11:52 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20 11:01 UTC (permalink / raw)
To: Gionatan Danti, LVM general discussion and development
Dne 20.3.2017 v 11:45 Gionatan Danti napsal(a):
> On 20/03/2017 10:51, Zdenek Kabelac wrote:
>>
>> Please check upstream behavior (git HEAD)
>> It will still take a while before final release so do not use it
>> regularly yet (as few things still may change).
>
> I will surely try with git head and report back here.
>
>>
>> Not sure for which other comment you look for.
>>
>> Zdenek
>>
>>
>>
>
> 1. you suggested that a 128 MB metadata volume is "quite good" for a 512GB
> volume and 128KB chunkgs. However, my tests show that a near full data volume
> (with *no* overprovisionig nor snapshots) will exhaust its metadata *before*
> really becoming 100% full.
>
> 2. On a MD RAID with 64KB chunk size, things become much worse:
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
> LV VG Attr LSize Pool Origin Data% Meta%
> Move Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-a-tz-- 500.00g 0.00 1.58
> 64.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-ao---- 3.75g
> 0
>
> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
> volume size. Now metadata can only address ~50% of thin volume space.
>
> So, I am missing something or the RHEL 7.3-provided LVM has some serious
> problems identifing correct metadata volume size when running on top of a MD
> RAID device?
As said - please try with HEAD - and report back if you still see a problem.
There were couple issue fixed along this path.
In my test it seems 500G needs at least 258M with 64K chunksize.
On the other hand - it's never been documented that thin-pool without
monitoring is supposed to fit single LV AFAIK - it's basically needed that
user knows what he is using when he uses thin-provisioning - but of course
we continuously try to improve things to be more usable.
Zdenek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 11:01 ` Zdenek Kabelac
@ 2017-03-20 11:52 ` Gionatan Danti
2017-03-20 13:57 ` Zdenek Kabelac
0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 11:52 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 20/03/2017 12:01, Zdenek Kabelac wrote:
>
>
> As said - please try with HEAD - and report back if you still see a
> problem.
> There were couple issue fixed along this path.
>
Ok, I tried now with tools and library from git:
LVM version: 2.02.169(2)-git (2016-11-30)
Library version: 1.02.138-git (2016-11-30)
Driver version: 4.34.0
I can confirm that now thin chunk size is no more bound (by default) by
MD RAID chunk. For example, having created a ~500 GB MD RAID 10 array
with 64 KB chunks, creating a thinpool shows that:
[root@blackhole ~]# lvcreate --thinpool vg_kvm/thinpool -L 500G
[root@blackhole ~]# lvs -a -o +chunk_size
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert Chunk
[lvol0_pmspare] vg_kvm ewi------- 128.00m
0
thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
256.00k
[thinpool_tdata] vg_kvm Twi-ao---- 500.00g
0
[thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
0
root vg_system -wi-ao---- 50.00g
0
swap vg_system -wi-a----- 7.62g
Should I open a bug against the RHEL-provided packages?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 11:52 ` Gionatan Danti
@ 2017-03-20 13:57 ` Zdenek Kabelac
2017-03-20 14:25 ` Gionatan Danti
0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20 13:57 UTC (permalink / raw)
To: Gionatan Danti, LVM general discussion and development
Dne 20.3.2017 v 12:52 Gionatan Danti napsal(a):
> On 20/03/2017 12:01, Zdenek Kabelac wrote:
>>
>>
>> As said - please try with HEAD - and report back if you still see a
>> problem.
>> There were couple issue fixed along this path.
>>
>
> Ok, I tried now with tools and library from git:
>
> LVM version: 2.02.169(2)-git (2016-11-30)
> Library version: 1.02.138-git (2016-11-30)
> Driver version: 4.34.0
>
> I can confirm that now thin chunk size is no more bound (by default) by MD
> RAID chunk. For example, having created a ~500 GB MD RAID 10 array with 64 KB
> chunks, creating a thinpool shows that:
>
> [root@blackhole ~]# lvcreate --thinpool vg_kvm/thinpool -L 500G
> [root@blackhole ~]# lvs -a -o +chunk_size
> WARNING: Failed to connect to lvmetad. Falling back to device scanning.
> LV VG Attr LSize Pool Origin Data% Meta% Move
> Log Cpy%Sync Convert Chunk
> [lvol0_pmspare] vg_kvm ewi------- 128.00m
> 0
> thinpool vg_kvm twi-a-tz-- 500.00g 0.00 0.42
> 256.00k
> [thinpool_tdata] vg_kvm Twi-ao---- 500.00g
> 0
> [thinpool_tmeta] vg_kvm ewi-ao---- 128.00m
> 0
> root vg_system -wi-ao---- 50.00g
> 0
> swap vg_system -wi-a----- 7.62g
>
> Should I open a bug against the RHEL-provided packages?
Well if you want to get support for your existing packages - you would
need to go via 'GSS' channel.
You may open BZ - which will get closed with next release of RHEL7.4
(as you already confirmed upstream has resolved the issue).
Zdenek
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
2017-03-20 13:57 ` Zdenek Kabelac
@ 2017-03-20 14:25 ` Gionatan Danti
0 siblings, 0 replies; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 14:25 UTC (permalink / raw)
To: Zdenek Kabelac, LVM general discussion and development
On 20/03/2017 14:57, Zdenek Kabelac wrote:
>
>
> Well if you want to get support for your existing packages - you would
> need to go via 'GSS' channel.
>
Sorry, but what do you means for "GSS channel"?
> You may open BZ - which will get closed with next release of RHEL7.4
> (as you already confirmed upstream has resolved the issue).
>
> Zdenek
>
I'll surely do that.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-03-20 14:25 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-08 16:14 [linux-lvm] Possible bug in thin metadata size with Linux MDRAID Gionatan Danti
2017-03-08 18:55 ` Zdenek Kabelac
2017-03-09 11:24 ` Gionatan Danti
2017-03-09 11:53 ` Zdenek Kabelac
2017-03-09 15:33 ` Gionatan Danti
2017-03-20 9:47 ` Gionatan Danti
2017-03-20 9:51 ` Zdenek Kabelac
2017-03-20 10:45 ` Gionatan Danti
2017-03-20 11:01 ` Zdenek Kabelac
2017-03-20 11:52 ` Gionatan Danti
2017-03-20 13:57 ` Zdenek Kabelac
2017-03-20 14:25 ` Gionatan Danti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).