[linux-lvm] Possible bug in thin metadata size with Linux MDRAID

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
@ 2017-03-08 16:14 Gionatan Danti
  2017-03-08 18:55 ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-08 16:14 UTC (permalink / raw)
  To: linux-lvm

Hi list,
I would like to understand if this is a lvmthin metadata size bug of if 
I am simply missing something.

These are my system specs:
- CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
- LVM version 2.02.166-1.el7_3.2
- two linux software RAID device, md127 (root) and md126 (storage)

MD array specs (the interesting one is md126)
Personalities : [raid10]
md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
       557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 1/5 pages [4KB], 65536KB chunk

md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
       67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

As you can see, /dev/md126 has a 128KB chunk size. I used this device to 
host a physical volume and volume group on which I created a thinpool of 
512GB. Then, I create a thin logical volume of the same size (512 GB) 
and started to fill it. Somewhere near (but not at) the full capacity, I 
saw the volume offline due to metadata exhaustion.

Let see how the logical volume was created and how it appear:
[root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G; lvs -a -o 
+chunk_size
   Using default stripesize 64.00 KiB.
   Logical volume "thinpool" created.
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                  0
   thinpool         vg_kvm    twi-a-tz-- 512.00g             0.00   0.83 
                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 512.00g 
                                  0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                  0
   root             vg_system -wi-ao----  50.00g 
                                  0
   swap             vg_system -wi-ao----   7.62g 
                                  0

The metadata volume is quite smaller (~2x) than I expected, and not big 
enough to reach 100% data utilization. Indeed, thin_metadata_size show a 
minimum metadata volume size of over 130 MB:

[root@blackhole ]# thin_metadata_size -b 128k -s 512g -m 1 -u m
thin_metadata_size - 130.04 mebibytes estimated metadata area size for 
"--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"

Now, the interesting thing: by explicitly setting --chunksize=128, the 
metadata volume is 2x bigger (and in line with my expectations):
[root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G 
--chunksize=128; lvs -a -o +chunk_size
   Using default stripesize 64.00 KiB.
   Logical volume "thinpool" created.
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 256.00m 
                                  0
   thinpool         vg_kvm    twi-a-tz-- 512.00g             0.00   0.42 
                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 512.00g 
                                  0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 256.00m 
                                  0
   root             vg_system -wi-ao----  50.00g 
                                  0
   swap             vg_system -wi-ao----   7.62g 
                                  0

Why I saw two very different metadata volume sizes? Chunksize was 128 KB 
in both cases; the only difference is that I explicitly specified it on 
the command line...

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-08 16:14 [linux-lvm] Possible bug in thin metadata size with Linux MDRAID Gionatan Danti
@ 2017-03-08 18:55 ` Zdenek Kabelac
  2017-03-09 11:24   ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-08 18:55 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti

Dne 8.3.2017 v 17:14 Gionatan Danti napsal(a):
> Hi list,
> I would like to understand if this is a lvmthin metadata size bug of if I am
> simply missing something.
>
> These are my system specs:
> - CentOS 7.3 64 bit with kernel 3.10.0-514.6.1.el7
> - LVM version 2.02.166-1.el7_3.2
> - two linux software RAID device, md127 (root) and md126 (storage)
>
> MD array specs (the interesting one is md126)
> Personalities : [raid10]
> md126 : active raid10 sdd2[3] sda3[0] sdb2[1] sdc2[2]
>       557632000 blocks super 1.2 128K chunks 2 near-copies [4/4] [UUUU]
>       bitmap: 1/5 pages [4KB], 65536KB chunk
>
> md127 : active raid10 sdc1[2] sda2[0] sdd1[3] sdb1[1]
>       67178496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>       bitmap: 0/1 pages [0KB], 65536KB chunk
>
> As you can see, /dev/md126 has a 128KB chunk size. I used this device to host
> a physical volume and volume group on which I created a thinpool of 512GB.
> Then, I create a thin logical volume of the same size (512 GB) and started to
> fill it. Somewhere near (but not at) the full capacity, I saw the volume
> offline due to metadata exhaustion.
>
> Let see how the logical volume was created and how it appear:
> [root@blackhole ]# lvcreate --thin vg_kvm/thinpool -L 512G; lvs -a -o +chunk_size
>   Using default stripesize 64.00 KiB.
>   Logical volume "thinpool" created.
>   LV               VG        Attr       LSize   Pool Origin Data% Meta%  Move
> Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                  0
>   thinpool         vg_kvm    twi-a-tz-- 512.00g             0.00   0.83
>                             128.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 512.00g
>                                  0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                  0
>   root             vg_system -wi-ao----  50.00g
>                                  0
>   swap             vg_system -wi-ao----   7.62g
>                                  0
>
> The metadata volume is quite smaller (~2x) than I expected, and not big enough
> to reach 100% data utilization. Indeed, thin_metadata_size show a minimum
> metadata volume size of over 130 MB:
>
> [root@blackhole ]# thin_metadata_size -b 128k -s 512g -m 1 -u m
> thin_metadata_size - 130.04 mebibytes estimated metadata area size for
> "--block-size=128kibibytes --pool-size=512gibibytes --max-thins=1"
>
> Now, the interesting thing: by explicitly setting --chunksize=128, the
> metadata volume is 2x bigger (and in line with my expectations):

Hi

If you do NOT specify any setting - lvm2 targets 128M metadata size.

If you specify '--chunksize'  lvm2 tries to find better fit and it happens
to be slightly better with 256M metadata size.

Basically - you could specify anything to the last bit - and if you don't lvm2 
does a little 'magic' and tries to come with 'reasonable' defaults for given 
kernel and time.

That said - I've in my git tree some rework of this code - mainly for better 
support of metadata profiles.
(And my git calculation gives me 256K chunksize + 128M metadata size - so 
there was possibly something not completely right in version 166)


> Why I saw two very different metadata volume sizes? Chunksize was 128 KB in
> both cases; the only difference is that I explicitly specified it on the
> command line...

You should NOT forget - that using 'thin-pool' without any monitoring and 
automatic resize is somewhat 'dangerous'.

So while lvm2 is not (ATM) enforcing automatic resize when data or metadata 
space has reached predefined threshold  - I'd highly recommnend to use it.

Upcoming version 169 will provide even support for 'external tool' to be 
called when threshold levels are surpassed for even more advanced 
configuration options.


Regards

Zdenek


NB. metadata size is not related to mdraid in any way.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-08 18:55 ` Zdenek Kabelac
@ 2017-03-09 11:24   ` Gionatan Danti
  2017-03-09 11:53     ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-09 11:24 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 08/03/2017 19:55, Zdenek Kabelac wrote:
>
> Hi
>
> If you do NOT specify any setting - lvm2 targets 128M metadata size.
>
> If you specify '--chunksize'  lvm2 tries to find better fit and it happens
> to be slightly better with 256M metadata size.
>
> Basically - you could specify anything to the last bit - and if you
> don't lvm2 does a little 'magic' and tries to come with 'reasonable'
> defaults for given kernel and time.
>
> That said - I've in my git tree some rework of this code - mainly for
> better support of metadata profiles.
> (And my git calculation gives me 256K chunksize + 128M metadata size -
> so there was possibly something not completely right in version 166)
>
>

256 KB chunksize would be perfectly reasonable

>> Why I saw two very different metadata volume sizes? Chunksize was 128
>> KB in
>> both cases; the only difference is that I explicitly specified it on the
>> command line...
>
> You should NOT forget - that using 'thin-pool' without any monitoring
> and automatic resize is somewhat 'dangerous'.
>

True, but I should have no problem if not using snapshot or 
overprovisioning - ie when all data chunks are allocated (filesystem 
full) but no overprovisioned. This time, however, the created metadata 
pool was *insufficient* to even address the provisioned data chunks.

> So while lvm2 is not (ATM) enforcing automatic resize when data or
> metadata space has reached predefined threshold  - I'd highly recommnend
> to use it.
>
> Upcoming version 169 will provide even support for 'external tool' to be
> called when threshold levels are surpassed for even more advanced
> configuration options.
>
>
> Regards
>
> Zdenek
>
>
> NB. metadata size is not related to mdraid in any way.
>
>
>

I am under impression that 128 KB size was chosen because this was MD 
chunk size. Indeed further tests seem to confirm this.

WITH 128 KB MD CHUNK SIZE:
[root@gdanti-laptop test]# mdadm --create md127 --level=raid10 
--assume-clean --chunk=128 --raid-devices=4 /dev/loop0 /dev/loop1 
/dev/loop2 /dev/loop3

[root@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm 
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G

[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                  0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   0.80 
                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                  0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                  0
   root             vg_system -wi-ao----  50.00g 
                                  0
   swap             vg_system -wi-ao----   3.75g 
                                  0


WITH 256 KB MD CHUNK SIZE:
[root@gdanti-laptop test]# mdadm --create md127 --level=raid10 
--assume-clean --chunk=256 --raid-devices=4 /dev/loop0 /dev/loop1 
/dev/loop2 /dev/loop3

[root@gdanti-laptop test]# pvcreate /dev/md127; vgcreate vg_kvm 
/dev/md127; lvcreate --thin vg_kvm --name thinpool -L 500G

[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                  0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   0.42 
                             256.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                  0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                  0
   root             vg_system -wi-ao----  50.00g 
                                  0
   swap             vg_system -wi-ao----   3.75g 
                                  0


So it seems MD chunk size has a strong influence on LVM thin chunk choice.

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-09 11:24   ` Gionatan Danti
@ 2017-03-09 11:53     ` Zdenek Kabelac
  2017-03-09 15:33       ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-09 11:53 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti

Dne 9.3.2017 v 12:24 Gionatan Danti napsal(a):
> On 08/03/2017 19:55, Zdenek Kabelac wrote:
>>
>> Hi
>>
>> If you do NOT specify any setting - lvm2 targets 128M metadata size.
>>
>> If you specify '--chunksize'  lvm2 tries to find better fit and it happens
>> to be slightly better with 256M metadata size.
>>
>> Basically - you could specify anything to the last bit - and if you
>> don't lvm2 does a little 'magic' and tries to come with 'reasonable'
>> defaults for given kernel and time.
>>
>> That said - I've in my git tree some rework of this code - mainly for
>> better support of metadata profiles.
>> (And my git calculation gives me 256K chunksize + 128M metadata size -
>> so there was possibly something not completely right in version 166)
>>
>>
>
> 256 KB chunksize would be perfectly reasonable
>
>>> Why I saw two very different metadata volume sizes? Chunksize was 128
>>> KB in
>>> both cases; the only difference is that I explicitly specified it on the
>>> command line...
>>
>> You should NOT forget - that using 'thin-pool' without any monitoring
>> and automatic resize is somewhat 'dangerous'.
>>
>
> True, but I should have no problem if not using snapshot or overprovisioning -
> ie when all data chunks are allocated (filesystem full) but no
> overprovisioned. This time, however, the created metadata pool was
> *insufficient* to even address the provisioned data chunks.

Hmm - it would be interesting to see your 'metadata' -  it should be still
quite good fit 128M of metadata for 512G  when you are not using snapshots.

What's been your actual test scenario ?? (Lots of LVs??)

But as said - there is no guarantee of the size to fit for any possible use 
case - user  is supposed to understand what kind of technology he is using,
and when he 'opt-out' from automatic resize - he needs to deploy his own
monitoring.

Otherwise you would have to simply always create 16G metadata LV if you do not 
want to run out of metadata space.


> I am under impression that 128 KB size was chosen because this was MD chunk
> size. Indeed further tests seem to confirm this.

Ahh yeah - there was small issue - when the 'hint' for device geometry was 
used it has started from 'default' 64K size - instead of already counted 256K 
chunk size.


Regards

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-09 11:53     ` Zdenek Kabelac
@ 2017-03-09 15:33       ` Gionatan Danti
  2017-03-20  9:47         ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-09 15:33 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 09/03/2017 12:53, Zdenek Kabelac wrote:
>
> Hmm - it would be interesting to see your 'metadata' -  it should be still
> quite good fit 128M of metadata for 512G  when you are not using snapshots.
>
> What's been your actual test scenario ?? (Lots of LVs??)
>

Nothing unusual - I had a single thinvol with an XFS filesystem used to 
store an HDD image gathered using ddrescue.

Anyway, are you sure that a 128 MB metadata volume is "quite good" for a 
512GB volume with 128 KB chunks? My testing suggests something 
different. For example, give it a look at this empty thinpool/thinvol:

[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool     Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                      0
   thinpool         vg_kvm    twi-aotz-- 500.00g                 0.00 
0.81                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                      0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                      0
   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        0.00 
                                      0
   root             vg_system -wi-ao----  50.00g 
                                      0
   swap             vg_system -wi-ao----   3.75g 
                                      0

As you can see, as it is a empty volume, metadata is at only 0.81% Let 
write 5 GB (1% of thin data volume):

[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool     Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                      0
   thinpool         vg_kvm    twi-aotz-- 500.00g                 1.00 
1.80                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                      0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                      0
   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        1.00 
                                      0
   root             vg_system -wi-ao----  50.00g 
                                      0
   swap             vg_system -wi-ao----   3.75g 
                                      0

Metadata grown by the same 1%. Accounting for the initial 0.81 
utilization, this means that a near full data volume (with *no* 
overprovisionig nor snapshots) will exhaust its metadata *before* really 
becoming 100% full.

While I can absolutely understand that this is expected behavior when 
using snapshots and/or overprovisioning, in this extremely simple case 
metadata should not be exhausted before data. In other words, the 
initial metadata creation process should be *at least* consider that a 
plain volume can be 100% full, and allocate according.

The interesting part is that when not using MD, all is working properly: 
metadata are about 2x their minimal value (as reported by 
thin_metadata_size), and this provide ample buffer for 
snapshotting/overprovisioning. When using MD, the bad iteration between 
RAID chunks and thin metadata chunks ends with a too small metadata volume.

This can become very bad. Give a look at what happens when creating a 
thin pool on a MD raid whose chunks are at 64 KB:

[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                 0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58 
                             64.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                 0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                 0
   root             vg_system -wi-ao----  50.00g 
                                 0
   swap             vg_system -wi-ao----   3.75g 
                                 0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata 
volume size. Now metadata can only address ~50% of thin volume space.

> But as said - there is no guarantee of the size to fit for any possible
> use case - user  is supposed to understand what kind of technology he is
> using,
> and when he 'opt-out' from automatic resize - he needs to deploy his own
> monitoring.

True, but this trivial case should really works without 
tuning/monitoring. In short, let fail gracefully on a simple case...
>
> Otherwise you would have to simply always create 16G metadata LV if you
> do not want to run out of metadata space.
>
>

Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-09 15:33       ` Gionatan Danti
@ 2017-03-20  9:47         ` Gionatan Danti
  2017-03-20  9:51           ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20  9:47 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

Hi all,
any comments on the report below?

Thanks.

On 09/03/2017 16:33, Gionatan Danti wrote:
> On 09/03/2017 12:53, Zdenek Kabelac wrote:
>>
>> Hmm - it would be interesting to see your 'metadata' -  it should be
>> still
>> quite good fit 128M of metadata for 512G  when you are not using
>> snapshots.
>>
>> What's been your actual test scenario ?? (Lots of LVs??)
>>
>
> Nothing unusual - I had a single thinvol with an XFS filesystem used to
> store an HDD image gathered using ddrescue.
>
> Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
> 512GB volume with 128 KB chunks? My testing suggests something
> different. For example, give it a look at this empty thinpool/thinvol:
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>   LV               VG        Attr       LSize   Pool     Origin Data%
> Meta%  Move Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                      0
>   thinpool         vg_kvm    twi-aotz-- 500.00g                 0.00
> 0.81                             128.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>                                      0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                      0
>   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        0.00
>                                      0
>   root             vg_system -wi-ao----  50.00g
>                                      0
>   swap             vg_system -wi-ao----   3.75g
>                                      0
>
> As you can see, as it is a empty volume, metadata is at only 0.81% Let
> write 5 GB (1% of thin data volume):
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>   LV               VG        Attr       LSize   Pool     Origin Data%
> Meta%  Move Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                      0
>   thinpool         vg_kvm    twi-aotz-- 500.00g                 1.00
> 1.80                             128.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>                                      0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                      0
>   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        1.00
>                                      0
>   root             vg_system -wi-ao----  50.00g
>                                      0
>   swap             vg_system -wi-ao----   3.75g
>                                      0
>
> Metadata grown by the same 1%. Accounting for the initial 0.81
> utilization, this means that a near full data volume (with *no*
> overprovisionig nor snapshots) will exhaust its metadata *before* really
> becoming 100% full.
>
> While I can absolutely understand that this is expected behavior when
> using snapshots and/or overprovisioning, in this extremely simple case
> metadata should not be exhausted before data. In other words, the
> initial metadata creation process should be *at least* consider that a
> plain volume can be 100% full, and allocate according.
>
> The interesting part is that when not using MD, all is working properly:
> metadata are about 2x their minimal value (as reported by
> thin_metadata_size), and this provide ample buffer for
> snapshotting/overprovisioning. When using MD, the bad iteration between
> RAID chunks and thin metadata chunks ends with a too small metadata volume.
>
> This can become very bad. Give a look at what happens when creating a
> thin pool on a MD raid whose chunks are at 64 KB:
>
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>   LV               VG        Attr       LSize   Pool Origin Data% Meta%
> Move Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                 0
>   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58
>                             64.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>                                 0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                 0
>   root             vg_system -wi-ao----  50.00g
>                                 0
>   swap             vg_system -wi-ao----   3.75g
>                                 0
>
> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
> volume size. Now metadata can only address ~50% of thin volume space.
>
>> But as said - there is no guarantee of the size to fit for any possible
>> use case - user  is supposed to understand what kind of technology he is
>> using,
>> and when he 'opt-out' from automatic resize - he needs to deploy his own
>> monitoring.
>
> True, but this trivial case should really works without
> tuning/monitoring. In short, let fail gracefully on a simple case...
>>
>> Otherwise you would have to simply always create 16G metadata LV if you
>> do not want to run out of metadata space.
>>
>>
>
> Absolutely true. I've written this email to report a bug, indeed ;)
> Thank you all for this outstanding work.
>

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20  9:47         ` Gionatan Danti
@ 2017-03-20  9:51           ` Zdenek Kabelac
  2017-03-20 10:45             ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20  9:51 UTC (permalink / raw)
  To: LVM general discussion and development, Gionatan Danti

Dne 20.3.2017 v 10:47 Gionatan Danti napsal(a):
> Hi all,
> any comments on the report below?
>
> Thanks.

Please check upstream behavior (git HEAD)
It will still take a while before final release so do not use it
regularly yet (as few things still may  change).

Not sure for which other comment you look for.

Zdenek



>
> On 09/03/2017 16:33, Gionatan Danti wrote:
>> On 09/03/2017 12:53, Zdenek Kabelac wrote:
>>>
>>> Hmm - it would be interesting to see your 'metadata' -  it should be
>>> still
>>> quite good fit 128M of metadata for 512G  when you are not using
>>> snapshots.
>>>
>>> What's been your actual test scenario ?? (Lots of LVs??)
>>>
>>
>> Nothing unusual - I had a single thinvol with an XFS filesystem used to
>> store an HDD image gathered using ddrescue.
>>
>> Anyway, are you sure that a 128 MB metadata volume is "quite good" for a
>> 512GB volume with 128 KB chunks? My testing suggests something
>> different. For example, give it a look at this empty thinpool/thinvol:
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>>   LV               VG        Attr       LSize   Pool     Origin Data%
>> Meta%  Move Log Cpy%Sync Convert Chunk
>>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>>                                      0
>>   thinpool         vg_kvm    twi-aotz-- 500.00g                 0.00
>> 0.81                             128.00k
>>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>>                                      0
>>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>>                                      0
>>   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        0.00
>>                                      0
>>   root             vg_system -wi-ao----  50.00g
>>                                      0
>>   swap             vg_system -wi-ao----   3.75g
>>                                      0
>>
>> As you can see, as it is a empty volume, metadata is at only 0.81% Let
>> write 5 GB (1% of thin data volume):
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>>   LV               VG        Attr       LSize   Pool     Origin Data%
>> Meta%  Move Log Cpy%Sync Convert Chunk
>>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>>                                      0
>>   thinpool         vg_kvm    twi-aotz-- 500.00g                 1.00
>> 1.80                             128.00k
>>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>>                                      0
>>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>>                                      0
>>   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        1.00
>>                                      0
>>   root             vg_system -wi-ao----  50.00g
>>                                      0
>>   swap             vg_system -wi-ao----   3.75g
>>                                      0
>>
>> Metadata grown by the same 1%. Accounting for the initial 0.81
>> utilization, this means that a near full data volume (with *no*
>> overprovisionig nor snapshots) will exhaust its metadata *before* really
>> becoming 100% full.
>>
>> While I can absolutely understand that this is expected behavior when
>> using snapshots and/or overprovisioning, in this extremely simple case
>> metadata should not be exhausted before data. In other words, the
>> initial metadata creation process should be *at least* consider that a
>> plain volume can be 100% full, and allocate according.
>>
>> The interesting part is that when not using MD, all is working properly:
>> metadata are about 2x their minimal value (as reported by
>> thin_metadata_size), and this provide ample buffer for
>> snapshotting/overprovisioning. When using MD, the bad iteration between
>> RAID chunks and thin metadata chunks ends with a too small metadata volume.
>>
>> This can become very bad. Give a look at what happens when creating a
>> thin pool on a MD raid whose chunks are at 64 KB:
>>
>> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>>   LV               VG        Attr       LSize   Pool Origin Data% Meta%
>> Move Log Cpy%Sync Convert Chunk
>>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>>                                 0
>>   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58
>>                             64.00k
>>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>>                                 0
>>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>>                                 0
>>   root             vg_system -wi-ao----  50.00g
>>                                 0
>>   swap             vg_system -wi-ao----   3.75g
>>                                 0
>>
>> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
>> volume size. Now metadata can only address ~50% of thin volume space.
>>
>>> But as said - there is no guarantee of the size to fit for any possible
>>> use case - user  is supposed to understand what kind of technology he is
>>> using,
>>> and when he 'opt-out' from automatic resize - he needs to deploy his own
>>> monitoring.
>>
>> True, but this trivial case should really works without
>> tuning/monitoring. In short, let fail gracefully on a simple case...
>>>
>>> Otherwise you would have to simply always create 16G metadata LV if you
>>> do not want to run out of metadata space.
>>>
>>>
>>
>> Absolutely true. I've written this email to report a bug, indeed ;)
>> Thank you all for this outstanding work.
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20  9:51           ` Zdenek Kabelac
@ 2017-03-20 10:45             ` Gionatan Danti
  2017-03-20 11:01               ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 10:45 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 20/03/2017 10:51, Zdenek Kabelac wrote:
>
> Please check upstream behavior (git HEAD)
> It will still take a while before final release so do not use it
> regularly yet (as few things still may  change).

I will surely try with git head and report back here.

>
> Not sure for which other comment you look for.
>
> Zdenek
>
>
>

1. you suggested that a 128 MB metadata volume is "quite good" for a 
512GB volume and 128KB chunkgs. However, my tests show that a near full 
data volume (with *no* overprovisionig nor snapshots) will exhaust its 
metadata *before* really becoming 100% full.

2. On a MD RAID with 64KB chunk size, things become much worse:
[root@gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool Origin Data% Meta%
Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
                                 0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58
                             64.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
                                 0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
                                 0
   root             vg_system -wi-ao----  50.00g
                                 0
   swap             vg_system -wi-ao----   3.75g
                                 0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
volume size. Now metadata can only address ~50% of thin volume space.

So, I am missing something or the RHEL 7.3-provided LVM has some serious 
problems identifing correct metadata volume size when running on top of 
a MD RAID device?

Thanks.


-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20 10:45             ` Gionatan Danti
@ 2017-03-20 11:01               ` Zdenek Kabelac
  2017-03-20 11:52                 ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20 11:01 UTC (permalink / raw)
  To: Gionatan Danti, LVM general discussion and development

Dne 20.3.2017 v 11:45 Gionatan Danti napsal(a):
> On 20/03/2017 10:51, Zdenek Kabelac wrote:
>>
>> Please check upstream behavior (git HEAD)
>> It will still take a while before final release so do not use it
>> regularly yet (as few things still may  change).
>
> I will surely try with git head and report back here.
>
>>
>> Not sure for which other comment you look for.
>>
>> Zdenek
>>
>>
>>
>
> 1. you suggested that a 128 MB metadata volume is "quite good" for a 512GB
> volume and 128KB chunkgs. However, my tests show that a near full data volume
> (with *no* overprovisionig nor snapshots) will exhaust its metadata *before*
> really becoming 100% full.
>
> 2. On a MD RAID with 64KB chunk size, things become much worse:
> [root@gdanti-laptop test]# lvs -a -o +chunk_size
>   LV               VG        Attr       LSize   Pool Origin Data% Meta%
> Move Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                 0
>   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58
>                             64.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>                                 0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                 0
>   root             vg_system -wi-ao----  50.00g
>                                 0
>   swap             vg_system -wi-ao----   3.75g
>                                 0
>
> Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata
> volume size. Now metadata can only address ~50% of thin volume space.
>
> So, I am missing something or the RHEL 7.3-provided LVM has some serious
> problems identifing correct metadata volume size when running on top of a MD
> RAID device?


As said - please try with HEAD - and report back if you still see a problem.
There were couple issue fixed along this path.

In my test it seems  500G needs at least 258M with 64K chunksize.

On the other hand - it's never been documented that thin-pool without 
monitoring is supposed to fit single LV AFAIK - it's basically needed that 
user knows what he is using when he uses  thin-provisioning  - but of course 
we continuously try to improve things to be more usable.

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20 11:01               ` Zdenek Kabelac
@ 2017-03-20 11:52                 ` Gionatan Danti
  2017-03-20 13:57                   ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 11:52 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

On 20/03/2017 12:01, Zdenek Kabelac wrote:
>
>
> As said - please try with HEAD - and report back if you still see a
> problem.
> There were couple issue fixed along this path.
>

Ok, I tried now with tools and library from git:

LVM version:     2.02.169(2)-git (2016-11-30)
Library version: 1.02.138-git (2016-11-30)
Driver version:  4.34.0

I can confirm that now thin chunk size is no more bound (by default) by 
MD RAID chunk. For example, having created a ~500 GB MD RAID 10 array 
with 64 KB chunks, creating a thinpool shows that:

[root@blackhole ~]# lvcreate --thinpool vg_kvm/thinpool -L 500G
[root@blackhole ~]# lvs -a -o +chunk_size
   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                  0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   0.42 
                             256.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                  0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                  0
   root             vg_system -wi-ao----  50.00g 
                                  0
   swap             vg_system -wi-a-----   7.62g

Should I open a bug against the RHEL-provided packages?
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20 11:52                 ` Gionatan Danti
@ 2017-03-20 13:57                   ` Zdenek Kabelac
  2017-03-20 14:25                     ` Gionatan Danti
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2017-03-20 13:57 UTC (permalink / raw)
  To: Gionatan Danti, LVM general discussion and development

Dne 20.3.2017 v 12:52 Gionatan Danti napsal(a):
> On 20/03/2017 12:01, Zdenek Kabelac wrote:
>>
>>
>> As said - please try with HEAD - and report back if you still see a
>> problem.
>> There were couple issue fixed along this path.
>>
>
> Ok, I tried now with tools and library from git:
>
> LVM version:     2.02.169(2)-git (2016-11-30)
> Library version: 1.02.138-git (2016-11-30)
> Driver version:  4.34.0
>
> I can confirm that now thin chunk size is no more bound (by default) by MD
> RAID chunk. For example, having created a ~500 GB MD RAID 10 array with 64 KB
> chunks, creating a thinpool shows that:
>
> [root@blackhole ~]# lvcreate --thinpool vg_kvm/thinpool -L 500G
> [root@blackhole ~]# lvs -a -o +chunk_size
>   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
>   LV               VG        Attr       LSize   Pool Origin Data% Meta%  Move
> Log Cpy%Sync Convert Chunk
>   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m
>                                  0
>   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   0.42
>                             256.00k
>   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g
>                                  0
>   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m
>                                  0
>   root             vg_system -wi-ao----  50.00g
>                                  0
>   swap             vg_system -wi-a-----   7.62g
>
> Should I open a bug against the RHEL-provided packages?


Well if you want to get support for your existing packages - you would
need to go via 'GSS' channel.

You may open BZ - which will get closed with next release of RHEL7.4
(as you already confirmed upstream has resolved the issue).

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [linux-lvm] Possible bug in thin metadata size with Linux MDRAID
  2017-03-20 13:57                   ` Zdenek Kabelac
@ 2017-03-20 14:25                     ` Gionatan Danti
  0 siblings, 0 replies; 12+ messages in thread
From: Gionatan Danti @ 2017-03-20 14:25 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development


On 20/03/2017 14:57, Zdenek Kabelac wrote:
>
>
> Well if you want to get support for your existing packages - you would
> need to go via 'GSS' channel.
>

Sorry, but what do you means for "GSS channel"?

> You may open BZ - which will get closed with next release of RHEL7.4
> (as you already confirmed upstream has resolved the issue).
>
> Zdenek
>

I'll surely do that.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-20 14:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-08 16:14 [linux-lvm] Possible bug in thin metadata size with Linux MDRAID Gionatan Danti
2017-03-08 18:55 ` Zdenek Kabelac
2017-03-09 11:24   ` Gionatan Danti
2017-03-09 11:53     ` Zdenek Kabelac
2017-03-09 15:33       ` Gionatan Danti
2017-03-20  9:47         ` Gionatan Danti
2017-03-20  9:51           ` Zdenek Kabelac
2017-03-20 10:45             ` Gionatan Danti
2017-03-20 11:01               ` Zdenek Kabelac
2017-03-20 11:52                 ` Gionatan Danti
2017-03-20 13:57                   ` Zdenek Kabelac
2017-03-20 14:25                     ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).