linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] Lvm think provisioning query
@ 2016-04-27 12:33 Bhasker C V
  2016-04-27 14:33 ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Bhasker C V @ 2016-04-27 12:33 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 3038 bytes --]

Hi,

 I am starting with investigating about the lvm thin provisioning
 (repeat post from
https://lists.debian.org/debian-user/2016/04/msg00852.html )
 (apologies for html mail)

 I have done the following

1.Create a PV
vdb    252:16   0   10G  0 disk
├─vdb1 252:17   0  100M  0 part
└─vdb2 252:18   0  9.9G  0 part
root@vmm-deb:~# pvcreate /dev/vdb1
  Physical volume "/dev/vdb1" successfully created.
root@vmm-deb:~# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/vdb1       lvm2 ---  100.00m 100.00m

2. create a VG
root@vmm-deb:~# vgcreate virtp /dev/vdb1
  Volume group "virtp" successfully created
root@vmm-deb:~# vgs
  VG    #PV #LV #SN Attr   VSize  VFree
  virtp   1   0   0 wz--n- 96.00m 96.00m

3. create a lv pool  and a over-provisioned volume inside it
root@vmm-deb:~# lvcreate -n virtpool -T virtp/virtpool -L40M
  Logical volume "virtpool" created.
root@vmm-deb:~# lvs
  LV       VG    Attr       LSize  Pool Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  virtpool virtp twi-a-tz-- 40.00m             0.00   0.88

root@vmm-deb:~# lvcreate  -V1G -T virtp/virtpool -n vol01
  WARNING: Sum of all thin volume sizes (1.00 GiB) exceeds the size of thin
pool virtp/virtpool and the size of whole volume group (96.00 MiB)!
  For thin pool auto extension activation/thin_pool_autoextend_threshold
should be below 100.
  Logical volume "vol01" created.
root@vmm-deb:~# lvs
  LV       VG    Attr       LSize  Pool     Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  virtpool virtp twi-aotz-- 40.00m                 0.00   0.98

  vol01    virtp Vwi-a-tz--  1.00g virtpool        0.00


---------- Now the operations
# dd if=/dev/urandom of=./fil status=progress
90532864 bytes (91 MB, 86 MiB) copied, 6.00005 s, 15.1 MB/s^C
188706+0 records in
188705+0 records out
96616960 bytes (97 MB, 92 MiB) copied, 6.42704 s, 15.0 MB/s

# df -h .
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/virtp-vol01  976M   95M  815M  11% /tmp/x
# sync
# cd ..
root@vmm-deb:/tmp# umount x
root@vmm-deb:/tmp# fsck.ext4 -f -C0  /dev/virtp/vol01
e2fsck 1.43-WIP (15-Mar-2016)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/virtp/vol01: 12/65536 files (8.3% non-contiguous), 36544/262144 blocks


<mount>
# du -hs fil
93M    fil

# dd if=./fil of=/dev/null status=progress
188705+0 records in
188705+0 records out
96616960 bytes (97 MB, 92 MiB) copied, 0.149194 s, 648 MB/s


# vgs
  VG    #PV #LV #SN Attr   VSize  VFree
  virtp   1   2   0 wz--n- 96.00m 48.00m

Definetly the file is occupying 90+MB.

What i expect is that the pool is 40M and the file must NOT exceed 40M.
Where does the file get 93M space ?
I know the VG is 96M but the pool created was max 40M (also VG still says
48M free). Is the file exceeding the boundaries ?
or am I doing anything wrong ?

[-- Attachment #2: Type: text/html, Size: 3705 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-04-27 12:33 [linux-lvm] Lvm think provisioning query Bhasker C V
@ 2016-04-27 14:33 ` Zdenek Kabelac
  2016-04-28 14:36   ` Bhasker C V
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2016-04-27 14:33 UTC (permalink / raw)
  To: linux-lvm

On 27.4.2016 14:33, Bhasker C V wrote:
> Hi,
>
>   I am starting with investigating about the lvm thin provisioning
>   (repeat post from https://lists.debian.org/debian-user/2016/04/msg00852.html )
>   (apologies for html mail)
>
>   I have done the following
>
> 1.Create a PV
> vdb    252:16   0   10G  0 disk
> ├─vdb1 252:17   0  100M  0 part
> └─vdb2 252:18   0  9.9G  0 part
> root@vmm-deb:~# pvcreate /dev/vdb1
>    Physical volume "/dev/vdb1" successfully created.
> root@vmm-deb:~# pvs
>    PV         VG   Fmt  Attr PSize   PFree
>    /dev/vdb1       lvm2 ---  100.00m 100.00m
>
> 2. create a VG
> root@vmm-deb:~# vgcreate virtp /dev/vdb1
>    Volume group "virtp" successfully created
> root@vmm-deb:~# vgs
>    VG    #PV #LV #SN Attr   VSize  VFree
>    virtp   1   0   0 wz--n- 96.00m 96.00m
>
> 3. create a lv pool  and a over-provisioned volume inside it
> root@vmm-deb:~# lvcreate -n virtpool -T virtp/virtpool -L40M
>    Logical volume "virtpool" created.
> root@vmm-deb:~# lvs
>    LV       VG    Attr       LSize  Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>    virtpool virtp twi-a-tz-- 40.00m             0.00   0.88
> root@vmm-deb:~# lvcreate  -V1G -T virtp/virtpool -n vol01
>    WARNING: Sum of all thin volume sizes (1.00 GiB) exceeds the size of thin
> pool virtp/virtpool and the size of whole volume group (96.00 MiB)!
>    For thin pool auto extension activation/thin_pool_autoextend_threshold
> should be below 100.
>    Logical volume "vol01" created.
> root@vmm-deb:~# lvs
>    LV       VG    Attr       LSize  Pool     Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>    virtpool virtp twi-aotz-- 40.00m                 0.00   0.98
>    vol01    virtp Vwi-a-tz--  1.00g virtpool        0.00
>
> ---------- Now the operations
> # dd if=/dev/urandom of=./fil status=progress
> 90532864 bytes (91 MB, 86 MiB) copied, 6.00005 s, 15.1 MB/s^C
> 188706+0 records in
> 188705+0 records out
> 96616960 bytes (97 MB, 92 MiB) copied, 6.42704 s, 15.0 MB/s
>
> # df -h .
> Filesystem               Size  Used Avail Use% Mounted on
> /dev/mapper/virtp-vol01  976M   95M  815M  11% /tmp/x
> # sync
> # cd ..
> root@vmm-deb:/tmp# umount x
> root@vmm-deb:/tmp# fsck.ext4 -f -C0  /dev/virtp/vol01
> e2fsck 1.43-WIP (15-Mar-2016)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/virtp/vol01: 12/65536 files (8.3% non-contiguous), 36544/262144 blocks
>
> <mount>
> # du -hs fil
> 93M    fil
>
> # dd if=./fil of=/dev/null status=progress
> 188705+0 records in
> 188705+0 records out
> 96616960 bytes (97 MB, 92 MiB) copied, 0.149194 s, 648 MB/s
>
>
> # vgs
>    VG    #PV #LV #SN Attr   VSize  VFree
>    virtp   1   2   0 wz--n- 96.00m 48.00m
>
> Definetly the file is occupying 90+MB.
>
> What i expect is that the pool is 40M and the file must NOT exceed 40M. Where
> does the file get 93M space ?
> I know the VG is 96M but the pool created was max 40M (also VG still says 48M
> free). Is the file exceeding the boundaries ?
> or am I doing anything wrong ?
>


Hi

Answer is simple ->  nowhere - they are simply lost - check your kernel dmesg 
log - you will spot lost of async write error.
(page cache is tricky here...    - dd ends just in page-cache which is later 
asynchronously sync to disk)

There is also 60s delay before thin-pool target starts to error all queued 
write operations if there is not enough space in pool.

So whenever you write something and you want to be 100% 'sure' it landed on 
disk you have to 'sync'  your writes.

i.e.
dd if=/dev/urandom of=./fil status=progress  conf=fsync

and if you want to know 'exactly' what's the error place -

dd if=/dev/urandom of=./fil status=progress oflags=direct

Regards

Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-04-27 14:33 ` Zdenek Kabelac
@ 2016-04-28 14:36   ` Bhasker C V
  2016-04-29  8:13     ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Bhasker C V @ 2016-04-28 14:36 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 4893 bytes --]

Zdenek,
 Thanks. Here I am just filling it up with random data and so I am not
concerned about data integrity
 You are right, I did get page lost during write errors in the kernel

The question however is even after reboot and doing several fsck of the
ext4fs the file size "occupied" is more than the pool size. How is this ?
I agree that data may be corrupted, but there *is* some data and this must
be saved somewhere. Why is this "somewhere" exceeding the pool size ?


On Wed, Apr 27, 2016 at 4:33 PM, Zdenek Kabelac <zkabelac@redhat.com> wrote:

> On 27.4.2016 14:33, Bhasker C V wrote:
>
>> Hi,
>>
>>   I am starting with investigating about the lvm thin provisioning
>>   (repeat post from
>> https://lists.debian.org/debian-user/2016/04/msg00852.html )
>>   (apologies for html mail)
>>
>>   I have done the following
>>
>> 1.Create a PV
>> vdb    252:16   0   10G  0 disk
>> ├─vdb1 252:17   0  100M  0 part
>> └─vdb2 252:18   0  9.9G  0 part
>> root@vmm-deb:~# pvcreate /dev/vdb1
>>    Physical volume "/dev/vdb1" successfully created.
>> root@vmm-deb:~# pvs
>>    PV         VG   Fmt  Attr PSize   PFree
>>    /dev/vdb1       lvm2 ---  100.00m 100.00m
>>
>> 2. create a VG
>> root@vmm-deb:~# vgcreate virtp /dev/vdb1
>>    Volume group "virtp" successfully created
>> root@vmm-deb:~# vgs
>>    VG    #PV #LV #SN Attr   VSize  VFree
>>    virtp   1   0   0 wz--n- 96.00m 96.00m
>>
>> 3. create a lv pool  and a over-provisioned volume inside it
>> root@vmm-deb:~# lvcreate -n virtpool -T virtp/virtpool -L40M
>>    Logical volume "virtpool" created.
>> root@vmm-deb:~# lvs
>>    LV       VG    Attr       LSize  Pool Origin Data%  Meta%  Move Log
>> Cpy%Sync Convert
>>    virtpool virtp twi-a-tz-- 40.00m             0.00   0.88
>> root@vmm-deb:~# lvcreate  -V1G -T virtp/virtpool -n vol01
>>    WARNING: Sum of all thin volume sizes (1.00 GiB) exceeds the size of
>> thin
>> pool virtp/virtpool and the size of whole volume group (96.00 MiB)!
>>    For thin pool auto extension activation/thin_pool_autoextend_threshold
>> should be below 100.
>>    Logical volume "vol01" created.
>> root@vmm-deb:~# lvs
>>    LV       VG    Attr       LSize  Pool     Origin Data%  Meta%  Move Log
>> Cpy%Sync Convert
>>    virtpool virtp twi-aotz-- 40.00m                 0.00   0.98
>>    vol01    virtp Vwi-a-tz--  1.00g virtpool        0.00
>>
>> ---------- Now the operations
>> # dd if=/dev/urandom of=./fil status=progress
>> 90532864 bytes (91 MB, 86 MiB) copied, 6.00005 s, 15.1 MB/s^C
>> 188706+0 records in
>> 188705+0 records out
>> 96616960 bytes (97 MB, 92 MiB) copied, 6.42704 s, 15.0 MB/s
>>
>> # df -h .
>> Filesystem               Size  Used Avail Use% Mounted on
>> /dev/mapper/virtp-vol01  976M   95M  815M  11% /tmp/x
>> # sync
>> # cd ..
>> root@vmm-deb:/tmp# umount x
>> root@vmm-deb:/tmp# fsck.ext4 -f -C0  /dev/virtp/vol01
>> e2fsck 1.43-WIP (15-Mar-2016)
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> /dev/virtp/vol01: 12/65536 files (8.3% non-contiguous), 36544/262144
>> blocks
>>
>> <mount>
>> # du -hs fil
>> 93M    fil
>>
>> # dd if=./fil of=/dev/null status=progress
>> 188705+0 records in
>> 188705+0 records out
>> 96616960 bytes (97 MB, 92 MiB) copied, 0.149194 s, 648 MB/s
>>
>>
>> # vgs
>>    VG    #PV #LV #SN Attr   VSize  VFree
>>    virtp   1   2   0 wz--n- 96.00m 48.00m
>>
>> Definetly the file is occupying 90+MB.
>>
>> What i expect is that the pool is 40M and the file must NOT exceed 40M.
>> Where
>> does the file get 93M space ?
>> I know the VG is 96M but the pool created was max 40M (also VG still says
>> 48M
>> free). Is the file exceeding the boundaries ?
>> or am I doing anything wrong ?
>>
>>
>
> Hi
>
> Answer is simple ->  nowhere - they are simply lost - check your kernel
> dmesg log - you will spot lost of async write error.
> (page cache is tricky here...    - dd ends just in page-cache which is
> later asynchronously sync to disk)
>
> There is also 60s delay before thin-pool target starts to error all queued
> write operations if there is not enough space in pool.
>
> So whenever you write something and you want to be 100% 'sure' it landed
> on disk you have to 'sync'  your writes.
>
> i.e.
> dd if=/dev/urandom of=./fil status=progress  conf=fsync
>
> and if you want to know 'exactly' what's the error place -
>
> dd if=/dev/urandom of=./fil status=progress oflags=direct
>
> Regards
>
> Zdenek
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[-- Attachment #2: Type: text/html, Size: 6147 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-04-28 14:36   ` Bhasker C V
@ 2016-04-29  8:13     ` Zdenek Kabelac
  2016-05-03  6:59       ` Bhasker C V
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2016-04-29  8:13 UTC (permalink / raw)
  To: LVM general discussion and development

On 28.4.2016 16:36, Bhasker C V wrote:
> Zdenek,
>   Thanks. Here I am just filling it up with random data and so I am not
> concerned about data integrity
>   You are right, I did get page lost during write errors in the kernel
>
> The question however is even after reboot and doing several fsck of the ext4fs
> the file size "occupied" is more than the pool size. How is this ?
> I agree that data may be corrupted, but there *is* some data and this must be
> saved somewhere. Why is this "somewhere" exceeding the pool size ?

Hi

Few key principles -


1. You should always mount extX fs with  errors=remount-ro  (tune2fs,mount)

2. There are few data={} modes ensuring various degree of data integrity,
    An case you really care about data integrity here - switch to 'journal'
    mode at price of lower speed. Default ordered mode might show this.
    (i.e. it's the very same behavior as you would have seen with failing hdd)

3. Do not continue using thin-pool when it's full :)

4. We do miss more configurable policies with thin-pools.
    i.e. do plan to instantiate 'error' target for writes in the case
    pool gets full - so ALL writes will be errored - as of now - writes
    to provisioned blocks may cause further filesystem confusion - that's
    why  'remount-ro' is rather mandatory - xfs is recently being enhanced
    to provide similar logic.


Regards


Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-04-29  8:13     ` Zdenek Kabelac
@ 2016-05-03  6:59       ` Bhasker C V
  2016-05-03  9:54         ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Bhasker C V @ 2016-05-03  6:59 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 2116 bytes --]

Does this mean the ext4 is showing wrong information. The file is reported
being 90+MB but in actuality the size is less in the FS ?
This is quite ok because it is just that file system being affected. I was
however concerned that the file in this FS might have overwritten other LV
data since the file is showing bigger than the volume size.

I will try this using BTRFS.


On Fri, Apr 29, 2016 at 10:13 AM, Zdenek Kabelac <zkabelac@redhat.com>
wrote:

> On 28.4.2016 16:36, Bhasker C V wrote:
>
>> Zdenek,
>>   Thanks. Here I am just filling it up with random data and so I am not
>> concerned about data integrity
>>   You are right, I did get page lost during write errors in the kernel
>>
>> The question however is even after reboot and doing several fsck of the
>> ext4fs
>> the file size "occupied" is more than the pool size. How is this ?
>> I agree that data may be corrupted, but there *is* some data and this
>> must be
>> saved somewhere. Why is this "somewhere" exceeding the pool size ?
>>
>
> Hi
>
> Few key principles -
>
>
> 1. You should always mount extX fs with  errors=remount-ro  (tune2fs,mount)
>
> 2. There are few data={} modes ensuring various degree of data integrity,
>    An case you really care about data integrity here - switch to 'journal'
>    mode at price of lower speed. Default ordered mode might show this.
>    (i.e. it's the very same behavior as you would have seen with failing
> hdd)
>
> 3. Do not continue using thin-pool when it's full :)
>
> 4. We do miss more configurable policies with thin-pools.
>    i.e. do plan to instantiate 'error' target for writes in the case
>    pool gets full - so ALL writes will be errored - as of now - writes
>    to provisioned blocks may cause further filesystem confusion - that's
>    why  'remount-ro' is rather mandatory - xfs is recently being enhanced
>    to provide similar logic.
>
>
>
> Regards
>
>
> Zdenek
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

[-- Attachment #2: Type: text/html, Size: 3088 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03  6:59       ` Bhasker C V
@ 2016-05-03  9:54         ` Zdenek Kabelac
  2016-05-03 12:21           ` Bhasker C V
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2016-05-03  9:54 UTC (permalink / raw)
  To: LVM general discussion and development

On 3.5.2016 08:59, Bhasker C V wrote:
> Does this mean the ext4 is showing wrong information. The file is reported
> being 90+MB but in actuality the size is less in the FS ?
> This is quite ok because it is just that file system being affected. I was
> however concerned that the file in this FS might have overwritten other LV
> data since the file is showing bigger than the volume size.
>

I've no idea what 'ext4' is showing you, but if you have i.e. 100M filesystem 
size, you could still have there e.g. 1TB file. Experience the magic:

'truncate -s 1T myfirst1TBfile'

As you can see 'ext4' is doing it's own over-provisioning with 'hole' files.
The only important bits are:
- is the filesystem consistent ?
- is 'fsck' not reporting any error ?

What's the 'real' size you get with 'du  myfirst1TBfile' or your wrong file ?

Somehow I don't believe you can get  i.e.  90+MB 'du' size with 10MB 
filesystem size and 'fsck' would not report any problem.

> I will try this using BTRFS.

For what exactly ??

Regard

Zdenek




>
>
> On Fri, Apr 29, 2016 at 10:13 AM, Zdenek Kabelac <zkabelac@redhat.com
> <mailto:zkabelac@redhat.com>> wrote:
>
>     On 28.4.2016 16:36, Bhasker C V wrote:
>
>         Zdenek,
>            Thanks. Here I am just filling it up with random data and so I am not
>         concerned about data integrity
>            You are right, I did get page lost during write errors in the kernel
>
>         The question however is even after reboot and doing several fsck of
>         the ext4fs
>         the file size "occupied" is more than the pool size. How is this ?
>         I agree that data may be corrupted, but there *is* some data and this
>         must be
>         saved somewhere. Why is this "somewhere" exceeding the pool size ?
>
>
>     Hi
>
>     Few key principles -
>
>
>     1. You should always mount extX fs with  errors=remount-ro  (tune2fs,mount)
>
>     2. There are few data={} modes ensuring various degree of data integrity,
>         An case you really care about data integrity here - switch to 'journal'
>         mode at price of lower speed. Default ordered mode might show this.
>         (i.e. it's the very same behavior as you would have seen with failing hdd)
>
>     3. Do not continue using thin-pool when it's full :)
>
>     4. We do miss more configurable policies with thin-pools.
>         i.e. do plan to instantiate 'error' target for writes in the case
>         pool gets full - so ALL writes will be errored - as of now - writes
>         to provisioned blocks may cause further filesystem confusion - that's
>         why  'remount-ro' is rather mandatory - xfs is recently being enhanced
>         to provide similar logic.
>
>
>
>     Regards
>
>
>     Zdenek
>
>     _______________________________________________
>     linux-lvm mailing list
>     linux-lvm@redhat.com <mailto:linux-lvm@redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-lvm
>     read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03  9:54         ` Zdenek Kabelac
@ 2016-05-03 12:21           ` Bhasker C V
  2016-05-03 14:49             ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Bhasker C V @ 2016-05-03 12:21 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 4642 bytes --]

Here are the answers to your questions

1. fsck does not report any error and the file contained inside the FS is
definitely greater than the allocatable LV size
# fsck.ext4 -f -C0 /dev/virtp/vol01
e2fsck 1.43-WIP (15-Mar-2016)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory
structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/virtp/vol01: 12/65536 files (8.3% non-contiguous), 30492/262144
blocks

2. Size of the file

# du -hs fil
69M     fil

(please note here that the LV virtual size is 1G but the parent pool size
is just 40M I expect the file not to exceed 40M at any cost.)

3. lvs
# lvs
  LV       VG    Attr       LSize  Pool     Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  virtpool virtp twi-aotzD- 40.00m                 100.00
1.37
  vol01    virtp Vwi-aotz--  1.00g
virtpool


You can do this on any virtual machine. I use qemu with virtio back-end.


On Tue, May 3, 2016 at 11:54 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:

> On 3.5.2016 08:59, Bhasker C V wrote:
>
>> Does this mean the ext4 is showing wrong information. The file is reported
>> being 90+MB but in actuality the size is less in the FS ?
>> This is quite ok because it is just that file system being affected. I was
>> however concerned that the file in this FS might have overwritten other LV
>> data since the file is showing bigger than the volume size.
>>
>>
> I've no idea what 'ext4' is showing you, but if you have i.e. 100M
> filesystem size, you could still have there e.g. 1TB file. Experience the
> magic:
>
> 'truncate -s 1T myfirst1TBfile'
>
> As you can see 'ext4' is doing it's own over-provisioning with 'hole'
> files.
> The only important bits are:
> - is the filesystem consistent ?
> - is 'fsck' not reporting any error ?
>
> What's the 'real' size you get with 'du  myfirst1TBfile' or your wrong
> file ?
>
> Somehow I don't believe you can get  i.e.  90+MB 'du' size with 10MB
> filesystem size and 'fsck' would not report any problem.
>
> I will try this using BTRFS.
>>
>
> For what exactly ??
>
> Regard
>
> Zdenek
>
>
>
>
>
>>
>> On Fri, Apr 29, 2016 at 10:13 AM, Zdenek Kabelac <zkabelac@redhat.com
>> <mailto:zkabelac@redhat.com>> wrote:
>>
>>     On 28.4.2016 16:36, Bhasker C V wrote:
>>
>>         Zdenek,
>>            Thanks. Here I am just filling it up with random data and so I
>> am not
>>         concerned about data integrity
>>            You are right, I did get page lost during write errors in the
>> kernel
>>
>>         The question however is even after reboot and doing several fsck
>> of
>>         the ext4fs
>>         the file size "occupied" is more than the pool size. How is this ?
>>         I agree that data may be corrupted, but there *is* some data and
>> this
>>         must be
>>         saved somewhere. Why is this "somewhere" exceeding the pool size ?
>>
>>
>>     Hi
>>
>>     Few key principles -
>>
>>
>>     1. You should always mount extX fs with  errors=remount-ro
>> (tune2fs,mount)
>>
>>     2. There are few data={} modes ensuring various degree of data
>> integrity,
>>         An case you really care about data integrity here - switch to
>> 'journal'
>>         mode at price of lower speed. Default ordered mode might show
>> this.
>>         (i.e. it's the very same behavior as you would have seen with
>> failing hdd)
>>
>>     3. Do not continue using thin-pool when it's full :)
>>
>>     4. We do miss more configurable policies with thin-pools.
>>         i.e. do plan to instantiate 'error' target for writes in the case
>>         pool gets full - so ALL writes will be errored - as of now -
>> writes
>>         to provisioned blocks may cause further filesystem confusion -
>> that's
>>         why  'remount-ro' is rather mandatory - xfs is recently being
>> enhanced
>>         to provide similar logic.
>>
>>
>>
>>     Regards
>>
>>
>>     Zdenek
>>
>>     _______________________________________________
>>     linux-lvm mailing list
>>     linux-lvm@redhat.com <mailto:linux-lvm@redhat.com>
>>     https://www.redhat.com/mailman/listinfo/linux-lvm
>>     read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>>
>>
>>
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>>
>>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

[-- Attachment #2: Type: text/html, Size: 7354 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03 12:21           ` Bhasker C V
@ 2016-05-03 14:49             ` Zdenek Kabelac
  2016-05-03 15:51               ` Xen
  2016-05-03 17:07               ` Gionatan Danti
  0 siblings, 2 replies; 11+ messages in thread
From: Zdenek Kabelac @ 2016-05-03 14:49 UTC (permalink / raw)
  To: LVM general discussion and development

On 3.5.2016 14:21, Bhasker C V wrote:
> Here are the answers to your questions
>
> 1. fsck does not report any error and the file contained inside the FS is
> definitely greater than the allocatable LV size
> # fsck.ext4 -f -C0 /dev/virtp/vol01
> e2fsck 1.43-WIP (15-Mar-2016)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/virtp/vol01: 12/65536 files (8.3% non-contiguous), 30492/262144 blocks
>
> 2. Size of the file
>
> # du -hs fil
> 69M     fil
>
> (please note here that the LV virtual size is 1G but the parent pool size is
> just 40M I expect the file not to exceed 40M at any cost.)
>
> 3. lvs
> # lvs
>    LV       VG    Attr       LSize  Pool     Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>    virtpool virtp twi-aotzD- 40.00m                 100.00 1.37
>    vol01    virtp Vwi-aotz--  1.00g virtpool
>
>
> You can do this on any virtual machine. I use qemu with virtio back-end.

But this is VERY different case.

You filesystem IS 1GB in size and ext4 provisions mostly all 'metadatata' 
during first mount.

So thin-pool has usually all filesystem's metadata space 'available' for 
updating and if you use  mount option data=ordered  (being default) - it 
happens that 'write' to provisioned space is OK, while write to 'data' space
gets async page lost.

And this all depends how are you willing to write your data.

Basically if you use page-cache and ignore  'fdatasync()'  you NEVER know what 
has been stored in disk (living in a dreamworld basically)
(i.e. close of your program/file descriptor  DOES NOT flush)

When thin-pool gets full and you have not managed to resize your data LV 
in-time various thing may go wrong - this is a fuzzy tricky land.

Now few people (me included) believe  thin volume should error 'ANY' further 
write when there was an overprovisioning error on a device and I'm afraid this 
can't be solved elsewhere then in target driver.
ATM this thin volume puts filesystem into very complex situation which does 
not have 'winning' scenario in number of cases - so we need to define number 
of policies.

BUT ATM we clearly communicate - when you run OUT of thin-pool space
it's serious ADMIN failure - and we could only try to lower damage.

Thin-pool overfull CANNOT be compared to writing to a full filesystem
and there is absolutely no guarantee about content of non-flushed files!

Expecting you run out-of-space in thin-pool and nothing bad can happens is 
naive ATM - we are cooperating at least with XFS/ext4 developers to solve some 
corner case, but there is still a lot of work to do as we exercise quite 
unusual error paths for them.


Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03 14:49             ` Zdenek Kabelac
@ 2016-05-03 15:51               ` Xen
  2016-05-03 16:27                 ` Zdenek Kabelac
  2016-05-03 17:07               ` Gionatan Danti
  1 sibling, 1 reply; 11+ messages in thread
From: Xen @ 2016-05-03 15:51 UTC (permalink / raw)
  To: LVM general discussion and development

Zdenek Kabelac schreef op 03-05-2016 16:49:

> Expecting you run out-of-space in thin-pool and nothing bad can
> happens is naive ATM - we are cooperating at least with XFS/ext4
> developers to solve some corner case, but there is still a lot of work
> to do as we exercise quite unusual error paths for them.

You also talked about seeing if you could have these filesystems work 
more in alignment with block (extent) boundaries, right?

I mean something that agrees more with allocation requests, so to speak.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03 15:51               ` Xen
@ 2016-05-03 16:27                 ` Zdenek Kabelac
  0 siblings, 0 replies; 11+ messages in thread
From: Zdenek Kabelac @ 2016-05-03 16:27 UTC (permalink / raw)
  To: LVM general discussion and development

On 3.5.2016 17:51, Xen wrote:
> Zdenek Kabelac schreef op 03-05-2016 16:49:
>
>> Expecting you run out-of-space in thin-pool and nothing bad can
>> happens is naive ATM - we are cooperating at least with XFS/ext4
>> developers to solve some corner case, but there is still a lot of work
>> to do as we exercise quite unusual error paths for them.
>
> You also talked about seeing if you could have these filesystems work more in
> alignment with block (extent) boundaries, right?

Yes it's mostly about 'space' efficiency.

i.e.  it's inefficient to  provision  1M thin-pool chunks and then filesystem
uses just 1/2 of this provisioned chunk and allocates next one.
The smaller the chunk is the better space efficiency gets (and need with 
snapshot), but may need lots of metadata and may cause fragmentation troubles.

ATM thin-pool support a single chunksize - so again up to admin to pick the 
right one for its needs.

For Read/Write alignment still the physical geometry is the limiting factor.


Zdenek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [linux-lvm] Lvm think provisioning query
  2016-05-03 14:49             ` Zdenek Kabelac
  2016-05-03 15:51               ` Xen
@ 2016-05-03 17:07               ` Gionatan Danti
  1 sibling, 0 replies; 11+ messages in thread
From: Gionatan Danti @ 2016-05-03 17:07 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

Il 03-05-2016 16:49 Zdenek Kabelac ha scritto:
> 
> Now few people (me included) believe  thin volume should error 'ANY'
> further write when there was an overprovisioning error on a device and
> I'm afraid this can't be solved elsewhere then in target driver.
> ATM this thin volume puts filesystem into very complex situation which
> does not have 'winning' scenario in number of cases - so we need to
> define number of policies.
> 
> BUT ATM we clearly communicate - when you run OUT of thin-pool space
> it's serious ADMIN failure - and we could only try to lower damage.
> 
> Thin-pool overfull CANNOT be compared to writing to a full filesystem
> and there is absolutely no guarantee about content of non-flushed 
> files!

True, but non-synched writes should be always treated as "this item can 
be lost if power disappears / system crashes" anyway. On the other 
hands, (f)synched writes should already fail immediately if no space can 
be allocated from the storage subsystem.

In other words, even with a full data pool filesystem intergrity by 
itself should be guaranteed (both by jornaling and fsync), while non 
flushed writes "maybe" (if the data segment required was *already* 
allocated the writes completes, otherwise it fail as an async lost 
page).

For full tmeta things are much worse, as sometime it require 
thin_repair. (ps: if you have two free minutes, please see my other 
email regarding full tmeta. Thanks in advance).

This is my current understanding; please correct me if I am wrong!

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-05-03 17:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-27 12:33 [linux-lvm] Lvm think provisioning query Bhasker C V
2016-04-27 14:33 ` Zdenek Kabelac
2016-04-28 14:36   ` Bhasker C V
2016-04-29  8:13     ` Zdenek Kabelac
2016-05-03  6:59       ` Bhasker C V
2016-05-03  9:54         ` Zdenek Kabelac
2016-05-03 12:21           ` Bhasker C V
2016-05-03 14:49             ` Zdenek Kabelac
2016-05-03 15:51               ` Xen
2016-05-03 16:27                 ` Zdenek Kabelac
2016-05-03 17:07               ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).