Re: [linux-lvm] LVM thin LV filesystem superblock corruption

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
@ 2013-03-22 15:12 Andres Toomsalu
  2013-03-25 19:45 ` Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-22 15:12 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 3577 bytes --]

Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.


We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:

lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
mkfs.ext4 /dev/VolGroupL1/thin_storage
mount /dev/VolGroupL1/thin_storage /storage/
reboot

# NB! without host reboot unmount/mount succeeds!

[root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
mount: you must specify the filesystem type

Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.

Hardware setup: 
* Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).

Some debug info:
[root@node3 ~]# lvs
  LV           VG         Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  lv_root      VolGroup   -wi-ao--  50.00g
  lv_swap      VolGroup   -wi-ao--   4.00g
  pool         VolGroupL1 twi-a-tz   1.00t               0.00
  thin_storage VolGroupL1 Vwi-a-tz 100.00g pool          0.00

[root@node3 ~]# lvdisplay /dev/VolGroupL1/thin_storage
  --- Logical volume ---
  LV Path                /dev/VolGroupL1/thin_storage
  LV Name                thin_storage
  VG Name                VolGroupL1
  LV UUID                qla8Zf-FOdU-WB0j-SSdv-Xzpk-c9MS-gc97fc
  LV Write Access        read/write
  LV Creation host, time node3.oncloud.int, 2013-03-22 15:38:08 +0200
  LV Pool name           pool
  LV Status              available
  # open                 0
  LV Size                100.00 GiB
  Mapped size            0.00%
  Current LE             800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:6

[root@node3 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  VolGroup     1   2   0 wz--n-  3.27t 3.22t
  VolGroupL1   1   2   0 wz--n- 10.91t 9.91t

[root@node3 ~]# vgdisplay VolGroupL1
  --- Volume group ---
  VG Name               VolGroupL1
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  61
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               10.91 TiB
  PE Size               128.00 MiB
  Total PE              89399
  Alloc PE / Size       8208 / 1.00 TiB
  Free  PE / Size       81191 / 9.91 TiB
  VG UUID               2cHIOM-Rs9u-B5Mv-FaZv-KORq-mrTk-QIGfoG

[root@node3 ~]# pvs
  PV         VG         Fmt  Attr PSize  PFree
  /dev/sda2  VolGroup   lvm2 a--   3.27t 3.22t
  /dev/sdb   VolGroupL1 lvm2 a--  10.91t 9.91t

[root@node3 ~]# pvdisplay /dev/sdb
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               VolGroupL1
  PV Size               10.91 TiB / not usable 128.00 MiB
  Allocatable           yes
  PE Size               128.00 MiB
  Total PE              89399
  Free PE               81191
  Allocated PE          8208
  PV UUID               l3ROps-Aar9-wSUO-ypGj-Wwi1-G0Wu-VqDs1a

What could be the issue here?

regards,
-- 
----------------------------------------------
Andres Toomsalu, andres@active.ee







[-- Attachment #2: Type: text/html, Size: 11031 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
  2013-03-22 15:12 [linux-lvm] LVM thin LV filesystem superblock corruption Andres Toomsalu
@ 2013-03-25 19:45 ` Mike Snitzer
  2013-03-25 21:29   ` Andres Toomsalu
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2013-03-25 19:45 UTC (permalink / raw)
  To: Andres Toomsalu; +Cc: LVM general discussion and development

On Fri, Mar 22 2013 at 11:12am -0400,
Andres Toomsalu <andres@active.ee> wrote:

> Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.
> 
> 
> We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
> 
> lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
> lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
> mkfs.ext4 /dev/VolGroupL1/thin_storage
> mount /dev/VolGroupL1/thin_storage /storage/
> reboot

Couple things:
1) mkfs.ext4 does buffered IO so there is no gaurantee the superblock or
   any other block group destriptors, have actually been committed to
   non-volatile storage when mkfs.ext4 completes
2) reboot sequence is very distro specific; /storage may not have been
   unmounted before reboot -- if it was unmounted then all data
   should've been pushed out to non-volatile storage

So if you add this to command before "reboot" do you no longer have
missing data after the system reboots?:

echo 3 > /proc/sys/vm/drop_caches

> # NB! without host reboot unmount/mount succeeds!
> 
> [root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
> mount: you must specify the filesystem type
> 
> Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
> 
> Hardware setup: 
> * Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
...
> What could be the issue here?

I assume by "reboot" you mean the host (with the PERC card) never loses
power?

What layers of hardware writeback caching are in place in the
H800+MD1200 case vs H700+localraid?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
  2013-03-25 19:45 ` Mike Snitzer
@ 2013-03-25 21:29   ` Andres Toomsalu
  0 siblings, 0 replies; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-25 21:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: LVM general discussion and development


On 25.03.2013, at 21:45, Mike Snitzer wrote:

> On Fri, Mar 22 2013 at 11:12am -0400,
> Andres Toomsalu <andres@active.ee> wrote:
> 
>> Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.
>> 
>> 
>> We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
>> 
>> lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
>> lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
>> mkfs.ext4 /dev/VolGroupL1/thin_storage
>> mount /dev/VolGroupL1/thin_storage /storage/
>> reboot
> 
> Couple things:
> 1) mkfs.ext4 does buffered IO so there is no gaurantee the superblock or
>   any other block group destriptors, have actually been committed to
>   non-volatile storage when mkfs.ext4 completes

I see. While this could have been true during the tests I did later after discovering the issue - its a bit unlikely the case for first time the issue appeared - as there was about 24h time window between mkfs and host reboot then - and data was copied into new /storage LV.

One more strange thing about the issue - during the tests I made repeatedly thin LV setups cycles with different pool, pool metadata and lv sizes - from 50G to 8TB - which seemed not to affect anything.
Once it failed - it failed repeatedly - until the moment it started to work again and then it worked repeatedly - that behaviour actually could support the buffered IO theoryï¿½
Right now I cant reproduce the issue anymore at will - waiting for failure again.

> 2) reboot sequence is very distro specific; /storage may not have been
>   unmounted before reboot -- if it was unmounted then all data
>   should've been pushed out to non-volatile storage

Distro is CentOS 6.4 - should unmount LV-s correctly as far as I know.

> 
> So if you add this to command before "reboot" do you no longer have
> missing data after the system reboots?:
> 
> echo 3 > /proc/sys/vm/drop_caches

will try next time

> 
>> # NB! without host reboot unmount/mount succeeds!
>> 
>> [root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
>> mount: you must specify the filesystem type
>> 
>> Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
>> 
>> Hardware setup: 
>> * Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
> ...
>> What could be the issue here?
> 
> I assume by "reboot" you mean the host (with the PERC card) never loses
> power?

Yes - soft reboot - no power cut.

> 
> What layers of hardware writeback caching are in place in the
> H800+MD1200 case vs H700+localraid?

H800 has RAID10 array with cache set to 'writethrough'
H700 has RAID10 array with cache set to 'writeback'

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [linux-lvm] LVM thin LV filesystem superblock corruption
@ 2013-03-22 13:50 Andres Toomsalu
  0 siblings, 0 replies; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-22 13:50 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 3412 bytes --]

We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:

lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
mkfs.ext4 /dev/VolGroupL1/thin_storage
mount /dev/VolGroupL1/thin_storage /storage/
reboot

# NB! without host reboot unmount/mount succeeds!

[root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
mount: you must specify the filesystem type

Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.

Hardware setup: 
* Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).

Some debug info:
[root@node3 ~]# lvs
  LV           VG         Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  lv_root      VolGroup   -wi-ao--  50.00g
  lv_swap      VolGroup   -wi-ao--   4.00g
  pool         VolGroupL1 twi-a-tz   1.00t               0.00
  thin_storage VolGroupL1 Vwi-a-tz 100.00g pool          0.00

[root@node3 ~]# lvdisplay /dev/VolGroupL1/thin_storage
  --- Logical volume ---
  LV Path                /dev/VolGroupL1/thin_storage
  LV Name                thin_storage
  VG Name                VolGroupL1
  LV UUID                qla8Zf-FOdU-WB0j-SSdv-Xzpk-c9MS-gc97fc
  LV Write Access        read/write
  LV Creation host, time node3.oncloud.int, 2013-03-22 15:38:08 +0200
  LV Pool name           pool
  LV Status              available
  # open                 0
  LV Size                100.00 GiB
  Mapped size            0.00%
  Current LE             800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:6

[root@node3 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  VolGroup     1   2   0 wz--n-  3.27t 3.22t
  VolGroupL1   1   2   0 wz--n- 10.91t 9.91t

[root@node3 ~]# vgdisplay VolGroupL1
  --- Volume group ---
  VG Name               VolGroupL1
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  61
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               10.91 TiB
  PE Size               128.00 MiB
  Total PE              89399
  Alloc PE / Size       8208 / 1.00 TiB
  Free  PE / Size       81191 / 9.91 TiB
  VG UUID               2cHIOM-Rs9u-B5Mv-FaZv-KORq-mrTk-QIGfoG

[root@node3 ~]# pvs
  PV         VG         Fmt  Attr PSize  PFree
  /dev/sda2  VolGroup   lvm2 a--   3.27t 3.22t
  /dev/sdb   VolGroupL1 lvm2 a--  10.91t 9.91t

[root@node3 ~]# pvdisplay /dev/sdb
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               VolGroupL1
  PV Size               10.91 TiB / not usable 128.00 MiB
  Allocatable           yes
  PE Size               128.00 MiB
  Total PE              89399
  Free PE               81191
  Allocated PE          8208
  PV UUID               l3ROps-Aar9-wSUO-ypGj-Wwi1-G0Wu-VqDs1a

What could be the issue here?

regards,
-- 
----------------------------------------------
Andres Toomsalu, andres@active.ee





[-- Attachment #2: Type: text/html, Size: 10750 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-25 21:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-22 15:12 [linux-lvm] LVM thin LV filesystem superblock corruption Andres Toomsalu
2013-03-25 19:45 ` Mike Snitzer
2013-03-25 21:29   ` Andres Toomsalu
  -- strict thread matches above, loose matches on Subject: below --
2013-03-22 13:50 Andres Toomsalu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).