* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
@ 2013-03-22 15:12 Andres Toomsalu
2013-03-25 19:45 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-22 15:12 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 3577 bytes --]
Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.
We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
mkfs.ext4 /dev/VolGroupL1/thin_storage
mount /dev/VolGroupL1/thin_storage /storage/
reboot
# NB! without host reboot unmount/mount succeeds!
[root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
mount: you must specify the filesystem type
Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
Hardware setup:
* Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
Some debug info:
[root@node3 ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
lv_root VolGroup -wi-ao-- 50.00g
lv_swap VolGroup -wi-ao-- 4.00g
pool VolGroupL1 twi-a-tz 1.00t 0.00
thin_storage VolGroupL1 Vwi-a-tz 100.00g pool 0.00
[root@node3 ~]# lvdisplay /dev/VolGroupL1/thin_storage
--- Logical volume ---
LV Path /dev/VolGroupL1/thin_storage
LV Name thin_storage
VG Name VolGroupL1
LV UUID qla8Zf-FOdU-WB0j-SSdv-Xzpk-c9MS-gc97fc
LV Write Access read/write
LV Creation host, time node3.oncloud.int, 2013-03-22 15:38:08 +0200
LV Pool name pool
LV Status available
# open 0
LV Size 100.00 GiB
Mapped size 0.00%
Current LE 800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:6
[root@node3 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup 1 2 0 wz--n- 3.27t 3.22t
VolGroupL1 1 2 0 wz--n- 10.91t 9.91t
[root@node3 ~]# vgdisplay VolGroupL1
--- Volume group ---
VG Name VolGroupL1
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 61
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 10.91 TiB
PE Size 128.00 MiB
Total PE 89399
Alloc PE / Size 8208 / 1.00 TiB
Free PE / Size 81191 / 9.91 TiB
VG UUID 2cHIOM-Rs9u-B5Mv-FaZv-KORq-mrTk-QIGfoG
[root@node3 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 VolGroup lvm2 a-- 3.27t 3.22t
/dev/sdb VolGroupL1 lvm2 a-- 10.91t 9.91t
[root@node3 ~]# pvdisplay /dev/sdb
--- Physical volume ---
PV Name /dev/sdb
VG Name VolGroupL1
PV Size 10.91 TiB / not usable 128.00 MiB
Allocatable yes
PE Size 128.00 MiB
Total PE 89399
Free PE 81191
Allocated PE 8208
PV UUID l3ROps-Aar9-wSUO-ypGj-Wwi1-G0Wu-VqDs1a
What could be the issue here?
regards,
--
----------------------------------------------
Andres Toomsalu, andres@active.ee
[-- Attachment #2: Type: text/html, Size: 11031 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
2013-03-22 15:12 [linux-lvm] LVM thin LV filesystem superblock corruption Andres Toomsalu
@ 2013-03-25 19:45 ` Mike Snitzer
2013-03-25 21:29 ` Andres Toomsalu
0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2013-03-25 19:45 UTC (permalink / raw)
To: Andres Toomsalu; +Cc: LVM general discussion and development
On Fri, Mar 22 2013 at 11:12am -0400,
Andres Toomsalu <andres@active.ee> wrote:
> Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.
>
>
> We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
>
> lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
> lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
> mkfs.ext4 /dev/VolGroupL1/thin_storage
> mount /dev/VolGroupL1/thin_storage /storage/
> reboot
Couple things:
1) mkfs.ext4 does buffered IO so there is no gaurantee the superblock or
any other block group destriptors, have actually been committed to
non-volatile storage when mkfs.ext4 completes
2) reboot sequence is very distro specific; /storage may not have been
unmounted before reboot -- if it was unmounted then all data
should've been pushed out to non-volatile storage
So if you add this to command before "reboot" do you no longer have
missing data after the system reboots?:
echo 3 > /proc/sys/vm/drop_caches
> # NB! without host reboot unmount/mount succeeds!
>
> [root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
> mount: you must specify the filesystem type
>
> Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
>
> Hardware setup:
> * Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
...
> What could be the issue here?
I assume by "reboot" you mean the host (with the PERC card) never loses
power?
What layers of hardware writeback caching are in place in the
H800+MD1200 case vs H700+localraid?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] LVM thin LV filesystem superblock corruption
2013-03-25 19:45 ` Mike Snitzer
@ 2013-03-25 21:29 ` Andres Toomsalu
0 siblings, 0 replies; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-25 21:29 UTC (permalink / raw)
To: Mike Snitzer; +Cc: LVM general discussion and development
On 25.03.2013, at 21:45, Mike Snitzer wrote:
> On Fri, Mar 22 2013 at 11:12am -0400,
> Andres Toomsalu <andres@active.ee> wrote:
>
>> Update! Issue seems to be active only with PERC H800 and MD1200 disks - local raid with PERC H700 and lvm thin lv-s work fine without corrupting on reboot.
>>
>>
>> We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
>>
>> lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
>> lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
>> mkfs.ext4 /dev/VolGroupL1/thin_storage
>> mount /dev/VolGroupL1/thin_storage /storage/
>> reboot
>
> Couple things:
> 1) mkfs.ext4 does buffered IO so there is no gaurantee the superblock or
> any other block group destriptors, have actually been committed to
> non-volatile storage when mkfs.ext4 completes
I see. While this could have been true during the tests I did later after discovering the issue - its a bit unlikely the case for first time the issue appeared - as there was about 24h time window between mkfs and host reboot then - and data was copied into new /storage LV.
One more strange thing about the issue - during the tests I made repeatedly thin LV setups cycles with different pool, pool metadata and lv sizes - from 50G to 8TB - which seemed not to affect anything.
Once it failed - it failed repeatedly - until the moment it started to work again and then it worked repeatedly - that behaviour actually could support the buffered IO theory�
Right now I cant reproduce the issue anymore at will - waiting for failure again.
> 2) reboot sequence is very distro specific; /storage may not have been
> unmounted before reboot -- if it was unmounted then all data
> should've been pushed out to non-volatile storage
Distro is CentOS 6.4 - should unmount LV-s correctly as far as I know.
>
> So if you add this to command before "reboot" do you no longer have
> missing data after the system reboots?:
>
> echo 3 > /proc/sys/vm/drop_caches
will try next time
>
>> # NB! without host reboot unmount/mount succeeds!
>>
>> [root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
>> mount: you must specify the filesystem type
>>
>> Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
>>
>> Hardware setup:
>> * Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
> ...
>> What could be the issue here?
>
> I assume by "reboot" you mean the host (with the PERC card) never loses
> power?
Yes - soft reboot - no power cut.
>
> What layers of hardware writeback caching are in place in the
> H800+MD1200 case vs H700+localraid?
H800 has RAID10 array with cache set to 'writethrough'
H700 has RAID10 array with cache set to 'writeback'
^ permalink raw reply [flat|nested] 4+ messages in thread
* [linux-lvm] LVM thin LV filesystem superblock corruption
@ 2013-03-22 13:50 Andres Toomsalu
0 siblings, 0 replies; 4+ messages in thread
From: Andres Toomsalu @ 2013-03-22 13:50 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 3412 bytes --]
We stumbled on strange lvm thinly provisioned LV filesystem corruption case - here are steps that reproduce the issue:
lvcreate --thinpool pool -L 8T --poolmetadatasize 16G VolGroupL1
lvcreate -T VolGroupL1/pool -V 2T --name thin_storage
mkfs.ext4 /dev/VolGroupL1/thin_storage
mount /dev/VolGroupL1/thin_storage /storage/
reboot
# NB! without host reboot unmount/mount succeeds!
[root@node3 ~]# mount /dev/VolGroupL1/thin_storage /storage/
mount: you must specify the filesystem type
Tried also to set poolmetadatasize to 2G, 14G, 15G and pool size to 1T, 2T - no change - corruption still happens.
Hardware setup:
* Underlaying block device (sdb) is hosted by PERC H800 controller and disks are coming from SAS disk expansion box (DELL MD1200).
Some debug info:
[root@node3 ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
lv_root VolGroup -wi-ao-- 50.00g
lv_swap VolGroup -wi-ao-- 4.00g
pool VolGroupL1 twi-a-tz 1.00t 0.00
thin_storage VolGroupL1 Vwi-a-tz 100.00g pool 0.00
[root@node3 ~]# lvdisplay /dev/VolGroupL1/thin_storage
--- Logical volume ---
LV Path /dev/VolGroupL1/thin_storage
LV Name thin_storage
VG Name VolGroupL1
LV UUID qla8Zf-FOdU-WB0j-SSdv-Xzpk-c9MS-gc97fc
LV Write Access read/write
LV Creation host, time node3.oncloud.int, 2013-03-22 15:38:08 +0200
LV Pool name pool
LV Status available
# open 0
LV Size 100.00 GiB
Mapped size 0.00%
Current LE 800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:6
[root@node3 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup 1 2 0 wz--n- 3.27t 3.22t
VolGroupL1 1 2 0 wz--n- 10.91t 9.91t
[root@node3 ~]# vgdisplay VolGroupL1
--- Volume group ---
VG Name VolGroupL1
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 61
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 10.91 TiB
PE Size 128.00 MiB
Total PE 89399
Alloc PE / Size 8208 / 1.00 TiB
Free PE / Size 81191 / 9.91 TiB
VG UUID 2cHIOM-Rs9u-B5Mv-FaZv-KORq-mrTk-QIGfoG
[root@node3 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 VolGroup lvm2 a-- 3.27t 3.22t
/dev/sdb VolGroupL1 lvm2 a-- 10.91t 9.91t
[root@node3 ~]# pvdisplay /dev/sdb
--- Physical volume ---
PV Name /dev/sdb
VG Name VolGroupL1
PV Size 10.91 TiB / not usable 128.00 MiB
Allocatable yes
PE Size 128.00 MiB
Total PE 89399
Free PE 81191
Allocated PE 8208
PV UUID l3ROps-Aar9-wSUO-ypGj-Wwi1-G0Wu-VqDs1a
What could be the issue here?
regards,
--
----------------------------------------------
Andres Toomsalu, andres@active.ee
[-- Attachment #2: Type: text/html, Size: 10750 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-03-25 21:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-22 15:12 [linux-lvm] LVM thin LV filesystem superblock corruption Andres Toomsalu
2013-03-25 19:45 ` Mike Snitzer
2013-03-25 21:29 ` Andres Toomsalu
-- strict thread matches above, loose matches on Subject: below --
2013-03-22 13:50 Andres Toomsalu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).