public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Strange LVM2/DM data corruption with 2.6.11.12
@ 2005-09-08  9:58 Ludovic Drolez
  2005-09-08 14:02 ` Alexander Nyberg
  0 siblings, 1 reply; 3+ messages in thread
From: Ludovic Drolez @ 2005-09-08  9:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi !

We are developing (GPLed) disk cloning software similar to partimage: it's an 
intelligent 'dd' which backups only used sectors.

Recently I added LVM1/2 support to it, and sometimes we saw LVM restorations 
failing randomly (Disk images are not corrupted, but the result of the 
restoration can be lead to a corrupted filesystem). If a restoration fails, just 
try another one and it will work...

How the restoration program works:
- I restore the LVM2 administrative data (384 sectors, most of the time),
- I 'vgscan', 'vgchange',
- open for writing the '/dev/dm-xxxx',
- read a compressed file over NFS,
- and put the sectors in place, so it's a succession of '_llseek()' and 
'write()' to the DM device.

But, *sometimes*, for example, the current seek position is at 9GB, and some 
data is written to sector 0 ! It happens randomly.

Here is a typical strace of a restoration:

write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(5, 20963328, [37604032512], SEEK_CUR) = 0
_llseek(5, 0, [37604032512], SEEK_CUR)  = 0
_llseek(5, 2097152, [37606129664], SEEK_CUR) = 0
write(5, "\1\335E\0\f\0\1\2.\0\0\0\2\0\0\0\364\17\2\2..\0\0\0\0\0"..., 512) = 51
2
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(5, 88076288, [37694210048], SEEK_CUR) = 0
_llseek(5, 0, [37694210048], SEEK_CUR)  = 0
_llseek(5, 20971520, [37715181568], SEEK_CUR) = 0
write(5, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 512) = 5
12
....
....

As you can see, there are no seeks to sector 0, but something randomly write 
some data to sector 0 !
I could reproduce these random problems on different kind of PCs.

But, the strace above comes from an improved version, which aggregates 
'_llseek's. A previous version, which did *many* 512 bytes seeks had much more 
problems. Aggregating seeks made the corruption to appears very rarely... And I 
more likely to happen, for a 40GB restoration than for a 10GB one.

So less system calls to the DM/LVM2 layer seems to give less corruption probability.


Any ideas ? Newer kernel releases could have fixed such a problem ?


-- 
Ludovic DROLEZ                              Linbox / Free&ALter Soft
http://lrs.linbox.org - Linbox Rescue Server GPL edition

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Strange LVM2/DM data corruption with 2.6.11.12
  2005-09-08  9:58 Strange LVM2/DM data corruption with 2.6.11.12 Ludovic Drolez
@ 2005-09-08 14:02 ` Alexander Nyberg
  2005-09-08 18:10   ` Chris Wright
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Nyberg @ 2005-09-08 14:02 UTC (permalink / raw)
  To: Ludovic Drolez; +Cc: Linux Kernel Mailing List

On Thu, Sep 08, 2005 at 11:58:54AM +0200 Ludovic Drolez wrote:

> Hi !
> 
> We are developing (GPLed) disk cloning software similar to partimage: it's 
> an intelligent 'dd' which backups only used sectors.
> 
> Recently I added LVM1/2 support to it, and sometimes we saw LVM 
> restorations failing randomly (Disk images are not corrupted, but the 
> result of the restoration can be lead to a corrupted filesystem). If a 
> restoration fails, just try another one and it will work...
> 

Please upgrade to 2.6.12.6 (I don't remember exactly in which
2.6.12.x it went in), it contains a bugfix that should fix what
you are seeing. 2.6.13 also has this.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Strange LVM2/DM data corruption with 2.6.11.12
  2005-09-08 14:02 ` Alexander Nyberg
@ 2005-09-08 18:10   ` Chris Wright
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Wright @ 2005-09-08 18:10 UTC (permalink / raw)
  To: Alexander Nyberg; +Cc: Ludovic Drolez, Linux Kernel Mailing List

* Alexander Nyberg (alexn@telia.com) wrote:
> Please upgrade to 2.6.12.6 (I don't remember exactly in which
> 2.6.12.x it went in), it contains a bugfix that should fix what
> you are seeing. 2.6.13 also has this.

Yep, that was 2.6.12.4, and here's the patch:

http://www.kernel.org/git/?p=linux/kernel/git/chrisw/stable-queue.git;a=blob_plain;h=6267a6b8da4b52eaf0fbddd9091a6e6ff2fe233c

thanks,
-chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-09-08 18:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-08  9:58 Strange LVM2/DM data corruption with 2.6.11.12 Ludovic Drolez
2005-09-08 14:02 ` Alexander Nyberg
2005-09-08 18:10   ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox