* Data corruption after resizing partition, when using bitmaps
@ 2015-05-19 14:12 Jim Paris
2015-05-20 5:31 ` NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: Jim Paris @ 2015-05-19 14:12 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]
I had a raid1 mirror consisting of big partitions on two disks.
The first disk was 2TB, partitioned like this:
[--sda1(128M)--][-------sda2(~2T)--------------]
The second disk was 3TB, partitioned like this:
[--sdb1(128M)--][-------sdb2(~3T)------------------------------------]
sda2 and sdb2 were part of the array, which was only ~2TB in size due
to the smaller disk.
I realized that I needed to add a BIOS boot partition to the 3TB disk,
so I removed sdb2 from the array, and repartitioned sdb like this:
[--sdb1(128M)--][--sdb2(1M)--][-------sdb3(~3T)----------------------]
Then I added sdb3 to the array. And lost all my data. :(
What happened was that the last sector of the big partition did not
change location. So the metadata (0.90) at the end was still present.
Adding sdb3 to the array was considered a "re-add" because the UUID
and array sizes still matched the array, even though the partition
itself shrank. And the resync was thus guided by an out-of-date
bitmap, which caused very little data to actually be written to sdb3,
so half the reads from the array started returning junk. Once the
filesystem got involved, the result was rapid corruption.
If I had not been using write-intent bitmaps, everything would have
worked fine. I only recently started using bitmaps, and never had any
problems with adjusting partitions like this before that.
Perhaps mdadm can be more careful here -- for example, maybe checking
the actual device size and not just the "used dev size" when
determining whether to trust the bitmap.
I wrote a script (attached) to recreate what happened, using some loop
devices. It works fine if BITMAP=none, and fails with BITMAP=internal.
Jim
[-- Attachment #2: repro.sh --]
[-- Type: application/x-sh, Size: 3181 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data corruption after resizing partition, when using bitmaps
2015-05-19 14:12 Data corruption after resizing partition, when using bitmaps Jim Paris
@ 2015-05-20 5:31 ` NeilBrown
2015-05-20 6:31 ` Jim Paris
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2015-05-20 5:31 UTC (permalink / raw)
To: Jim Paris; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]
On Tue, 19 May 2015 10:12:40 -0400 Jim Paris <jim@jtan.com> wrote:
> I had a raid1 mirror consisting of big partitions on two disks.
> The first disk was 2TB, partitioned like this:
>
> [--sda1(128M)--][-------sda2(~2T)--------------]
>
> The second disk was 3TB, partitioned like this:
>
> [--sdb1(128M)--][-------sdb2(~3T)------------------------------------]
>
> sda2 and sdb2 were part of the array, which was only ~2TB in size due
> to the smaller disk.
>
> I realized that I needed to add a BIOS boot partition to the 3TB disk,
> so I removed sdb2 from the array, and repartitioned sdb like this:
>
> [--sdb1(128M)--][--sdb2(1M)--][-------sdb3(~3T)----------------------]
>
> Then I added sdb3 to the array. And lost all my data. :(
>
> What happened was that the last sector of the big partition did not
> change location. So the metadata (0.90) at the end was still present.
This is one of the big reasons why 1.x was invented.
> Adding sdb3 to the array was considered a "re-add" because the UUID
> and array sizes still matched the array, even though the partition
> itself shrank. And the resync was thus guided by an out-of-date
> bitmap, which caused very little data to actually be written to sdb3,
> so half the reads from the array started returning junk. Once the
> filesystem got involved, the result was rapid corruption.
>
> If I had not been using write-intent bitmaps, everything would have
> worked fine. I only recently started using bitmaps, and never had any
> problems with adjusting partitions like this before that.
>
> Perhaps mdadm can be more careful here -- for example, maybe checking
> the actual device size and not just the "used dev size" when
> determining whether to trust the bitmap.
It is perfectly acceptable to have the various devices in an array of
different sizes. Unfortunately I don't think there is anything that mdadm
can usefully do here.
Thanks for the report anyway,
NeilBrown
>
> I wrote a script (attached) to recreate what happened, using some loop
> devices. It works fine if BITMAP=none, and fails with BITMAP=internal.
>
> Jim
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data corruption after resizing partition, when using bitmaps
2015-05-20 5:31 ` NeilBrown
@ 2015-05-20 6:31 ` Jim Paris
2015-05-21 0:24 ` NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: Jim Paris @ 2015-05-20 6:31 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown wrote:
> On Tue, 19 May 2015 10:12:40 -0400 Jim Paris <jim@jtan.com> wrote:
>
> > I had a raid1 mirror consisting of big partitions on two disks.
> > The first disk was 2TB, partitioned like this:
> >
> > [--sda1(128M)--][-------sda2(~2T)--------------]
> >
> > The second disk was 3TB, partitioned like this:
> >
> > [--sdb1(128M)--][-------sdb2(~3T)------------------------------------]
> >
> > sda2 and sdb2 were part of the array, which was only ~2TB in size due
> > to the smaller disk.
> >
> > I realized that I needed to add a BIOS boot partition to the 3TB disk,
> > so I removed sdb2 from the array, and repartitioned sdb like this:
> >
> > [--sdb1(128M)--][--sdb2(1M)--][-------sdb3(~3T)----------------------]
> >
> > Then I added sdb3 to the array. And lost all my data. :(
> >
> > What happened was that the last sector of the big partition did not
> > change location. So the metadata (0.90) at the end was still present.
>
> This is one of the big reasons why 1.x was invented.
>
> > Adding sdb3 to the array was considered a "re-add" because the UUID
> > and array sizes still matched the array, even though the partition
> > itself shrank. And the resync was thus guided by an out-of-date
> > bitmap, which caused very little data to actually be written to sdb3,
> > so half the reads from the array started returning junk. Once the
> > filesystem got involved, the result was rapid corruption.
> >
> > If I had not been using write-intent bitmaps, everything would have
> > worked fine. I only recently started using bitmaps, and never had any
> > problems with adjusting partitions like this before that.
> >
> > Perhaps mdadm can be more careful here -- for example, maybe checking
> > the actual device size and not just the "used dev size" when
> > determining whether to trust the bitmap.
>
> It is perfectly acceptable to have the various devices in an array of
> different sizes. Unfortunately I don't think there is anything that mdadm
> can usefully do here.
>
> Thanks for the report anyway,
> NeilBrown
Hi Neil,
Can we add u64 device_size to bitmap_super_t, and ensure that it
matches the actual current device size before trusting the bitmap?
Jim
>
>
> >
> > I wrote a script (attached) to recreate what happened, using some loop
> > devices. It works fine if BITMAP=none, and fails with BITMAP=internal.
> >
> > Jim
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data corruption after resizing partition, when using bitmaps
2015-05-20 6:31 ` Jim Paris
@ 2015-05-21 0:24 ` NeilBrown
2015-05-21 5:58 ` Jim Paris
0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2015-05-21 0:24 UTC (permalink / raw)
To: Jim Paris; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]
On Wed, 20 May 2015 02:31:50 -0400 Jim Paris <jim@jtan.com> wrote:
> NeilBrown wrote:
> > On Tue, 19 May 2015 10:12:40 -0400 Jim Paris <jim@jtan.com> wrote:
> >
> > > I had a raid1 mirror consisting of big partitions on two disks.
> > > The first disk was 2TB, partitioned like this:
> > >
> > > [--sda1(128M)--][-------sda2(~2T)--------------]
> > >
> > > The second disk was 3TB, partitioned like this:
> > >
> > > [--sdb1(128M)--][-------sdb2(~3T)------------------------------------]
> > >
> > > sda2 and sdb2 were part of the array, which was only ~2TB in size due
> > > to the smaller disk.
> > >
> > > I realized that I needed to add a BIOS boot partition to the 3TB disk,
> > > so I removed sdb2 from the array, and repartitioned sdb like this:
> > >
> > > [--sdb1(128M)--][--sdb2(1M)--][-------sdb3(~3T)----------------------]
> > >
> > > Then I added sdb3 to the array. And lost all my data. :(
> > >
> > > What happened was that the last sector of the big partition did not
> > > change location. So the metadata (0.90) at the end was still present.
> >
> > This is one of the big reasons why 1.x was invented.
> >
> > > Adding sdb3 to the array was considered a "re-add" because the UUID
> > > and array sizes still matched the array, even though the partition
> > > itself shrank. And the resync was thus guided by an out-of-date
> > > bitmap, which caused very little data to actually be written to sdb3,
> > > so half the reads from the array started returning junk. Once the
> > > filesystem got involved, the result was rapid corruption.
> > >
> > > If I had not been using write-intent bitmaps, everything would have
> > > worked fine. I only recently started using bitmaps, and never had any
> > > problems with adjusting partitions like this before that.
> > >
> > > Perhaps mdadm can be more careful here -- for example, maybe checking
> > > the actual device size and not just the "used dev size" when
> > > determining whether to trust the bitmap.
> >
> > It is perfectly acceptable to have the various devices in an array of
> > different sizes. Unfortunately I don't think there is anything that mdadm
> > can usefully do here.
> >
> > Thanks for the report anyway,
> > NeilBrown
>
> Hi Neil,
>
> Can we add u64 device_size to bitmap_super_t, and ensure that it
> matches the actual current device size before trusting the bitmap?
Well .... we could, but the bitmap_super is currently the same on all
devices. This would make it different.
And if we a going to change the metadata, why not just convert from 0.90 to
1.0?
mdadm --stop /dev/mdXX
mdadm --assemble /dev/mdXX --update=metadata /dev/...list-of-devices....
You might need to remove the bitmap first, and add it back afterwards.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Data corruption after resizing partition, when using bitmaps
2015-05-21 0:24 ` NeilBrown
@ 2015-05-21 5:58 ` Jim Paris
0 siblings, 0 replies; 5+ messages in thread
From: Jim Paris @ 2015-05-21 5:58 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown wrote:
> On Wed, 20 May 2015 02:31:50 -0400 Jim Paris <jim@jtan.com> wrote:
>
> > NeilBrown wrote:
> > > On Tue, 19 May 2015 10:12:40 -0400 Jim Paris <jim@jtan.com> wrote:
> > >
> > > > I had a raid1 mirror consisting of big partitions on two disks.
> > > > The first disk was 2TB, partitioned like this:
> > > >
> > > > [--sda1(128M)--][-------sda2(~2T)--------------]
> > > >
> > > > The second disk was 3TB, partitioned like this:
> > > >
> > > > [--sdb1(128M)--][-------sdb2(~3T)------------------------------------]
> > > >
> > > > sda2 and sdb2 were part of the array, which was only ~2TB in size due
> > > > to the smaller disk.
> > > >
> > > > I realized that I needed to add a BIOS boot partition to the 3TB disk,
> > > > so I removed sdb2 from the array, and repartitioned sdb like this:
> > > >
> > > > [--sdb1(128M)--][--sdb2(1M)--][-------sdb3(~3T)----------------------]
> > > >
> > > > Then I added sdb3 to the array. And lost all my data. :(
> > > >
> > > > What happened was that the last sector of the big partition did not
> > > > change location. So the metadata (0.90) at the end was still present.
> > >
> > > This is one of the big reasons why 1.x was invented.
> > >
> > > > Adding sdb3 to the array was considered a "re-add" because the UUID
> > > > and array sizes still matched the array, even though the partition
> > > > itself shrank. And the resync was thus guided by an out-of-date
> > > > bitmap, which caused very little data to actually be written to sdb3,
> > > > so half the reads from the array started returning junk. Once the
> > > > filesystem got involved, the result was rapid corruption.
> > > >
> > > > If I had not been using write-intent bitmaps, everything would have
> > > > worked fine. I only recently started using bitmaps, and never had any
> > > > problems with adjusting partitions like this before that.
> > > >
> > > > Perhaps mdadm can be more careful here -- for example, maybe checking
> > > > the actual device size and not just the "used dev size" when
> > > > determining whether to trust the bitmap.
> > >
> > > It is perfectly acceptable to have the various devices in an array of
> > > different sizes. Unfortunately I don't think there is anything that mdadm
> > > can usefully do here.
> > >
> > > Thanks for the report anyway,
> > > NeilBrown
> >
> > Hi Neil,
> >
> > Can we add u64 device_size to bitmap_super_t, and ensure that it
> > matches the actual current device size before trusting the bitmap?
>
> Well .... we could, but the bitmap_super is currently the same on all
> devices. This would make it different.
> And if we a going to change the metadata, why not just convert from 0.90 to
> 1.0?
My thinking was that the extra field could be added to bitmap_super
automatically -- just start writing it now, but only use it to
determine bitmap validity if the current value is non-zero. No
explicit user-visible conversion.
I see your point; it's a bit strange to change outdated stuff. But I
also feel that if there's something mdadm could have done to prevent
my data loss, that's worth putting in there.
> mdadm --stop /dev/mdXX
> mdadm --assemble /dev/mdXX --update=metadata /dev/...list-of-devices....
>
> You might need to remove the bitmap first, and add it back afterwards.
Cool. Much simpler than what's currently listed in the wiki for that
conversion.
Thanks Neil.
xJim
>
> NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-05-21 5:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-19 14:12 Data corruption after resizing partition, when using bitmaps Jim Paris
2015-05-20 5:31 ` NeilBrown
2015-05-20 6:31 ` Jim Paris
2015-05-21 0:24 ` NeilBrown
2015-05-21 5:58 ` Jim Paris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).