Bug(?): btrfs carries on working if part of a device disappears

All of lore.kernel.org
 help / color / mirror / Atom feed

* Bug(?): btrfs carries on working if part of a device disappears
@ 2012-01-05 18:02 Maik Zumstrull
  2012-01-13 12:07 ` Liu Bo
  0 siblings, 1 reply; 3+ messages in thread
From: Maik Zumstrull @ 2012-01-05 18:02 UTC (permalink / raw)
  To: linux-btrfs

Hello list,

I hit a funny BIOS bug the other day where the BIOS suddenly sets a
HPA on a random hard disk, leaving only the first 33 MB accessible.
That disk had one device of a multi-device btrfs on it in my case.
(With dm-crypt/LUKS in between, no partitioning or LVM.)

The reason I'm writing to you is that btrfs apparently didn't care at
all. It didn't complain, and it certainly didn't consider "Uhm, maybe
I should stop writing to a file system that mostly doesn't exist
anymore." The only errors I saw in dmesg were from the lower block
device level: someone trying to read or write beyond the end of a
device. An error btrfs apparently didn't mind. It took me a while to
figure out what had happened, during which time btrfsck and the btrfs
kernel part worked together to pretty much totally trash the fs. (I'm
still trying a few things, but I'm not hopeful. Hold the default
backup rant, I can in fact recover anything that was on this from
elsewhere, I think.)

So, I think during mount, btrfs should check the reported size of the
block device, and if it's significantly smaller than fs metadata
implies it must be, mount degraded or read-only or not at all. And
mostly, complain. Loudly.

This was on Debian's linux-image-3.1.0-1-amd6 at version 3.1.6-1.
Other ways this could happen than HPA are LVM or partitioning.

Maik

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bug(?): btrfs carries on working if part of a device disappears
  2012-01-05 18:02 Bug(?): btrfs carries on working if part of a device disappears Maik Zumstrull
@ 2012-01-13 12:07 ` Liu Bo
  2012-01-13 12:51   ` Ben Klein
  0 siblings, 1 reply; 3+ messages in thread
From: Liu Bo @ 2012-01-13 12:07 UTC (permalink / raw)
  To: Maik Zumstrull; +Cc: linux-btrfs

On 01/06/2012 02:02 AM, Maik Zumstrull wrote:
> Hello list,
> 
> I hit a funny BIOS bug the other day where the BIOS suddenly sets a
> HPA on a random hard disk, leaving only the first 33 MB accessible.
> That disk had one device of a multi-device btrfs on it in my case.
> (With dm-crypt/LUKS in between, no partitioning or LVM.)
> 
> The reason I'm writing to you is that btrfs apparently didn't care at
> all. It didn't complain, and it certainly didn't consider "Uhm, maybe
> I should stop writing to a file system that mostly doesn't exist
> anymore." The only errors I saw in dmesg were from the lower block
> device level: someone trying to read or write beyond the end of a
> device. An error btrfs apparently didn't mind. It took me a while to
> figure out what had happened, during which time btrfsck and the btrfs
> kernel part worked together to pretty much totally trash the fs. (I'm
> still trying a few things, but I'm not hopeful. Hold the default
> backup rant, I can in fact recover anything that was on this from
> elsewhere, I think.)
> 
> So, I think during mount, btrfs should check the reported size of the
> block device, and if it's significantly smaller than fs metadata
> implies it must be, mount degraded or read-only or not at all. And
> mostly, complain. Loudly.
> 

I also notice this, when we "mkfs.btrfs" with a "-b fssize", if "fssize" is
larger than dev size, it will not complain and get "beyond the end" errors.

so maybe we limit the mkfs size:

diff --git a/mkfs.c b/mkfs.c
index e3ced19..3ac8525 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1282,6 +1282,8 @@ int main(int ac, char **av)
 		ret = btrfs_prepare_device(fd, file, zero_end, &dev_block_count, &mixed);
 		if (block_count == 0)
 			block_count = dev_block_count;
+		if (block_count > dev_block_count);
+			block_count = dev_block_count;
 	} else {
 		ac = 0;
 		file = av[optind++];

thanks,
liubo

> This was on Debian's linux-image-3.1.0-1-amd6 at version 3.1.6-1.
> Other ways this could happen than HPA are LVM or partitioning.
> 
> 
> Maik
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Bug(?): btrfs carries on working if part of a device disappears
  2012-01-13 12:07 ` Liu Bo
@ 2012-01-13 12:51   ` Ben Klein
  0 siblings, 0 replies; 3+ messages in thread
From: Ben Klein @ 2012-01-13 12:51 UTC (permalink / raw)
  To: Liu Bo; +Cc: Maik Zumstrull, linux-btrfs

On 13 January 2012 23:07, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
> On 01/06/2012 02:02 AM, Maik Zumstrull wrote:
>> Hello list,
>>
>> I hit a funny BIOS bug the other day where the BIOS suddenly sets a
>> HPA on a random hard disk, leaving only the first 33 MB accessible.
>> That disk had one device of a multi-device btrfs on it in my case.
>> (With dm-crypt/LUKS in between, no partitioning or LVM.)
>>
>> The reason I'm writing to you is that btrfs apparently didn't care a=
t
>> all. It didn't complain, and it certainly didn't consider "Uhm, mayb=
e
>> I should stop writing to a file system that mostly doesn't exist
>> anymore." The only errors I saw in dmesg were from the lower block
>> device level: someone trying to read or write beyond the end of a
>> device. An error btrfs apparently didn't mind. It took me a while to
>> figure out what had happened, during which time btrfsck and the btrf=
s
>> kernel part worked together to pretty much totally trash the fs. (I'=
m
>> still trying a few things, but I'm not hopeful. Hold the default
>> backup rant, I can in fact recover anything that was on this from
>> elsewhere, I think.)
>>
>> So, I think during mount, btrfs should check the reported size of th=
e
>> block device, and if it's significantly smaller than fs metadata
>> implies it must be, mount degraded or read-only or not at all. And
>> mostly, complain. Loudly.
>>
>
> I also notice this, when we "mkfs.btrfs" with a "-b fssize", if "fssi=
ze" is
> larger than dev size, it will not complain and get "beyond the end" e=
rrors.
>
> so maybe we limit the mkfs size:
>
> diff --git a/mkfs.c b/mkfs.c
> index e3ced19..3ac8525 100644
> --- a/mkfs.c
> +++ b/mkfs.c
> @@ -1282,6 +1282,8 @@ int main(int ac, char **av)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ret =3D btrfs_=
prepare_device(fd, file, zero_end, &dev_block_count, &mixed);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (block_coun=
t =3D=3D 0)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0block_count =3D dev_block_count;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (block_count > =
dev_block_count);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 block_count =3D dev_block_count;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ac =3D 0;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0file =3D av[op=
tind++];
>
> thanks,
> liubo

It might be a better idea to error out at this point. If the user is
asking for a filesystem larger than what is possible on the device, I
think the mkfs should fail completely.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-01-13 12:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-05 18:02 Bug(?): btrfs carries on working if part of a device disappears Maik Zumstrull
2012-01-13 12:07 ` Liu Bo
2012-01-13 12:51   ` Ben Klein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.