overlay file to test btrfs repairs

All of lore.kernel.org
 help / color / mirror / Atom feed

* overlay file to test btrfs repairs
@ 2016-03-21  3:43 Chris Murphy
  2016-03-21  9:55 ` Duncan
  2016-03-22 20:42 ` Henk Slager
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Murphy @ 2016-03-21  3:43 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi folks,

So I just ran into this:
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

This is a device mapper overlay file - not overlayfs.

For the repairs that are sometimes uncertain what's next, maybe this
is a viable option to avoid changing the file system? I'm thinking
chunk-recover might take up too much space, I'm not sure how that one
works, if chunks are just being read or if they have to be rewritten
or if it's just the chunk tree? But for 'btrfs check' and 'btrfs
rescue super-recover/zero-log' there should be very little being
written so the overlay idea might be a good step?

Opinions?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21  3:43 overlay file to test btrfs repairs Chris Murphy
@ 2016-03-21  9:55 ` Duncan
  2016-03-21 11:22   ` Austin S. Hemmelgarn
  2016-03-22 20:42 ` Henk Slager
  1 sibling, 1 reply; 9+ messages in thread
From: Duncan @ 2016-03-21  9:55 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Sun, 20 Mar 2016 21:43:52 -0600 as excerpted:

> Hi folks,
> 
> So I just ran into this:
> https://raid.wiki.kernel.org/index.php/
Recovering_a_failed_software_RAID#Making_the_harddisks_read-
only_using_an_overlay_file

[That's a single link, wrapped by my client.]
 
> This is a device mapper overlay file - not overlayfs.
> 
> For the repairs that are sometimes uncertain what's next, maybe this is
> a viable option to avoid changing the file system? I'm thinking
> chunk-recover might take up too much space, I'm not sure how that one
> works, if chunks are just being read or if they have to be rewritten or
> if it's just the chunk tree? But for 'btrfs check' and 'btrfs rescue
> super-recover/zero-log' there should be very little being written so the
> overlay idea might be a good step?
> 
> Opinions?

That's a creative and potentially quite useful possible solution to an 
often hairy problem.  Thanks for bringing it up. =:^)

Provided Hugo and the devs don't find major fault with the idea, linking 
that from appropriate locations (as a possible solution in the Problem 
FAQ is the first one that occurs to me) in the btrfs wiki could be quite 
useful, to many.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21  9:55 ` Duncan
@ 2016-03-21 11:22   ` Austin S. Hemmelgarn
  2016-03-21 17:13     ` Chris Murphy
  2016-03-23 19:45     ` Goffredo Baroncelli
  0 siblings, 2 replies; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-21 11:22 UTC (permalink / raw)
  To: linux-btrfs

On 2016-03-21 05:55, Duncan wrote:
> Chris Murphy posted on Sun, 20 Mar 2016 21:43:52 -0600 as excerpted:
>
>> Hi folks,
>>
>> So I just ran into this:
>> https://raid.wiki.kernel.org/index.php/
> Recovering_a_failed_software_RAID#Making_the_harddisks_read-
> only_using_an_overlay_file
>
> [That's a single link, wrapped by my client.]
>
>> This is a device mapper overlay file - not overlayfs.
>>
>> For the repairs that are sometimes uncertain what's next, maybe this is
>> a viable option to avoid changing the file system? I'm thinking
>> chunk-recover might take up too much space, I'm not sure how that one
>> works, if chunks are just being read or if they have to be rewritten or
>> if it's just the chunk tree? But for 'btrfs check' and 'btrfs rescue
>> super-recover/zero-log' there should be very little being written so the
>> overlay idea might be a good step?
>>
>> Opinions?
>
> That's a creative and potentially quite useful possible solution to an
> often hairy problem.  Thanks for bringing it up. =:^)
>
> Provided Hugo and the devs don't find major fault with the idea, linking
> that from appropriate locations (as a possible solution in the Problem
> FAQ is the first one that occurs to me) in the btrfs wiki could be quite
> useful, to many.
>
If we could find some way to have the programs themselves do this if the 
system supports it (and the user opts in of course), it would be really 
helpful.  That said, I can see this possibly causing issues due to 
duplicate device UUID's.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21 11:22   ` Austin S. Hemmelgarn
@ 2016-03-21 17:13     ` Chris Murphy
  2016-03-22 14:21       ` Austin S. Hemmelgarn
  2016-03-23 19:45     ` Goffredo Baroncelli
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2016-03-21 17:13 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

On Mon, Mar 21, 2016 at 5:22 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-03-21 05:55, Duncan wrote:
>>
>> Chris Murphy posted on Sun, 20 Mar 2016 21:43:52 -0600 as excerpted:
>>
>>> Hi folks,
>>>
>>> So I just ran into this:
>>> https://raid.wiki.kernel.org/index.php/
>>
>> Recovering_a_failed_software_RAID#Making_the_harddisks_read-
>> only_using_an_overlay_file
>>
>> [That's a single link, wrapped by my client.]
>>
>>> This is a device mapper overlay file - not overlayfs.
>>>
>>> For the repairs that are sometimes uncertain what's next, maybe this is
>>> a viable option to avoid changing the file system? I'm thinking
>>> chunk-recover might take up too much space, I'm not sure how that one
>>> works, if chunks are just being read or if they have to be rewritten or
>>> if it's just the chunk tree? But for 'btrfs check' and 'btrfs rescue
>>> super-recover/zero-log' there should be very little being written so the
>>> overlay idea might be a good step?
>>>
>>> Opinions?
>>
>>
>> That's a creative and potentially quite useful possible solution to an
>> often hairy problem.  Thanks for bringing it up. =:^)
>>
>> Provided Hugo and the devs don't find major fault with the idea, linking
>> that from appropriate locations (as a possible solution in the Problem
>> FAQ is the first one that occurs to me) in the btrfs wiki could be quite
>> useful, to many.
>>
> If we could find some way to have the programs themselves do this if the
> system supports it (and the user opts in of course), it would be really
> helpful.  That said, I can see this possibly causing issues due to duplicate
> device UUID's.

I thought of this. Btrfs seed device. The problem is it has some
minimal requirements (that I don't understand) for file system
integrity, probably starting out with the superblocks all being in a
good state. So literal leveraging of seed device is not possible, and
it's also non-obvious. Any repairs should be fail safe or they're
arguably broken. But if there were a way to effectively setup a seed +
ram or file based device behind the scene so that repairs can be
tested, that might be useful. And it would be mountable, even rw, and
that too would be reversible.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21 17:13     ` Chris Murphy
@ 2016-03-22 14:21       ` Austin S. Hemmelgarn
  2016-03-22 17:34         ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-22 14:21 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 2016-03-21 13:13, Chris Murphy wrote:
> On Mon, Mar 21, 2016 at 5:22 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-03-21 05:55, Duncan wrote:
>>>
>>> Chris Murphy posted on Sun, 20 Mar 2016 21:43:52 -0600 as excerpted:
>>>
>>>> Hi folks,
>>>>
>>>> So I just ran into this:
>>>> https://raid.wiki.kernel.org/index.php/
>>>
>>> Recovering_a_failed_software_RAID#Making_the_harddisks_read-
>>> only_using_an_overlay_file
>>>
>>> [That's a single link, wrapped by my client.]
>>>
>>>> This is a device mapper overlay file - not overlayfs.
>>>>
>>>> For the repairs that are sometimes uncertain what's next, maybe this is
>>>> a viable option to avoid changing the file system? I'm thinking
>>>> chunk-recover might take up too much space, I'm not sure how that one
>>>> works, if chunks are just being read or if they have to be rewritten or
>>>> if it's just the chunk tree? But for 'btrfs check' and 'btrfs rescue
>>>> super-recover/zero-log' there should be very little being written so the
>>>> overlay idea might be a good step?
>>>>
>>>> Opinions?
>>>
>>>
>>> That's a creative and potentially quite useful possible solution to an
>>> often hairy problem.  Thanks for bringing it up. =:^)
>>>
>>> Provided Hugo and the devs don't find major fault with the idea, linking
>>> that from appropriate locations (as a possible solution in the Problem
>>> FAQ is the first one that occurs to me) in the btrfs wiki could be quite
>>> useful, to many.
>>>
>> If we could find some way to have the programs themselves do this if the
>> system supports it (and the user opts in of course), it would be really
>> helpful.  That said, I can see this possibly causing issues due to duplicate
>> device UUID's.
>
> I thought of this. Btrfs seed device. The problem is it has some
> minimal requirements (that I don't understand) for file system
> integrity, probably starting out with the superblocks all being in a
> good state. So literal leveraging of seed device is not possible, and
> it's also non-obvious. Any repairs should be fail safe or they're
> arguably broken. But if there were a way to effectively setup a seed +
> ram or file based device behind the scene so that repairs can be
> tested, that might be useful. And it would be mountable, even rw, and
> that too would be reversible.
>
OTOH, if we could add some way to tell the code (both userspace and 
in-kernel) to explicitly ignore specific devices when trying to assemble 
filesystems, that would allow us to use DM snapshots (or something 
similar) to do this, and would also allow people to work around the UUID 
issues when dealing with LVM snapshots (or similar situations).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-22 14:21       ` Austin S. Hemmelgarn
@ 2016-03-22 17:34         ` Duncan
  0 siblings, 0 replies; 9+ messages in thread
From: Duncan @ 2016-03-22 17:34 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Tue, 22 Mar 2016 10:21:57 -0400 as
excerpted:

> OTOH, if we could add some way to tell the code (both userspace and
> in-kernel) to explicitly ignore specific devices when trying to assemble
> filesystems, that would allow us to use DM snapshots (or something
> similar) to do this, and would also allow people to work around the UUID
> issues when dealing with LVM snapshots (or similar situations).

That's a good idea, but minor detail, it'd need to resolve to specific 
block-device major:minor comparison; it couldn't be a simple device-path 
blacklist, because device paths are routinely symlinked.

I guess that's obvious from a kernel dev perspective, but perhaps not so 
much from an admin-user perspective, where the device-path /is/ often 
considered the device.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21  3:43 overlay file to test btrfs repairs Chris Murphy
  2016-03-21  9:55 ` Duncan
@ 2016-03-22 20:42 ` Henk Slager
  2016-03-23 11:17   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 9+ messages in thread
From: Henk Slager @ 2016-03-22 20:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Mon, Mar 21, 2016 at 4:43 AM, Chris Murphy <lists@colorremedies.com> wrote:
> Hi folks,
>
> So I just ran into this:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> This is a device mapper overlay file - not overlayfs.
>
> For the repairs that are sometimes uncertain what's next, maybe this
> is a viable option to avoid changing the file system? I'm thinking
> chunk-recover might take up too much space, I'm not sure how that one
> works, if chunks are just being read or if they have to be rewritten
> or if it's just the chunk tree? But for 'btrfs check' and 'btrfs
> rescue super-recover/zero-log' there should be very little being
> written so the overlay idea might be a good step?

I used the info via this message:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/54178

to try to fix a 4x4TB disks RAID10 (some bad metadata, some nbytes 400 errors).
I used AoE (instead of NBD) to avoid that btrfs+kernel might get
confused by double UUID's.

I created 4x 10G sparse files for each bcached HDD. After the --repair
action had ended (apparently successful), du reported only 50M size on
disk for each of the sparse files. The fix operation lasted about 1.5
hours. After a mount and umount again of the 'just repaired fs', a
subsequent btrfs check still reported the same errors, although
reported in another sequence.
So the nbytes 400 errors actually did not get fixed ( while there were
also other errors; This in accordance to what Qu once noted, but at
that time older tools/kernel).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-22 20:42 ` Henk Slager
@ 2016-03-23 11:17   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2016-03-23 11:17 UTC (permalink / raw)
  To: Henk Slager, Chris Murphy; +Cc: Btrfs BTRFS

On 2016-03-22 16:42, Henk Slager wrote:
> On Mon, Mar 21, 2016 at 4:43 AM, Chris Murphy <lists@colorremedies.com> wrote:
>> Hi folks,
>>
>> So I just ran into this:
>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>>
>> This is a device mapper overlay file - not overlayfs.
>>
>> For the repairs that are sometimes uncertain what's next, maybe this
>> is a viable option to avoid changing the file system? I'm thinking
>> chunk-recover might take up too much space, I'm not sure how that one
>> works, if chunks are just being read or if they have to be rewritten
>> or if it's just the chunk tree? But for 'btrfs check' and 'btrfs
>> rescue super-recover/zero-log' there should be very little being
>> written so the overlay idea might be a good step?
>
> I used the info via this message:
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/54178
>
> to try to fix a 4x4TB disks RAID10 (some bad metadata, some nbytes 400 errors).
> I used AoE (instead of NBD) to avoid that btrfs+kernel might get
> confused by double UUID's.
>
> I created 4x 10G sparse files for each bcached HDD. After the --repair
> action had ended (apparently successful), du reported only 50M size on
> disk for each of the sparse files. The fix operation lasted about 1.5
> hours. After a mount and umount again of the 'just repaired fs', a
> subsequent btrfs check still reported the same errors, although
> reported in another sequence.
> So the nbytes 400 errors actually did not get fixed ( while there were
> also other errors; This in accordance to what Qu once noted, but at
> that time older tools/kernel).

I actually do similar when I need to fix something other than my root 
filesystem on my home server system.  I run Xen though, so instead of 
using AoE, I just unmount the filesystem, set up the snapshots, and then 
attach them to the VM I use to build updates for the other VM's directly 
via Xen's virtual block device protocol, and work with them from there. 
  Obviously not practical for most people, but it works well for my setup.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: overlay file to test btrfs repairs
  2016-03-21 11:22   ` Austin S. Hemmelgarn
  2016-03-21 17:13     ` Chris Murphy
@ 2016-03-23 19:45     ` Goffredo Baroncelli
  1 sibling, 0 replies; 9+ messages in thread
From: Goffredo Baroncelli @ 2016-03-23 19:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Austin S. Hemmelgarn

On 2016-03-21 12:22, Austin S. Hemmelgarn wrote:
[...]
> If we could find some way to have the programs themselves do this if
> the system supports it (and the user opts in of course), it would be
> really helpful.  That said, I can see this possibly causing issues
> due to duplicate device UUID's. 

In the past I proposed (and implemented a prototype) of a mount.btrfs
helper [1]. The idea behind it was to collect all the action of
preparing a filesystem in only one place:

- collecting info about all the devices involved/needed
- taking the decision if a degraded filesystem has to be mounted
as degraded or an error has to be raised or continuing to wait for a new
device
- raising an error in case of conflicting uuid

Also it would be more simple to implement a logic to use an
"overlay device(s)": it could be done as option ! :-)

Now to implement these point we have to change several place (kernel, udev rules, btrfs scan utility) to got it... Not to mention that between the 
first device discovery and the filesystem mount, there would be several 
seconds due to boot process.

BR
G.Baroncelli

[1]http://marc.info/?l=linux-btrfs&m=141736989508243&w=2

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-23 19:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-21  3:43 overlay file to test btrfs repairs Chris Murphy
2016-03-21  9:55 ` Duncan
2016-03-21 11:22   ` Austin S. Hemmelgarn
2016-03-21 17:13     ` Chris Murphy
2016-03-22 14:21       ` Austin S. Hemmelgarn
2016-03-22 17:34         ` Duncan
2016-03-23 19:45     ` Goffredo Baroncelli
2016-03-22 20:42 ` Henk Slager
2016-03-23 11:17   ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.