rw-mount-problem after raid1-failure

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* rw-mount-problem after raid1-failure
@ 2015-06-08 17:10 Martin
  2015-06-10  1:19 ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Martin @ 2015-06-08 17:10 UTC (permalink / raw)
  To: linux-btrfs

Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid Vervet, 
btrfs-tools 3.17-1.1). One disk failed some days ago. I could remount the 
remaining one with "-o degraded". After one day and some write-operations 
(with no errrors) I had to reboot the system. And now I can not mount "rw" 
anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable mount is 
not allowed. 

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I did no 
conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add a new one? 
Because there are many snapshots of the filesystem, copying the system would 
be only the last alternative ;-)

Thanks

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-08 17:10 rw-mount-problem after raid1-failure Martin
@ 2015-06-10  1:19 ` Anand Jain
  2015-06-10  3:58   ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-06-10  1:19 UTC (permalink / raw)
  To: Martin, linux-btrfs



On 06/09/2015 01:10 AM, Martin wrote:
> Hello!
>
> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid Vervet,
> btrfs-tools 3.17-1.1). One disk failed some days ago. I could remount the
> remaining one with "-o degraded". After one day and some write-operations
> (with no errrors) I had to reboot the system. And now I can not mount "rw"
> anymore, only "-o degraded,ro" is possible.
>
> In the kernel log I found BTRFS: too many missing devices, writeable mount is
> not allowed.
>
> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I did no
> conversion to a single drive.
>
> How can I mount the disk "rw" to remove the "missing" drive and add a new one?
> Because there are many snapshots of the filesystem, copying the system would
> be only the last alternative ;-)

  How many disks you had in the RAID1. How many are failed ?

Thanks Anand


> Thanks
>
> Martin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10  1:19 ` Anand Jain
@ 2015-06-10  3:58   ` Duncan
  2015-06-10  6:38     ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Duncan @ 2015-06-10  3:58 UTC (permalink / raw)
  To: linux-btrfs

Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:

> On 06/09/2015 01:10 AM, Martin wrote:
>> Hello!
>>
>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>> remount the remaining one with "-o degraded". After one day and some
>> write-operations (with no errrors) I had to reboot the system. And now
>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>
>> In the kernel log I found BTRFS: too many missing devices, writeable
>> mount is not allowed.
>>
>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>> did no conversion to a single drive.
>>
>> How can I mount the disk "rw" to remove the "missing" drive and add a
>> new one?
>> Because there are many snapshots of the filesystem, copying the system
>> would be only the last alternative ;-)
> 
> How many disks you had in the RAID1. How many are failed ?

The answer is (a bit indirectly) in what you quoted.  Repeating:

>> One disk failed[.] I could remount the remaining one[.]

So it was a two-device raid1, one failed device, one remaining, unfailed.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10  3:58   ` Duncan
@ 2015-06-10  6:38     ` Anand Jain
  2015-06-10  6:58       ` Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-06-10  6:38 UTC (permalink / raw)
  To: Duncan, linux-btrfs



Ah thanks David. So its 2 disks RAID1.

Martin,

  disk pool error handle is primitive as of now. readonly is the only
  action it would take. rest of recovery action is manual. thats
  unacceptable in a data center solutions. I don't recommend btrfs VM
  productions yet. But we are working to get that to a complete VM.

  For now, for your pool recovery: pls try this.

     - After reboot.
     - modunload and modload (so that kernel devlist is empty)
     - mount -o degraded <good-disk> <-- this should work.
     - btrfs fi show -m <-- Should show missing if you don't let me know.
     - Do a replace of the missing disk without reading the source disk.

Good luck.

Thanks, Anand


On 06/10/2015 11:58 AM, Duncan wrote:
> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>
>> On 06/09/2015 01:10 AM, Martin wrote:
>>> Hello!
>>>
>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>>> remount the remaining one with "-o degraded". After one day and some
>>> write-operations (with no errrors) I had to reboot the system. And now
>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>>
>>> In the kernel log I found BTRFS: too many missing devices, writeable
>>> mount is not allowed.
>>>
>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>>> did no conversion to a single drive.
>>>
>>> How can I mount the disk "rw" to remove the "missing" drive and add a
>>> new one?
>>> Because there are many snapshots of the filesystem, copying the system
>>> would be only the last alternative ;-)
>>
>> How many disks you had in the RAID1. How many are failed ?
>
> The answer is (a bit indirectly) in what you quoted.  Repeating:
>
>>> One disk failed[.] I could remount the remaining one[.]
>
> So it was a two-device raid1, one failed device, one remaining, unfailed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10  6:38     ` Anand Jain
@ 2015-06-10  6:58       ` Martin
  2015-06-10  7:46         ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Martin @ 2015-06-10  6:58 UTC (permalink / raw)
  To: Anand Jain; +Cc: Duncan, linux-btrfs

Hello Anand,

the

> mount -o degraded <good-disk> <-- this should work

is my problem. The fist times it works but suddently, after a reboot, it fails 
with message "BTRFS: too many missing devices, writeable mount is not allowed" 
in kernel log.

"btrfs fi show /backup2" shows:
Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
	Total devices 2 FS bytes used 3.50TiB
	devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
	*** Some devices missing

I suppose there is a "marker", telling the system only to mount in ro-mode?

Due to the ro-mount I can't replace the missing one because all the btrfs-
commands need rw-access ...

Martin

Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> Ah thanks David. So its 2 disks RAID1.
> 
> Martin,
> 
>   disk pool error handle is primitive as of now. readonly is the only
>   action it would take. rest of recovery action is manual. thats
>   unacceptable in a data center solutions. I don't recommend btrfs VM
>   productions yet. But we are working to get that to a complete VM.
> 
>   For now, for your pool recovery: pls try this.
> 
>      - After reboot.
>      - modunload and modload (so that kernel devlist is empty)
>      - mount -o degraded <good-disk> <-- this should work.
>      - btrfs fi show -m <-- Should show missing if you don't let me know.
>      - Do a replace of the missing disk without reading the source disk.
> 
> Good luck.
> 
> Thanks, Anand
> 
> On 06/10/2015 11:58 AM, Duncan wrote:
> > Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >> On 06/09/2015 01:10 AM, Martin wrote:
> >>> Hello!
> >>> 
> >>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> >>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
> >>> remount the remaining one with "-o degraded". After one day and some
> >>> write-operations (with no errrors) I had to reboot the system. And now
> >>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>> 
> >>> In the kernel log I found BTRFS: too many missing devices, writeable
> >>> mount is not allowed.
> >>> 
> >>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> >>> did no conversion to a single drive.
> >>> 
> >>> How can I mount the disk "rw" to remove the "missing" drive and add a
> >>> new one?
> >>> Because there are many snapshots of the filesystem, copying the system
> >>> would be only the last alternative ;-)
> >> 
> >> How many disks you had in the RAID1. How many are failed ?
> > 
> > The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>> One disk failed[.] I could remount the remaining one[.]
> > 
> > So it was a two-device raid1, one failed device, one remaining, unfailed.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10  6:58       ` Martin
@ 2015-06-10  7:46         ` Anand Jain
  2015-06-10 12:05           ` Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-06-10  7:46 UTC (permalink / raw)
  To: Martin; +Cc: Duncan, linux-btrfs



On 06/10/2015 02:58 PM, Martin wrote:
> Hello Anand,
>
> the
>
>> mount -o degraded <good-disk> <-- this should work
>
> is my problem. The fist times it works but suddently, after a reboot, it fails
> with message "BTRFS: too many missing devices, writeable mount is not allowed"
> in kernel log.

  the failed(ing) disk is it still physically in the system ?
  when btrfs finds EIO on the intermittently failing disk,
  ro-mode kicks in, (there are some opportunity for fixes which
  I am trying). To recover, the approach is to make the failing
  disk a missing disk instead, by pulling out the failing disk
  from the system and boot. When system finds disk missing
  (not EIO rather) it should mount rw,degraded (from the VM part
  at least) and then replace (with a new disk) should work.

Thanks, Anand


> "btrfs fi show /backup2" shows:
> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> 	Total devices 2 FS bytes used 3.50TiB
> 	devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
> 	*** Some devices missing
>
> I suppose there is a "marker", telling the system only to mount in ro-mode?
>
> Due to the ro-mount I can't replace the missing one because all the btrfs-
> commands need rw-access ...
>
> Martin
>
> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
>> Ah thanks David. So its 2 disks RAID1.
>>
>> Martin,
>>
>>    disk pool error handle is primitive as of now. readonly is the only
>>    action it would take. rest of recovery action is manual. thats
>>    unacceptable in a data center solutions. I don't recommend btrfs VM
>>    productions yet. But we are working to get that to a complete VM.
>>
>>    For now, for your pool recovery: pls try this.
>>
>>       - After reboot.
>>       - modunload and modload (so that kernel devlist is empty)
>>       - mount -o degraded <good-disk> <-- this should work.
>>       - btrfs fi show -m <-- Should show missing if you don't let me know.
>>       - Do a replace of the missing disk without reading the source disk.
>>
>> Good luck.
>>
>> Thanks, Anand
>>
>> On 06/10/2015 11:58 AM, Duncan wrote:
>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>>>> On 06/09/2015 01:10 AM, Martin wrote:
>>>>> Hello!
>>>>>
>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>>>>> remount the remaining one with "-o degraded". After one day and some
>>>>> write-operations (with no errrors) I had to reboot the system. And now
>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>>>>
>>>>> In the kernel log I found BTRFS: too many missing devices, writeable
>>>>> mount is not allowed.
>>>>>
>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>>>>> did no conversion to a single drive.
>>>>>
>>>>> How can I mount the disk "rw" to remove the "missing" drive and add a
>>>>> new one?
>>>>> Because there are many snapshots of the filesystem, copying the system
>>>>> would be only the last alternative ;-)
>>>>
>>>> How many disks you had in the RAID1. How many are failed ?
>>>
>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
>>>>> One disk failed[.] I could remount the remaining one[.]
>>>
>>> So it was a two-device raid1, one failed device, one remaining, unfailed.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10  7:46         ` Anand Jain
@ 2015-06-10 12:05           ` Martin
  2015-06-11  0:04             ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Martin @ 2015-06-10 12:05 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

Hello Anand,

the failed disk was removed. My procedure was the following:

 - I found some write errors in the kernel log, so
 - I shutdown the system
 - I removed the failed disk
 - I powered on the system
 - I mounted the remaining disk degraded,rw (works OK)
 - the system works an and was rebooted some times, mounting degraded,rw works
 - suddentlym mounting degraded,rw stops working and only degraded,ro works.

Thanks, Martin


Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
> On 06/10/2015 02:58 PM, Martin wrote:
> > Hello Anand,
> > 
> > the
> > 
> >> mount -o degraded <good-disk> <-- this should work
> > 
> > is my problem. The fist times it works but suddently, after a reboot, it
> > fails with message "BTRFS: too many missing devices, writeable mount is
> > not allowed" in kernel log.
> 
>   the failed(ing) disk is it still physically in the system ?
>   when btrfs finds EIO on the intermittently failing disk,
>   ro-mode kicks in, (there are some opportunity for fixes which
>   I am trying). To recover, the approach is to make the failing
>   disk a missing disk instead, by pulling out the failing disk
>   from the system and boot. When system finds disk missing
>   (not EIO rather) it should mount rw,degraded (from the VM part
>   at least) and then replace (with a new disk) should work.
> 
> Thanks, Anand
> 
> > "btrfs fi show /backup2" shows:
> > Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> > 
> > 	Total devices 2 FS bytes used 3.50TiB
> > 	devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
> > 	*** Some devices missing
> > 
> > I suppose there is a "marker", telling the system only to mount in
> > ro-mode?
> > 
> > Due to the ro-mount I can't replace the missing one because all the btrfs-
> > commands need rw-access ...
> > 
> > Martin
> > 
> > Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> >> Ah thanks David. So its 2 disks RAID1.
> >> 
> >> Martin,
> >> 
> >>    disk pool error handle is primitive as of now. readonly is the only
> >>    action it would take. rest of recovery action is manual. thats
> >>    unacceptable in a data center solutions. I don't recommend btrfs VM
> >>    productions yet. But we are working to get that to a complete VM.
> >>    
> >>    For now, for your pool recovery: pls try this.
> >>    
> >>       - After reboot.
> >>       - modunload and modload (so that kernel devlist is empty)
> >>       - mount -o degraded <good-disk> <-- this should work.
> >>       - btrfs fi show -m <-- Should show missing if you don't let me
> >>       know.
> >>       - Do a replace of the missing disk without reading the source disk.
> >> 
> >> Good luck.
> >> 
> >> Thanks, Anand
> >> 
> >> On 06/10/2015 11:58 AM, Duncan wrote:
> >>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >>>> On 06/09/2015 01:10 AM, Martin wrote:
> >>>>> Hello!
> >>>>> 
> >>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> >>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
> >>>>> remount the remaining one with "-o degraded". After one day and some
> >>>>> write-operations (with no errrors) I had to reboot the system. And now
> >>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>>>> 
> >>>>> In the kernel log I found BTRFS: too many missing devices, writeable
> >>>>> mount is not allowed.
> >>>>> 
> >>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> >>>>> did no conversion to a single drive.
> >>>>> 
> >>>>> How can I mount the disk "rw" to remove the "missing" drive and add a
> >>>>> new one?
> >>>>> Because there are many snapshots of the filesystem, copying the system
> >>>>> would be only the last alternative ;-)
> >>>> 
> >>>> How many disks you had in the RAID1. How many are failed ?
> >>> 
> >>> The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>>>> One disk failed[.] I could remount the remaining one[.]
> >>> 
> >>> So it was a two-device raid1, one failed device, one remaining,
> >>> unfailed.
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-10 12:05           ` Martin
@ 2015-06-11  0:04             ` Anand Jain
  2015-06-11 13:03               ` Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-06-11  0:04 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs@vger.kernel.org



> On 10 Jun 2015, at 5:35 pm, Martin <develop@imagmbh.de> wrote:
> 
> Hello Anand,
> 
> the failed disk was removed. My procedure was the following:
> 
> - I found some write errors in the kernel log, so
> - I shutdown the system
> - I removed the failed disk
> - I powered on the system
> - I mounted the remaining disk degraded,rw (works OK)
> - the system works an and was rebooted some times, mounting degraded,rw works
> - suddentlym mounting degraded,rw stops working and only degraded,ro works.

any logs to say why. ?
Or
If these (above) stages are reproducible, could you fetch them afresh?

Thanks Anand

> Thanks, Martin
> 
> 
> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
>> On 06/10/2015 02:58 PM, Martin wrote:
>>> Hello Anand,
>>> 
>>> the
>>> 
>>>> mount -o degraded <good-disk> <-- this should work
>>> 
>>> is my problem. The fist times it works but suddently, after a reboot, it
>>> fails with message "BTRFS: too many missing devices, writeable mount is
>>> not allowed" in kernel log.
>> 
>>  the failed(ing) disk is it still physically in the system ?
>>  when btrfs finds EIO on the intermittently failing disk,
>>  ro-mode kicks in, (there are some opportunity for fixes which
>>  I am trying). To recover, the approach is to make the failing
>>  disk a missing disk instead, by pulling out the failing disk
>>  from the system and boot. When system finds disk missing
>>  (not EIO rather) it should mount rw,degraded (from the VM part
>>  at least) and then replace (with a new disk) should work.
>> 
>> Thanks, Anand
>> 
>>> "btrfs fi show /backup2" shows:
>>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
>>> 
>>>    Total devices 2 FS bytes used 3.50TiB
>>>    devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
>>>    *** Some devices missing
>>> 
>>> I suppose there is a "marker", telling the system only to mount in
>>> ro-mode?
>>> 
>>> Due to the ro-mount I can't replace the missing one because all the btrfs-
>>> commands need rw-access ...
>>> 
>>> Martin
>>> 
>>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
>>>> Ah thanks David. So its 2 disks RAID1.
>>>> 
>>>> Martin,
>>>> 
>>>>   disk pool error handle is primitive as of now. readonly is the only
>>>>   action it would take. rest of recovery action is manual. thats
>>>>   unacceptable in a data center solutions. I don't recommend btrfs VM
>>>>   productions yet. But we are working to get that to a complete VM.
>>>> 
>>>>   For now, for your pool recovery: pls try this.
>>>> 
>>>>      - After reboot.
>>>>      - modunload and modload (so that kernel devlist is empty)
>>>>      - mount -o degraded <good-disk> <-- this should work.
>>>>      - btrfs fi show -m <-- Should show missing if you don't let me
>>>>      know.
>>>>      - Do a replace of the missing disk without reading the source disk.
>>>> 
>>>> Good luck.
>>>> 
>>>> Thanks, Anand
>>>> 
>>>>> On 06/10/2015 11:58 AM, Duncan wrote:
>>>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>>>>>>> On 06/09/2015 01:10 AM, Martin wrote:
>>>>>>> Hello!
>>>>>>> 
>>>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>>>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>>>>>>> remount the remaining one with "-o degraded". After one day and some
>>>>>>> write-operations (with no errrors) I had to reboot the system. And now
>>>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>>>>>> 
>>>>>>> In the kernel log I found BTRFS: too many missing devices, writeable
>>>>>>> mount is not allowed.
>>>>>>> 
>>>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>>>>>>> did no conversion to a single drive.
>>>>>>> 
>>>>>>> How can I mount the disk "rw" to remove the "missing" drive and add a
>>>>>>> new one?
>>>>>>> Because there are many snapshots of the filesystem, copying the system
>>>>>>> would be only the last alternative ;-)
>>>>>> 
>>>>>> How many disks you had in the RAID1. How many are failed ?
>>>>> 
>>>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
>>>>>>> One disk failed[.] I could remount the remaining one[.]
>>>>> 
>>>>> So it was a two-device raid1, one failed device, one remaining,
>>>>> unfailed.
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-11  0:04             ` Anand Jain
@ 2015-06-11 13:03               ` Martin
  2015-06-12 10:38                 ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Martin @ 2015-06-11 13:03 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs@vger.kernel.org

It is reproduceable but the logs doesn't say much:

dmesg:
[151183.214355] BTRFS info (device sdb2): allowing degraded mounts
[151183.214361] BTRFS info (device sdb2): disk space caching is enabled
[151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush 150, 
corrupt 0, gen 0
[151214.513046] BTRFS: too many missing devices, writeable mount is not 
allowed
[151214.548566] BTRFS: open_ctree failed

Can I get more info out of the kernel-module?

Thanks, Martin

Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
> > On 10 Jun 2015, at 5:35 pm, Martin <develop@imagmbh.de> wrote:
> > 
> > Hello Anand,
> > 
> > the failed disk was removed. My procedure was the following:
> > 
> > - I found some write errors in the kernel log, so
> > - I shutdown the system
> > - I removed the failed disk
> > - I powered on the system
> > - I mounted the remaining disk degraded,rw (works OK)
> > - the system works an and was rebooted some times, mounting degraded,rw
> > works - suddentlym mounting degraded,rw stops working and only
> > degraded,ro works.
> any logs to say why. ?
> Or
> If these (above) stages are reproducible, could you fetch them afresh?
> 
> Thanks Anand
> 
> > Thanks, Martin
> > 
> > Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
> >> On 06/10/2015 02:58 PM, Martin wrote:
> >>> Hello Anand,
> >>> 
> >>> the
> >>> 
> >>>> mount -o degraded <good-disk> <-- this should work
> >>> 
> >>> is my problem. The fist times it works but suddently, after a reboot, it
> >>> fails with message "BTRFS: too many missing devices, writeable mount is
> >>> not allowed" in kernel log.
> >>> 
> >>  the failed(ing) disk is it still physically in the system ?
> >>  when btrfs finds EIO on the intermittently failing disk,
> >>  ro-mode kicks in, (there are some opportunity for fixes which
> >>  I am trying). To recover, the approach is to make the failing
> >>  disk a missing disk instead, by pulling out the failing disk
> >>  from the system and boot. When system finds disk missing
> >>  (not EIO rather) it should mount rw,degraded (from the VM part
> >>  at least) and then replace (with a new disk) should work.
> >> 
> >> Thanks, Anand
> >> 
> >>> "btrfs fi show /backup2" shows:
> >>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> >>> 
> >>>    Total devices 2 FS bytes used 3.50TiB
> >>>    devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
> >>>    *** Some devices missing
> >>> 
> >>> I suppose there is a "marker", telling the system only to mount in
> >>> ro-mode?
> >>> 
> >>> Due to the ro-mount I can't replace the missing one because all the
> >>> btrfs-
> >>> commands need rw-access ...
> >>> 
> >>> Martin
> >>> 
> >>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> >>>> Ah thanks David. So its 2 disks RAID1.
> >>>> 
> >>>> Martin,
> >>>> 
> >>>>   disk pool error handle is primitive as of now. readonly is the only
> >>>>   action it would take. rest of recovery action is manual. thats
> >>>>   unacceptable in a data center solutions. I don't recommend btrfs VM
> >>>>   productions yet. But we are working to get that to a complete VM.
> >>>>   
> >>>>   For now, for your pool recovery: pls try this.
> >>>>   
> >>>>      - After reboot.
> >>>>      - modunload and modload (so that kernel devlist is empty)
> >>>>      - mount -o degraded <good-disk> <-- this should work.
> >>>>      - btrfs fi show -m <-- Should show missing if you don't let me
> >>>>      know.
> >>>>      - Do a replace of the missing disk without reading the source
> >>>>      disk.
> >>>> 
> >>>> Good luck.
> >>>> 
> >>>> Thanks, Anand
> >>>> 
> >>>>> On 06/10/2015 11:58 AM, Duncan wrote:
> >>>>> 
> >>>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >>>>>>> On 06/09/2015 01:10 AM, Martin wrote:
> >>>>>>> Hello!
> >>>>>>> 
> >>>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> >>>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
> >>>>>>> could
> >>>>>>> remount the remaining one with "-o degraded". After one day and some
> >>>>>>> write-operations (with no errrors) I had to reboot the system. And
> >>>>>>> now
> >>>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>>>>>> 
> >>>>>>> In the kernel log I found BTRFS: too many missing devices, writeable
> >>>>>>> mount is not allowed.
> >>>>>>> 
> >>>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> >>>>>>> did no conversion to a single drive.
> >>>>>>> 
> >>>>>>> How can I mount the disk "rw" to remove the "missing" drive and add
> >>>>>>> a
> >>>>>>> new one?
> >>>>>>> Because there are many snapshots of the filesystem, copying the
> >>>>>>> system
> >>>>>>> would be only the last alternative ;-)
> >>>>>> 
> >>>>>> How many disks you had in the RAID1. How many are failed ?
> >>>>> 
> >>>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>>>>>> One disk failed[.] I could remount the remaining one[.]
> >>>>> 
> >>>>> So it was a two-device raid1, one failed device, one remaining,
> >>>>> unfailed.
> >>>> 
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> >>>> in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> 
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> >>> in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-11 13:03               ` Martin
@ 2015-06-12 10:38                 ` Anand Jain
  2015-06-14 18:24                   ` Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2015-06-12 10:38 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs@vger.kernel.org



On 06/11/2015 09:03 PM, Martin wrote:
> It is reproduceable but the logs doesn't say much:
>
> dmesg:
> [151183.214355] BTRFS info (device sdb2): allowing degraded mounts
> [151183.214361] BTRFS info (device sdb2): disk space caching is enabled
> [151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush 150,
> corrupt 0, gen 0
> [151214.513046] BTRFS: too many missing devices, writeable mount is not
> allowed

presumably (we did not confirm that only one disk is missing from
kernel point of view?) with One disk missing if you are still getting
this that means, there is a group profile in your disk pool that does
not tolerate single disk failure either.

So now how would we check all the group profiles in an unmount(able)
state ?

There is a patch to show devlist using /proc/fs/btrfs/devlist.
That would have helped here to debug. I am ok if you could confirm
that using any other method as well.

Thanks, Anand


> [151214.548566] BTRFS: open_ctree failed
>
> Can I get more info out of the kernel-module?
>
> Thanks, Martin
>
> Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
>>> On 10 Jun 2015, at 5:35 pm, Martin <develop@imagmbh.de> wrote:
>>>
>>> Hello Anand,
>>>
>>> the failed disk was removed. My procedure was the following:
>>>
>>> - I found some write errors in the kernel log, so
>>> - I shutdown the system
>>> - I removed the failed disk
>>> - I powered on the system
>>> - I mounted the remaining disk degraded,rw (works OK)
>>> - the system works an and was rebooted some times, mounting degraded,rw
>>> works - suddentlym mounting degraded,rw stops working and only
>>> degraded,ro works.
>> any logs to say why. ?
>> Or
>> If these (above) stages are reproducible, could you fetch them afresh?
>>
>> Thanks Anand
>>
>>> Thanks, Martin
>>>
>>> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
>>>> On 06/10/2015 02:58 PM, Martin wrote:
>>>>> Hello Anand,
>>>>>
>>>>> the
>>>>>
>>>>>> mount -o degraded <good-disk> <-- this should work
>>>>>
>>>>> is my problem. The fist times it works but suddently, after a reboot, it
>>>>> fails with message "BTRFS: too many missing devices, writeable mount is
>>>>> not allowed" in kernel log.
>>>>>
>>>>   the failed(ing) disk is it still physically in the system ?
>>>>   when btrfs finds EIO on the intermittently failing disk,
>>>>   ro-mode kicks in, (there are some opportunity for fixes which
>>>>   I am trying). To recover, the approach is to make the failing
>>>>   disk a missing disk instead, by pulling out the failing disk
>>>>   from the system and boot. When system finds disk missing
>>>>   (not EIO rather) it should mount rw,degraded (from the VM part
>>>>   at least) and then replace (with a new disk) should work.
>>>>
>>>> Thanks, Anand
>>>>
>>>>> "btrfs fi show /backup2" shows:
>>>>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
>>>>>
>>>>>     Total devices 2 FS bytes used 3.50TiB
>>>>>     devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
>>>>>     *** Some devices missing
>>>>>
>>>>> I suppose there is a "marker", telling the system only to mount in
>>>>> ro-mode?
>>>>>
>>>>> Due to the ro-mount I can't replace the missing one because all the
>>>>> btrfs-
>>>>> commands need rw-access ...
>>>>>
>>>>> Martin
>>>>>
>>>>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
>>>>>> Ah thanks David. So its 2 disks RAID1.
>>>>>>
>>>>>> Martin,
>>>>>>
>>>>>>    disk pool error handle is primitive as of now. readonly is the only
>>>>>>    action it would take. rest of recovery action is manual. thats
>>>>>>    unacceptable in a data center solutions. I don't recommend btrfs VM
>>>>>>    productions yet. But we are working to get that to a complete VM.
>>>>>>
>>>>>>    For now, for your pool recovery: pls try this.
>>>>>>
>>>>>>       - After reboot.
>>>>>>       - modunload and modload (so that kernel devlist is empty)
>>>>>>       - mount -o degraded <good-disk> <-- this should work.
>>>>>>       - btrfs fi show -m <-- Should show missing if you don't let me
>>>>>>       know.
>>>>>>       - Do a replace of the missing disk without reading the source
>>>>>>       disk.
>>>>>>
>>>>>> Good luck.
>>>>>>
>>>>>> Thanks, Anand
>>>>>>
>>>>>>> On 06/10/2015 11:58 AM, Duncan wrote:
>>>>>>>
>>>>>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>>>>>>>>> On 06/09/2015 01:10 AM, Martin wrote:
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>>>>>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
>>>>>>>>> could
>>>>>>>>> remount the remaining one with "-o degraded". After one day and some
>>>>>>>>> write-operations (with no errrors) I had to reboot the system. And
>>>>>>>>> now
>>>>>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>>>>>>>>
>>>>>>>>> In the kernel log I found BTRFS: too many missing devices, writeable
>>>>>>>>> mount is not allowed.
>>>>>>>>>
>>>>>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>>>>>>>>> did no conversion to a single drive.
>>>>>>>>>
>>>>>>>>> How can I mount the disk "rw" to remove the "missing" drive and add
>>>>>>>>> a
>>>>>>>>> new one?
>>>>>>>>> Because there are many snapshots of the filesystem, copying the
>>>>>>>>> system
>>>>>>>>> would be only the last alternative ;-)
>>>>>>>>
>>>>>>>> How many disks you had in the RAID1. How many are failed ?
>>>>>>>
>>>>>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
>>>>>>>>> One disk failed[.] I could remount the remaining one[.]
>>>>>>>
>>>>>>> So it was a two-device raid1, one failed device, one remaining,
>>>>>>> unfailed.
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>> in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-12 10:38                 ` Anand Jain
@ 2015-06-14 18:24                   ` Martin
  2015-06-15  0:58                     ` Anand Jain
  0 siblings, 1 reply; 12+ messages in thread
From: Martin @ 2015-06-14 18:24 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs@vger.kernel.org

Do you know, where I can find this kernel-patch because I didn't find it. Then 
I will build the patched kernel and send the devlist-output.

Thanks, Martin

Am Freitag, 12. Juni 2015, 18:38:18 schrieb Anand Jain:
> On 06/11/2015 09:03 PM, Martin wrote:
> > It is reproduceable but the logs doesn't say much:
> > 
> > dmesg:
> > [151183.214355] BTRFS info (device sdb2): allowing degraded mounts
> > [151183.214361] BTRFS info (device sdb2): disk space caching is enabled
> > [151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush
> > 150,
> > corrupt 0, gen 0
> > [151214.513046] BTRFS: too many missing devices, writeable mount is not
> > allowed
> 
> presumably (we did not confirm that only one disk is missing from
> kernel point of view?) with One disk missing if you are still getting
> this that means, there is a group profile in your disk pool that does
> not tolerate single disk failure either.
> 
> So now how would we check all the group profiles in an unmount(able)
> state ?
> 
> There is a patch to show devlist using /proc/fs/btrfs/devlist.
> That would have helped here to debug. I am ok if you could confirm
> that using any other method as well.
> 
> Thanks, Anand
> 
> > [151214.548566] BTRFS: open_ctree failed
> > 
> > Can I get more info out of the kernel-module?
> > 
> > Thanks, Martin
> > 
> > Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
> >>> On 10 Jun 2015, at 5:35 pm, Martin <develop@imagmbh.de> wrote:
> >>> 
> >>> Hello Anand,
> >>> 
> >>> the failed disk was removed. My procedure was the following:
> >>> 
> >>> - I found some write errors in the kernel log, so
> >>> - I shutdown the system
> >>> - I removed the failed disk
> >>> - I powered on the system
> >>> - I mounted the remaining disk degraded,rw (works OK)
> >>> - the system works an and was rebooted some times, mounting degraded,rw
> >>> works - suddentlym mounting degraded,rw stops working and only
> >>> degraded,ro works.
> >> 
> >> any logs to say why. ?
> >> Or
> >> If these (above) stages are reproducible, could you fetch them afresh?
> >> 
> >> Thanks Anand
> >> 
> >>> Thanks, Martin
> >>> 
> >>> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
> >>>> On 06/10/2015 02:58 PM, Martin wrote:
> >>>>> Hello Anand,
> >>>>> 
> >>>>> the
> >>>>> 
> >>>>>> mount -o degraded <good-disk> <-- this should work
> >>>>> 
> >>>>> is my problem. The fist times it works but suddently, after a reboot,
> >>>>> it
> >>>>> fails with message "BTRFS: too many missing devices, writeable mount
> >>>>> is
> >>>>> not allowed" in kernel log.
> >>>>> 
> >>>>   the failed(ing) disk is it still physically in the system ?
> >>>>   when btrfs finds EIO on the intermittently failing disk,
> >>>>   ro-mode kicks in, (there are some opportunity for fixes which
> >>>>   I am trying). To recover, the approach is to make the failing
> >>>>   disk a missing disk instead, by pulling out the failing disk
> >>>>   from the system and boot. When system finds disk missing
> >>>>   (not EIO rather) it should mount rw,degraded (from the VM part
> >>>>   at least) and then replace (with a new disk) should work.
> >>>> 
> >>>> Thanks, Anand
> >>>> 
> >>>>> "btrfs fi show /backup2" shows:
> >>>>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> >>>>> 
> >>>>>     Total devices 2 FS bytes used 3.50TiB
> >>>>>     devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
> >>>>>     *** Some devices missing
> >>>>> 
> >>>>> I suppose there is a "marker", telling the system only to mount in
> >>>>> ro-mode?
> >>>>> 
> >>>>> Due to the ro-mount I can't replace the missing one because all the
> >>>>> btrfs-
> >>>>> commands need rw-access ...
> >>>>> 
> >>>>> Martin
> >>>>> 
> >>>>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> >>>>>> Ah thanks David. So its 2 disks RAID1.
> >>>>>> 
> >>>>>> Martin,
> >>>>>> 
> >>>>>>    disk pool error handle is primitive as of now. readonly is the
> >>>>>>    only
> >>>>>>    action it would take. rest of recovery action is manual. thats
> >>>>>>    unacceptable in a data center solutions. I don't recommend btrfs
> >>>>>>    VM
> >>>>>>    productions yet. But we are working to get that to a complete VM.
> >>>>>>    
> >>>>>>    For now, for your pool recovery: pls try this.
> >>>>>>    
> >>>>>>       - After reboot.
> >>>>>>       - modunload and modload (so that kernel devlist is empty)
> >>>>>>       - mount -o degraded <good-disk> <-- this should work.
> >>>>>>       - btrfs fi show -m <-- Should show missing if you don't let me
> >>>>>>       know.
> >>>>>>       - Do a replace of the missing disk without reading the source
> >>>>>>       disk.
> >>>>>> 
> >>>>>> Good luck.
> >>>>>> 
> >>>>>> Thanks, Anand
> >>>>>> 
> >>>>>>> On 06/10/2015 11:58 AM, Duncan wrote:
> >>>>>>> 
> >>>>>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >>>>>>>>> On 06/09/2015 01:10 AM, Martin wrote:
> >>>>>>>>> Hello!
> >>>>>>>>> 
> >>>>>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu
> >>>>>>>>> Vivid
> >>>>>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
> >>>>>>>>> could
> >>>>>>>>> remount the remaining one with "-o degraded". After one day and
> >>>>>>>>> some
> >>>>>>>>> write-operations (with no errrors) I had to reboot the system. And
> >>>>>>>>> now
> >>>>>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>>>>>>>> 
> >>>>>>>>> In the kernel log I found BTRFS: too many missing devices,
> >>>>>>>>> writeable
> >>>>>>>>> mount is not allowed.
> >>>>>>>>> 
> >>>>>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but
> >>>>>>>>> I
> >>>>>>>>> did no conversion to a single drive.
> >>>>>>>>> 
> >>>>>>>>> How can I mount the disk "rw" to remove the "missing" drive and
> >>>>>>>>> add
> >>>>>>>>> a
> >>>>>>>>> new one?
> >>>>>>>>> Because there are many snapshots of the filesystem, copying the
> >>>>>>>>> system
> >>>>>>>>> would be only the last alternative ;-)
> >>>>>>>> 
> >>>>>>>> How many disks you had in the RAID1. How many are failed ?
> >>>>>>> 
> >>>>>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>>>>>>>> One disk failed[.] I could remount the remaining one[.]
> >>>>>>> 
> >>>>>>> So it was a two-device raid1, one failed device, one remaining,
> >>>>>>> unfailed.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: rw-mount-problem after raid1-failure
  2015-06-14 18:24                   ` Martin
@ 2015-06-15  0:58                     ` Anand Jain
  0 siblings, 0 replies; 12+ messages in thread
From: Anand Jain @ 2015-06-15  0:58 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs@vger.kernel.org

Martin,

below patch will help to obtain the device list from the kernel.

https://patchwork.kernel.org/patch/4996111/

(just fyi above patch is v1 which should apply,
however its v2 would not apply without sysfs patches)

However  other data what we need is group profile,
collect them when and if device could mount.

(was there any patch which can obtain the group profile,
without mounting ?, I vaguely remember some efforts/comment
on that before).

Thanks, Anand



On 06/15/2015 02:24 AM, Martin wrote:
> Do you know, where I can find this kernel-patch because I didn't find it. Then
> I will build the patched kernel and send the devlist-output.
>
> Thanks, Martin
>
> Am Freitag, 12. Juni 2015, 18:38:18 schrieb Anand Jain:
>> On 06/11/2015 09:03 PM, Martin wrote:
>>> It is reproduceable but the logs doesn't say much:
>>>
>>> dmesg:
>>> [151183.214355] BTRFS info (device sdb2): allowing degraded mounts
>>> [151183.214361] BTRFS info (device sdb2): disk space caching is enabled
>>> [151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush
>>> 150,
>>> corrupt 0, gen 0
>>> [151214.513046] BTRFS: too many missing devices, writeable mount is not
>>> allowed
>>
>> presumably (we did not confirm that only one disk is missing from
>> kernel point of view?) with One disk missing if you are still getting
>> this that means, there is a group profile in your disk pool that does
>> not tolerate single disk failure either.
>>
>> So now how would we check all the group profiles in an unmount(able)
>> state ?
>>
>> There is a patch to show devlist using /proc/fs/btrfs/devlist.
>> That would have helped here to debug. I am ok if you could confirm
>> that using any other method as well.
>>
>> Thanks, Anand
>>
>>> [151214.548566] BTRFS: open_ctree failed
>>>
>>> Can I get more info out of the kernel-module?
>>>
>>> Thanks, Martin
>>>
>>> Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
>>>>> On 10 Jun 2015, at 5:35 pm, Martin <develop@imagmbh.de> wrote:
>>>>>
>>>>> Hello Anand,
>>>>>
>>>>> the failed disk was removed. My procedure was the following:
>>>>>
>>>>> - I found some write errors in the kernel log, so
>>>>> - I shutdown the system
>>>>> - I removed the failed disk
>>>>> - I powered on the system
>>>>> - I mounted the remaining disk degraded,rw (works OK)
>>>>> - the system works an and was rebooted some times, mounting degraded,rw
>>>>> works - suddentlym mounting degraded,rw stops working and only
>>>>> degraded,ro works.
>>>>
>>>> any logs to say why. ?
>>>> Or
>>>> If these (above) stages are reproducible, could you fetch them afresh?
>>>>
>>>> Thanks Anand
>>>>
>>>>> Thanks, Martin
>>>>>
>>>>> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
>>>>>> On 06/10/2015 02:58 PM, Martin wrote:
>>>>>>> Hello Anand,
>>>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> mount -o degraded <good-disk> <-- this should work
>>>>>>>
>>>>>>> is my problem. The fist times it works but suddently, after a reboot,
>>>>>>> it
>>>>>>> fails with message "BTRFS: too many missing devices, writeable mount
>>>>>>> is
>>>>>>> not allowed" in kernel log.
>>>>>>>
>>>>>>    the failed(ing) disk is it still physically in the system ?
>>>>>>    when btrfs finds EIO on the intermittently failing disk,
>>>>>>    ro-mode kicks in, (there are some opportunity for fixes which
>>>>>>    I am trying). To recover, the approach is to make the failing
>>>>>>    disk a missing disk instead, by pulling out the failing disk
>>>>>>    from the system and boot. When system finds disk missing
>>>>>>    (not EIO rather) it should mount rw,degraded (from the VM part
>>>>>>    at least) and then replace (with a new disk) should work.
>>>>>>
>>>>>> Thanks, Anand
>>>>>>
>>>>>>> "btrfs fi show /backup2" shows:
>>>>>>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
>>>>>>>
>>>>>>>      Total devices 2 FS bytes used 3.50TiB
>>>>>>>      devid    4 size 7.19TiB used 4.02TiB path /dev/sdb2
>>>>>>>      *** Some devices missing
>>>>>>>
>>>>>>> I suppose there is a "marker", telling the system only to mount in
>>>>>>> ro-mode?
>>>>>>>
>>>>>>> Due to the ro-mount I can't replace the missing one because all the
>>>>>>> btrfs-
>>>>>>> commands need rw-access ...
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
>>>>>>>> Ah thanks David. So its 2 disks RAID1.
>>>>>>>>
>>>>>>>> Martin,
>>>>>>>>
>>>>>>>>     disk pool error handle is primitive as of now. readonly is the
>>>>>>>>     only
>>>>>>>>     action it would take. rest of recovery action is manual. thats
>>>>>>>>     unacceptable in a data center solutions. I don't recommend btrfs
>>>>>>>>     VM
>>>>>>>>     productions yet. But we are working to get that to a complete VM.
>>>>>>>>
>>>>>>>>     For now, for your pool recovery: pls try this.
>>>>>>>>
>>>>>>>>        - After reboot.
>>>>>>>>        - modunload and modload (so that kernel devlist is empty)
>>>>>>>>        - mount -o degraded <good-disk> <-- this should work.
>>>>>>>>        - btrfs fi show -m <-- Should show missing if you don't let me
>>>>>>>>        know.
>>>>>>>>        - Do a replace of the missing disk without reading the source
>>>>>>>>        disk.
>>>>>>>>
>>>>>>>> Good luck.
>>>>>>>>
>>>>>>>> Thanks, Anand
>>>>>>>>
>>>>>>>>> On 06/10/2015 11:58 AM, Duncan wrote:
>>>>>>>>>
>>>>>>>>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>>>>>>>>>>> On 06/09/2015 01:10 AM, Martin wrote:
>>>>>>>>>>> Hello!
>>>>>>>>>>>
>>>>>>>>>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu
>>>>>>>>>>> Vivid
>>>>>>>>>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
>>>>>>>>>>> could
>>>>>>>>>>> remount the remaining one with "-o degraded". After one day and
>>>>>>>>>>> some
>>>>>>>>>>> write-operations (with no errrors) I had to reboot the system. And
>>>>>>>>>>> now
>>>>>>>>>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>>>>>>>>>>
>>>>>>>>>>> In the kernel log I found BTRFS: too many missing devices,
>>>>>>>>>>> writeable
>>>>>>>>>>> mount is not allowed.
>>>>>>>>>>>
>>>>>>>>>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but
>>>>>>>>>>> I
>>>>>>>>>>> did no conversion to a single drive.
>>>>>>>>>>>
>>>>>>>>>>> How can I mount the disk "rw" to remove the "missing" drive and
>>>>>>>>>>> add
>>>>>>>>>>> a
>>>>>>>>>>> new one?
>>>>>>>>>>> Because there are many snapshots of the filesystem, copying the
>>>>>>>>>>> system
>>>>>>>>>>> would be only the last alternative ;-)
>>>>>>>>>>
>>>>>>>>>> How many disks you had in the RAID1. How many are failed ?
>>>>>>>>>
>>>>>>>>> The answer is (a bit indirectly) in what you quoted.  Repeating:
>>>>>>>>>>> One disk failed[.] I could remount the remaining one[.]
>>>>>>>>>
>>>>>>>>> So it was a two-device raid1, one failed device, one remaining,
>>>>>>>>> unfailed.
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-06-15  0:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-08 17:10 rw-mount-problem after raid1-failure Martin
2015-06-10  1:19 ` Anand Jain
2015-06-10  3:58   ` Duncan
2015-06-10  6:38     ` Anand Jain
2015-06-10  6:58       ` Martin
2015-06-10  7:46         ` Anand Jain
2015-06-10 12:05           ` Martin
2015-06-11  0:04             ` Anand Jain
2015-06-11 13:03               ` Martin
2015-06-12 10:38                 ` Anand Jain
2015-06-14 18:24                   ` Martin
2015-06-15  0:58                     ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox