linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Can't replace a faulty disk of raid1
@ 2012-10-26 10:57 Lluís Batlle i Rossell
  2012-10-26 10:59 ` Lluís Batlle i Rossell
  2012-10-31 18:50 ` Lluís Batlle i Rossell
  0 siblings, 2 replies; 5+ messages in thread
From: Lluís Batlle i Rossell @ 2012-10-26 10:57 UTC (permalink / raw)
  To: Btrfs mailing list

Hello,

I had a raid1 btrfs (540GB) on vanilla 3.6.3, a disk failed, and removed it at
power off, plugged in a new one, partitioned it (to 110GB, by error), and added
it to btrfs.

I tried to remove the missing device, and it said "Input/output error" after a
while. Next attempts simply gave "Invalid argument".

I repartitioned, rebooted the system, and made the partition grow: "btrfs fi
resize 3:max /"

# btrfs fi show
Label: 'mainbtrfs'  uuid: 2ebf9e90-104c-47a4-adff-fada1ce3b682
    Total devices 3 FS bytes used 445.06GB
    devid    1 size 539.95GB used 539.95GB path /dev/sda5
    devid    3 size 539.95GB used 96.90GB path /dev/sdb1   <= New disk
    *** Some devices missing

The size appeared fine (I checked it at byte-amount level, to ensure I have not
set 4K smaller for example). But attempting the 'btrfs device delete missing /'
again gave the same outcome.

I tried "btrfs balance start /", and after a while, also ends with "Input/output
error". In any of the cases above, I have an error message in dmesg. dmesg only
shows usual 'relocating block...' and 'found 4 extents'.

I see that the /dev/sdb1, in any operation above I do, never goes beyond those 'used
96.90GB'. So, I'm stuck not being able to go back to raid1, with a degraded
mount.

Some data:

# btrfs fi df /
Data, RAID1: total=507.62GB, used=417.08GB
Data: total=25.32GB, used=22.48GB
System, RAID1: total=32.00MB, used=92.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=19.97GB, used=5.50GB

Mount log:
[   10.939163] device label mainbtrfs devid 1 transid 194548 /dev/sda5
[   10.939856] btrfs: allowing degraded mounts
[   10.939939] btrfs: disk space caching is enabled
[   10.940652] warning devid 2 missing
[   10.987500] btrfs: bdev (null) errs: wr 6702, rd 2632, flush 312, corrupt 1970, gen 573
[   10.987636] btrfs: bdev /dev/sda5 errs: wr 52, rd 13, flush 0, corrupt 2, gen 8
[   14.391309] btrfs: unlinked 1 orphans
[   22.319849] btrfs: use lzo compression
[   22.319937] btrfs: disk space caching is enabled
[   27.481405] udevd[1451]: starting version 173
[   28.493786] device label mainbtrfs devid 3 transid 194549 /dev/sdb1
[   28.930870] device fsid 30781650-3053-4273-b640-ec86a442c945 devid 1 transid 2272 /dev/sda3
[   28.947632] device label mainbtrfs devid 1 transid 194549 /dev/sda5


Any help?

Thank you,
Lluís.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Can't replace a faulty disk of raid1
  2012-10-26 10:57 Can't replace a faulty disk of raid1 Lluís Batlle i Rossell
@ 2012-10-26 10:59 ` Lluís Batlle i Rossell
  2012-10-26 11:41   ` Goffredo Baroncelli
  2012-10-31 18:50 ` Lluís Batlle i Rossell
  1 sibling, 1 reply; 5+ messages in thread
From: Lluís Batlle i Rossell @ 2012-10-26 10:59 UTC (permalink / raw)
  To: Btrfs mailing list

Another topposting detail:

I've run "btrfs scrub start /", and it finished properly. So it seems my data is
still there:
scrub status for 2ebf9e90-104c-47a4-adff-fada1ce3b682
        scrub started at Fri Oct 26 10:13:21 2012, running for 7719 seconds
        total bytes scrubbed: 434.54GB with 0 errors

If someone asks what can be "Data: total=25.32GB, used=22.48GB", it may come
from a "balance start -dconvert=single /" I had started *before* I unplugged the
faulty disk, but at some point cancelled it.

Thank you in advance,
Lluís.

On Fri, Oct 26, 2012 at 12:57:21PM +0200, Lluís Batlle i Rossell wrote:
> Hello,
> 
> I had a raid1 btrfs (540GB) on vanilla 3.6.3, a disk failed, and removed it at
> power off, plugged in a new one, partitioned it (to 110GB, by error), and added
> it to btrfs.
> 
> I tried to remove the missing device, and it said "Input/output error" after a
> while. Next attempts simply gave "Invalid argument".
> 
> I repartitioned, rebooted the system, and made the partition grow: "btrfs fi
> resize 3:max /"
> 
> # btrfs fi show
> Label: 'mainbtrfs'  uuid: 2ebf9e90-104c-47a4-adff-fada1ce3b682
>     Total devices 3 FS bytes used 445.06GB
>     devid    1 size 539.95GB used 539.95GB path /dev/sda5
>     devid    3 size 539.95GB used 96.90GB path /dev/sdb1   <= New disk
>     *** Some devices missing
> 
> The size appeared fine (I checked it at byte-amount level, to ensure I have not
> set 4K smaller for example). But attempting the 'btrfs device delete missing /'
> again gave the same outcome.
> 
> I tried "btrfs balance start /", and after a while, also ends with "Input/output
> error". In any of the cases above, I have an error message in dmesg. dmesg only
> shows usual 'relocating block...' and 'found 4 extents'.
> 
> I see that the /dev/sdb1, in any operation above I do, never goes beyond those 'used
> 96.90GB'. So, I'm stuck not being able to go back to raid1, with a degraded
> mount.
> 
> Some data:
> 
> # btrfs fi df /
> Data, RAID1: total=507.62GB, used=417.08GB
> Data: total=25.32GB, used=22.48GB
> System, RAID1: total=32.00MB, used=92.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=19.97GB, used=5.50GB
> 
> Mount log:
> [   10.939163] device label mainbtrfs devid 1 transid 194548 /dev/sda5
> [   10.939856] btrfs: allowing degraded mounts
> [   10.939939] btrfs: disk space caching is enabled
> [   10.940652] warning devid 2 missing
> [   10.987500] btrfs: bdev (null) errs: wr 6702, rd 2632, flush 312, corrupt 1970, gen 573
> [   10.987636] btrfs: bdev /dev/sda5 errs: wr 52, rd 13, flush 0, corrupt 2, gen 8
> [   14.391309] btrfs: unlinked 1 orphans
> [   22.319849] btrfs: use lzo compression
> [   22.319937] btrfs: disk space caching is enabled
> [   27.481405] udevd[1451]: starting version 173
> [   28.493786] device label mainbtrfs devid 3 transid 194549 /dev/sdb1
> [   28.930870] device fsid 30781650-3053-4273-b640-ec86a442c945 devid 1 transid 2272 /dev/sda3
> [   28.947632] device label mainbtrfs devid 1 transid 194549 /dev/sda5
> 
> 
> Any help?
> 
> Thank you,
> Lluís.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Can't replace a faulty disk of raid1
  2012-10-26 10:59 ` Lluís Batlle i Rossell
@ 2012-10-26 11:41   ` Goffredo Baroncelli
  2012-10-26 13:23     ` Lluís Batlle i Rossell
  0 siblings, 1 reply; 5+ messages in thread
From: Goffredo Baroncelli @ 2012-10-26 11:41 UTC (permalink / raw)
  To: Lluís Batlle i Rossell; +Cc: Btrfs mailing list

Hi Luis

On Fri, Oct 26, 2012 at 12:59 PM, Lluís Batlle i Rossell
<viric@viric.name> wrote:
[...]
>>
>> I tried to remove the missing device, and it said "Input/output error" after a
>> while. Next attempts simply gave "Invalid argument".

Todate BTRFS is not capable to remove (via btrfs device delete <path>)
a missing disk. The user has to unmount the filesystem and remount-it
in "degraded" mode.
I.e.:

# umount /mnt/btrfs
# mount -o degraded /dev/sdX /mnt/btrfs


Now I am noticing that the filesystem is the root filesystem. This is
a bit more difficult to handle. I think that you have to pass the
right parameter to the boot loader to do that.

For example on debian, you must pass the following option to grub
(typically to the line of the kernel)

rootflags=degraded

I don't know i is valid to do something like:

# mount -o remount,degraded /

BR
G.Baroncelli

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Can't replace a faulty disk of raid1
  2012-10-26 11:41   ` Goffredo Baroncelli
@ 2012-10-26 13:23     ` Lluís Batlle i Rossell
  0 siblings, 0 replies; 5+ messages in thread
From: Lluís Batlle i Rossell @ 2012-10-26 13:23 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Btrfs mailing list

On Fri, Oct 26, 2012 at 01:41:58PM +0200, Goffredo Baroncelli wrote:
> Hi Luis
> 
> On Fri, Oct 26, 2012 at 12:59 PM, Lluís Batlle i Rossell
> <viric@viric.name> wrote:
> [...]
> >>
> >> I tried to remove the missing device, and it said "Input/output error" after a
> >> while. Next attempts simply gave "Invalid argument".
> 
> Todate BTRFS is not capable to remove (via btrfs device delete <path>)
> a missing disk. The user has to unmount the filesystem and remount-it
> in "degraded" mode.

Of course I was doing all I reported with the filesystem in degraded mount. The mount
log showed that. :)

I simply did the proper degraded mount at initrd time.

Regards,
Lluís.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Can't replace a faulty disk of raid1
  2012-10-26 10:57 Can't replace a faulty disk of raid1 Lluís Batlle i Rossell
  2012-10-26 10:59 ` Lluís Batlle i Rossell
@ 2012-10-31 18:50 ` Lluís Batlle i Rossell
  1 sibling, 0 replies; 5+ messages in thread
From: Lluís Batlle i Rossell @ 2012-10-31 18:50 UTC (permalink / raw)
  To: Btrfs mailing list

On Fri, Oct 26, 2012 at 12:57:21PM +0200, Lluís Batlle i Rossell wrote:
> I had a raid1 btrfs (540GB) on vanilla 3.6.3, a disk failed, and removed it at
> power off, plugged in a new one, partitioned it (to 110GB, by error), and added
> it to btrfs.
> 
> I tried to remove the missing device, and it said "Input/output error" after a
> while. Next attempts simply gave "Invalid argument".
> 
> Some data:
> 
> # btrfs fi df /
> Data, RAID1: total=507.62GB, used=417.08GB
> Data: total=25.32GB, used=22.48GB
> System, RAID1: total=32.00MB, used=92.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID1: total=19.97GB, used=5.50GB

For the sake of mail archiving and future searches, problem solved; there is
some 'single' data there (22.48GB). Removing the files containing that data
allowed "btrfs device delete missing /" to work.

For what I know, at 3.6 there were no error messages indicating about that.
Then, scrub does not check anything related to missing disks, so it reports that
all works. And I think that btrfsck also does not report anything related to
missing disks.

The way to get what files had the missing data was to run "tar c /", and wait
for the errors report on EIO. Then I removed those files.

Thanks a lot to Josef, who sent me patches adding error reports, to get closer
to the source the errors, and the rest of the helpful people at the #btrfs @
freenode

Regards,
Lluís.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-31 18:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-26 10:57 Can't replace a faulty disk of raid1 Lluís Batlle i Rossell
2012-10-26 10:59 ` Lluís Batlle i Rossell
2012-10-26 11:41   ` Goffredo Baroncelli
2012-10-26 13:23     ` Lluís Batlle i Rossell
2012-10-31 18:50 ` Lluís Batlle i Rossell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).