* size 2.73TiB used 240.97GiB after balance
@ 2015-07-06 19:20 Hendrik Friedel
2015-07-06 19:44 ` Hendrik Friedel
0 siblings, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-06 19:20 UTC (permalink / raw)
To: linux-btrfs
Hello,
I started with a raid1:
devid 1 size 2.73TiB used 2.67TiB path /dev/sdd
devid 2 size 2.73TiB used 2.67TiB path /dev/sdb
Then I added a third device, /dev/sdc1 and a balance
btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/__Complete_Disk/
Now the file-system looks like this:
Total devices 3 FS bytes used 4.68TiB
devid 1 size 2.73TiB used 2.67TiB path /dev/sdd
devid 2 size 2.73TiB used 2.67TiB path /dev/sdb
devid 3 size 2.73TiB used 240.97GiB path /dev/sdc1
I am surprised by the 240.97GiB...
In the syslog and dmesg I find several:
[108274.415499] btrfs_dev_stat_print_on_error: 8 callbacks suppressed
[108279.840334] btrfs_dev_stat_print_on_error: 12 callbacks suppressed
What's wrong here?
Regards,
Hendrik
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 19:20 size 2.73TiB used 240.97GiB after balance Hendrik Friedel
@ 2015-07-06 19:44 ` Hendrik Friedel
2015-07-06 19:49 ` Hugo Mills
0 siblings, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-06 19:44 UTC (permalink / raw)
To: linux-btrfs
Hello,
ok, sdc seems to have failed (sorry, I checked only sdd and sdb SMART
values, as sdc is brand new. Maybe a bad assumption, from my side.
I have mounted the device
mount -o recovery,ro
So, what should I do now:
btrfs device delete /dev/sdc /mnt
or
mount -o degraded /dev/sdb /mnt
btrfs device delete missing /mnt
I do have a backup of the most valuable data.
But if you consider one of the above options risky, I might better get a
new drive before, but this might take a couple of days (in which sdc
could further degrade).
What is your recommendation?
Regards,
Hendrik
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 19:44 ` Hendrik Friedel
@ 2015-07-06 19:49 ` Hugo Mills
2015-07-06 20:01 ` Donald Pearson
0 siblings, 1 reply; 16+ messages in thread
From: Hugo Mills @ 2015-07-06 19:49 UTC (permalink / raw)
To: Hendrik Friedel; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 981 bytes --]
On Mon, Jul 06, 2015 at 09:44:53PM +0200, Hendrik Friedel wrote:
> Hello,
>
> ok, sdc seems to have failed (sorry, I checked only sdd and sdb
> SMART values, as sdc is brand new. Maybe a bad assumption, from my
> side.
>
> I have mounted the device
> mount -o recovery,ro
>
> So, what should I do now:
> btrfs device delete /dev/sdc /mnt
>
> or
>
> mount -o degraded /dev/sdb /mnt
> btrfs device delete missing /mnt
>
> I do have a backup of the most valuable data.
> But if you consider one of the above options risky, I might better
> get a new drive before, but this might take a couple of days (in
> which sdc could further degrade).
> What is your recommendation?
Physically remove the device from the array, mount with -o
degraded, optionally add the new device, and run a balance.
Hugo.
--
Hugo Mills | "I lost my leg in 1942. Some bastard stole it in a
hugo@... carfax.org.uk | pub in Pimlico."
http://carfax.org.uk/ |
PGP: E2AB1DE4 |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 19:49 ` Hugo Mills
@ 2015-07-06 20:01 ` Donald Pearson
2015-07-06 20:52 ` Omar Sandoval
0 siblings, 1 reply; 16+ messages in thread
From: Donald Pearson @ 2015-07-06 20:01 UTC (permalink / raw)
To: Hugo Mills, Hendrik Friedel, Btrfs BTRFS
Based on my experience Hugo's advice is critical, get the bad drive
out of the pool when in raid56 and do not try to replace or delete it
while it's still attached and recognized.
If you add a new device, mount degraded and rebalance. If you don't,
mount degraded then device delete missing.
On Mon, Jul 6, 2015 at 2:49 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Mon, Jul 06, 2015 at 09:44:53PM +0200, Hendrik Friedel wrote:
>> Hello,
>>
>> ok, sdc seems to have failed (sorry, I checked only sdd and sdb
>> SMART values, as sdc is brand new. Maybe a bad assumption, from my
>> side.
>>
>> I have mounted the device
>> mount -o recovery,ro
>>
>> So, what should I do now:
>> btrfs device delete /dev/sdc /mnt
>>
>> or
>>
>> mount -o degraded /dev/sdb /mnt
>> btrfs device delete missing /mnt
>>
>> I do have a backup of the most valuable data.
>> But if you consider one of the above options risky, I might better
>> get a new drive before, but this might take a couple of days (in
>> which sdc could further degrade).
>> What is your recommendation?
>
> Physically remove the device from the array, mount with -o
> degraded, optionally add the new device, and run a balance.
>
> Hugo.
>
> --
> Hugo Mills | "I lost my leg in 1942. Some bastard stole it in a
> hugo@... carfax.org.uk | pub in Pimlico."
> http://carfax.org.uk/ |
> PGP: E2AB1DE4 |
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 20:01 ` Donald Pearson
@ 2015-07-06 20:52 ` Omar Sandoval
2015-07-06 21:12 ` Hendrik Friedel
0 siblings, 1 reply; 16+ messages in thread
From: Omar Sandoval @ 2015-07-06 20:52 UTC (permalink / raw)
To: Donald Pearson, Hugo Mills, Hendrik Friedel, Btrfs BTRFS
On 07/06/2015 01:01 PM, Donald Pearson wrote:
> Based on my experience Hugo's advice is critical, get the bad drive
> out of the pool when in raid56 and do not try to replace or delete it
> while it's still attached and recognized.
>
> If you add a new device, mount degraded and rebalance. If you don't,
> mount degraded then device delete missing.
>
Watch out, replacing a missing device in RAID 5/6 currently doesn't work
and will cause a kernel BUG(). See my patch series here:
http://www.spinics.net/lists/linux-btrfs/msg44874.html
--
Omar
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 20:52 ` Omar Sandoval
@ 2015-07-06 21:12 ` Hendrik Friedel
2015-07-06 21:49 ` Donald Pearson
0 siblings, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-06 21:12 UTC (permalink / raw)
To: Omar Sandoval, Donald Pearson, Hugo Mills, Btrfs BTRFS
Hello,
oh dear, I fear I am in trouble:
recovery-mounted, I tried to save some data, but the system hung.
So I re-booted and sdc is now physically disconnected.
Label: none uuid: b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8
Total devices 3 FS bytes used 4.67TiB
devid 1 size 2.73TiB used 2.67TiB path /dev/sdc
devid 2 size 2.73TiB used 2.67TiB path /dev/sdb
*** Some devices missing
I try to mount the rest again:
mount -o recovery,ro /dev/sdb /mnt/__Complete_Disk
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
root@homeserver:~# dmesg | tail
[ 447.059275] BTRFS info (device sdc): enabling auto recovery
[ 447.059280] BTRFS info (device sdc): disk space caching is enabled
[ 447.086844] BTRFS: failed to read chunk tree on sdc
[ 447.110588] BTRFS: open_ctree failed
[ 474.496778] BTRFS info (device sdc): enabling auto recovery
[ 474.496781] BTRFS info (device sdc): disk space caching is enabled
[ 474.519005] BTRFS: failed to read chunk tree on sdc
[ 474.540627] BTRFS: open_ctree failed
mount -o degraded,ro /dev/sdb /mnt/__Complete_Disk
Does work now though.
So, how can I remove the reference to the failed disk and check the data
for consistency (scrub I suppose, but is it safe?)?
Regards,
Hendrik
On 06.07.2015 22:52, Omar Sandoval wrote:
> On 07/06/2015 01:01 PM, Donald Pearson wrote:
>> Based on my experience Hugo's advice is critical, get the bad drive
>> out of the pool when in raid56 and do not try to replace or delete it
>> while it's still attached and recognized.
>>
>> If you add a new device, mount degraded and rebalance. If you don't,
>> mount degraded then device delete missing.
>>
>
> Watch out, replacing a missing device in RAID 5/6 currently doesn't work
> and will cause a kernel BUG(). See my patch series here:
> http://www.spinics.net/lists/linux-btrfs/msg44874.html
>
--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 21:12 ` Hendrik Friedel
@ 2015-07-06 21:49 ` Donald Pearson
0 siblings, 0 replies; 16+ messages in thread
From: Donald Pearson @ 2015-07-06 21:49 UTC (permalink / raw)
To: Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
If you can mount it RO, first thing to do is back up any data that you
care about.
According to the bug that Omar posted you should not try a device
replace and you should not try a scrub with a missing device.
You may be able to just do a device delete missing, then separately do
a device add of a new drive, or rebalance back in to raid1.
On Mon, Jul 6, 2015 at 4:12 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello,
>
> oh dear, I fear I am in trouble:
> recovery-mounted, I tried to save some data, but the system hung.
> So I re-booted and sdc is now physically disconnected.
>
> Label: none uuid: b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8
> Total devices 3 FS bytes used 4.67TiB
> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc
> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb
> *** Some devices missing
>
> I try to mount the rest again:
> mount -o recovery,ro /dev/sdb /mnt/__Complete_Disk
> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
> missing codepage or helper program, or other error
> In some cases useful info is found in syslog - try
> dmesg | tail or so
>
> root@homeserver:~# dmesg | tail
> [ 447.059275] BTRFS info (device sdc): enabling auto recovery
> [ 447.059280] BTRFS info (device sdc): disk space caching is enabled
> [ 447.086844] BTRFS: failed to read chunk tree on sdc
> [ 447.110588] BTRFS: open_ctree failed
> [ 474.496778] BTRFS info (device sdc): enabling auto recovery
> [ 474.496781] BTRFS info (device sdc): disk space caching is enabled
> [ 474.519005] BTRFS: failed to read chunk tree on sdc
> [ 474.540627] BTRFS: open_ctree failed
>
>
> mount -o degraded,ro /dev/sdb /mnt/__Complete_Disk
> Does work now though.
>
> So, how can I remove the reference to the failed disk and check the data for
> consistency (scrub I suppose, but is it safe?)?
>
> Regards,
> Hendrik
>
>
>
>
> On 06.07.2015 22:52, Omar Sandoval wrote:
>>
>> On 07/06/2015 01:01 PM, Donald Pearson wrote:
>>>
>>> Based on my experience Hugo's advice is critical, get the bad drive
>>> out of the pool when in raid56 and do not try to replace or delete it
>>> while it's still attached and recognized.
>>>
>>> If you add a new device, mount degraded and rebalance. If you don't,
>>> mount degraded then device delete missing.
>>>
>>
>> Watch out, replacing a missing device in RAID 5/6 currently doesn't work
>> and will cause a kernel BUG(). See my patch series here:
>> http://www.spinics.net/lists/linux-btrfs/msg44874.html
>>
>
>
> --
> Hendrik Friedel
> Auf dem Brink 12
> 28844 Weyhe
> Tel. 04203 8394854
> Mobil 0178 1874363
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
[not found] <000f4242.05e425492a977c7b@friedels.name>
@ 2015-07-06 22:59 ` Donald Pearson
2015-07-07 5:42 ` Hendrik Friedel
0 siblings, 1 reply; 16+ messages in thread
From: Donald Pearson @ 2015-07-06 22:59 UTC (permalink / raw)
To: hendrik@friedels.name; +Cc: osandov, Hugo Mills, Btrfs BTRFS
Anything in dmesg?
On Mon, Jul 6, 2015 at 5:07 PM, hendrik@friedels.name
<hendrik@friedels.name> wrote:
> Hallo,
>
> It seems, that mounting works, but the System locks completely soon after I
> backing up.
>
>
> Greetings,
>
> Hendrik
>
>
> ------ Originalnachricht------
>
> Von: Donald Pearson
>
> Datum: Mo., 6. Juli 2015 23:49
>
> An: Hendrik Friedel;
>
> Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;
>
> Betreff:Re: size 2.73TiB used 240.97GiB after balance
>
>
> If you can mount it RO, first thing to do is back up any data that youcare
> about.According to the bug that Omar posted you should not try a
> devicereplace and you should not try a scrub with a missing device.You may
> be able to just do a device delete missing, then separately doa device add
> of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
> PM, Hendrik Friedel wrote:> Hello,>> oh dear, I fear I am in trouble:>
> recovery-mounted, I tried to save some data, but the system hung.> So I
> re-booted and sdc is now physically disconnected.>> Label: none uuid:
> b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8> Total devices 3 FS bytes used
> 4.67TiB> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc>
> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb> *** Some devices
> missing>> I try to mount the rest again:> mount -o recovery,ro /dev/sdb
> /mnt/__Complete_Disk> mount: wrong fs type, bad option, bad superblock on
> /dev/sdb,> missing codepage or helper program, or other error>
> In some cases useful info is found in syslog - try> dmesg | tail or
> so>> root@homeserver:~# dmesg | tail> [ 447.059275] BTRFS info (device
> sdc): enabling auto recovery> [ 447.059280] BTRFS info (device sdc): disk
> space caching is enabled> [ 447.086844] BTRFS: failed to read chunk tree on
> sdc> [ 447.110588] BTRFS: open_ctree failed> [ 474.496778] BTRFS info
> (device sdc): enabling auto recovery> [ 474.496781] BTRFS info (device
> sdc): disk space caching is enabled> [ 474.519005] BTRFS: failed to read
> chunk tree on sdc> [ 474.540627] BTRFS: open_ctree failed>>> mount -o
> degraded,ro /dev/sdb /mnt/__Complete_Disk> Does work now though.>> So, how
> can I remove the reference to the failed disk and check the data for>
> consistency (scrub I suppose, but is it safe?)?>> Regards,> Hendrik>>>>> On
> 06.07.2015 22:52, Omar Sandoval wrote:>>>> On 07/06/2015 01:01 PM, Donald
> Pearson wrote:>>>>>> Based on my experience Hugo's advice is critical, get
> the bad drive>>> out of the pool when in raid56 and do not try to replace or
> delete it>>> while it's still attached and recognized.>>>>>> If you add a
> new device, mount degraded and rebalance. If you don't,>>> mount degraded
> then device delete missing.>>>>>>> Watch out, replacing a missing device in
> RAID 5/6 currently doesn't work>> and will cause a kernel BUG(). See my
> patch series here:>>
> http://www.spinics.net/lists/linux-btrfs/msg44874.html>>>>> --> Hendrik
> Friedel> Auf dem Brink 12> 28844 Weyhe> Tel. 04203 8394854> Mobil 0178
> 1874363>>> ---> Diese E-Mail wurde von Avast Antivirus-Software auf Viren
> geprüft.> https://www.avast.com/antivirus>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-06 22:59 ` Donald Pearson
@ 2015-07-07 5:42 ` Hendrik Friedel
2015-07-07 13:14 ` Donald Pearson
0 siblings, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-07 5:42 UTC (permalink / raw)
To: Donald Pearson; +Cc: osandov, Hugo Mills, Btrfs BTRFS
Hello,
while mounting works with the recovery option, the system locks after
reading.
dmesg shows:
[ 684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 684.258249] ata6.00: irq_stat 0x40000001
[ 684.258252] ata6.00: failed command: DATA SET MANAGEMENT
[ 684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26
dma 512 out
[ 684.258255] res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask
0x1 (device error)
[ 684.258256] ata6.00: status: { DRDY ERR }
[ 684.258258] ata6.00: error: { ABRT }
[ 684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
[current] [descriptor]
[ 684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
[ 684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00
00 00 00 01 d3 80 00 00 00 80 00 00
So, also this drive is failing?!
Regards,
Hendrik
On 07.07.2015 00:59, Donald Pearson wrote:
> Anything in dmesg?
>
> On Mon, Jul 6, 2015 at 5:07 PM, hendrik@friedels.name
> <hendrik@friedels.name> wrote:
>> Hallo,
>>
>> It seems, that mounting works, but the System locks completely soon after I
>> backing up.
>>
>>
>> Greetings,
>>
>> Hendrik
>>
>>
>> ------ Originalnachricht------
>>
>> Von: Donald Pearson
>>
>> Datum: Mo., 6. Juli 2015 23:49
>>
>> An: Hendrik Friedel;
>>
>> Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;
>>
>> Betreff:Re: size 2.73TiB used 240.97GiB after balance
>>
>>
>> If you can mount it RO, first thing to do is back up any data that youcare
>> about.According to the bug that Omar posted you should not try a
>> devicereplace and you should not try a scrub with a missing device.You may
>> be able to just do a device delete missing, then separately doa device add
>> of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
>> PM, Hendrik Friedel wrote:> Hello,>> oh dear, I fear I am in trouble:>
>> recovery-mounted, I tried to save some data, but the system hung.> So I
>> re-booted and sdc is now physically disconnected.>> Label: none uuid:
>> b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8> Total devices 3 FS bytes used
>> 4.67TiB> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc>
>> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb> *** Some devices
>> missing>> I try to mount the rest again:> mount -o recovery,ro /dev/sdb
>> /mnt/__Complete_Disk> mount: wrong fs type, bad option, bad superblock on
>> /dev/sdb,> missing codepage or helper program, or other error>
>> In some cases useful info is found in syslog - try> dmesg | tail or
>> so>> root@homeserver:~# dmesg | tail> [ 447.059275] BTRFS info (device
>> sdc): enabling auto recovery> [ 447.059280] BTRFS info (device sdc): disk
>> space caching is enabled> [ 447.086844] BTRFS: failed to read chunk tree on
>> sdc> [ 447.110588] BTRFS: open_ctree failed> [ 474.496778] BTRFS info
>> (device sdc): enabling auto recovery> [ 474.496781] BTRFS info (device
>> sdc): disk space caching is enabled> [ 474.519005] BTRFS: failed to read
>> chunk tree on sdc> [ 474.540627] BTRFS: open_ctree failed>>> mount -o
>> degraded,ro /dev/sdb /mnt/__Complete_Disk> Does work now though.>> So, how
>> can I remove the reference to the failed disk and check the data for>
>> consistency (scrub I suppose, but is it safe?)?>> Regards,> Hendrik>>>>> On
>> 06.07.2015 22:52, Omar Sandoval wrote:>>>> On 07/06/2015 01:01 PM, Donald
>> Pearson wrote:>>>>>> Based on my experience Hugo's advice is critical, get
>> the bad drive>>> out of the pool when in raid56 and do not try to replace or
>> delete it>>> while it's still attached and recognized.>>>>>> If you add a
>> new device, mount degraded and rebalance. If you don't,>>> mount degraded
>> then device delete missing.>>>>>>> Watch out, replacing a missing device in
>> RAID 5/6 currently doesn't work>> and will cause a kernel BUG(). See my
>> patch series here:>>
>> http://www.spinics.net/lists/linux-btrfs/msg44874.html>>>>> --> Hendrik
>> Friedel> Auf dem Brink 12> 28844 Weyhe> Tel. 04203 8394854> Mobil 0178
>> 1874363>>> ---> Diese E-Mail wurde von Avast Antivirus-Software auf Viren
>> geprüft.> https://www.avast.com/antivirus>
--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-07 5:42 ` Hendrik Friedel
@ 2015-07-07 13:14 ` Donald Pearson
2015-07-08 18:56 ` Hendrik Friedel
0 siblings, 1 reply; 16+ messages in thread
From: Donald Pearson @ 2015-07-07 13:14 UTC (permalink / raw)
To: Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
That's what it looks like. You may want to try reseating cables, etc.
Instead of mounting and file copy, btrfs restore might be worth a shot
to recover what you can.
On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello,
>
> while mounting works with the recovery option, the system locks after
> reading.
> dmesg shows:
> [ 684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 684.258249] ata6.00: irq_stat 0x40000001
> [ 684.258252] ata6.00: failed command: DATA SET MANAGEMENT
> [ 684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26 dma
> 512 out
> [ 684.258255] res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
> (device error)
> [ 684.258256] ata6.00: status: { DRDY ERR }
> [ 684.258258] ata6.00: error: { ABRT }
> [ 684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
> [current] [descriptor]
> [ 684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
> [ 684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00 00
> 00 00 01 d3 80 00 00 00 80 00 00
>
>
> So, also this drive is failing?!
>
> Regards,
> Hendrik
>
>
> On 07.07.2015 00:59, Donald Pearson wrote:
>>
>> Anything in dmesg?
>>
>> On Mon, Jul 6, 2015 at 5:07 PM, hendrik@friedels.name
>> <hendrik@friedels.name> wrote:
>>>
>>> Hallo,
>>>
>>> It seems, that mounting works, but the System locks completely soon after
>>> I
>>> backing up.
>>>
>>>
>>> Greetings,
>>>
>>> Hendrik
>>>
>>>
>>> ------ Originalnachricht------
>>>
>>> Von: Donald Pearson
>>>
>>> Datum: Mo., 6. Juli 2015 23:49
>>>
>>> An: Hendrik Friedel;
>>>
>>> Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;
>>>
>>> Betreff:Re: size 2.73TiB used 240.97GiB after balance
>>>
>>>
>>> If you can mount it RO, first thing to do is back up any data that
>>> youcare
>>> about.According to the bug that Omar posted you should not try a
>>> devicereplace and you should not try a scrub with a missing device.You
>>> may
>>> be able to just do a device delete missing, then separately doa device
>>> add
>>> of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
>>> PM, Hendrik Friedel wrote:> Hello,>> oh dear, I fear I am in trouble:>
>>> recovery-mounted, I tried to save some data, but the system hung.> So I
>>> re-booted and sdc is now physically disconnected.>> Label: none uuid:
>>> b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8> Total devices 3 FS bytes
>>> used
>>> 4.67TiB> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc>
>>> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb> *** Some
>>> devices
>>> missing>> I try to mount the rest again:> mount -o recovery,ro /dev/sdb
>>> /mnt/__Complete_Disk> mount: wrong fs type, bad option, bad superblock on
>>> /dev/sdb,> missing codepage or helper program, or other error>
>>> In some cases useful info is found in syslog - try> dmesg | tail
>>> or
>>> so>> root@homeserver:~# dmesg | tail> [ 447.059275] BTRFS info (device
>>> sdc): enabling auto recovery> [ 447.059280] BTRFS info (device sdc):
>>> disk
>>> space caching is enabled> [ 447.086844] BTRFS: failed to read chunk tree
>>> on
>>> sdc> [ 447.110588] BTRFS: open_ctree failed> [ 474.496778] BTRFS info
>>> (device sdc): enabling auto recovery> [ 474.496781] BTRFS info (device
>>> sdc): disk space caching is enabled> [ 474.519005] BTRFS: failed to read
>>> chunk tree on sdc> [ 474.540627] BTRFS: open_ctree failed>>> mount -o
>>> degraded,ro /dev/sdb /mnt/__Complete_Disk> Does work now though.>> So,
>>> how
>>> can I remove the reference to the failed disk and check the data for>
>>> consistency (scrub I suppose, but is it safe?)?>> Regards,> Hendrik>>>>>
>>> On
>>> 06.07.2015 22:52, Omar Sandoval wrote:>>>> On 07/06/2015 01:01 PM, Donald
>>> Pearson wrote:>>>>>> Based on my experience Hugo's advice is critical,
>>> get
>>> the bad drive>>> out of the pool when in raid56 and do not try to replace
>>> or
>>> delete it>>> while it's still attached and recognized.>>>>>> If you add a
>>> new device, mount degraded and rebalance. If you don't,>>> mount
>>> degraded
>>> then device delete missing.>>>>>>> Watch out, replacing a missing device
>>> in
>>> RAID 5/6 currently doesn't work>> and will cause a kernel BUG(). See my
>>> patch series here:>>
>>> http://www.spinics.net/lists/linux-btrfs/msg44874.html>>>>> --> Hendrik
>>> Friedel> Auf dem Brink 12> 28844 Weyhe> Tel. 04203 8394854> Mobil 0178
>>> 1874363>>> ---> Diese E-Mail wurde von Avast Antivirus-Software auf Viren
>>> geprüft.> https://www.avast.com/antivirus>
>
>
>
> --
> Hendrik Friedel
> Auf dem Brink 12
> 28844 Weyhe
> Tel. 04203 8394854
> Mobil 0178 1874363
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-07 13:14 ` Donald Pearson
@ 2015-07-08 18:56 ` Hendrik Friedel
2015-07-08 19:06 ` Donald Pearson
0 siblings, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-08 18:56 UTC (permalink / raw)
To: Donald Pearson; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
Hello,
yes, I will check the cables, thanks for the hint.
Before trying to recover the data, I would like to save the status quo.
I have two new drives? Is it advisable to dd-copy the data on the new
drives and then to try to recover?
I am asking, because I suppose that dd will also copy the UUID, which
might confuse BTRFS (two drives with same UUID attached)?
And then I have a technical question on btrfs balance when converting to
raid5 (from raid1): does the balance create the parity information on
the newly-added (empty) drive, so that the data on the two original
disks is not touched at all?
Regards,
Hendrik
On 07.07.2015 15:14, Donald Pearson wrote:
> That's what it looks like. You may want to try reseating cables, etc.
>
> Instead of mounting and file copy, btrfs restore might be worth a shot
> to recover what you can.
>
> On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel <hendrik@friedels.name> wrote:
>> Hello,
>>
>> while mounting works with the recovery option, the system locks after
>> reading.
>> dmesg shows:
>> [ 684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>> [ 684.258249] ata6.00: irq_stat 0x40000001
>> [ 684.258252] ata6.00: failed command: DATA SET MANAGEMENT
>> [ 684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26 dma
>> 512 out
>> [ 684.258255] res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
>> (device error)
>> [ 684.258256] ata6.00: status: { DRDY ERR }
>> [ 684.258258] ata6.00: error: { ABRT }
>> [ 684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE
>> [ 684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
>> [current] [descriptor]
>> [ 684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write command
>> [ 684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00 00
>> 00 00 01 d3 80 00 00 00 80 00 00
>>
>>
>> So, also this drive is failing?!
>>
>> Regards,
>> Hendrik
>>
>>
>> On 07.07.2015 00:59, Donald Pearson wrote:
>>>
>>> Anything in dmesg?
>>>
>>> On Mon, Jul 6, 2015 at 5:07 PM, hendrik@friedels.name
>>> <hendrik@friedels.name> wrote:
>>>>
>>>> Hallo,
>>>>
>>>> It seems, that mounting works, but the System locks completely soon after
>>>> I
>>>> backing up.
>>>>
>>>>
>>>> Greetings,
>>>>
>>>> Hendrik
>>>>
>>>>
>>>> ------ Originalnachricht------
>>>>
>>>> Von: Donald Pearson
>>>>
>>>> Datum: Mo., 6. Juli 2015 23:49
>>>>
>>>> An: Hendrik Friedel;
>>>>
>>>> Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;
>>>>
>>>> Betreff:Re: size 2.73TiB used 240.97GiB after balance
>>>>
>>>>
>>>> If you can mount it RO, first thing to do is back up any data that
>>>> youcare
>>>> about.According to the bug that Omar posted you should not try a
>>>> devicereplace and you should not try a scrub with a missing device.You
>>>> may
>>>> be able to just do a device delete missing, then separately doa device
>>>> add
>>>> of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at 4:12
>>>> PM, Hendrik Friedel wrote:> Hello,>> oh dear, I fear I am in trouble:>
>>>> recovery-mounted, I tried to save some data, but the system hung.> So I
>>>> re-booted and sdc is now physically disconnected.>> Label: none uuid:
>>>> b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8> Total devices 3 FS bytes
>>>> used
>>>> 4.67TiB> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc>
>>>> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb> *** Some
>>>> devices
>>>> missing>> I try to mount the rest again:> mount -o recovery,ro /dev/sdb
>>>> /mnt/__Complete_Disk> mount: wrong fs type, bad option, bad superblock on
>>>> /dev/sdb,> missing codepage or helper program, or other error>
>>>> In some cases useful info is found in syslog - try> dmesg | tail
>>>> or
>>>> so>> root@homeserver:~# dmesg | tail> [ 447.059275] BTRFS info (device
>>>> sdc): enabling auto recovery> [ 447.059280] BTRFS info (device sdc):
>>>> disk
>>>> space caching is enabled> [ 447.086844] BTRFS: failed to read chunk tree
>>>> on
>>>> sdc> [ 447.110588] BTRFS: open_ctree failed> [ 474.496778] BTRFS info
>>>> (device sdc): enabling auto recovery> [ 474.496781] BTRFS info (device
>>>> sdc): disk space caching is enabled> [ 474.519005] BTRFS: failed to read
>>>> chunk tree on sdc> [ 474.540627] BTRFS: open_ctree failed>>> mount -o
>>>> degraded,ro /dev/sdb /mnt/__Complete_Disk> Does work now though.>> So,
>>>> how
>>>> can I remove the reference to the failed disk and check the data for>
>>>> consistency (scrub I suppose, but is it safe?)?>> Regards,> Hendrik>>>>>
>>>> On
>>>> 06.07.2015 22:52, Omar Sandoval wrote:>>>> On 07/06/2015 01:01 PM, Donald
>>>> Pearson wrote:>>>>>> Based on my experience Hugo's advice is critical,
>>>> get
>>>> the bad drive>>> out of the pool when in raid56 and do not try to replace
>>>> or
>>>> delete it>>> while it's still attached and recognized.>>>>>> If you add a
>>>> new device, mount degraded and rebalance. If you don't,>>> mount
>>>> degraded
>>>> then device delete missing.>>>>>>> Watch out, replacing a missing device
>>>> in
>>>> RAID 5/6 currently doesn't work>> and will cause a kernel BUG(). See my
>>>> patch series here:>>
>>>> http://www.spinics.net/lists/linux-btrfs/msg44874.html>>>>> --> Hendrik
>>>> Friedel> Auf dem Brink 12> 28844 Weyhe> Tel. 04203 8394854> Mobil 0178
>>>> 1874363>>> ---> Diese E-Mail wurde von Avast Antivirus-Software auf Viren
>>>> geprüft.> https://www.avast.com/antivirus>
>>
>>
>>
>> --
>> Hendrik Friedel
>> Auf dem Brink 12
>> 28844 Weyhe
>> Tel. 04203 8394854
>> Mobil 0178 1874363
>>
>> ---
>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>> https://www.avast.com/antivirus
>>
--
Hendrik Friedel
Auf dem Brink 12
28844 Weyhe
Tel. 04203 8394854
Mobil 0178 1874363
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-08 18:56 ` Hendrik Friedel
@ 2015-07-08 19:06 ` Donald Pearson
2015-07-08 21:29 ` Hendrik Friedel
2015-07-09 11:59 ` Austin S Hemmelgarn
0 siblings, 2 replies; 16+ messages in thread
From: Donald Pearson @ 2015-07-08 19:06 UTC (permalink / raw)
To: Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
I wouldn't use dd.
I would use recover to get the data if at all possible, then you can
experiment with try to fix the degraded condition live. If you have
any chance of getting data from the pool, you reduce that chance every
time you make a change.
If btrfs did the balance like you said, it wouldn't be raid5. What
you just described is raid4 where only one drive holds parity data. I
can't say that I actually know for a fact that btrfs doesn't do this,
but I'd be shocked and some dev would need to eat their underware if
the balance job didn't distribute the parity also.
On Wed, Jul 8, 2015 at 1:56 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello,
>
> yes, I will check the cables, thanks for the hint.
> Before trying to recover the data, I would like to save the status quo. I
> have two new drives? Is it advisable to dd-copy the data on the new drives
> and then to try to recover?
>
> I am asking, because I suppose that dd will also copy the UUID, which might
> confuse BTRFS (two drives with same UUID attached)?
>
> And then I have a technical question on btrfs balance when converting to
> raid5 (from raid1): does the balance create the parity information on the
> newly-added (empty) drive, so that the data on the two original disks is not
> touched at all?
>
> Regards,
> Hendrik
>
>
>
> On 07.07.2015 15:14, Donald Pearson wrote:
>>
>> That's what it looks like. You may want to try reseating cables, etc.
>>
>> Instead of mounting and file copy, btrfs restore might be worth a shot
>> to recover what you can.
>>
>> On Tue, Jul 7, 2015 at 12:42 AM, Hendrik Friedel <hendrik@friedels.name>
>> wrote:
>>>
>>> Hello,
>>>
>>> while mounting works with the recovery option, the system locks after
>>> reading.
>>> dmesg shows:
>>> [ 684.258246] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>>> [ 684.258249] ata6.00: irq_stat 0x40000001
>>> [ 684.258252] ata6.00: failed command: DATA SET MANAGEMENT
>>> [ 684.258255] ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 26
>>> dma
>>> 512 out
>>> [ 684.258255] res 51/04:01:01:00:00/00:00:00:00:00/a0 Emask 0x1
>>> (device error)
>>> [ 684.258256] ata6.00: status: { DRDY ERR }
>>> [ 684.258258] ata6.00: error: { ABRT }
>>> [ 684.258266] sd 5:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_OK
>>> driverbyte=DRIVER_SENSE
>>> [ 684.258268] sd 5:0:0:0: [sdd] tag#26 Sense Key : Illegal Request
>>> [current] [descriptor]
>>> [ 684.258270] sd 5:0:0:0: [sdd] tag#26 Add. Sense: Unaligned write
>>> command
>>> [ 684.258272] sd 5:0:0:0: [sdd] tag#26 CDB: Write same(16) 93 08 00 00
>>> 00
>>> 00 00 01 d3 80 00 00 00 80 00 00
>>>
>>>
>>> So, also this drive is failing?!
>>>
>>> Regards,
>>> Hendrik
>>>
>>>
>>> On 07.07.2015 00:59, Donald Pearson wrote:
>>>>
>>>>
>>>> Anything in dmesg?
>>>>
>>>> On Mon, Jul 6, 2015 at 5:07 PM, hendrik@friedels.name
>>>> <hendrik@friedels.name> wrote:
>>>>>
>>>>>
>>>>> Hallo,
>>>>>
>>>>> It seems, that mounting works, but the System locks completely soon
>>>>> after
>>>>> I
>>>>> backing up.
>>>>>
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Hendrik
>>>>>
>>>>>
>>>>> ------ Originalnachricht------
>>>>>
>>>>> Von: Donald Pearson
>>>>>
>>>>> Datum: Mo., 6. Juli 2015 23:49
>>>>>
>>>>> An: Hendrik Friedel;
>>>>>
>>>>> Cc: Omar Sandoval;Hugo Mills;Btrfs BTRFS;
>>>>>
>>>>> Betreff:Re: size 2.73TiB used 240.97GiB after balance
>>>>>
>>>>>
>>>>> If you can mount it RO, first thing to do is back up any data that
>>>>> youcare
>>>>> about.According to the bug that Omar posted you should not try a
>>>>> devicereplace and you should not try a scrub with a missing device.You
>>>>> may
>>>>> be able to just do a device delete missing, then separately doa device
>>>>> add
>>>>> of a new drive, or rebalance back in to raid1.On Mon, Jul 6, 2015 at
>>>>> 4:12
>>>>> PM, Hendrik Friedel wrote:> Hello,>> oh dear, I fear I am in trouble:>
>>>>> recovery-mounted, I tried to save some data, but the system hung.> So I
>>>>> re-booted and sdc is now physically disconnected.>> Label: none uuid:
>>>>> b4a6cce6-dc9c-4a13-80a4-ed6bc5b40bb8> Total devices 3 FS bytes
>>>>> used
>>>>> 4.67TiB> devid 1 size 2.73TiB used 2.67TiB path /dev/sdc>
>>>>> devid 2 size 2.73TiB used 2.67TiB path /dev/sdb> *** Some
>>>>> devices
>>>>> missing>> I try to mount the rest again:> mount -o recovery,ro /dev/sdb
>>>>> /mnt/__Complete_Disk> mount: wrong fs type, bad option, bad superblock
>>>>> on
>>>>> /dev/sdb,> missing codepage or helper program, or other error>
>>>>> In some cases useful info is found in syslog - try> dmesg | tail
>>>>> or
>>>>> so>> root@homeserver:~# dmesg | tail> [ 447.059275] BTRFS info (device
>>>>> sdc): enabling auto recovery> [ 447.059280] BTRFS info (device sdc):
>>>>> disk
>>>>> space caching is enabled> [ 447.086844] BTRFS: failed to read chunk
>>>>> tree
>>>>> on
>>>>> sdc> [ 447.110588] BTRFS: open_ctree failed> [ 474.496778] BTRFS info
>>>>> (device sdc): enabling auto recovery> [ 474.496781] BTRFS info (device
>>>>> sdc): disk space caching is enabled> [ 474.519005] BTRFS: failed to
>>>>> read
>>>>> chunk tree on sdc> [ 474.540627] BTRFS: open_ctree failed>>> mount -o
>>>>> degraded,ro /dev/sdb /mnt/__Complete_Disk> Does work now though.>> So,
>>>>> how
>>>>> can I remove the reference to the failed disk and check the data for>
>>>>> consistency (scrub I suppose, but is it safe?)?>> Regards,>
>>>>> Hendrik>>>>>
>>>>> On
>>>>> 06.07.2015 22:52, Omar Sandoval wrote:>>>> On 07/06/2015 01:01 PM,
>>>>> Donald
>>>>> Pearson wrote:>>>>>> Based on my experience Hugo's advice is critical,
>>>>> get
>>>>> the bad drive>>> out of the pool when in raid56 and do not try to
>>>>> replace
>>>>> or
>>>>> delete it>>> while it's still attached and recognized.>>>>>> If you add
>>>>> a
>>>>> new device, mount degraded and rebalance. If you don't,>>> mount
>>>>> degraded
>>>>> then device delete missing.>>>>>>> Watch out, replacing a missing
>>>>> device
>>>>> in
>>>>> RAID 5/6 currently doesn't work>> and will cause a kernel BUG(). See my
>>>>> patch series here:>>
>>>>> http://www.spinics.net/lists/linux-btrfs/msg44874.html>>>>> --> Hendrik
>>>>> Friedel> Auf dem Brink 12> 28844 Weyhe> Tel. 04203 8394854> Mobil 0178
>>>>> 1874363>>> ---> Diese E-Mail wurde von Avast Antivirus-Software auf
>>>>> Viren
>>>>> geprüft.> https://www.avast.com/antivirus>
>>>
>>>
>>>
>>>
>>> --
>>> Hendrik Friedel
>>> Auf dem Brink 12
>>> 28844 Weyhe
>>> Tel. 04203 8394854
>>> Mobil 0178 1874363
>>>
>>> ---
>>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>>> https://www.avast.com/antivirus
>>>
>
>
> --
> Hendrik Friedel
> Auf dem Brink 12
> 28844 Weyhe
> Tel. 04203 8394854
> Mobil 0178 1874363
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-08 19:06 ` Donald Pearson
@ 2015-07-08 21:29 ` Hendrik Friedel
2015-07-08 22:16 ` Donald Pearson
2015-07-09 11:59 ` Austin S Hemmelgarn
1 sibling, 1 reply; 16+ messages in thread
From: Hendrik Friedel @ 2015-07-08 21:29 UTC (permalink / raw)
To: Donald Pearson; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
Hello Donald,
thanks for your reply. I appreciate your help.
> I would use recover to get the data if at all possible, then you can
> experiment with try to fix the degraded condition live. If you have
> any chance of getting data from the pool, you reduce that chance every
> time you make a change.
Ok, you assume that btrfs recover is the most likely way of recovering
data. But if mounting degraded, scrubbing, btrfsck, ... are more
successful, your proposal is more risky, isn't it? With a dd-image I can
always go back to todays status.
> If btrfs did the balance like you said, it wouldn't be raid5. What
> you just described is raid4 where only one drive holds parity data. I
> can't say that I actually know for a fact that btrfs doesn't do this,
> but I'd be shocked and some dev would need to eat their underware if
> the balance job didn't distribute the parity also.
Ok, I was not aware of the difference between raid4&5.
So, I did try a btrs-recover:
warning devid 3 not found already
Check tree block failed, want=8300102483968, have=65536
Check tree block failed, want=8300102483968, have=65536
Check tree block failed, want=8300102483968, have=65536
read block failed check_tree_block
Couldn't setup extent tree
[it is still running]
btrfs-find-root gives me:
http://paste.ubuntu.com/11844005/
http://paste.ubuntu.com/11844009/
(on the two disks)
btrfs-show-super:
http://paste.ubuntu.com/11844016/
Greetings,
Hendrik
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-08 21:29 ` Hendrik Friedel
@ 2015-07-08 22:16 ` Donald Pearson
2015-07-09 12:02 ` Austin S Hemmelgarn
0 siblings, 1 reply; 16+ messages in thread
From: Donald Pearson @ 2015-07-08 22:16 UTC (permalink / raw)
To: Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
Basically I wouldn't trust the drive that's already showing signs of
failure to survive a dd. It isn't completely full, so the recover is
less load. That's just the way I see it. But I see your point of
trying to get drive images now to hedge against failures.
Unfortunately those errors are over my head so hopefully someone else
has insights.
Also the posessive "think's" at the end of those outputs made me chuckle.
On Wed, Jul 8, 2015 at 4:29 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello Donald,
>
> thanks for your reply. I appreciate your help.
>
>> I would use recover to get the data if at all possible, then you can
>>
>> experiment with try to fix the degraded condition live. If you have
>> any chance of getting data from the pool, you reduce that chance every
>> time you make a change.
>
>
> Ok, you assume that btrfs recover is the most likely way of recovering data.
> But if mounting degraded, scrubbing, btrfsck, ... are more successful, your
> proposal is more risky, isn't it? With a dd-image I can always go back to
> todays status.
>
>> If btrfs did the balance like you said, it wouldn't be raid5. What
>> you just described is raid4 where only one drive holds parity data. I
>> can't say that I actually know for a fact that btrfs doesn't do this,
>> but I'd be shocked and some dev would need to eat their underware if
>> the balance job didn't distribute the parity also.
>
>
> Ok, I was not aware of the difference between raid4&5.
>
> So, I did try a btrs-recover:
> warning devid 3 not found already
> Check tree block failed, want=8300102483968, have=65536
> Check tree block failed, want=8300102483968, have=65536
> Check tree block failed, want=8300102483968, have=65536
> read block failed check_tree_block
> Couldn't setup extent tree
> [it is still running]
>
> btrfs-find-root gives me:
> http://paste.ubuntu.com/11844005/
> http://paste.ubuntu.com/11844009/
> (on the two disks)
>
>
> btrfs-show-super:
> http://paste.ubuntu.com/11844016/
>
> Greetings,
> Hendrik
>
>
>
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-08 19:06 ` Donald Pearson
2015-07-08 21:29 ` Hendrik Friedel
@ 2015-07-09 11:59 ` Austin S Hemmelgarn
1 sibling, 0 replies; 16+ messages in thread
From: Austin S Hemmelgarn @ 2015-07-09 11:59 UTC (permalink / raw)
To: Donald Pearson, Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]
On 2015-07-08 15:06, Donald Pearson wrote:
> I wouldn't use dd.
>
> I would use recover to get the data if at all possible, then you can
> experiment with try to fix the degraded condition live. If you have
> any chance of getting data from the pool, you reduce that chance every
> time you make a change.
>
> If btrfs did the balance like you said, it wouldn't be raid5. What
> you just described is raid4 where only one drive holds parity data. I
> can't say that I actually know for a fact that btrfs doesn't do this,
> but I'd be shocked and some dev would need to eat their underware if
> the balance job didn't distribute the parity also.
>
That is correct, it does distribute the parity among all the member
drives. That said, it would still have to modify the existing drives
even if it did put the parity on just the new drive, because raid{4,5,6}
are defined as _striped_ data with parity, not mirrored (ie, if you just
removed the parity, you'd have a raid0, not a raid1).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: size 2.73TiB used 240.97GiB after balance
2015-07-08 22:16 ` Donald Pearson
@ 2015-07-09 12:02 ` Austin S Hemmelgarn
0 siblings, 0 replies; 16+ messages in thread
From: Austin S Hemmelgarn @ 2015-07-09 12:02 UTC (permalink / raw)
To: Donald Pearson, Hendrik Friedel; +Cc: Omar Sandoval, Hugo Mills, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 942 bytes --]
On 2015-07-08 18:16, Donald Pearson wrote:
> Basically I wouldn't trust the drive that's already showing signs of
> failure to survive a dd. It isn't completely full, so the recover is
> less load. That's just the way I see it. But I see your point of
> trying to get drive images now to hedge against failures.
>
> Unfortunately those errors are over my head so hopefully someone else
> has insights.
>
A better option if you want a block level copy would probably be
ddrescue (it's available in almost every distro in a package of the same
name), it's designed for recovering as much data as possible from failed
disks (and gives a much nicer status display than plain old dd). If you
do go for a block level copy however, make certain that no more than one
of the copies is visible to the system at any given time, especially
when the filesystem is mounted, otherwise things _WILL_ get
exponentially worse.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-07-09 12:03 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-06 19:20 size 2.73TiB used 240.97GiB after balance Hendrik Friedel
2015-07-06 19:44 ` Hendrik Friedel
2015-07-06 19:49 ` Hugo Mills
2015-07-06 20:01 ` Donald Pearson
2015-07-06 20:52 ` Omar Sandoval
2015-07-06 21:12 ` Hendrik Friedel
2015-07-06 21:49 ` Donald Pearson
[not found] <000f4242.05e425492a977c7b@friedels.name>
2015-07-06 22:59 ` Donald Pearson
2015-07-07 5:42 ` Hendrik Friedel
2015-07-07 13:14 ` Donald Pearson
2015-07-08 18:56 ` Hendrik Friedel
2015-07-08 19:06 ` Donald Pearson
2015-07-08 21:29 ` Hendrik Friedel
2015-07-08 22:16 ` Donald Pearson
2015-07-09 12:02 ` Austin S Hemmelgarn
2015-07-09 11:59 ` Austin S Hemmelgarn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).