Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Joeri Vanthienen <mail@joerivanthienen.be>
Cc: kreijack@inwind.it, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS thinks device is busy [kernel 3.5.3]
Date: Wed, 05 Sep 2012 20:36:41 +0200	[thread overview]
Message-ID: <50479BB9.3040209@libero.it> (raw)
In-Reply-To: <CAPsrAuD002esQ_bEMhWRSb7sd_ubvmDx2iqm2Q1FAXiBLb4cFQ@mail.gmail.com>

On 09/05/2012 08:06 PM, Joeri Vanthienen wrote:
> Hi,
>
> Thank you for your reply.
> I physically disconnected the device before the command "btrfs device
> delete missing".

Ok. The point is that btrfs didn't see the device disconnection. It saw 
only some problem on the device.

I think that "btrfs device delete missing" makes sense only when you 
(re)mount a filesystem with

	mount -o degraded /dev/sdXX /mnt/mntpoint


However I pointed out that you before wrote "btrfs  device delete 
/dev/sdg /btrfs/" which could have succeeded.

> Maybe it was not wise to do that, but in a raid10 (both data and
> metadata), there is one disk having the mirrored data from the
> disconnected and deleted disk. right?

Yes, the data should be safe

>
> SANOS1:~ # btrfs filesystem df /btrfs/
> Data, RAID10: total=330.00GB, used=261.11GB
> Data: total=8.00MB, used=0.00
> System, RAID10: total=63.75MB, used=168.00KB
> System: total=4.00MB, used=0.00
> Metadata, RAID10: total=260.94GB, used=423.32MB
> Metadata: total=8.00MB, used=0.00
>
>
> After the "btrfs device delete missing", I connected the disk again.
> But it appeared again in the "btrfs filesystem show" output.

Don't trust too much "btrfs filesystem show". I repeat it wrote "Total 
devices 13", but it shows 14 devices...
"btrfs filesystem show" dump the disk contents not the internal (in ram) 
btrfs data structure. If a disk contains old data (== an old  generation 
number) it is considered valid.

>
> So now I'm searching for a way to add the device again... without
> bringing the pool/volume offline/unmounting it, or at least trying to
> let the device busy error go away and scrub the volume.
>
> Now "btrfs device delete missing" could not zero out the superblock
> signature, if I totally wipe the disk, would it change this situation?
> The device busy error stays weird...

I checked the btrfs code. If a disk superblock contains a valid 
signature (remember the disk was not be zeroed) and the filesystem UUID 
(aka fsid) is equal to the one of a mounted filesystem, btrfs think that 
the disk is already mounted.

So my opinion is that zeroing the superblock should be sufficient to be 
able to re-add the device.

What I am not sure if the disk was deleted form the btrfs pool. My fear 
is that you may zeros a "valid" disk. However the fact that "btrfs 
filesystem shows" returns "Total devices 13" lets me suppose that 
/dev/sdg was really removed from the pool.

May be that when you did "btrfs device delete /dev/vdg", the command 
succeeded.

>
>
> SANOS1:~ # btrfs filesystem sync /btrfs/
> FSSync '/btrfs/'
> SANOS1:~ # btrfs filesystem show
> Label: 'firstpool'  uuid: 517e8cfa-4275-4589-8da4-6a46ad613daa
>          Total devices 13 FS bytes used 242.82GB
>          devid    3 size 931.51GB used 90.28GB path /dev/sdg
>          devid   14 size 931.51GB used 91.33GB path /dev/sdr
>          devid   13 size 931.51GB used 90.50GB path /dev/sdq
>          devid   12 size 931.51GB used 90.50GB path /dev/sdp
>          devid   11 size 931.51GB used 90.50GB path /dev/sdo
>          devid   10 size 931.51GB used 90.50GB path /dev/sdn
>          devid    9 size 931.51GB used 90.50GB path /dev/sdm
>          devid    8 size 931.51GB used 90.50GB path /dev/sdl
>          devid    7 size 931.51GB used 91.50GB path /dev/sdk
>          devid    6 size 931.51GB used 91.49GB path /dev/sdj
>          devid    5 size 931.51GB used 91.33GB path /dev/sdi
>          devid    4 size 931.51GB used 91.50GB path /dev/sdh
>          devid    2 size 931.51GB used 91.33GB path /dev/sdf
>          devid    1 size 931.51GB used 90.52GB path /dev/sde
>
> =>  check dmesg output
> =>  indeed the transid is different for /dev/sdg, however it still
> appears in the list above

The message above means that btrfs is checking the disk because it 
contains a valid signature (no check on generation is performed)

>
> [109624.549395] device label firstpool devid 1 transid 32208 /dev/sde
> [109624.549792] device label firstpool devid 2 transid 32208 /dev/sdf
> [109624.550073] device label firstpool devid 4 transid 32208 /dev/sdh
> [109624.550356] device label firstpool devid 5 transid 32208 /dev/sdi
> [109624.551712] device label firstpool devid 6 transid 32208 /dev/sdj
> [109624.552572] device label firstpool devid 7 transid 32208 /dev/sdk
> [109624.553360] device label firstpool devid 8 transid 32208 /dev/sdl
> [109624.553888] device label firstpool devid 9 transid 32208 /dev/sdm
> [109624.554183] device label firstpool devid 10 transid 32208 /dev/sdn
> [109624.554565] device label firstpool devid 11 transid 32208 /dev/sdo
> [109624.555265] device label firstpool devid 12 transid 32208 /dev/sdp
> [109624.555699] device label firstpool devid 13 transid 32208 /dev/sdq
> [109624.556111] device label firstpool devid 14 transid 32208 /dev/sdr
> [109624.592864] device label firstpool devid 3 transid 31490 /dev/sdg
>
>
>
>
> Please find below the strace output
> -------------------------------------------------
> strace btrfs device scan
> execve("/sbin/btrfs", ["btrfs", "device", "scan"], [/* 60 vars */]) = 0
> brk(0)                                  = 0x1956000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7f1cf0a7e000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=85716, ...}) = 0
> mmap(NULL, 85716, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1cf0a69000
> close(3)                                = 0
[...]
> lstat("/dev/sdg", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 96), ...}) = 0
> open("/dev/sdg", O_RDONLY)              = 4
> pread(4, "\v\\9\274\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 3531, 65536) = 3531
> pread(4, "\253=\21r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 3531, 67108864) = 3531
> pread(4, "V\272GC\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 3531, 274877906944) = 3531
> open("/dev/btrfs-control", O_RDONLY)    = 5
> ioctl(5, 0x50009404, 0x7fff3e970be0)    = -1 EBUSY (Device or resource busy)
> write(2, "ERROR: unable to scan the device"..., 70ERROR: unable to
> scan the device '/dev/sdg' - Device or resource busy

Yes, the EBUSY is returned by the BTRFS_IOC_SCAN_DEV ioctl. That happens 
when the user try to add a device with a fsid of a already mounted 
filesystem.

> ) = 70
> close(5)                                = 0
> close(4)                                = 0
> read(3, "", 1024)                       = 0
> close(3)                                = 0
> munmap(0x7f1cf0a7c000, 4096)            = 0
> open("/proc/partitions", O_RDONLY)      = 3
> fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7f1cf0a7c000
> read(3, "major minor  #blocks  name\n\n   8"..., 1024) = 700
> read(3, "", 1024)                       = 0
> close(3)                                = 0
> munmap(0x7f1cf0a7c000, 4096)            = 0
> exit_group(0)                           = ?
> +++ exited with 0 +++
>
> On Wed, Sep 5, 2012 at 7:28 PM, Goffredo Baroncelli<kreijack@libero.it>  wrote:
>> Hi,
>>
>>
>> On 09/05/2012 03:29 PM, Joeri Vanthienen wrote:
>>>
>>> Hi,
>>> I'm running OpenSuse 12.2 with kernel 3.5.3
>>> HBA= LSI 1068e using the MPTSAS driver (patched)
>>> (https://patchwork.kernel.org/patch/1379181/)
>>>
>>> SANOS1:/media # uname -a
>>> Linux SANOS1 3.5.3 #3 SMP Sun Sep 2 18:44:37 CEST 2012 x86_64 x86_64
>>> x86_64 GNU/Linux
>>>
>>> I've tried to simulate a disk replacement but it seems that now
>>> /dev/sdg is stuck in the btrfs pool (RAID10)
>>>
>>> SANOS1:/media # btrfs device scan
>>> Scanning for Btrfs filesystems
>>> ERROR: unable to scan the device '/dev/sdg' - Device or resource busy
>>
>>
>> Please could you send the strace of the command above ?
>>
>>
>>> I've ran the btrfs device delete missing command before.
>>> /dev/sdg is connected, but not mounted, is not in use and there is no
>>> scrub running.
>>
>>
>> I am not sure to have understood correctly: did you physically disconnected
>> the device after or before you did "btrfs device delete ..." ?
>>
>> When you do a "btrfs dev rem" btrfs moves all the data to the others disks,
>> then it zeroes the superblock signature invaliding the devices. To do that
>> btrfs needs to access the devices.
>>
>>
>>>
>>> ANOS1:/media # btrfs  device delete /dev/sdg /btrfs/
>>> ERROR: error removing the device '/dev/sdg' - No such file or directory
>>>
>>> SANOS1:/media # cat /etc/mtab /proc/mounts | grep btrfs
>>> /dev/sde /btrfs btrfs rw,noatime,space_cache,inode_
>>> cache 0 0
>>> /dev/sde /btrfs btrfs rw,noatime,space_cache,inode_cache 0 0
>>>
>>> SANOS1:/media # cat /etc/mtab /proc/mounts | grep /dev/sdg
>>> SANOS1:/media #
>>> SANOS1:/media # lsof /dev/sdg
>>> SANOS1:/media #
>>>
>>>
>>> SANOS1:/media # btrfs filesystem show
>>> Label: 'firstpool'  uuid: 517e8cfa-4275-4589-8da4-6a46ad613daa
>>>           Total devices 13 FS bytes used 242.82GB
>>>           devid    3 size 931.51GB used 90.28GB path /dev/sdg
>>>           devid   14 size 931.51GB used 91.33GB path /dev/sdr
>>>           devid   13 size 931.51GB used 90.50GB path /dev/sdq
>>>           devid   12 size 931.51GB used 90.50GB path /dev/sdp
>>>           devid   11 size 931.51GB used 90.50GB path /dev/sdo
>>>           devid   10 size 931.51GB used 90.50GB path /dev/sdn
>>>           devid    9 size 931.51GB used 90.50GB path /dev/sdm
>>>           devid    8 size 931.51GB used 90.50GB path /dev/sdl
>>>           devid    7 size 931.51GB used 91.50GB path /dev/sdk
>>>           devid    6 size 931.51GB used 91.49GB path /dev/sdj
>>>           devid    5 size 931.51GB used 91.33GB path /dev/sdi
>>>           devid    4 size 931.51GB used 91.50GB path /dev/sdh
>>>           devid    2 size 931.51GB used 91.33GB path /dev/sdf
>>>           devid    1 size 931.51GB used 90.52GB path /dev/sde
>>
>>
>> The output of the command above is wrong: 14 devices are listed, but btrfs
>> report that only 13 devices are used. Please do a sync before the command
>> "btrfs filesystem show"
>>
>>
>>
>>> Also tried to again remove (physical) the disk drive, but the result
>>> is the same.
>>> dmesg:
>>> [92728.516346] device label firstpool devid 1 transid 31965 /dev/sde
>>> [92728.516378] device label firstpool devid 2 transid 31965 /dev/sdf
>>> [92728.516406] device label firstpool devid 4 transid 31965 /dev/sdh
>>> [92728.516432] device label firstpool devid 5 transid 31965 /dev/sdi
>>> [92728.516458] device label firstpool devid 6 transid 31965 /dev/sdj
>>> [92728.516484] device label firstpool devid 7 transid 31965 /dev/sdk
>>> [92728.516510] device label firstpool devid 8 transid 31965 /dev/sdl
>>> [92728.516535] device label firstpool devid 9 transid 31965 /dev/sdm
>>> [92728.516589] device label firstpool devid 10 transid 31965 /dev/sdn
>>> [92728.516617] device label firstpool devid 11 transid 31965 /dev/sdo
>>> [92728.516643] device label firstpool devid 12 transid 31965 /dev/sdp
>>> [92728.516669] device label firstpool devid 13 transid 31965 /dev/sdq
>>> [92728.516695] device label firstpool devid 14 transid 31965 /dev/sdr
>>> [92728.551786] device label firstpool devid 3 transid 31490 /dev/sdg
>>> [92750.177157]  end_device-4:0:19: mptsas: ioc0: removing sata device:
>>> fw_channel 0, fw_id 12, phy 12,sas_addr 0x50030480008a364c
>>> [92750.177163]  phy-4:0:20: mptsas: ioc0: delete phy 12, phy-obj
>>> (0xffff8803ab81d400)
>>> [92750.177170]  port-4:0:19: mptsas: ioc0: delete port 19, sas_addr
>>> (0x50030480008a364c)
>>> [92750.178149] sd 4:0:18:0: [sdg] Synchronizing SCSI cache
>>> [92750.178326] sd 4:0:18:0: [sdg]
>>> [92750.178331] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> [92750.178441] scsi target4:0:18: mptsas: ioc0: delete device:
>>> fw_channel 0, fw_id 12, phy 12, sas_addr 0x50030480008a364c
>>> [92766.761077] mptsas: ioc0: attaching sata device: fw_channel 0,
>>> fw_id 12, phy 12, sas_addr 0x50030480008a364c
>>> [92766.764242] scsi 4:0:19:0: Direct-Access     ATA      WDC
>>> WD1002FBYS-0 0C06 PQ: 0 ANSI: 5
>>> [92766.766302] sd 4:0:19:0: Attached scsi generic sg6 type 0
>>> [92766.769374] sd 4:0:19:0: [sdg] 1953525168 512-byte logical blocks:
>>> (1.00 TB/931 GiB)
>>> [92766.778433] sd 4:0:19:0: [sdg] Write Protect is off
>>> [92766.778438] sd 4:0:19:0: [sdg] Mode Sense: 73 00 00 08
>>> [92766.780583] sd 4:0:19:0: [sdg] Write cache: enabled, read cache:
>>> enabled, doesn't support DPO or FUA
>>> [92766.797777]  sdg:
>>> [92766.813296] sd 4:0:19:0: [sdg] Attached SCSI disk
>>> [92773.288107] device label singleBTRFS devid 1 transid 43 /dev/sdc
>>> [92773.288807] device label firstpool devid 1 transid 31967 /dev/sde
>>> [92773.288845] device label firstpool devid 2 transid 31967 /dev/sdf
>>> [92773.288877] device label firstpool devid 4 transid 31967 /dev/sdh
>>> [92773.288904] device label firstpool devid 5 transid 31967 /dev/sdi
>>> [92773.288927] device label firstpool devid 6 transid 31967 /dev/sdj
>>> [92773.288949] device label firstpool devid 7 transid 31967 /dev/sdk
>>> [92773.288971] device label firstpool devid 8 transid 31967 /dev/sdl
>>> [92773.288993] device label firstpool devid 9 transid 31967 /dev/sdm
>>> [92773.289014] device label firstpool devid 10 transid 31967 /dev/sdn
>>> [92773.289036] device label firstpool devid 11 transid 31967 /dev/sdo
>>> [92773.289058] device label firstpool devid 12 transid 31967 /dev/sdp
>>> [92773.289080] device label firstpool devid 13 transid 31967 /dev/sdq
>>> [92773.289102] device label firstpool devid 14 transid 31967 /dev/sdr
>>> [92773.313675] device label firstpool devid 3 transid 31490 /dev/sdg
>>>
>>> Can someone help me?
>>>
>>>
>>> It seems there is still some btrfs structure on the disk. Is this the
>>> cause of the error? Why can't BTRFS rebuild this "online"?
>>
>>
>> It seems that BTRFS was never aware of the /dev/sdg disconnection....
>>
>>
>>>
>>> SANOS1:/media # btrfs-find-root /dev/sdg | head
>>> ERROR: unable to scan the device '/dev/sdg' - Device or resource busy
>>> Well block 905192472576 seems great, but generation doesn't match,
>>> have=31490, want=32015
>>> Super think's the tree root is at 906491981824, chunk root 628100251648
>>> Generation: 31490 Root bytenr: 905192484864 Root objectid: 2
>>> Generation: 31490 Root bytenr: 905543114752 Root objectid: 4
>>> Generation: 31490 Root bytenr: 905641820160 Root objectid: 5
>>> Generation: 31490 Root bytenr: 905689354240 Root objectid: 7
>>> Generation: 31490 Root bytenr: 905688096768 Root objectid: 554
>>> Generation: 31490 Root bytenr: 905687691264 Root objectid: 561
>>> Generation: 31490 Root bytenr: 905642328064 Root objectid: 565
>>> Generation: 31490 Root bytenr: 905642332160 Root objectid: 566
>>> Generation: 31490 Root bytenr: 905678802944 Root objectid: 568
>>> Couldn't map the block 433225728
>>> Well block 905192542208 seems great, but generation doesn't match,
>>> have=31416, want=32015
>>
>>
>> Pay attention that when a device is removed, the superblock signature is
>> zeroed to mark the device as not valid any more. So the generation of a
>> removed device doesn't make sense.
>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> .
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>


      reply	other threads:[~2012-09-05 18:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-05 13:29 BTRFS thinks device is busy [kernel 3.5.3] Joeri Vanthienen
2012-09-05 17:28 ` Goffredo Baroncelli
2012-09-05 18:06   ` Joeri Vanthienen
2012-09-05 18:36     ` Goffredo Baroncelli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50479BB9.3040209@libero.it \
    --to=kreijack@libero.it \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mail@joerivanthienen.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox