linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* device failure hangs the system
@ 2011-07-01  9:25 Anand Jain
  2011-07-01  9:42 ` Anand Jain
  2011-07-01 20:14 ` Josef Bacik
  0 siblings, 2 replies; 5+ messages in thread
From: Anand Jain @ 2011-07-01  9:25 UTC (permalink / raw)
  To: linux-btrfs


hi,

Following test case causes my remote system to hard-hang and does
not respond to any key strokes.

-----------
# btrfs fi show
failed to read /dev/sr0
Label: none  uuid: 75ad3c9f-f661-498e-8c13-89d4e4c58312
	Total devices 3 FS bytes used 28.00KB
	devid    1 size 465.76GB used 2.02GB path /dev/sdb
	devid    3 size 465.76GB used 2.01GB path /dev/sdd
	devid    2 size 465.76GB used 1.01GB path /dev/sdc

Btrfs v0.19-35-g1b444cd
------------


Stopping the disk
------------
# echo 1 > /sys/block/sdd/device/delete
------------
------------
# dmesg | tail
sd 3:0:0:0: [sdd] Stopping disk
::
#
------------

------------
# btrfs fi show
failed to read /dev/sr0
Label: none  uuid: 75ad3c9f-f661-498e-8c13-89d4e4c58312
	Total devices 3 FS bytes used 28.00KB
	devid    1 size 465.76GB used 2.02GB path /dev/sdb
	devid    2 size 465.76GB used 1.01GB path /dev/sdc
	*** Some devices missing

Btrfs v0.19-35-g1b444cd
------------

and the following command hangs the system.
-------------
# btrfs fi balance /btrfs
-------------

Any idea if this is an known issue ? OR if there is any better way
to fail a disk (or a loop disk) for testing.

(something similar to cfgadm -c unconfigure in solaris).

Thanks
-Anand


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: device failure hangs the system
  2011-07-01  9:25 device failure hangs the system Anand Jain
@ 2011-07-01  9:42 ` Anand Jain
  2011-07-01  9:44   ` Roman Mamedov
  2011-07-01 20:14 ` Josef Bacik
  1 sibling, 1 reply; 5+ messages in thread
From: Anand Jain @ 2011-07-01  9:42 UTC (permalink / raw)
  To: linux-btrfs



  Looks like there is a panic (not system hang).
  any idea where is the panic log after the system has been
  power-recycled. (its not in the /var/log/messages or dmesg or
  /var/crash is empty)

Thanks, Anand


On 07/01/2011 05:25 PM, Anand Jain wrote:
>
> hi,
>
> Following test case causes my remote system to hard-hang and does
> not respond to any key strokes.
>
> -----------
> # btrfs fi show
> failed to read /dev/sr0
> Label: none uuid: 75ad3c9f-f661-498e-8c13-89d4e4c58312
> Total devices 3 FS bytes used 28.00KB
> devid 1 size 465.76GB used 2.02GB path /dev/sdb
> devid 3 size 465.76GB used 2.01GB path /dev/sdd
> devid 2 size 465.76GB used 1.01GB path /dev/sdc
>
> Btrfs v0.19-35-g1b444cd
> ------------
>
>
> Stopping the disk
> ------------
> # echo 1 > /sys/block/sdd/device/delete
> ------------
> ------------
> # dmesg | tail
> sd 3:0:0:0: [sdd] Stopping disk
> ::
> #
> ------------
>
> ------------
> # btrfs fi show
> failed to read /dev/sr0
> Label: none uuid: 75ad3c9f-f661-498e-8c13-89d4e4c58312
> Total devices 3 FS bytes used 28.00KB
> devid 1 size 465.76GB used 2.02GB path /dev/sdb
> devid 2 size 465.76GB used 1.01GB path /dev/sdc
> *** Some devices missing
>
> Btrfs v0.19-35-g1b444cd
> ------------
>
> and the following command hangs the system.
> -------------
> # btrfs fi balance /btrfs
> -------------
>
> Any idea if this is an known issue ? OR if there is any better way
> to fail a disk (or a loop disk) for testing.
>
> (something similar to cfgadm -c unconfigure in solaris).
>
> Thanks
> -Anand
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: device failure hangs the system
  2011-07-01  9:42 ` Anand Jain
@ 2011-07-01  9:44   ` Roman Mamedov
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Mamedov @ 2011-07-01  9:44 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On Fri, 01 Jul 2011 17:42:33 +0800
Anand Jain <Anand.Jain@oracle.com> wrote:

>   Looks like there is a panic (not system hang).
>   any idea where is the panic log after the system has been
>   power-recycled. (its not in the /var/log/messages or dmesg or
>   /var/crash is empty)

You can get it by seting up netconsole logging to another machine and then reproducing the crash.
http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: device failure hangs the system
  2011-07-01  9:25 device failure hangs the system Anand Jain
  2011-07-01  9:42 ` Anand Jain
@ 2011-07-01 20:14 ` Josef Bacik
  2011-07-07  9:35   ` Anand Jain
  1 sibling, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2011-07-01 20:14 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On 07/01/2011 05:25 AM, Anand Jain wrote:
> 
> hi,
> 
> Following test case causes my remote system to hard-hang and does
> not respond to any key strokes.
> 
> -----------
> # btrfs fi show
> failed to read /dev/sr0
> Label: none  uuid: 75ad3c9f-f661-498e-8c13-89d4e4c58312
>     Total devices 3 FS bytes used 28.00KB
>     devid    1 size 465.76GB used 2.02GB path /dev/sdb
>     devid    3 size 465.76GB used 2.01GB path /dev/sdd
>     devid    2 size 465.76GB used 1.01GB path /dev/sdc
> 
> Btrfs v0.19-35-g1b444cd
> ------------
> 
> 
> Stopping the disk
> ------------
> # echo 1 > /sys/block/sdd/device/delete
> ------------

Well that's a neat trick, do you have a way to undo that action too?
Seems a rescan doesn't make it show back up.  Please try the patch I
just posted to the list to fix this problem.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: device failure hangs the system
  2011-07-01 20:14 ` Josef Bacik
@ 2011-07-07  9:35   ` Anand Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2011-07-07  9:35 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs



Josef,


> Well that's a neat trick, do you have a way to undo that action too?
> Seems a rescan doesn't make it show back up.


hope the following helps..
-------------
  # fdisk -l /dev/sdg | egrep "Disk /"
  Disk /dev/sdg: 4294 MB, 4294967296 bytes

  # x=`ls -l /sys/class/block/sdg | cut -d "/" -f12 | sed 's/:/ /g'`
  # echo "scsi remove-single-device ${x}" > /proc/scsi/scsi

  # fdisk -l /dev/sdg | egrep "Disk /"

  # echo "scsi add-single-device ${x}" > /proc/scsi/scsi

  # fdisk -l /dev/sdg | egrep "Disk /"
  Disk /dev/sdg: 4294 MB, 4294967296 bytes
-------------


>  Please try the patch I just posted to the list to fix this problem.  Thanks,

  Facing some challenges to upgrade my machine to 3.0.0-rc6, so is
  the delay.

  Thanks for the patch.

Anand

> Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-07  9:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-01  9:25 device failure hangs the system Anand Jain
2011-07-01  9:42 ` Anand Jain
2011-07-01  9:44   ` Roman Mamedov
2011-07-01 20:14 ` Josef Bacik
2011-07-07  9:35   ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).