linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
To: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Alex Turin <alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>,
	Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
	James Bottomley
	<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 00/11] First pass at merging Bart's HA work
Date: Sat, 08 Dec 2012 12:15:58 +0100	[thread overview]
Message-ID: <50C3216E.6020206@acm.org> (raw)
In-Reply-To: <50C263E2.1070805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On 12/07/12 22:47, Vu Pham wrote:
> I applied your latest patch [PATCH for-next] IB/srp: Make SCSI error
> handling finish
> and test
>
> Let me capture what I'm seeing:
>
> Host has two paths (scsi_host 7 & 8) to target thru two physical ports 1
> & 2
>
> [root@rsws42 ~]# multipath -l
> size=50G features='0' hwhandler='0' wp=rw
> |-+- policy='round-robin 0' prio=0 status=active
> | `- 7:0:0:11 sdb 8:16 active undef running
> `-+- policy='round-robin 0' prio=0 status=enabled
>   `- 8:0:0:11 sdc 8:32 active undef running
>
> Cable pull by disable port 1, I/Os fail-over fine, the problem is the
> cleaning of scsi_host 7 of fail path.
> IB RC failure, scsi error recovery kick in.
> srp _reconnect_target() failed, srp_remove_target() run to remove
> scsi_host 7; however, I think it get stuck at device_del(dev) inside
> __scsi_remove_device(dev)
>
> Error recovery continuously happen again and again on scsi host 7 for
> 9-10 minutes.
> scsi_host 7 cannot be cleaned up, its sysfs entry is still there
> (/sys/class/scsi_host/host7), its state is SHOST_CANCEL.
>
> I brought port 1 back online, scsi_host 7 cannot reconnect to target
> because its state in SRP_TARGET_REMOVED.
>
> scci_host 7 sysfs entry does not contain target login info (ioc_guid,
> id_ext, dgid...).
> I think srp_daemon can reconnect to target by creating new path with new
> scsi hosst; however, I cannot check because I currently don't have a
> working srp_daemon.
> I need to manually reconnect to target with echo command
>
> Bottom line, I/Os can fail-over/failback; however, old scsi hosts cannot
> be removed (sysfs entry is still there) with state SHOST_CANCEL, error
> recovery keep happening on old scsi hosts for 10-20 minutes.

(reduced CC list)

Hello Vu,

Please double check the kernel tree you have used in your test. The 
behavior you describe is the behavior that was fixed by the patch you 
mentioned. If I repeat your test with Roland's for-next tree (commit 
fb57e1d) with the "Make SCSI error handling finish" patch on top and on 
a system where srp_daemon is not running, this is what I see:
* About 60s after "ibportstate 1 1 disable" on the target, the message
   "scsi host7: SRP abort called" appears in the initiator kernel log.
* A few seconds later the following messages appear in the kernel log
   of the initiator:
   scsi host7: SRP reset_device called
   scsi host7: ib_srp: SRP reset_host called
   scsi host7: ib_srp: Got failed path rec status -110
   scsi host7: ib_srp: Path record query failed
   scsi host7: ib_srp: reconnect failed (-110), removing target port.
   sd 7:0:0:0: Device offlined - not ready after error recovery
   sd 7:0:0:0: alua: Detached
* A quick check in /sys on the initiator shows that the corresponding
   SCSI host has been removed correctly:
   # find /sys | grep host7
   # ls /sys/class/scsi_host/
   host0  host1  host10  host2  host3  host4  host5

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-12-08 11:15 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26  4:44 [PATCH 00/11] First pass at merging Bart's HA work David Dillow
2012-11-26  4:44 ` [PATCH 01/11] IB/srp: enlarge block layer timeout David Dillow
2012-11-26  4:44 ` [PATCH 02/11] IB/srp: simplify state tracking David Dillow
2012-11-26  9:46   ` Bart Van Assche
     [not found]     ` <50B33A91.3060103-HInyCGIudOg@public.gmane.org>
2012-11-27  3:56       ` David Dillow
2012-11-26  4:44 ` [PATCH 05/11] IB/srp: destroy and recreate QP and CQs on each connection David Dillow
     [not found]   ` <8fa9a268ec4dc587970161efe94968f3263aad3b.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26 18:57     ` Bart Van Assche
2012-11-26  4:44 ` [PATCH 06/11] IB/srp: send disconnect request without waiting for CM timewait exit David Dillow
2012-11-26  4:44 ` [PATCH 07/11] IB/srp: Document sysfs attributes David Dillow
     [not found] ` <cover.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26  4:44   ` [PATCH 03/11] IB/srp: don't send anything on a bad QP David Dillow
2012-11-26  9:17     ` Bart Van Assche
     [not found]       ` <50B333AF.6040509-HInyCGIudOg@public.gmane.org>
2012-11-27  3:31         ` David Dillow
2012-11-26  4:44   ` [PATCH 04/11] IB/srp: keep processing commands during host removal David Dillow
     [not found]     ` <8715294a23dded5879b3a327c470d9b6a39ddbc4.1353903448.git.dillowda-1Heg1YXhbW8@public.gmane.org>
2012-11-26  9:43       ` Bart Van Assche
2012-11-27  3:16         ` David Dillow
2012-11-26  4:44   ` [PATCH 08/11] srp_transport: Fix attribute registration David Dillow
2012-11-26  4:44   ` [PATCH 09/11] srp_transport: Simplify attribute initialization code David Dillow
2012-11-26  5:02     ` David Dillow
2012-11-26  4:44   ` [PATCH 11/11] IB/srp: Allow SRP disconnect through sysfs David Dillow
2012-11-26  4:44 ` [PATCH 10/11] srp_transport: Document sysfs attributes David Dillow
2012-11-26  7:57 ` [PATCH 00/11] First pass at merging Bart's HA work Or Gerlitz
2012-11-27  4:53   ` David Dillow
2012-11-26 18:50 ` Roland Dreier
2012-11-26 19:15   ` James Bottomley
2012-11-26 19:22     ` Roland Dreier
2012-11-27  4:04     ` David Dillow
     [not found]       ` <1353989041.28917.24.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2012-11-27  6:42         ` Or Gerlitz
2012-11-29 20:21       ` Roland Dreier
     [not found]         ` <CAL1RGDXpdWL_r7sWp=vvvXH4jxFgjDL+XcEGgKo-44=wrOBmtA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-30  2:21           ` David Dillow
2012-12-05 18:23             ` Or Gerlitz
     [not found]               ` <CAJZOPZJBTRXftrW5NWEEHnf2QWsni0HMTAV_PKSgDtA7GO=wRw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-05 18:50                 ` Bart Van Assche
     [not found]                   ` <50BF9760.2080801-HInyCGIudOg@public.gmane.org>
2012-12-05 19:50                     ` Bart Van Assche
2012-12-05 21:32                     ` Or Gerlitz
2012-12-06 14:10                       ` Bart Van Assche
     [not found]                         ` <50C0A76C.20500-HInyCGIudOg@public.gmane.org>
2012-12-06 14:27                           ` Or Gerlitz
     [not found]                             ` <50C0AB42.8040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-06 15:04                               ` Bart Van Assche
     [not found]                                 ` <50C0B407.4010706-HInyCGIudOg@public.gmane.org>
2012-12-06 15:46                                   ` Or Gerlitz
2012-12-06 15:55                                   ` Alex Turin
     [not found]                                     ` <50C0BFE0.909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-07 21:47                                       ` Vu Pham
     [not found]                                         ` <50C263E2.1070805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-12-08 11:15                                           ` Bart Van Assche [this message]
2012-12-07  8:19                               ` Or Gerlitz
2012-11-27 16:34 ` Bart Van Assche
     [not found]   ` <50B4EBA3.7070400-HInyCGIudOg@public.gmane.org>
2012-11-27 18:10     ` Joseph Glanville
2012-11-27 22:13     ` Or Gerlitz
     [not found]       ` <CAJZOPZJ6tNimvQGKa6E0ttowM8WWxCMR7s9BMWebVOEYtQ+TCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-28 13:33         ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C3216E.6020206@acm.org \
    --to=bvanassche-hinycgiudog@public.gmane.org \
    --cc=James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=alextu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=dillowda-1Heg1YXhbW8@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
    --cc=vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).