From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCHv5 00/36] asynchronous ALUA device handler Date: Wed, 30 Sep 2015 15:21:32 +0200 Message-ID: <560BE1DC.9060600@suse.de> References: <1443523658-87622-1-git-send-email-hare@suse.de> <560AD88B.9050902@sandisk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mx2.suse.de ([195.135.220.15]:36786 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752419AbbI3NVe (ORCPT ); Wed, 30 Sep 2015 09:21:34 -0400 In-Reply-To: <560AD88B.9050902@sandisk.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Bart Van Assche , James Bottomley Cc: "linux-scsi@vger.kernel.org" , Christoph Hellwig , Ewan Milne , "Martin K. Petersen" On 09/29/2015 08:29 PM, Bart Van Assche wrote: > On 09/29/2015 03:47 AM, Hannes Reinecke wrote: >> here the next round of my update to the ALUA device handler. >=20 > Hello Hannes, >=20 > Sorry but this with this version I see an initiator kernel lockup > shortly after the initiator system had been booted. I have attached > the output of echo t > /proc/sysrq-trigger to this e-mail. >=20 Hmm. Weird. Everything seems to wait for alua_rtpg() to complete: kworker/4:2 D ffff88045c64c380 0 203 2 0x00000000 Workqueue: kaluad_wq alua_rtpg_work [scsi_dh_alua] ffff88045d94f968 0000000000000086 ffff88047fd0dcc0 ffff88047fd15ad8 ffff88045c64c380 ffff88044fc7c380 ffff88045d950000 ffff88047fd0dcc0 ffff88047fd0dcc0 000000010001c779 0000000000000004 ffff88045d94f980 Call Trace: [] schedule+0x3a/0x90 [] schedule_timeout+0x143/0x290 [] ? ktime_get+0x7d/0x130 [] ? init_timer_key+0x140/0x140 [] io_schedule_timeout+0xa6/0x120 [] ? trace_hardirqs_on+0xd/0x10 [] wait_for_completion_io_timeout+0xdf/0x120 [] ? wake_up_q+0x70/0x70 [] blk_execute_rq+0xad/0x130 [] ? bio_alloc_bioset+0x179/0x200 [] ? bio_phys_segments+0x19/0x20 [] ? blk_rq_bio_prep+0x63/0x80 [] ? blk_rq_map_kern+0xb7/0x130 [] scsi_execute+0xd3/0x160 [scsi_mod] [] scsi_execute_req_flags+0x8e/0xf0 [scsi_mod] [] alua_rtpg_work+0x2d0/0xc10 [scsi_dh_alua] But this just seems to wait for a command completion, which apparently doesn't arrive. Or not in time. What's curious, though, is that there are several instances of 'srp_daemon', each trying to allocate/setup a new SRP device: srp_daemon D ffff88045ca2ad00 0 595 592 0x00000000 ffff88043c3db960 0000000000000082 ffffffff810ba14d ffff88047fd55ad8 ffff88045ca2ad00 ffff88043cf24380 ffff88043c3dc000 ffff880425ef6548 ffff88042d5c3f78 ffff880425ef5968 ffff880425ef4dd0 ffff88043c3db978 Call Trace: [] ? trace_hardirqs_on+0xd/0x10 [] schedule+0x3a/0x90 [] blk_mq_freeze_queue_wait+0x56/0xb0 [] ? prepare_to_wait_event+0xf0/0xf0 [] blk_mq_update_tag_set_depth+0x41/0xb0 [] blk_mq_init_allocated_queue+0x7c4/0x860 [] blk_mq_init_queue+0x3a/0x60 [] scsi_mq_alloc_queue+0x1c/0x50 [scsi_mod] [] scsi_alloc_sdev+0x331/0x3b0 [scsi_mod] [] scsi_probe_and_add_lun+0x884/0xd20 [scsi_mod] [] __scsi_scan_target+0x52b/0x5f0 [scsi_mod] Unfortunately I cannot tell from the provided logs whether both refer to the same device; if so this would easily explain the issue. Can you check if there is some line-bouncing involved? If a device would be setup and teared down several times that would explain things. However, the main point seems to be that we never get a completion for the RTPG command, Which also might be an issue with the srp driver, as I've never seen this issue during my tests. Is there a way on how I could be trying to reproduce it? Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg GF: F. Imend=C3=B6rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N=C3=BCrnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html