All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michaelc@cs.wisc.edu>
To: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: "Shi, Harris" <Harris.Shi@lsi.com>,
	Hannes Reinecke <hare@suse.de>,
	"malahal@us.ibm.com" <malahal@us.ibm.com>,
	SCSI development list <linux-scsi@vger.kernel.org>
Subject: Re: question on block-layer timeout change
Date: Sun, 04 Jan 2009 11:12:00 -0600	[thread overview]
Message-ID: <4960EDE0.70007@cs.wisc.edu> (raw)
In-Reply-To: <20081218092339.GB812@linux.vnet.ibm.com>

Mike Anderson wrote:
> Shi, Harris <Harris.Shi@lsi.com> wrote:
>> Information from /var/log/messages:
>> ===================================
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current]
>> Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
>> Dec 17 15:58:25 timon kernel:  connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237
>> Dec 17 15:58:25 timon kernel:  connection2:0: detected conn error (1011)
>> Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
>>
>>
>>
>> Information from Serial output:
>> ===============================
>> Oops: 0002 [#1] SMP
>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map
>> Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 00000000319fe001 *pde = 0000000000000000
>> BUG: unable to handle kernel NULL pointer dereference at 00000086
>> IP: [<c011a274>] __ticket_spin_lock+0x8/0x19
>> *pdpt = 0000000000546001 *pde = 0000000000000000
>>  ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ]
>>
>> Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850
>> EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3
>> EIP is at __ticket_spin_lock+0x8/0x19
>> EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100
>> ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78
>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000)
>> Stack:
>>  f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000
>>  f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2
>>  f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08
>> Call Trace:
>>  [<c0328e0f>] ? _spin_lock+0x15/0x18
>>  [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi]
>>  [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod]
>>  [<c0212539>] ? blk_rq_timed_out+0xc/0x46
> 
> I could not match my listing exactly with this output, but it appears that
> the session is NULL when we call into iscsi_eh_cmd_timed_out. An addr2line
> would help verify the iscsi_eh_cmd_timed_out line.
> 
> I added Mike C to the email cc for possible comments on the error messages
> displayed above and if that would lead to cleanup of structures referenced
> in iscsi_eh_cmd_timed_out.
> 

Sorry for the late reply. I have been on vacation.

The iscsi error message just indicates that the initiator tried to send 
a iscsi ping and it did not get a response, so the initiator dropped the 
session. The error was reported as a generic connection error (1011) and 
when the error was fired the initiator was in the logged in / full 
feature phase (this basically means normal old use and nothing special).

Due to the sles use and MPP driver, I am not sure what exactly is 
running, but for this code path the iscsi driver does this:

         cls_session = starget_to_session(scsi_target(scmd->device));
         session = cls_session->dd_data;

to get the session in iscsi_eh_cmd_timed_out (we do this in all kernels 
do that has not changed).

The session pointers are only changed when a session is destroyed, and 
that only happens if you do a logout of the session (iscsiadm -m .... 
-u), and at that time when the session is destroyed we should have 
flushed all IO.

Are you guys doing a logout of the session with iscsiadm at this time?

Does MPP clone commands and is it doing something with the command's 
pointers to the device?

Another possibilty is that commands are not getting cleaned up 
correctly. When you see this "connection2:0: detected conn error 
(1011)", the driver is going to kill all outstadnding commands and and 
call scsi_done on them to requeue them with the scsi layer, so we should 
be getting any commands timed out after you see that message (maybe only 
in some race case where the session's commands are getting flushed at 
the exact same time the scsi eh was firing). There was no time stamp on 
the oops output but I doubt this happened. Did the oops happen after the 
conn error message though?

  parent reply	other threads:[~2009-01-04 17:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3568BBCB98C00041A9E622952FD5F24EA11C9F3A@cosmail03.lsi.com>
2008-11-12  7:29 ` question on block-layer timeout change Mike Anderson
2008-11-12 17:16   ` malahal
2008-11-14  8:51   ` Shi, Harris
2008-11-14 17:18     ` malahal
2008-12-10 23:11       ` Shi, Harris
2008-12-11 11:03         ` Hannes Reinecke
2008-12-16 16:55           ` Shi, Harris
2008-12-17  7:33             ` Hannes Reinecke
2008-12-17 22:38               ` Shi, Harris
2008-12-18  9:23                 ` Mike Anderson
2008-12-18 22:37                   ` Shi, Harris
2009-01-04 17:12                   ` Mike Christie [this message]
2009-01-07  6:37                     ` Shi, Harris
2009-01-07 20:46                       ` Mike Christie
2009-01-24 16:34                         ` Shi, Harris
2008-11-11 16:26 Question " Shi, Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4960EDE0.70007@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=Harris.Shi@lsi.com \
    --cc=andmike@linux.vnet.ibm.com \
    --cc=hare@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=malahal@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.