From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: Re: [Fwd] further testing w/ multipath ... and bugs Date: Thu, 23 Jun 2005 14:05:36 -0700 Message-ID: <20050623210536.GA20141@us.ibm.com> References: <20050613123053.GB28625@averon.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from e1.ny.us.ibm.com ([32.97.182.141]:34709 "EHLO e1.ny.us.ibm.com") by vger.kernel.org with ESMTP id S262676AbVFWVF2 (ORCPT ); Thu, 23 Jun 2005 17:05:28 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j5NL5SBf016887 for ; Thu, 23 Jun 2005 17:05:28 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j5NL5SiO261352 for ; Thu, 23 Jun 2005 17:05:28 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j5NL5RAY001882 for ; Thu, 23 Jun 2005 17:05:28 -0400 Content-Disposition: inline In-Reply-To: <20050613123053.GB28625@averon.dyndns.org> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Christophe Varoqui , Andrew Vasquez , James.Smart@Emulex.Com Cc: linux-scsi@vger.kernel.org I did not see an answer to this issue. I am also hitting the problem (i.e., devices being removed) during some dm-mp port bounce testing. Is this the correct behavior going forward for the fc transport? Also I see a difference in behavior between the lpfc and qla2xxx drivers where the lpfc is not removing target even though the "rport-5:0-5: blocked FC remote port time out: removing target" message is printed. I guess I can look into the difference myself, but I thought Andrew or James S you two would know. Christophe Varoqui [christophe.varoqui@free.fr] wrote: > I should have posted this here in the first place. > Seems related to the recent fc_remote_ports and qlogic work. > > Regards, > cvaroqui > > ----- Forwarded message from Christophe Varoqui ----- > > List-Id: device-mapper development > > Here is an additional one : > > When at the end of the previous scenario, with a dd in D-state, I "dmsetup remove_all" ... it effectively accept to remove the maps. Exec'ing multipath again gives : > > [] end_that_request_last+0xcc/0x100 > [] scsi_end_request+0x9d/0xe0 > [] scsi_io_completion+0x155/0x500 > [] ip_rcv+0x3a3/0x560 > [] del_timer+0x5e/0x70 > [] sd_rw_intr+0x164/0x320 > [] mempool_free+0x81/0xa0 > [] qla2x00_process_response_queue+0x14d/0x1d0 > [] scsi_finish_command+0x96/0xe0 > [] tcp_write_timer+0x73/0xe0 > [] scsi_softirq+0xa6/0xe0 > [] __do_softirq+0x82/0x100 > [] do_softirq+0x35/0x40 > [] do_IRQ+0x3b/0x70 > [] common_interrupt+0x1a/0x20 > [] default_idle+0x0/0x30 > [] default_idle+0x23/0x30 > [] cpu_idle+0x64/0x80 > [] start_kernel+0x185/0x1d0 > [] unknown_bootoption+0x0/0x1e0 > Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10 c3 c7 04 > 24 c4 6a 38 c0 8b 44 24 10 89 44 24 04 e8 6d 30 db ff <0f> 0b 95 00 c2 62 38 c0 > eb bc 8d 76 00 53 83 ec 08 89 c3 fa 81 > <0>Kernel panic - not syncing: Fatal exception in interrupt > > Regards, > cvaroqui > > On Mon, Jun 13, 2005 at 10:11:54AM +0200, Christophe Varoqui wrote: > > Hello, > > > > I'm testing Mike Christie's START_STOP hwhandler and discovered a bunch of new, interesting, phenomenons : > > > > A little context first : > > o kernel 2.6.12-rc6 + qlogic discovery patch > > o qla2342 (dual 2GB) > > o EVA5000, Solaris-tagged connections > > > > Here is a map created by multipath, fresh from boot : > > > > eva1_lun2 (3600508b400014ba7000120000cf00000) > > [size=50 GB][features="1 queue_if_no_path"][hwhandler="1 hp_sw"] > > \_ round-robin 0 [active][best] > > \_ 0:0:0:2 sdb 8:16 [ready ][active] > > \_ 1:0:0:2 sdf 8:80 [ready ][active] > > \_ round-robin 0 [enabled] > > \_ 0:0:1:2 sdd 8:48 [faulty][active] > > \_ 1:0:1:2 sdh 8:112 [faulty][active] > > > > Start a background stream read with dd on that map. > > > > Do a port disable on the FC switch port connected to HBA 0 > > Consistently at this moment I get the following in the logs : > > > > qla2300 0000:05:0d.0: LOOP DOWN detected. > > Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 > > in_atomic():1, irqs_disabled():1 > > [] __might_sleep+0xa4/0xc0 > > [] device_for_each_child+0x26/0x80 > > [] target_block+0x0/0x30 > > [] fc_remote_port_block+0x2e/0x60 > > [] qla2x00_mark_all_devices_lost+0x55/0x60 > > [] qla2x00_async_event+0x83e/0xd60 > > [] find_busiest_group+0xbb/0x310 > > [] sd_rw_intr+0x164/0x320 > > [] qla2300_intr_handler+0x77/0x240 > > [] handle_IRQ_event+0x32/0x70 > > [] __do_IRQ+0xd7/0x140 > > [] do_IRQ+0x36/0x70 > > [] common_interrupt+0x1a/0x20 > > [] default_idle+0x0/0x30 > > [] default_idle+0x23/0x30 > > [] cpu_idle+0x64/0x80 > > > > If I wait long enough, I then get the following : > > > > rport-0:0-0: blocked FC remote port time out: removing target > > rport-0:0-1: blocked FC remote port time out: removing target > > > > ... which is rather new to me. > > > > As a side effect, all sd associated are removed, uevents are sent signaling the disks have gone. This triggers checker removal on multipathd side in the current implementation. > > > > Then, upon port reenable, sd are registred again with different minor than before. uevent adds get sent, multipath reconfigures the maps and ... > > > > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > > printing eip: > > f8b0d29f > > *pde = 08e4d001 > > Oops: 0000 [#1] > > SMP > > Modules linked in: dm_round_robin dm_hp_sw dm_multipath md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc video button battery ac ohci_hcd tg3 floppy dm_mod qla6312 > > CPU: 2 > > EIP: 0060:[] Not tainted VLI > > EFLAGS: 00010086 (2.6.12-rc6) > > EIP is at rr_select_path+0xf/0x60 [dm_round_robin] > > eax: f6a989cc ebx: 00000000 ecx: f6a978c0 edx: f7f1e77c > > esi: f7f1e77c edi: 00000000 ebp: 00000001 esp: f65d1f00 > > ds: 007b es: 007b ss: 0068 > > Process kmpathd/2 (pid: 4564, threadinfo=f65d0000 task=f6708aa0) > > Stack: f6a989c0 f7f1e740 f8ae3bc2 f7f1e740 f7f1e740 f8ae3c90 f7f1e740 f7f1e740 > > 00000000 f7f1e74c f8ae3f9c 00000286 00000000 f7f1e754 f7f1e740 f7f34100 > > f7f1e790 00000282 c01339a2 00000000 000f42b4 f6cdfe5c f7f34128 f7f34110 > > Call Trace: > > [] __choose_path_in_pg+0x12/0x40 [dm_multipath] > > [] __choose_pgpath+0xa0/0xb0 [dm_multipath] > > [] process_queued_ios+0x7c/0xf0 [dm_multipath] > > [] worker_thread+0x1c2/0x250 > > [] process_queued_ios+0x0/0xf0 [dm_multipath] > > [] default_wake_function+0x0/0x10 > > [] default_wake_function+0x0/0x10 > > [] worker_thread+0x0/0x250 > > [] kthread+0xa5/0xf0 > > [] kthread+0x0/0xf0 > > [] kernel_thread_helper+0x5/0x10 > > Code: 42 04 89 10 89 58 04 89 03 31 c0 5b c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 83 ec 08 89 74 24 04 89 d6 89 1c 24 8b 58 04 <8b> 03 39 d8 74 30 89 c1 8b 50 04 8b 00 85 c9 89 50 04 89 02 8b > > > > Here dd is now stuck in D-state. > > > > I Will post more as I continue my hammering. > > > > Regards, > > cvaroqui > > > ----- End forwarded message ----- > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -andmike -- Michael Anderson andmike@us.ibm.com