All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@us.ibm.com>
To: Christophe Varoqui <christophe.varoqui@free.fr>,
	Andrew Vasquez <andrew.vasquez@qlogic.com>,
	James.Smart@Emulex.Com
Cc: linux-scsi@vger.kernel.org
Subject: Re: [Fwd] further testing w/ multipath ... and bugs
Date: Thu, 23 Jun 2005 14:05:36 -0700	[thread overview]
Message-ID: <20050623210536.GA20141@us.ibm.com> (raw)
In-Reply-To: <20050613123053.GB28625@averon.dyndns.org>

I did not see an answer to this issue. I am also hitting the problem
(i.e., devices being removed) during some dm-mp port bounce testing.
Is this the correct behavior going forward for the fc transport?

Also I see a difference in behavior between the lpfc and qla2xxx drivers
where the lpfc is not removing target even though the "rport-5:0-5: blocked
FC remote port time out: removing target" message is printed. I guess I
can look into the difference myself, but I thought Andrew or James S you
two would know.

Christophe Varoqui [christophe.varoqui@free.fr] wrote:
> I should have posted this here in the first place.
> Seems related to the recent fc_remote_ports and qlogic work.
> 
> Regards,
> cvaroqui
> 
> ----- Forwarded message from Christophe Varoqui <christophe.varoqui@free.fr> -----
> 
> List-Id: device-mapper development <dm-devel.redhat.com>
> 
> Here is an additional one :
> 
> When at the end of the previous scenario, with a dd in D-state, I "dmsetup remove_all" ... it effectively accept to remove the maps. Exec'ing multipath again gives :
> 
>  [<c027506c>] end_that_request_last+0xcc/0x100                                  
>  [<c02b19ed>] scsi_end_request+0x9d/0xe0                                    
>  [<c02b1d45>] scsi_io_completion+0x155/0x500                                  
>  [<c0327643>] ip_rcv+0x3a3/0x560                                            
>  [<c012c1de>] del_timer+0x5e/0x70                                               
>  [<c02cdce4>] sd_rw_intr+0x164/0x320                                         
>  [<c0149531>] mempool_free+0x81/0xa0                                          
>  [<c02c60cd>] qla2x00_process_response_queue+0x14d/0x1d0                    
>  [<c02ac946>] scsi_finish_command+0x96/0xe0                                   
>  [<c033f2e3>] tcp_write_timer+0x73/0xe0                                      
>  [<c02ac836>] scsi_softirq+0xa6/0xe0                                        
>  [<c01285b2>] __do_softirq+0x82/0x100                                      
>  [<c0128665>] do_softirq+0x35/0x40                                         
>  [<c010675b>] do_IRQ+0x3b/0x70                                                
>  [<c0104c1e>] common_interrupt+0x1a/0x20                           
>  [<c0102030>] default_idle+0x0/0x30                                
>  [<c0102053>] default_idle+0x23/0x30                               
>  [<c0102104>] cpu_idle+0x64/0x80                                   
>  [<c0462965>] start_kernel+0x185/0x1d0                             
>  [<c0462370>] unknown_bootoption+0x0/0x1e0                         
> Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10 c3 c7 04
>  24 c4 6a 38 c0 8b 44 24 10 89 44 24 04 e8 6d 30 db ff <0f> 0b 95 00 c2 62 38 c0
>  eb bc 8d 76 00 53 83 ec 08 89 c3 fa 81                     
>  <0>Kernel panic - not syncing: Fatal exception in interrupt       
> 
> Regards,
> cvaroqui
> 
> On Mon, Jun 13, 2005 at 10:11:54AM +0200, Christophe Varoqui wrote:
> > Hello,
> > 
> > I'm testing Mike Christie's START_STOP hwhandler and discovered a bunch of new, interesting, phenomenons :
> > 
> > A little context first :
> > o kernel 2.6.12-rc6 + qlogic discovery patch
> > o qla2342 (dual 2GB)
> > o EVA5000, Solaris-tagged connections
> > 
> > Here is a map created by multipath, fresh from boot :
> > 
> > eva1_lun2 (3600508b400014ba7000120000cf00000)
> > [size=50 GB][features="1 queue_if_no_path"][hwhandler="1 hp_sw"]
> > \_ round-robin 0 [active][best]
> >   \_ 0:0:0:2 sdb  8:16    [ready ][active]
> >   \_ 1:0:0:2 sdf  8:80    [ready ][active]
> > \_ round-robin 0 [enabled]
> >   \_ 0:0:1:2 sdd  8:48    [faulty][active]
> >   \_ 1:0:1:2 sdh  8:112   [faulty][active]
> > 
> > Start a background stream read with dd on that map.
> > 
> > Do a port disable on the FC switch port connected to HBA 0
> > Consistently at this moment I get the following in the logs :
> > 
> > qla2300 0000:05:0d.0: LOOP DOWN detected.
> > Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
> > in_atomic():1, irqs_disabled():1
> >  [<c0120a74>] __might_sleep+0xa4/0xc0
> >  [<c026a466>] device_for_each_child+0x26/0x80
> >  [<c02b3180>] target_block+0x0/0x30
> >  [<c02bbdae>] fc_remote_port_block+0x2e/0x60
> >  [<c02bdbf5>] qla2x00_mark_all_devices_lost+0x55/0x60
> >  [<c02c597e>] qla2x00_async_event+0x83e/0xd60
> >  [<c011dd2b>] find_busiest_group+0xbb/0x310
> >  [<c02cdce4>] sd_rw_intr+0x164/0x320
> >  [<c02c4e37>] qla2300_intr_handler+0x77/0x240
> >  [<c0144882>] handle_IRQ_event+0x32/0x70
> >  [<c0144997>] __do_IRQ+0xd7/0x140
> >  [<c0106756>] do_IRQ+0x36/0x70
> >  [<c0104c1e>] common_interrupt+0x1a/0x20
> >  [<c0102030>] default_idle+0x0/0x30
> >  [<c0102053>] default_idle+0x23/0x30
> >  [<c0102104>] cpu_idle+0x64/0x80
> > 
> > If I wait long enough, I then get the following :
> > 
> >  rport-0:0-0: blocked FC remote port time out: removing target
> >  rport-0:0-1: blocked FC remote port time out: removing target
> > 
> > ... which is rather new to me.
> > 
> > As a side effect, all sd associated are removed, uevents are sent signaling the disks have gone. This triggers checker removal on multipathd side in the current implementation.
> > 
> > Then, upon port reenable, sd are registred again with different minor than before. uevent adds get sent, multipath reconfigures the maps and ...
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 00000000
> >  printing eip:
> > f8b0d29f
> > *pde = 08e4d001
> > Oops: 0000 [#1]
> > SMP
> > Modules linked in: dm_round_robin dm_hp_sw dm_multipath md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc video button battery ac ohci_hcd tg3 floppy dm_mod qla6312
> > CPU:    2
> > EIP:    0060:[<f8b0d29f>]    Not tainted VLI
> > EFLAGS: 00010086   (2.6.12-rc6)
> > EIP is at rr_select_path+0xf/0x60 [dm_round_robin]
> > eax: f6a989cc   ebx: 00000000   ecx: f6a978c0   edx: f7f1e77c
> > esi: f7f1e77c   edi: 00000000   ebp: 00000001   esp: f65d1f00
> > ds: 007b   es: 007b   ss: 0068
> > Process kmpathd/2 (pid: 4564, threadinfo=f65d0000 task=f6708aa0)
> > Stack: f6a989c0 f7f1e740 f8ae3bc2 f7f1e740 f7f1e740 f8ae3c90 f7f1e740 f7f1e740
> >        00000000 f7f1e74c f8ae3f9c 00000286 00000000 f7f1e754 f7f1e740 f7f34100
> >        f7f1e790 00000282 c01339a2 00000000 000f42b4 f6cdfe5c f7f34128 f7f34110
> > Call Trace:
> >  [<f8ae3bc2>] __choose_path_in_pg+0x12/0x40 [dm_multipath]
> >  [<f8ae3c90>] __choose_pgpath+0xa0/0xb0 [dm_multipath]
> >  [<f8ae3f9c>] process_queued_ios+0x7c/0xf0 [dm_multipath]
> >  [<c01339a2>] worker_thread+0x1c2/0x250
> >  [<f8ae3f20>] process_queued_ios+0x0/0xf0 [dm_multipath]
> >  [<c011eaa0>] default_wake_function+0x0/0x10
> >  [<c011eaa0>] default_wake_function+0x0/0x10
> >  [<c01337e0>] worker_thread+0x0/0x250
> >  [<c0137c65>] kthread+0xa5/0xf0
> >  [<c0137bc0>] kthread+0x0/0xf0
> >  [<c0102445>] kernel_thread_helper+0x5/0x10
> > Code: 42 04 89 10 89 58 04 89 03 31 c0 5b c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 83 ec 08 89 74 24 04 89 d6 89 1c 24 8b 58 04 <8b> 03 39 d8 74 30 89 c1 8b 50 04 8b 00 85 c9 89 50 04 89 02 8b
> > 
> > Here dd is now stuck in D-state.
> > 
> > I Will post more as I continue my hammering.
> > 
> > Regards,
> > cvaroqui
> > 
> ----- End forwarded message -----
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-andmike
--
Michael Anderson
andmike@us.ibm.com


  reply	other threads:[~2005-06-23 21:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-13 12:30 [Fwd] further testing w/ multipath ... and bugs Christophe Varoqui
2005-06-23 21:05 ` Mike Anderson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2005-06-24 14:14 James.Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050623210536.GA20141@us.ibm.com \
    --to=andmike@us.ibm.com \
    --cc=James.Smart@Emulex.Com \
    --cc=andrew.vasquez@qlogic.com \
    --cc=christophe.varoqui@free.fr \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.