From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: Re: [RFC] aic94xx: attaching to the sas transport class Date: Thu, 9 Mar 2006 10:05:21 -0800 Message-ID: <20060309180521.GB5498@us.ibm.com> References: <8C064C48AB104B428CBA524C342357CA34CFCB@aime2k05.adaptec.com> <1141445373.5397.23.camel@mulgrave.il.steeleye.com> <20060306193555.GA2316@us.ibm.com> <1141674628.3167.31.camel@mulgrave.il.steeleye.com> <1141692292.8649.75.camel@localhost.localdomain> <1141830925.3194.10.camel@mulgrave.il.steeleye.com> <1141923969.8649.108.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from e32.co.us.ibm.com ([32.97.110.150]:51410 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S1751118AbWCISHp (ORCPT ); Thu, 9 Mar 2006 13:07:45 -0500 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id k29I7iWe014747 for ; Thu, 9 Mar 2006 13:07:44 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.8) with ESMTP id k29I4xBe244212 for ; Thu, 9 Mar 2006 11:05:00 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id k29I7hXD003011 for ; Thu, 9 Mar 2006 11:07:43 -0700 Content-Disposition: inline In-Reply-To: <1141923969.8649.108.camel@localhost.localdomain> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Alexis Bruemmer Cc: James Bottomley , "Tarte, Robert" , linux-scsi Alexis Bruemmer wrote: > > Which shows that the current scsi_flush_work() is in the wrong place. > > If you move it out of sas_init.c and into aic94xx_init.c at this place, > > I think you'll find everything now works for you. > > I tried your suggestion and moved the scsi_flush_work() from sas_init.c > to aic94xx_init.c and, unfortunately the discovery race condition still > existed with this change (see dump below). This makes sense because > where we are flushing the work queue we cannot guarantee that any work > actually exits there yet. I assume one explanation is that without waiting for the event thread to make it passed the event_sema there is no work to flush. I have added some notes I had below of the different call chains. -andmike -- Michael Anderson andmike@us.ibm.com Moving the scsi_flush_work from sas_register_ha to asd_pci_probe() prior to return of asd_pci_probe may still miss an event. The difference between this move not working and the patch that Alexis posted working could be that Alexis's patch was waiting until sas_discover_work_fn had called sas_process_events prior to indicating discovery was done. 1.) pci probe context asd_pci_probe(...) asd_register_sas_ha(...) sas_register_ha(...) sas_start_event_thread(...) wait_for_completion(&event_th_comp); <-- asd_enable_phys(...) scsi_flush_work(sas_ha->core.shost); 2.) hw interrupt leading to event context. asd_hw_isr(...) asd_process_donelist_isr(...) asd_dl_tasklet_handler(...) asd_task_tasklet_complete(...) or control_phy_tasklet_complete(...) or escb_tasklet_complete(...) asd_bytes_dmaed_tasklet(...) notify_port_event(...) up(&ha->event_sema); 3.) event thread context sas_event_thread(...) complete(&event_th_comp); down_interruptible(&sas_ha->event_sema); sas_process_events(...) sas_process_port_event(...) sas_porte_bytes_dmaed(...) sas_form_port(...) sas_discover_event(...) INIT_WORK(&port->work, sas_discover_work_fn, port); scsi_queue_work(port->ha->core.shost, &port->work); 4.) work fn context sas_discover_work_fn(...) sas_discover_domain(...) sas_discover_end_dev(...) sas_rphy_add(...) scsi_scan_target(...)