From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Smart Subject: Re: [RFC PATCH 2/4] scsi error: have scsi-ml call change_queue_depth to handle QUEUE_FULL Date: Tue, 16 Jun 2009 09:16:15 -0400 Message-ID: <4A379B1F.2070708@emulex.com> References: <12427123671020-git-send-email-michaelc@cs.wisc.edu> <12427123683166-git-send-email-michaelc@cs.wisc.edu> <1242712369913-git-send-email-michaelc@cs.wisc.edu> <12427123692457-git-send-email-michaelc@cs.wisc.edu> <20090612124849.GA8017@schmichrtp.de.ibm.com> <4A368639.20701@cs.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from emulex.emulex.com ([138.239.112.1]:43558 "EHLO emulex.emulex.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753896AbZFPNQ1 (ORCPT ); Tue, 16 Jun 2009 09:16:27 -0400 In-Reply-To: <4A368639.20701@cs.wisc.edu> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Mike Christie Cc: Christof Schmitt , "linux-scsi@vger.kernel.org" , Andrew Vasquez Mike Christie wrote: >> This was called because of a "queue full" for one SCSI device. Why do >> you decrement the queue depth for all SCSI devices on the same host >> and not only for one device? >> > > It should actually do it for only the devices on the same target where > the problem occurred. I copied the code from lpfc and qla2xxx and cannot > remember the reason why this is done now. I am ccing AndrewV and JamesS. > > Agree that it should be localized to the target and not propagated to all targets. Our design issue was choosing how to apply the backoff - did a queue full on a single lun imply the entire target is full ? Thus, should we reduce all luns at that point, or only the lun that saw the queue full. All depends on how fast you want to ramp down the overall situation and how biased things get on multiple luns.. And the same decision on the ramp up - to we raise everyone, or let the luns function independently. Raising everyone too quickly recauses the issue, and why would one hot lun steal capacity/queuing depth for a slow lun ? There's a lot of assumptions being made in this choice on what is the gating resource (the io capacity of the target being equally shared by all luns). -- james s