From mboxrd@z Thu Jan 1 00:00:00 1970 From: Doug Ledford Subject: Re: [patch 2.5] ips queue depths Date: Wed, 16 Oct 2002 16:15:31 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20021016201531.GB8159@redhat.com> References: <20021015194705.GD4391@redhat.com> <20021015130445.A829@eng2.beaverton.ibm.com> <20021015205218.GG4391@redhat.com> <20021015163057.A7687@eng2.beaverton.ibm.com> <20021016023231.GA4690@redhat.com> <20021016120436.A1598@eng2.beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from flossy.devel.redhat.com (localhost.localdomain [127.0.0.1]) by flossy.devel.redhat.com (8.12.5/8.12.5) with ESMTP id g9GKFVfv008184 for ; Wed, 16 Oct 2002 16:15:31 -0400 Received: (from dledford@localhost) by flossy.devel.redhat.com (8.12.5/8.12.5/Submit) id g9GKFV2m008182 for linux-scsi@vger.kernel.org; Wed, 16 Oct 2002 16:15:31 -0400 Content-Disposition: inline In-Reply-To: <20021016120436.A1598@eng2.beaverton.ibm.com> List-Id: linux-scsi@vger.kernel.org To: "'linux-scsi@vger.kernel.org'" On Wed, Oct 16, 2002 at 12:04:36PM -0700, Patrick Mansfield wrote: > On Tue, Oct 15, 2002 at 10:32:31PM -0400, Doug Ledford wrote: > > On Tue, Oct 15, 2002 at 04:30:57PM -0700, Patrick Mansfield wrote: > > > > > If the adapter is setting queue depth based upon what > > > the adapter knows, that is completely wrong > > > > I strongly disagree here. Only the adapter driver knows if the card > > itself has a hard limit of X commands at a time (so that deeper queues are > > wasted). Only the adapter driver knows if the particular interconnect has > > any inherent queue limitations or speed limitations. There are a thousand > > things only the adapter driver knows that should be factored into a sane > > queue depth. > > OK, the adapter does not get it completely wrong, but it does not > know about special scsi device limitations, block layer limits, usage > patterns, or total number of scsi devices on the system. All of these are special cases. Scsi device limitations it will learn (that's the go lower part ;-), block layer limits need to be addressed (by changing the block layer to know about our queue depth requirements is my preferred way of doing things, I would like to see the block layer adjust the request queue depth whenever we adjust our queue depth, but I'm not sure what this would take at this point in time), usage patterns are something only an admin could reasonably tell us ("Hey, this disk is only used for temporary storage when streaming to tape, so it only needs miniscule resources" is valid, but special, and trying to write a default for this scenario isn't feasible), and total number of scsi devices on the system is learnable, but I don't think we should change our defaults over it (especially since hot plug devices makes this a dubious distinction at best, I think the better default is to assume that any machine with some huge number of drives attached via SCSI is likely a huge machine and that the relatively small X kilobytes of data we allocate statically per drive is OK, the aic7xxx_old driver allocates 1k +- for each command, with a maximum of 255 commands per controller regardless of total device queue depth on that controller, so a maximum of 256K per controller right now, so 100 drives on 10 controllers would be a maximum of 2.5MB of RAM, and any machine with 100 disks shouldn't balk at 2.5MB of aic7xxx_old data structs, how much data is allocated in the block layer and mid layer is not a figure I have at hand though). > Your changes (set high, adjust lower as we hit queue fulls) should > work fine in most cases, and is much better than the previous state. Good, then we agree ;-) > Some example cases where we might want to lower queue depth: > > System with small amounts of memory compared to the number of devices. > > With many disks on a system, some with a very light load, it could be > give the lightly loaded disks a lower queue depth so they use less > memory, or so they can do less IO. Both of these are strong candidates for proper admin setup IMHO. > In a 2 node cluster with shared devices, the queue depth could be set to > half of some hard limit on each node of the cluster, and avoid hitting > any hard queue fulls. Maybe, but what if you want each node to be able to reach maximum performance under peak conditions and then fall back during more relaxed times? This is the same sort of thing that all of our controllers have been doing in the past by limiting the queue depth of all devices on a controller to the controllers maximum depth and I contend it's a false optimization. It permanently lowers the peak performance of each machine for fear that you might have contention for the device's queue instead of simply acting reasonable in the presence of QUEUE_FULL return codes. > (It would really nice if we could modify the number of struct request's > allocated for little used or unused devices, or on character devices like > tape that don't even use the requests. Current block code allocates 2*128 > of these on systems with lots of memory, this could save way more space > than lowering the queue depth.) This I agree with 1000% and is on my list of things to investigate. > I was suggesting a common interface via a Scsi_Device device attribute > so the default depth can be modified as needed, rather than a fixed > boot or module load option that is fixed (once the driver is loaded) > and might be a different option for every adapter driver. > > It's too bad we can't modify new_queue_depth and have all the layers > (well mid and lower) adjust accordingly. Well, I don't have any objection to being able to modify queue depths. However, the current problem is that none of the device drivers have the ability to accept events that tell them to change their queue depth on a device. Instead, they all want to tell the mid layer what queue depth to use. Now, because we can't just blindly shove more commands into a driver than it's expecting, you can't up the queue depth without telling the driver (lowering it is probably safe, although it is wasteful of allocated structs in memory). As it stands, the mid layer *would* honor changing new_queue_depth. -- Doug Ledford 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606