From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick Mansfield Subject: Re: [PATCH] scsi-misc-2.5 user per-device spare command Date: Fri, 25 Apr 2003 09:37:18 -0700 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20030425093718.A8776@beaverton.ibm.com> References: <20030424100229.A32098@beaverton.ibm.com> <20030424100317.A32134@beaverton.ibm.com> <20030425111227.B28577@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from 216-99-218-173.dsl.aracnet.com ([216.99.218.173]:5617 "EHLO dyn9-47-17-132.beaverton.ibm.com") by vger.kernel.org with ESMTP id S263459AbTDYQ3Q (ORCPT ); Fri, 25 Apr 2003 12:29:16 -0400 Content-Disposition: inline In-Reply-To: <20030425111227.B28577@infradead.org>; from hch@infradead.org on Fri, Apr 25, 2003 at 11:12:27AM +0100 List-Id: linux-scsi@vger.kernel.org To: Christoph Hellwig Cc: James Bottomley , linux-scsi@vger.kernel.org Christoph - On Fri, Apr 25, 2003 at 11:12:27AM +0100, Christoph Hellwig wrote: > On Thu, Apr 24, 2003 at 10:03:17AM -0700, Patrick Mansfield wrote: > > Patch against scsi-misc-2.5 > > > > Use a per-device spare command rather than a per-host spare. > > Why? This means we'll have a much bigger number of spare commands > around. So we are guaranteed to be able to have at least one IO in flight for all devices. With a per-host spare, some device(s) might have to wait (generally polling based on an interval dependent on whatever blk_plug_device gives us) for a command to become available, and it is not clear what the IO behaviour - especially for swap - will be in such cases. It simplifies the code, and allows removal of a lock from the IO completion path. [I'm not sure what happens for swap striping or md. Is swap stripe size always a multiple of the page size? So one IO completion to a swap device (striped or not) is guaranteed to free up at least one page? What about users putting swap on top of md, RAID-0 or RAID-5? Is this not allowed, or considered a dumb configuration?] For systems with one disk we are better off (than with a per host spare) - we have one spare, we always use it, and we don't allocate or use the per host free_list_lock. For larger numbers of disks, if we normally have IO in flight to all devices, there is not much difference. Generally, we have one extra command per device that does not have IO in flight. Ideally we should have some sysfs attribute to specify whether a spare (or spares) is needed or not, default to no spare, and then have some user land method to normally add a spare for any scsi swap devices (maybe swapon?). But, a sysfs attribute (one inode) would use almost as much memory as the spare command itself (340 bytes, but should be smaller), and adds a bit of code to the put case. (we could save almost as much memory as we are allocating for the spare just by removing for example the scsi "rev" sysfs attribute, or even more by exposing all of the inquiry as a binary file). If the sysfs attributes were lighter, something lighter is available, or general sentiment is to just go ahead and use a sysfs attribute, I can change the code to use it. (I don't think having more than one spare per device is useful.) > > + /* > > + * Use any spare command first. > > + */ > > + cmd = sdev->spare_cmd; > > + if (cmd != NULL) > > + sdev->spare_cmd = NULL; > > + else > > + cmd = kmem_cache_alloc(sdev->host->cmd_pool->slab, > > + gfp_mask | sdev->host->cmd_pool->gfp_mask); > > This logical is flawed. We don't need a spare command if we always use > it first. In addition the sdev->spare_cmd access is racy. I had originally coded it to only use the spare on failure of the cache, but then we would normally never use the spare (similiar to our per-host spare today). The sdev->sdev_lock (i.e. queue_lock) protects access to sdev->spare_cmd. -- Patrick Mansfield