From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: scsi command slab allocation under memory pressure Date: 29 Jan 2003 17:53:15 -0500 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <1043880795.1775.2.camel@mulgrave> References: <20030129104731.A2811@beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: (from root@localhost) by pogo.mtv1.steeleye.com (8.9.3/8.9.3) id OAA13845 for ; Wed, 29 Jan 2003 14:53:22 -0800 In-Reply-To: <20030129104731.A2811@beaverton.ibm.com> List-Id: linux-scsi@vger.kernel.org To: Patrick Mansfield Cc: SCSI Mailing List On Wed, 2003-01-29 at 13:47, Patrick Mansfield wrote: > The linux-scsi.bkbits.net scsi-kmem_alloc-2.5 and scsi-combined-2.5 tree > include the scsi command slab allocation (Luben's patch). > > How does the use of a single slab for all hosts and all devices allow for > IO while under memory pressure? In essence, all we really need to guarantee under memory pressure is that I/O which is being used to clear memory (i.e. for the swap device) will eventually proceed. This is the weakest necessary assumption for the system to make forward progress. Having a single command per device (or even just a single available command) guarantees this since if it is outstanding, it will eventually return and be re-used for clearing memory, which is all that is required. With a single command per host, there is a starvation issue if you have heavy I/O to a device whose controller also contains the swap. However, it would have to be fairly pathological conditions to continue doing heavy I/O under memory pressure while starving the swap device. > There is one extra scsi command pre-allocated per host, but don't we > require at least one (and ideally maybe more) per device? The pre-slab > (current mainline kernel) command allocation always had at least one > command per device available, and usually more (because we allocated more > commands during the scan and upper level init). Now we get into tuning: Even if the system is making forward progress, it might be doing it erratically, so how best do we ensure that the memory clearing I/O proceeds. > That is - if we have swap on a separate disk and our command pool is small > enough, IO to another disk could use the single per-host command under > memory pressure, and we can fail to get a scsi command in order to write > to the swap disk. Right: a single command reserved per swap device would be sufficient to assure a steady stream of memory clearing I/O, which is probably sufficient for most purposes. > scsi_put_command() re-fills the host->free_list if it is empty, but under > high (or higher) IO loads, the disk/device that generated the > scsi_put_command will immediately issue a scsi_get_command for the same > device. That's true, but again, it's a system tuning issue. The optimal thing to do for SCSI is to issue a new command for a device that just returned one because we know it has all the resources to hand. > What we do under memory pressure needs to be separated from what we do ordinarily. > If all command allocations are failing for a particular device (i.e. > swap), we will wait a bit (device_blocked and device_busy == 0) and try > again, we will not retry based on a scsi_put_command(). Even if we did > retry based on a scsi_put_command, we will can race with the > scsi_put_command caller. This is theoretically possible, but unlikely: all of the allocated commands must eventually return. I can't think of any non pathological load scenarios where we can load up the command queues so completely from userland as to cause complete starvation of the swap devices...but doubtless somebody will come up with one. James