From mboxrd@z Thu Jan 1 00:00:00 1970 From: Werner Almesberger Subject: Prioritized disk IO BOF: summary Date: Thu, 29 Jul 2004 00:13:09 -0300 Sender: linux-fsdevel-owner@vger.kernel.org Message-ID: <20040729001309.A6857@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: abiss-general@lists.sourceforge.net Return-path: Received: from almesberger.net ([63.105.73.238]:4874 "EHLO host.almesberger.net") by vger.kernel.org with ESMTP id S263893AbUG2DNa (ORCPT ); Wed, 28 Jul 2004 23:13:30 -0400 To: linux-fsdevel@vger.kernel.org Content-Disposition: inline List-Id: linux-fsdevel.vger.kernel.org Here's a brief summary of the BOF we had last Saturday at OLS. 1st presentation: "Active Block I/O Scheduling System" by Benno van den Brink. (Slides at http://abiss.sourceforge.net/) There were a few questions related to understanding how things are arranged. An interesting suggestion was to move the entire prefetching mechanism to user space (with direct and asynchronous IO), which unfortunately would give the kernel less control over what the application is doing. Nevertheless, this approach may be worth considering for other scenarios. One issue that came up later was whether our user space daemon could deadlock in a resource conflict with requests (no, at least not in any particularly nasty way, because it is only used when enabling the ABISS service after opening the file, but not for handling the actual IO requests). Kurt Garloff also remarked that CKRM could be useful as a configuration interface. (Which indeed seems likely, for CKRM seems very nice and flexible.) 2nd presentation: "The ABISS Elevator" by Werner Almesberger (Slides also at http://abiss.sourceforge.net/) The audience wasn't so happy about the elevator "fixing" the ordering of overlapping writes, because we may not want to or even be unable to give such guarantees in other cases, where the system is more complex than just a disk with its elevator (RAID comes to mind). Either way, this doesn't affect the implementation of the ABISS elevator that much, because - in order to be allowed to reorder reads across barriers - it needs to handle some overlaps anyway. To make it not care about overlapping writes is just a matter of removing one call (a bit more than a one-line change), which will make it marginally faster. Question to the list: is guaranteeing that overlapping requests will be processed in FIFO order if and only if they're separated by a barrier something we shall assume as a general property of elevators ? (This only affects direct IO, never buffered IO.) Regarding barriers, there was general uncertainty of whether the semantics we currently have do actually make sense. Different barrier semantics may also make life easier for the elevator. We've had some discussion on barrier semantics a while ago. So I think it would be good to continue this. For now, I'm assuming that compatibility with what the current elevators do when they see a barrier, is a good thing. The next issue was whether ELEVATOR_INSERT_FRONT should act as a FIFO or as a LIFO if multiple requests are added this way. Unfortunately, Jens Axboe didn't make it to the BOF, and we couldn't quite puzzle out whether such a case could really occur, and what behaviour(s) would be right then. In any case, the ABISS elevator does what the noop elevator does, and implements a strict LIFO. Question to the list: if we enqueue A and B, both with ELEVATOR_INSERT_FRONT, must the come out as B, then A, or can they also be delivered a B, then A, as long as they come before anything else ? Another question (by Daniel Phillips or Andrew Morton, I think) was whether the ABISS elevator really needs the FIFO queues, or whether they could be implemented as priority queues. I promised to think about this. My conclusion is that such a change would probably make the elevator more complicated, because the request selection algorithm is different (the sort queues are read through a cursor implementing a single-sweep elevator, while the FIFO queues are read at the list head), and because the requests in the queues have different properties (e.g. the ones in the FIFO queue are not registered in any of the RB or RPST trees). Another issue that was raised was support for barriers and priorities at layers below the elevator, e.g. in "intelligent" storage systems. None of this seems to be overly hard to add (famous last words, I know :-), once the general infrastructure is in place. (Barriers should already be there, in some cases.) Daniel Phillips (who also had lots of good questions during the BOF) later remarked that ensuring that IO operations can be processed within predictable time is hard, because memory allocations have an unpredictable and sometimes surprisingly large cost, which is something we've observed in ABISS when going from 2.6.5 to 2.6.6. (Measured worst-case delays for read operations went up from about 2ms to 100ms or more, all due to memory allocations. That was on a 1200 MHz Duron with 256 MB or 128 MB of RAM - not exactly a race horse, but still ...) These delays seem excessive, so I'll have a look at whether they're still happening in more recent kernels. Unfortunately, not much was said about similar projects or APIs. So this was quite an ABISS-only BOF, with the main focus on the elevator. (Which is one piece of common infrastructure everybody needs to agree on.) Future directions: I've discussed some of the elevator issues with Jens before the BOF, and he's planning to add some of the functionality needed for this kind of priorities soon. This will reduce the set of patches we currently need for ABISS, and I plan to eventually merge the overlap handling into CFQ, so that, provided that Jens likes the changes, we can get rid of the ABISS elevator entirely. (The higher level parts of ABISS, such as the prefetching, will still be needed, of course. ABISS also has the concept of "upgrading" a request, which may currently not be suitable for the mainstream, but should be easy enough to maintain as add-on functionality for a while.) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/