linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Prioritized disk IO BOF: summary
@ 2004-07-29  3:13 Werner Almesberger
  0 siblings, 0 replies; only message in thread
From: Werner Almesberger @ 2004-07-29  3:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: abiss-general

Here's a brief summary of the BOF we had last Saturday at OLS.


1st presentation: "Active Block I/O Scheduling System" by Benno
van den Brink. (Slides at http://abiss.sourceforge.net/)

There were a few questions related to understanding how things
are arranged. An interesting suggestion was to move the entire
prefetching mechanism to user space (with direct and asynchronous
IO), which unfortunately would give the kernel less control over
what the application is doing. Nevertheless, this approach may be
worth considering for other scenarios.

One issue that came up later was whether our user space daemon
could deadlock in a resource conflict with requests (no, at least
not in any particularly nasty way, because it is only used when
enabling the ABISS service after opening the file, but not for
handling the actual IO requests).

Kurt Garloff also remarked that CKRM could be useful as a
configuration interface. (Which indeed seems likely, for CKRM
seems very nice and flexible.)


2nd presentation: "The ABISS Elevator" by Werner Almesberger
(Slides also at http://abiss.sourceforge.net/)

The audience wasn't so happy about the elevator "fixing" the
ordering of overlapping writes, because we may not want to or
even be unable to give such guarantees in other cases, where the
system is more complex than just a disk with its elevator (RAID
comes to mind).

Either way, this doesn't affect the implementation of the ABISS
elevator that much, because - in order to be allowed to reorder
reads across barriers - it needs to handle some overlaps anyway.
To make it not care about overlapping writes is just a matter of
removing one call (a bit more than a one-line change), which will
make it marginally faster.

Question to the list: is guaranteeing that overlapping requests
will be processed in FIFO order if and only if they're separated
by a barrier something we shall assume as a general property of
elevators ? (This only affects direct IO, never buffered IO.)

Regarding barriers, there was general uncertainty of whether the
semantics we currently have do actually make sense. Different
barrier semantics may also make life easier for the elevator.
We've had some discussion on barrier semantics a while ago. So I
think it would be good to continue this. For now, I'm assuming
that compatibility with what the current elevators do when they
see a barrier, is a good thing.

The next issue was whether ELEVATOR_INSERT_FRONT should act as a
FIFO or as a LIFO if multiple requests are added this way.
Unfortunately, Jens Axboe didn't make it to the BOF, and we
couldn't quite puzzle out whether such a case could really occur,
and what behaviour(s) would be right then. In any case, the ABISS
elevator does what the noop elevator does, and implements a
strict LIFO.

Question to the list: if we enqueue A and B, both with
ELEVATOR_INSERT_FRONT, must the come out as B, then A, or can
they also be delivered a B, then A, as long as they come before
anything else ?

Another question (by Daniel Phillips or Andrew Morton, I think)
was whether the ABISS elevator really needs the FIFO queues, or
whether they could be implemented as priority queues. I promised
to think about this. My conclusion is that such a change would
probably make the elevator more complicated, because the request
selection algorithm is different (the sort queues are read
through a cursor implementing a single-sweep elevator, while the
FIFO queues are read at the list head), and because the requests
in the queues have different properties (e.g. the ones in the
FIFO queue are not registered in any of the RB or RPST trees).

Another issue that was raised was support for barriers and
priorities at layers below the elevator, e.g. in "intelligent"
storage systems. None of this seems to be overly hard to add
(famous last words, I know :-), once the general infrastructure
is in place. (Barriers should already be there, in some cases.)

Daniel Phillips (who also had lots of good questions during the
BOF) later remarked that ensuring that IO operations can be
processed within predictable time is hard, because memory
allocations have an unpredictable and sometimes surprisingly
large cost, which is something we've observed in ABISS when
going from 2.6.5 to 2.6.6. (Measured worst-case delays for read
operations went up from about 2ms to 100ms or more, all due to
memory allocations. That was on a 1200 MHz Duron with 256 MB or
128 MB of RAM - not exactly a race horse, but still ...)

These delays seem excessive, so I'll have a look at whether
they're still happening in more recent kernels.


Unfortunately, not much was said about similar projects or APIs.
So this was quite an ABISS-only BOF, with the main focus on the
elevator. (Which is one piece of common infrastructure everybody
needs to agree on.)


Future directions: I've discussed some of the elevator issues
with Jens before the BOF, and he's planning to add some of the
functionality needed for this kind of priorities soon. This will
reduce the set of patches we currently need for ABISS, and I plan
to eventually merge the overlap handling into CFQ, so that,
provided that Jens likes the changes, we can get rid of the ABISS
elevator entirely. (The higher level parts of ABISS, such as the
prefetching, will still be needed, of course. ABISS also has the
concept of "upgrading" a request, which may currently not be
suitable for the mainstream, but should be easy enough to
maintain as add-on functionality for a while.)

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina         wa@almesberger.net /
/_http://www.almesberger.net/____________________________________________/

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-07-29  3:13 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-29  3:13 Prioritized disk IO BOF: summary Werner Almesberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).