From: Neil Conway <nconway.list@ukaea.org.uk>
To: Jens Axboe <axboe@suse.de>
Cc: Martin Dalecki <dalecki@evision-ventures.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] 2.5.15 IDE 61
Date: Wed, 15 May 2002 12:37:48 +0100 [thread overview]
Message-ID: <3CE2488C.AF2AFB9A@ukaea.org.uk> (raw)
In-Reply-To: <E177dYp-00083c-00@the-village.bc.nu> <3CE11F90.5070701@evision-ventures.com> <3CE13943.FBD5B1D6@ukaea.org.uk> <20020514163241.GR17509@suse.de> <3CE13F99.5BDED3DF@ukaea.org.uk> <20020514165113.GT17509@suse.de>
Jens Axboe wrote:
> On Tue, May 14 2002, Neil Conway wrote:
> > > To really serialize operations the queue _must_ be shared with whoever
> > > requires serialiation.
> > Why will this help? The hardware can still be doing DMA on hda while
> > the queue's request_fn is called quite legitimately for a hdb request -
> > and the IDE code MUST impose the serialization here to avoid hitting the
> > cable with commands destined for hdb. (For example, by waiting for
> > !channel->busy.)
>
> Current IDE code leaves a request on the list until it has completed
> (this is ignoring TCQ of course), so there's no way that you could start
> serving a second request before the first one completes.
I didn't understand why this was the case, so I've been reading source
code (and sleeping for a few minutes too ;-)). Wow, is it hard for a
newbie to follow - if I didn't know better I'd swear the kernel was
obfuscated.
I _think_ I now understand what's going on here: you guys already know
this stuff but perhaps others can learn from my slog through the code
(?) (and/or spot the dumb mistakes). (I did notice what could be a
potential problem too, skip to the end if you want to find it.)
On a "totally" idle system, if a process decides to read from a file,
the sequence appears to be (with minor simplifications for clarity):
sys_read(), file->fops->read(), usually then into generic_file_read(),
page_cache_readahead(), and then the important one:
do_page_cache_readahead(). This does a couple of seriously important
things. (The name implies it's only used for readahead but the comments
show this isn't the case (if one reads them!)).
Firstly, do_page_cache_readahead() invokes the call chain:
a_ops->readpage(), block_read_full_page(), submit_bh(), submit_bio(),
generic_make_request() (this one finds the right queue for the I/O) and
down into __make_request_fcn(q,bio). This routine is also key: it adds
requests to the device queue, and a little more: if the queue is empty
it "plugs" the device (before adding the request), ("plugging" the
device refers to preventing I/O while a request queue is populated), and
also queues a task to the tq_disk task-queue which will unplug the
device at some later time. The "unplug" routine is central: see below.
Control returns to do_page_cache_readahead() after the call to
readpage(). It then calls run_task_queue(&tq_disk), thus starting the
call chain: generic_unplug_device(),..., q->request_fn(). This is the
end of the line for the block-layer: request_fn() is part of the device
code - for IDE it's do_ide_request().
So this call to request_fn()/do_ide_request() is the first time the IDE
code is really involved in the loop.
The key bit to notice here is that request_fn() is only called if the
device was plugged. This in turn only happens (well almost, see below)
if the queue was empty. Thus one concludes that if do_ide_request() is
busy servicing requests and thus the queue is non-empty, the block layer
will never again call do_ide_request(); it will be allowed to get on
with things in its own time.
This I believe, is what you meant Jens when you said "so there's no way
that you could start serving a second request before the first one
completes". Am I right so far?
Now a possible fly in the ointment: __make_request_fcn() will actually
plug a non-empty queue if BIO_RW_BARRIER is set. This appears right now
to be impossible because I can't find any code in the entire kernel that
uses bio_barrier() or any similar construct. But if it's a work in
progress and this bit is set one day, then it seems to me that the block
layer could plug a non-empty queue, and then subsequently any call to
run_task_queue(&tq_disk) would call the block-device's request_fn().
This would violate the assumption that request_fn() is never called
twice without the queue emptying in between. The current IDE code would
survive this because it checks for hwgroup->busy.
Anyway, IFF the BARRIER stuff is not an issue, then I guess the
block-layer really can do all the serialization we need if we set things
up right. I now probably have to retract my assertions that we vitally
need hwgroup->busy (or equiv) because there really does appear to be no
route into request_fn() from the block layer other than if the queue was
empty... In the real world though, as you suggested, it's probably
worth having.
all the best,
Neil
PS: Errors and Omissions Expected.
next prev parent reply other threads:[~2002-05-15 11:38 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-14 9:49 [PATCH] 2.5.15 IDE 61 Neil Conway
2002-05-14 8:52 ` Martin Dalecki
2002-05-14 10:12 ` Neil Conway
2002-05-14 9:30 ` Martin Dalecki
2002-05-14 11:10 ` Neil Conway
2002-05-14 10:21 ` Martin Dalecki
2002-05-14 11:38 ` Russell King
2002-05-14 10:49 ` Martin Dalecki
2002-05-14 12:10 ` Alan Cox
2002-05-14 11:11 ` Martin Dalecki
2002-05-14 12:47 ` Alan Cox
2002-05-14 12:30 ` Martin Dalecki
2002-05-15 14:43 ` Pavel Machek
2002-05-14 12:00 ` Russell King
2002-05-14 11:03 ` Martin Dalecki
2002-05-14 13:03 ` Neil Conway
2002-05-14 13:27 ` Andre Hedrick
2002-05-14 14:45 ` Alan Cox
2002-05-14 14:30 ` Martin Dalecki
2002-05-14 16:20 ` Neil Conway
2002-05-14 16:32 ` Jens Axboe
2002-05-14 16:47 ` Neil Conway
2002-05-14 16:51 ` Jens Axboe
2002-05-15 11:37 ` Neil Conway [this message]
2002-05-14 22:51 ` Mike Fedyk
2002-05-14 16:26 ` Jens Axboe
2002-05-14 19:34 ` Anton Altaparmakov
2002-05-15 6:16 ` Jens Axboe
2002-05-15 8:32 ` Anton Altaparmakov
2002-05-15 9:42 ` Martin Dalecki
2002-05-15 9:32 ` Martin Dalecki
2002-05-15 11:44 ` Neil Conway
2002-05-15 11:02 ` Martin Dalecki
2002-05-15 13:10 ` Alan Cox
2002-05-15 13:34 ` Neil Conway
2002-05-15 13:04 ` Martin Dalecki
2002-05-15 14:08 ` benh
2002-05-15 16:40 ` Denis Vlasenko
2002-05-15 11:55 ` Neil Conway
2002-05-17 7:07 ` Mike Fedyk
2002-05-17 11:06 ` Neil Conway
2002-05-17 10:12 ` Martin Dalecki
2002-05-14 16:03 ` Neil Conway
2002-05-14 16:46 ` Alan Cox
2002-05-14 12:52 ` Daniela Engert
-- strict thread matches above, loose matches on Subject: below --
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
2002-05-13 9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3CE2488C.AF2AFB9A@ukaea.org.uk \
--to=nconway.list@ukaea.org.uk \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=axboe@suse.de \
--cc=dalecki@evision-ventures.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.