Re: 2.[45] fixes for design locking bug in wait_on_page/wait_on_buffer/get_request_wait

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@digeo.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Linus Torvalds <torvalds@transmeta.com>,
	Marcelo Tosatti <marcelo@conectiva.com.br>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.[45] fixes for design locking bug in  wait_on_page/wait_on_buffer/get_request_wait
Date: Tue, 12 Nov 2002 00:50:46 -0800	[thread overview]
Message-ID: <3DD0C0E6.4A8035A4@digeo.com> (raw)
In-Reply-To: 20021112035723.GA17642@dualathlon.random

Andrea Arcangeli wrote:
> 
> the race looks like this:
> 
>         CPU0                    CPU1
>         -----------------       ------------------------
>         reiserfs_writepage
>         lock_buffer()
>                                 fsync_buffers_list() under lock_super()
>                                 wait_on_buffer()
>                                 run_task_queue(&tq_disk) -> noop
>                                 schedule() <- hang with lock_super acquired
>         submit_bh()
>         /* don't unplug here */
> 

Or, more simply:

	lock_buffer()
				while (buffer_locked()) {
					blk_run_queues();	/* Nothing happens */
					if (buffer_locked(bh))
						schedule();
	submit_bh();	/* No unplug */

The fix seems reasonable to me.  It would perhaps be better to just do:

+       if (waitqueue_active(wqh))
+               blk_run_queues();

in submit_bh().  To save the context switch.

Moving the blk_run_queues() inside the TASK_UNINTERRUPTIBLE region
is something which always worried me, because if something down
there sets TASK_RUNNING, we end up in a busy wait.  But that's OK
for 2.5 and may be OK for 2.4's run_task_queue() - I haven't checked...

The multipage stuff in 2.5 does its own blk_run_queues() and looks to be
OK, which I assume is why you didn't touch that.

The little single-page reads like do_generic_mapping_read() look to be
OK because the process whcih waits is the one which submitted the IO.

wrt the get_request_wait changes: I never bothered about the barrier
because we know that there are tons of requests outstanding, and if
we don't do a wakeup the next guy will.  Plus *this* request has to
be put back sometime too, which will deliver a wakeup.  But whatever;
it's not exactly a fastpath.

However the function is still not watertight:

static struct request *get_request_wait(request_queue_t *q, int rw)
{
	DEFINE_WAIT(wait);
	struct request_list *rl = &q->rq[rw];
	struct request *rq;

	spin_lock_prefetch(q->queue_lock);

	generic_unplug_device(q);
	do {
		prepare_to_wait_exclusive(&rl->wait, &wait,
					TASK_UNINTERRUPTIBLE);
		if (!rl->count)
			io_schedule();
		finish_wait(&rl->wait, &wait);
		spin_lock_irq(q->queue_lock);
		rq = get_request(q, rw);
		spin_unlock_irq(q->queue_lock);
	} while (rq == NULL);
	return rq;
}

If someone has taken *all* the requests and hasn't submitted any of them
yet, there is nothing to unplug.   We go to sleep, all the requests are
submitted (behind a plug) and it's game over.  Could happen if the device
has a teeny queue...

I dunno.  I bet there are still more holes, and I for one am heartily sick
of unplug bugs.  Why not make the damn queue unplug itself after ten
milliseconds or 16 requests?  I bet that would actually increase throughput,
especially in the presence of kernel preemption and scheduling points.

next prev parent reply	other threads:[~2002-11-12  8:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-12  3:57 2.[45] fixes for design locking bug in wait_on_page/wait_on_buffer/get_request_wait Andrea Arcangeli
2002-11-12  8:50 ` Andrew Morton [this message]
2002-11-12 18:15   ` Andrea Arcangeli
2002-11-12 19:46     ` Andrew Morton
2002-11-13  0:33       ` Andrea Arcangeli
2002-11-13  1:01         ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2002-11-16 16:24 Marc-Christian Petersen
2002-11-16 16:59 ` Andrea Arcangeli
2002-11-16 17:23   ` Marc-Christian Petersen
2002-11-16 17:32     ` Andrea Arcangeli
2002-11-16 17:43       ` Marc-Christian Petersen
2002-11-16 18:46         ` Andrea Arcangeli
2002-11-16 18:58 Marc-Christian Petersen
2002-11-16 21:55 ` Marc-Christian Petersen
2002-11-17 17:37   ` Marc-Christian Petersen
2002-11-17 19:19     ` Folkert van Heusden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DD0C0E6.4A8035A4@digeo.com \
    --to=akpm@digeo.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@conectiva.com.br \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.