public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
       [not found] <20010128024635.C31648@suse.de>
@ 2001-01-28  2:03 ` Linus Torvalds
  2001-01-28  2:17   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2001-01-28  2:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Lorenzo Allegrucci, Kernel Mailing List



On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > So what happens is that somebody takes a page fault (and gets the mm
> > lock), tries to read something in, and never gets anything back, thus
> > leaving the MM locked.
> 
> What was the trace of this? Just curious, the below case outlined by
> Linus should be pretty generic, but I'd still like to know what
> can lead to this condition.

It was posted on linux-kernel - I don't save the dang things because I
have too much in my "archives" as is ;)

> > Lorenzo, does the problem go away for you if you remove the
> > 
> > 	if (!list_empty(&q->request_freelist[rw])) {
> > 		...
> > 	}
> > 
> > code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Good spotting. Actually I see one more problem with it too. If
> we've started batching (under heavy I/O of course), we could
> splice the pending list and wake up X number of sleepers, but
> there's a) no guarentee that these sleepers will actually get
> the requests if new ones keep flooding in

(a) is ok. They'll go back to sleep - it's a loop waiting for requests..

>				 and b) no guarentee
> that X sleepers require X request slots.

Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
them), they _will_ use a request. I don't think we have to worry about
that. At most we will wake up "too many" - we'll wake up processes even
though they end up not being able to get a request anyway because somebody
else got to it first. And that's ok. It's the "wake up too few" that
causes trouble, and I think that will be fixed by my suggestion.

Now, I'd worred if somebody wants several requests at the same time, and
doesn't feed them to the IO layer until it has gotten all of them. In that
case, you can get starvation with many people having "reserved" their
requests, and there not be enough free requests around to actually ever
wake anybody up again. But the regular IO paths do not do this: they will
all allocate a request and just submit it immediately, no "reservation".

(Obviously, _submitting_ the request doesn't mean that we'd actually start
processing it, but if somebody ends up waiting for requests they'll do the
unplug that does start it all going, so effectively we can think of it as
a logical "start this request now" thing even if it gets delayed in order
to coalesce IO).

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
  2001-01-28  2:17   ` Jens Axboe
@ 2001-01-28  2:04     ` Marcelo Tosatti
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2001-01-28  2:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Lorenzo Allegrucci, Kernel Mailing List


On Sun, 28 Jan 2001, Jens Axboe wrote:

> On Sat, Jan 27 2001, Linus Torvalds wrote:
> > > What was the trace of this? Just curious, the below case outlined by
> > > Linus should be pretty generic, but I'd still like to know what
> > > can lead to this condition.
> > 
> > It was posted on linux-kernel - I don't save the dang things because I
> > have too much in my "archives" as is ;)
> 
> Ok I see it now, confused wrt the different threads...
> 
> > > Good spotting. Actually I see one more problem with it too. If
> > > we've started batching (under heavy I/O of course), we could
> > > splice the pending list and wake up X number of sleepers, but
> > > there's a) no guarentee that these sleepers will actually get
> > > the requests if new ones keep flooding in
> > 
> > (a) is ok. They'll go back to sleep - it's a loop waiting for requests..
> 
> My point is not that it's broken, but it will favor new comers
> instead of tasks having blocked on a free slot already. So it
> would still be nice to get right.
> 
> > >				 and b) no guarentee
> > > that X sleepers require X request slots.
> > 
> > Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> > them), they _will_ use a request. I don't think we have to worry about
> > that. At most we will wake up "too many" - we'll wake up processes even
> > though they end up not being able to get a request anyway because somebody
> > else got to it first. And that's ok. It's the "wake up too few" that
> > causes trouble, and I think that will be fixed by my suggestion.
> 
> Yes they may end up sleeing right away again as per the above a) case
> for instance. The logic now is 'we have X free slots now, wake up
> x sleepers' where it instead should be 'we have X free slots now,
> wake up people until the free list is exhausted'.
> 
> > Now, I'd worred if somebody wants several requests at the same time, and
> > doesn't feed them to the IO layer until it has gotten all of them. In that
> > case, you can get starvation with many people having "reserved" their
> > requests, and there not be enough free requests around to actually ever
> > wake anybody up again. But the regular IO paths do not do this: they will
> > all allocate a request and just submit it immediately, no "reservation".
> 
> Right, the I/O path doesn't do this and it would seem more appropriate
> to have such users use their own requests instead of eating from
> the internal pool.
> 
> -- 
> * Jens Axboe <axboe@suse.de>
> * SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
  2001-01-28  2:03 ` Linus Torvalds
@ 2001-01-28  2:17   ` Jens Axboe
  2001-01-28  2:04     ` Marcelo Tosatti
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2001-01-28  2:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Lorenzo Allegrucci, Kernel Mailing List

On Sat, Jan 27 2001, Linus Torvalds wrote:
> > What was the trace of this? Just curious, the below case outlined by
> > Linus should be pretty generic, but I'd still like to know what
> > can lead to this condition.
> 
> It was posted on linux-kernel - I don't save the dang things because I
> have too much in my "archives" as is ;)

Ok I see it now, confused wrt the different threads...

> > Good spotting. Actually I see one more problem with it too. If
> > we've started batching (under heavy I/O of course), we could
> > splice the pending list and wake up X number of sleepers, but
> > there's a) no guarentee that these sleepers will actually get
> > the requests if new ones keep flooding in
> 
> (a) is ok. They'll go back to sleep - it's a loop waiting for requests..

My point is not that it's broken, but it will favor new comers
instead of tasks having blocked on a free slot already. So it
would still be nice to get right.

> >				 and b) no guarentee
> > that X sleepers require X request slots.
> 
> Well, IF they are sleeping (and thus, if the wake_up_nr() will trigger on
> them), they _will_ use a request. I don't think we have to worry about
> that. At most we will wake up "too many" - we'll wake up processes even
> though they end up not being able to get a request anyway because somebody
> else got to it first. And that's ok. It's the "wake up too few" that
> causes trouble, and I think that will be fixed by my suggestion.

Yes they may end up sleeing right away again as per the above a) case
for instance. The logic now is 'we have X free slots now, wake up
x sleepers' where it instead should be 'we have X free slots now,
wake up people until the free list is exhausted'.

> Now, I'd worred if somebody wants several requests at the same time, and
> doesn't feed them to the IO layer until it has gotten all of them. In that
> case, you can get starvation with many people having "reserved" their
> requests, and there not be enough free requests around to actually ever
> wake anybody up again. But the regular IO paths do not do this: they will
> all allocate a request and just submit it immediately, no "reservation".

Right, the I/O path doesn't do this and it would seem more appropriate
to have such users use their own requests instead of eating from
the internal pool.

-- 
* Jens Axboe <axboe@suse.de>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
       [not found] ` <3.0.6.32.20010127220102.01367ce0@pop.tiscalinet.it>
@ 2001-01-28  9:48   ` Lorenzo Allegrucci
  2001-01-28 18:23     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Lorenzo Allegrucci @ 2001-01-28  9:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Jens Axboe

At 15.40 27/01/01 -0800, you wrote:
>
>
>On Sat, 27 Jan 2001, Lorenzo Allegrucci wrote:
>> 
>> A trivial "while(1) fork()" is enough to trigger it.
>> "mem=32M" by lilo, ulimit -u is 1024.
>
>Hmm.. This does not look like a VM deadlock - it looks like some IO
>request is waiting forever on "__get_request_wait()". In fact, it looks
>like a _lot_ of people are waiting for requests.
>
>So what happens is that somebody takes a page fault (and gets the mm
>lock), tries to read something in, and never gets anything back, thus
>leaving the MM locked.
>
>Jens: this looks suspiciously like somebody isn't waking things up when
>they add requests back to the request lists. Alternatively, maybe the
>unplugging isn't properly done, so that we have a lot of pending IO that
>doesn't get started..
>
>Ho humm. Jens: imagine that you have more people waiting for requests than
>"batchcount". Further, imagine that you have multiple requests finishing
>at the same time. Not unlikely. Now, imagine that one request finishes,
>and causes "batchcount" users to wake up, and immediately another request
>finishes but THAT one doesn't wake anybody up because it notices that the
>freelist isn't empty - so it thinks that it doesn't need to wake anybody.
>
>Lorenzo, does the problem go away for you if you remove the
>
>	if (!list_empty(&q->request_freelist[rw])) {
>		...
>	}
>
>code from blkdev_release_request() in drivers/block/ll_rw_block.c?

Yes, it does.

--
Lorenzo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
  2001-01-28  9:48   ` 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10) Lorenzo Allegrucci
@ 2001-01-28 18:23     ` Jens Axboe
  2001-01-28 18:29       ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2001-01-28 18:23 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Linus Torvalds, linux-kernel

On Sun, Jan 28 2001, Lorenzo Allegrucci wrote:
> >Ho humm. Jens: imagine that you have more people waiting for requests than
> >"batchcount". Further, imagine that you have multiple requests finishing
> >at the same time. Not unlikely. Now, imagine that one request finishes,
> >and causes "batchcount" users to wake up, and immediately another request
> >finishes but THAT one doesn't wake anybody up because it notices that the
> >freelist isn't empty - so it thinks that it doesn't need to wake anybody.
> >
> >Lorenzo, does the problem go away for you if you remove the
> >
> >	if (!list_empty(&q->request_freelist[rw])) {
> >		...
> >	}
> >
> >code from blkdev_release_request() in drivers/block/ll_rw_block.c?
> 
> Yes, it does.

How about this instead?

--- /opt/kernel/linux-2.4.1-pre10/drivers/block/ll_rw_blk.c	Thu Jan 25 19:15:12 2001
+++ drivers/block/ll_rw_blk.c	Sun Jan 28 19:22:20 2001
@@ -633,6 +634,8 @@
 		if (!list_empty(&q->request_freelist[rw])) {
 			blk_refill_freelist(q, rw);
 			list_add(&req->table, &q->request_freelist[rw]);
+			if (waitqueue_active(&q->wait_for_request))
+				wake_up_nr(&q->wait_for_request, 2);
 			return;
 		}
 

-- 
* Jens Axboe <axboe@suse.de>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
  2001-01-28 18:23     ` Jens Axboe
@ 2001-01-28 18:29       ` Linus Torvalds
  2001-01-28 18:39         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2001-01-28 18:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Lorenzo Allegrucci, linux-kernel



On Sun, 28 Jan 2001, Jens Axboe wrote:
> 
> How about this instead?

I really don't like this one. It will basically re-introduce the old
behaviour of waking people up in a trickle, as far as I can tell. The
reason we want the batching is to make people have more requests to sort
in the elevator, and as far as I can tell this will just hurt that.

Are there any downsides to just _always_ batching, regardless of whether
the request freelist is empty or not? Sure, it will make the "effective"
size of the freelist a bit smaller, but that's probably not actually
noticeable under any load except for the one that empties the freelist (in
which case the old code would have triggered the batching anyway).

Performance numbers?

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10)
  2001-01-28 18:29       ` Linus Torvalds
@ 2001-01-28 18:39         ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2001-01-28 18:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Lorenzo Allegrucci, linux-kernel

On Sun, Jan 28 2001, Linus Torvalds wrote:
> On Sun, 28 Jan 2001, Jens Axboe wrote:
> > 
> > How about this instead?
> 
> I really don't like this one. It will basically re-introduce the old
> behaviour of waking people up in a trickle, as far as I can tell. The
> reason we want the batching is to make people have more requests to sort
> in the elevator, and as far as I can tell this will just hurt that.
> 
> Are there any downsides to just _always_ batching, regardless of whether
> the request freelist is empty or not? Sure, it will make the "effective"
> size of the freelist a bit smaller, but that's probably not actually
> noticeable under any load except for the one that empties the freelist (in
> which case the old code would have triggered the batching anyway).

The problem with removing the !list_empty test like you suggested
is that batching is no longer controlled anymore. If we start
batching once the lists are empty and start wakeups once batch_requests
has been reached, we know we'll give the elevator enough to work
with to be effective. With !list_empty removed, batch_requests is no
longer a measure of how many requests we want to batch. Always
batching is not a in problem in itself, the effective smaller freelist
effect should be neglible.

The sent patch will only trickle wakeups in case of batching already
in effect, but batch_request wakeups were not enough to deplete
the freelist again. At least that was the intended effect :-)

> Performance numbers?

Don't have any right now, will test a bit later.

-- 
* Jens Axboe <axboe@suse.de>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-01-28 18:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.10.10101271524010.1076-100000@penguin.transmeta .com>
     [not found] ` <3.0.6.32.20010127220102.01367ce0@pop.tiscalinet.it>
2001-01-28  9:48   ` 2.4.1-pre10 deadlock (Re: ps hang in 241-pre10) Lorenzo Allegrucci
2001-01-28 18:23     ` Jens Axboe
2001-01-28 18:29       ` Linus Torvalds
2001-01-28 18:39         ` Jens Axboe
     [not found] <20010128024635.C31648@suse.de>
2001-01-28  2:03 ` Linus Torvalds
2001-01-28  2:17   ` Jens Axboe
2001-01-28  2:04     ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox