PF_MEMALLOC in 2.6

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PF_MEMALLOC in 2.6
@ 2004-08-19  6:55 Pete Zaitcev
  2004-08-19  6:59 ` William Lee Irwin III
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Pete Zaitcev @ 2004-08-19  6:55 UTC (permalink / raw)
  To: arjanv; +Cc: alan, greg, linux-kernel, zaitcev, riel, sct

The PF_MEMALLOC is required on usb-storage threads in 2.4, because ext3
will deadlock and otherwise misbehave when it's trying to write out
dirty pages under memory pressure.

I received a bug report today from an FC3T1 user with same symptoms
as 2.4. But I'm entirely clueless in the way VM operates. Comments?

-- Pete

--- linux-2.6.8-rc4-mm1/drivers/usb/storage/usb.c	2004-08-16 12:13:06.000000000 -0700
+++ linux-2.6.8-rc4-mm1-ub/drivers/usb/storage/usb.c	2004-08-18 23:48:09.335107648 -0700
@@ -285,7 +285,7 @@ static int usb_stor_control_thread(void 
 	 */
 	daemonize("usb-storage");
 
-	current->flags |= PF_NOFREEZE;
+	current->flags |= PF_NOFREEZE|PF_MEMALLOC;
 
 	unlock_kernel();
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19  6:55 PF_MEMALLOC in 2.6 Pete Zaitcev
@ 2004-08-19  6:59 ` William Lee Irwin III
  2004-08-19  8:46 ` Stephen C. Tweedie
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: William Lee Irwin III @ 2004-08-19  6:59 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: arjanv, alan, greg, linux-kernel, riel, sct

On Wed, Aug 18, 2004 at 11:55:23PM -0700, Pete Zaitcev wrote:
> The PF_MEMALLOC is required on usb-storage threads in 2.4, because ext3
> will deadlock and otherwise misbehave when it's trying to write out
> dirty pages under memory pressure.
> I received a bug report today from an FC3T1 user with same symptoms
> as 2.4. But I'm entirely clueless in the way VM operates. Comments?

I suspect this describes it adequately. If the shoe fits...


-- wli

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19  6:55 PF_MEMALLOC in 2.6 Pete Zaitcev
  2004-08-19  6:59 ` William Lee Irwin III
@ 2004-08-19  8:46 ` Stephen C. Tweedie
  2004-08-19  8:59 ` Oliver Neukum
  2004-08-19 12:41 ` Hugh Dickins
  3 siblings, 0 replies; 25+ messages in thread
From: Stephen C. Tweedie @ 2004-08-19  8:46 UTC (permalink / raw)
  To: Pete Zaitcev
  Cc: Arjan van de Ven, Alan Cox, Greg KH, linux-kernel, Rik van Riel,
	Stephen Tweedie

Hi,

On Thu, 2004-08-19 at 07:55, Pete Zaitcev wrote:
> The PF_MEMALLOC is required on usb-storage threads in 2.4, because ext3
> will deadlock and otherwise misbehave when it's trying to write out
> dirty pages under memory pressure.

> I received a bug report today from an FC3T1 user with same symptoms
> as 2.4. But I'm entirely clueless in the way VM operates. Comments?


> @@ -285,7 +285,7 @@ static int usb_stor_control_thread(void 
> -	current->flags |= PF_NOFREEZE;
> +	current->flags |= PF_NOFREEZE|PF_MEMALLOC;

Looks entirely reasonable to me.

--Stephen


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19  6:55 PF_MEMALLOC in 2.6 Pete Zaitcev
  2004-08-19  6:59 ` William Lee Irwin III
  2004-08-19  8:46 ` Stephen C. Tweedie
@ 2004-08-19  8:59 ` Oliver Neukum
  2004-08-19 12:41 ` Hugh Dickins
  3 siblings, 0 replies; 25+ messages in thread
From: Oliver Neukum @ 2004-08-19  8:59 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: arjanv, alan, greg, linux-kernel, riel, sct

Am Donnerstag, 19. August 2004 08:55 schrieb Pete Zaitcev:
> The PF_MEMALLOC is required on usb-storage threads in 2.4, because ext3
> will deadlock and otherwise misbehave when it's trying to write out
> dirty pages under memory pressure.

Can you tell where it hangs? 2.6 passes GFP_NOIO around. If we
have an error about that somewhere we need to find it because
it may also affect the error handlers which do not operate in that
context.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19  6:55 PF_MEMALLOC in 2.6 Pete Zaitcev
                   ` (2 preceding siblings ...)
  2004-08-19  8:59 ` Oliver Neukum
@ 2004-08-19 12:41 ` Hugh Dickins
  2004-08-19 18:25   ` Oliver Neukum
  2004-08-20 10:31   ` Stephen C. Tweedie
  3 siblings, 2 replies; 25+ messages in thread
From: Hugh Dickins @ 2004-08-19 12:41 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: arjanv, alan, greg, linux-kernel, riel, sct

On Wed, 18 Aug 2004, Pete Zaitcev wrote:

> The PF_MEMALLOC is required on usb-storage threads in 2.4, because ext3
> will deadlock and otherwise misbehave when it's trying to write out
> dirty pages under memory pressure.
> 
> I received a bug report today from an FC3T1 user with same symptoms
> as 2.4. But I'm entirely clueless in the way VM operates. Comments?
> 
> --- linux-2.6.8-rc4-mm1/drivers/usb/storage/usb.c	2004-08-16 12:13:06.000000000 -0700
> +++ linux-2.6.8-rc4-mm1-ub/drivers/usb/storage/usb.c	2004-08-18 23:48:09.335107648 -0700
> @@ -285,7 +285,7 @@ static int usb_stor_control_thread(void 
>  	 */
>  	daemonize("usb-storage");
>  
> -	current->flags |= PF_NOFREEZE;
> +	current->flags |= PF_NOFREEZE|PF_MEMALLOC;
>  
>  	unlock_kernel();

Seems I'm in a minority, and certainly beyond my expertise,
but I'm very suspicious of this.  Though I see mtd_blktrans_thread
already does the same.  I bet it'll fix problems in some workloads,
and it's a very easy change to make, but I doubt it's right.

PF_MEMALLOC entitles the thread to dip into emergency memory reserves
(and stops it from descending into try_to_free_pages: that may be okay,
though Oliver notes that allocations from here ought to be GFP_NOIO,
that's more appropriate).

Fine for it to dip into those reserves when acting on behalf of something
already PF_MEMALLOC (i.e. try_to_free_pages itself), but not fine for it
to do so as a matter of course e.g. worst case, doing readahead could
easily exhaust reserves.  Or, is this thread only used for writing?
that wouldn't be so bad if so.

I'd have thought the right 2.6 answer would be for it to have mempools
for whatever it might need to get the I/O done; but I haven't a clue
in this area, can easily believe that would difficult to implement.

Or would it solve the problem at hand, if it made itself PF_MEMALLOC
just while servicing a request from a PF_MEMALLOC?

Hugh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19 12:41 ` Hugh Dickins
@ 2004-08-19 18:25   ` Oliver Neukum
  2004-08-20  2:37     ` Nick Piggin
  2004-08-20 10:31   ` Stephen C. Tweedie
  1 sibling, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2004-08-19 18:25 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Pete Zaitcev, arjanv, alan, greg, linux-kernel, riel, sct

Am Donnerstag, 19. August 2004 14:41 schrieb Hugh Dickins:
> Fine for it to dip into those reserves when acting on behalf of something
> already PF_MEMALLOC (i.e. try_to_free_pages itself), but not fine for it
> to do so as a matter of course e.g. worst case, doing readahead could
> easily exhaust reserves.  Or, is this thread only used for writing?
> that wouldn't be so bad if so.

All IO going to the actual disk uses the thread. However we usually
don't want to fail IO request due to low memory.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19 18:25   ` Oliver Neukum
@ 2004-08-20  2:37     ` Nick Piggin
  2004-08-20  7:56       ` Oliver Neukum
  0 siblings, 1 reply; 25+ messages in thread
From: Nick Piggin @ 2004-08-20  2:37 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Pete Zaitcev, arjanv, alan, greg, linux-kernel,
	riel, sct

Oliver Neukum wrote:
> Am Donnerstag, 19. August 2004 14:41 schrieb Hugh Dickins:
> 
>>Fine for it to dip into those reserves when acting on behalf of something
>>already PF_MEMALLOC (i.e. try_to_free_pages itself), but not fine for it
>>to do so as a matter of course e.g. worst case, doing readahead could
>>easily exhaust reserves.  Or, is this thread only used for writing?
>>that wouldn't be so bad if so.
> 
> 
> All IO going to the actual disk uses the thread. However we usually
> don't want to fail IO request due to low memory.
> 

I'm with Hugh on this one. You only want to be PF_MEMALLOC when
you are in the process of cleaning some memory so it can be freed.
(Perhaps it would be more logical if it were called PF_MEMFREE, and
set in mm/vmscan.c, however the end result is the same)

So if this thing allocates memory on behalf of a read request, then
it is basically a bug. In practice you could probably get away with
servicing all writes with PF_MEMALLOC, however that could still lead
to situations where it consumes all your low memory on behalf of
highmem IO (though perhaps this won't deadlock if that memory is
going to be released as a matter of course?)

Another thing, having it always use PF_MEMALLOC means it can easily
wipe out the GFP_ATOMIC reserve.

So I'd say try to find a way to only use PF_MEMALLOC on behalf of
a PF_MEMALLOC thread or use a mempool or something.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  2:37     ` Nick Piggin
@ 2004-08-20  7:56       ` Oliver Neukum
  2004-08-20  8:06         ` Nick Piggin
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2004-08-20  7:56 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Pete Zaitcev, arjanv, alan, greg, linux-kernel,
	riel, sct

Am Freitag, 20. August 2004 04:37 schrieb Nick Piggin:
> So if this thing allocates memory on behalf of a read request, then
> it is basically a bug. In practice you could probably get away with
> servicing all writes with PF_MEMALLOC, however that could still lead
> to situations where it consumes all your low memory on behalf of
> highmem IO (though perhaps this won't deadlock if that memory is
> going to be released as a matter of course?)
> 
> Another thing, having it always use PF_MEMALLOC means it can easily
> wipe out the GFP_ATOMIC reserve.
> 
> So I'd say try to find a way to only use PF_MEMALLOC on behalf of
> a PF_MEMALLOC thread or use a mempool or something.

Then the SCSI layer should pass down the flag.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  7:56       ` Oliver Neukum
@ 2004-08-20  8:06         ` Nick Piggin
  2004-08-20  8:40           ` Pete Zaitcev
  2004-08-20  8:52           ` Oliver Neukum
  0 siblings, 2 replies; 25+ messages in thread
From: Nick Piggin @ 2004-08-20  8:06 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Pete Zaitcev, arjanv, alan, greg, linux-kernel,
	riel, sct

Oliver Neukum wrote:
> Am Freitag, 20. August 2004 04:37 schrieb Nick Piggin:
> 
>>So if this thing allocates memory on behalf of a read request, then
>>it is basically a bug. In practice you could probably get away with
>>servicing all writes with PF_MEMALLOC, however that could still lead
>>to situations where it consumes all your low memory on behalf of
>>highmem IO (though perhaps this won't deadlock if that memory is
>>going to be released as a matter of course?)
>>
>>Another thing, having it always use PF_MEMALLOC means it can easily
>>wipe out the GFP_ATOMIC reserve.
>>
>>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
>>a PF_MEMALLOC thread or use a mempool or something.
> 
> 
> Then the SCSI layer should pass down the flag.
> 

It would be ideal from the memory allocator's point of view to do it
on a per-request basis like that.

When the rubber hits the road, I think it is probably going to be very
troublesome to do it right that way. For example, what happens when
your usb-thingy-thread blocks on a memory allocation while handling a
read request, then the system gets low on memory and someone tries to
free some by submitting a write request to the USB device?

I don't know anything about how the usb thread works so I'm not sure.

The mempool model seems to work well for requests in the block layer -
making a completely uneducated guess I'd say that could be a good
option to investigate.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  8:06         ` Nick Piggin
@ 2004-08-20  8:40           ` Pete Zaitcev
  2004-08-20 14:50             ` Oliver Neukum
  2004-08-20  8:52           ` Oliver Neukum
  1 sibling, 1 reply; 25+ messages in thread
From: Pete Zaitcev @ 2004-08-20  8:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Oliver Neukum, Hugh Dickins, arjanv, alan, greg, linux-kernel,
	riel, sct, zaitcev

On Fri, 20 Aug 2004 18:06:41 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> >>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
> >>a PF_MEMALLOC thread or use a mempool or something.
> > 
> > Then the SCSI layer should pass down the flag.
> 
> It would be ideal from the memory allocator's point of view to do it
> on a per-request basis like that.
> 
> When the rubber hits the road, I think it is probably going to be very
> troublesome to do it right that way. For example, what happens when
> your usb-thingy-thread blocks on a memory allocation while handling a
> read request, then the system gets low on memory and someone tries to
> free some by submitting a write request to the USB device?

If you let me gloat for a little bit, ub makes this discussion moot
because it has no helper thread. But getting back to usb-storage,
here's the actual bug:
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=130326

As you can see, it's muddled considerably. The original report
has no useful information, but if you look into an obscure but hefty
attachement, you see this:

Aug 18 20:23:47 trilobite kernel: kjournald starting.  Commit interval 5 seconds
Aug 18 20:23:47 trilobite kernel: EXT3 FS on sdc5, internal journal
Aug 18 20:23:47 trilobite kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 18 20:30:54 trilobite kernel: SCSI error : <2 0 0 0> return code = 0x70000
Aug 18 20:30:54 trilobite kernel: end_request: I/O error, dev sdc, sector 37625666
Aug 18 20:30:54 trilobite kernel: Buffer I/O error on device sdc5, logical block 1532371
Aug 18 20:30:54 trilobite kernel: lost page write due to I/O error on sdc5
Aug 18 20:31:04 trilobite kernel: SCSI error : <2 0 0 0> return code = 0x70000
Aug 18 20:31:04 trilobite kernel: end_request: I/O error, dev sdc, sector 37625674
Aug 18 20:31:04 trilobite kernel: Buffer I/O error on device sdc5, logical block 1532372
Aug 18 20:31:04 trilobite kernel: lost page write due to I/O error on sdc5
....
Aug 18 20:32:23 trilobite kernel: scsi2 (0:0): rejecting I/O to dead device
Aug 18 20:32:23 trilobite last message repeated 1723 times
 <----- which is basically what happens when the EH thread loses patience

This is what made me suspect that it's the diry memory writeout problem.
It's just like how it was on 2.4 before Alan added PF_MEMALLOC.

-- Pete

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  8:06         ` Nick Piggin
  2004-08-20  8:40           ` Pete Zaitcev
@ 2004-08-20  8:52           ` Oliver Neukum
  2004-08-20  9:06             ` Nick Piggin
  2004-08-26 21:16             ` Zephaniah E. Hull
  1 sibling, 2 replies; 25+ messages in thread
From: Oliver Neukum @ 2004-08-20  8:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Pete Zaitcev, arjanv, alan, greg, linux-kernel,
	riel, sct

Am Freitag, 20. August 2004 10:06 schrieb Nick Piggin:
> >>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
> >>a PF_MEMALLOC thread or use a mempool or something.
> > 
> > 
> > Then the SCSI layer should pass down the flag.
> > 
> 
> It would be ideal from the memory allocator's point of view to do it
> on a per-request basis like that.
> 
> When the rubber hits the road, I think it is probably going to be very
> troublesome to do it right that way. For example, what happens when
> your usb-thingy-thread blocks on a memory allocation while handling a
> read request, then the system gets low on memory and someone tries to
> free some by submitting a write request to the USB device?

The write request will have to wait. Storage cannot do concurrent IO.
But all memory allocated in the read request will be GFP_NOIO or
GFP_ATOMIC so the conclusion of the memory allocation should not
wait for IO. Either it fails and we report that to the SCSI layer or it
is completed and the write serviced in turn.
At least that's the intent.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  8:52           ` Oliver Neukum
@ 2004-08-20  9:06             ` Nick Piggin
  2004-08-26 21:16             ` Zephaniah E. Hull
  1 sibling, 0 replies; 25+ messages in thread
From: Nick Piggin @ 2004-08-20  9:06 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Pete Zaitcev, arjanv, alan, greg, linux-kernel,
	riel, sct

Oliver Neukum wrote:
> Am Freitag, 20. August 2004 10:06 schrieb Nick Piggin:
> 
>>>>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
>>>>a PF_MEMALLOC thread or use a mempool or something.
>>>
>>>
>>>Then the SCSI layer should pass down the flag.
>>>
>>
>>It would be ideal from the memory allocator's point of view to do it
>>on a per-request basis like that.
>>
>>When the rubber hits the road, I think it is probably going to be very
>>troublesome to do it right that way. For example, what happens when
>>your usb-thingy-thread blocks on a memory allocation while handling a
>>read request, then the system gets low on memory and someone tries to
>>free some by submitting a write request to the USB device?
> 
> 
> The write request will have to wait. Storage cannot do concurrent IO.
> But all memory allocated in the read request will be GFP_NOIO or
> GFP_ATOMIC so the conclusion of the memory allocation should not
> wait for IO. Either it fails and we report that to the SCSI layer or it
> is completed and the write serviced in turn.
> At least that's the intent.
> 

In that case, having the SCSI layer pass down the flag may be a viable
option.

Just FYI, non atomic allocations need to be __GFP_NORETRY otherwise they
won't fail (unless order >= 3). I suspect this detail is fairly important.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-19 12:41 ` Hugh Dickins
  2004-08-19 18:25   ` Oliver Neukum
@ 2004-08-20 10:31   ` Stephen C. Tweedie
  2004-08-20 15:34     ` Oliver Neukum
  1 sibling, 1 reply; 25+ messages in thread
From: Stephen C. Tweedie @ 2004-08-20 10:31 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Pete Zaitcev, Arjan van de Ven, Alan Cox, Greg KH, linux-kernel,
	Rik van Riel, Stephen Tweedie

Hi,

On Thu, 2004-08-19 at 13:41, Hugh Dickins wrote:

> Or would it solve the problem at hand, if it made itself PF_MEMALLOC
> just while servicing a request from a PF_MEMALLOC?

It's not the PF_* state of the caller who submitted the IO that matters,
though --- it's the state of all threads _waiting_ on the IO, which may
be different, and which can change even after the IO has begun.  

Eg. kswapd does a writepage, the writepage needs to allocate disk space,
and in doing so tries to access a metadata block which is already
undergoing IO from a different thread altogether.

--Stephen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  8:40           ` Pete Zaitcev
@ 2004-08-20 14:50             ` Oliver Neukum
  2004-08-20 15:02               ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2004-08-20 14:50 UTC (permalink / raw)
  To: Pete Zaitcev
  Cc: Nick Piggin, Hugh Dickins, arjanv, alan, greg, linux-kernel, riel,
	sct


> If you let me gloat for a little bit, ub makes this discussion moot
> because it has no helper thread. But getting back to usb-storage,

But ub supports only a subset of storage devices, doesn't it?

[..] 
> This is what made me suspect that it's the diry memory writeout problem.
> It's just like how it was on 2.4 before Alan added PF_MEMALLOC.

If we add PF_MEMALLOC, do we solve the issue or make it only less
likely? Isn't there a need to limit users of the reserves in number?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 14:50             ` Oliver Neukum
@ 2004-08-20 15:02               ` Alan Cox
  2004-08-20 16:04                 ` Rik van Riel
  2004-08-21  2:03                 ` Nick Piggin
  0 siblings, 2 replies; 25+ messages in thread
From: Alan Cox @ 2004-08-20 15:02 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Pete Zaitcev, Nick Piggin, Hugh Dickins, arjanv, alan, greg,
	linux-kernel, riel, sct

On Fri, Aug 20, 2004 at 04:50:07PM +0200, Oliver Neukum wrote:
> > This is what made me suspect that it's the diry memory writeout problem.
> > It's just like how it was on 2.4 before Alan added PF_MEMALLOC.
> 
> If we add PF_MEMALLOC, do we solve the issue or make it only less
> likely? Isn't there a need to limit users of the reserves in number?

PF_MEMALLOC won't recurse. You might run out of memory however. The old
world scsi drivers run in the thread of the I/O so are protected already
by PF_MEMALLOC in those cases, its the thread nature of the USB driver which
makes it more fun. Unless 2.6 vm is radically different I think PF_MEMALLOC
is the right thing to set although it would always eventually be better to
find out who is guilty of the blocking allocation that recurses.

Are any of the VM guys considering PF_LOGALLOC so you can trace it down 8)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 10:31   ` Stephen C. Tweedie
@ 2004-08-20 15:34     ` Oliver Neukum
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Neukum @ 2004-08-20 15:34 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hugh Dickins, Pete Zaitcev, Arjan van de Ven, Alan Cox, Greg KH,
	linux-kernel, Rik van Riel

Am Freitag, 20. August 2004 12:31 schrieb Stephen C. Tweedie:
> Hi,
> 
> On Thu, 2004-08-19 at 13:41, Hugh Dickins wrote:
> 
> > Or would it solve the problem at hand, if it made itself PF_MEMALLOC
> > just while servicing a request from a PF_MEMALLOC?
> 
> It's not the PF_* state of the caller who submitted the IO that matters,
> though --- it's the state of all threads _waiting_ on the IO, which may
> be different, and which can change even after the IO has begun.  
> 
> Eg. kswapd does a writepage, the writepage needs to allocate disk space,
> and in doing so tries to access a metadata block which is already
> undergoing IO from a different thread altogether.

Then how do the current SCSI drivers work?

	Regards
			Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 15:02               ` Alan Cox
@ 2004-08-20 16:04                 ` Rik van Riel
  2004-08-20 16:06                   ` Arjan van de Ven
  2004-08-21  2:03                 ` Nick Piggin
  1 sibling, 1 reply; 25+ messages in thread
From: Rik van Riel @ 2004-08-20 16:04 UTC (permalink / raw)
  To: Alan Cox
  Cc: Oliver Neukum, Pete Zaitcev, Nick Piggin, Hugh Dickins, arjanv,
	greg, linux-kernel, sct

On Fri, 20 Aug 2004, Alan Cox wrote:

> PF_MEMALLOC won't recurse. You might run out of memory however.

> Are any of the VM guys considering PF_LOGALLOC so you can trace it down 8)

No, but this thread does make me consider PF_NOIO ;)

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 16:04                 ` Rik van Riel
@ 2004-08-20 16:06                   ` Arjan van de Ven
  2004-08-20 16:10                     ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: Arjan van de Ven @ 2004-08-20 16:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Oliver Neukum, Pete Zaitcev, Nick Piggin, Hugh Dickins,
	greg, linux-kernel, sct

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Fri, Aug 20, 2004 at 12:04:51PM -0400, Rik van Riel wrote:
> On Fri, 20 Aug 2004, Alan Cox wrote:
> 
> > PF_MEMALLOC won't recurse. You might run out of memory however.
> 
> > Are any of the VM guys considering PF_LOGALLOC so you can trace it down 8)
> 
> No, but this thread does make me consider PF_NOIO ;)

given that the task of this thread is to DO io ... ;)


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 16:06                   ` Arjan van de Ven
@ 2004-08-20 16:10                     ` Alan Cox
  2004-08-20 16:14                       ` Rik van Riel
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Cox @ 2004-08-20 16:10 UTC (permalink / raw)
  To: Arjan van de Ven, y
  Cc: Rik van Riel, Alan Cox, Oliver Neukum, Pete Zaitcev, Nick Piggin,
	Hugh Dickins, greg, linux-kernel, sct

On Fri, Aug 20, 2004 at 06:06:05PM +0200, Arjan van de Ven wrote:
> > > Are any of the VM guys considering PF_LOGALLOC so you can trace it down 8)
> > No, but this thread does make me consider PF_NOIO ;)
> given that the task of this thread is to DO io ... ;)

But not to cause I/O.. what are the semantics of PF_NOIO ?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 16:10                     ` Alan Cox
@ 2004-08-20 16:14                       ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2004-08-20 16:14 UTC (permalink / raw)
  To: Alan Cox
  Cc: Arjan van de Ven, y, Oliver Neukum, Pete Zaitcev, Nick Piggin,
	Hugh Dickins, greg, linux-kernel, sct

On Fri, 20 Aug 2004, Alan Cox wrote:
> On Fri, Aug 20, 2004 at 06:06:05PM +0200, Arjan van de Ven wrote:

> > > No, but this thread does make me consider PF_NOIO ;)
> > given that the task of this thread is to DO io ... ;)
> 
> But not to cause I/O.. what are the semantics of PF_NOIO ?

Any gfp_mask of this process gets GFP_NOIO set and other
appropriate bits cleared, so it will never cause IO but
only reclaim clean pages.

It should also not loop (too often) in alloc_pages and
try_to_free_pages...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20 15:02               ` Alan Cox
  2004-08-20 16:04                 ` Rik van Riel
@ 2004-08-21  2:03                 ` Nick Piggin
  1 sibling, 0 replies; 25+ messages in thread
From: Nick Piggin @ 2004-08-21  2:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: Oliver Neukum, Pete Zaitcev, Hugh Dickins, arjanv, greg,
	linux-kernel, riel, sct

Alan Cox wrote:
> On Fri, Aug 20, 2004 at 04:50:07PM +0200, Oliver Neukum wrote:
> 
>>>This is what made me suspect that it's the diry memory writeout problem.
>>>It's just like how it was on 2.4 before Alan added PF_MEMALLOC.
>>
>>If we add PF_MEMALLOC, do we solve the issue or make it only less
>>likely? Isn't there a need to limit users of the reserves in number?
> 
> 
> PF_MEMALLOC won't recurse. You might run out of memory however. The old
> world scsi drivers run in the thread of the I/O so are protected already
> by PF_MEMALLOC in those cases, its the thread nature of the USB driver which
> makes it more fun. Unless 2.6 vm is radically different I think PF_MEMALLOC
> is the right thing to set although it would always eventually be better to
> find out who is guilty of the blocking allocation that recurses.
> 
> Are any of the VM guys considering PF_LOGALLOC so you can trace it down 8)
> 
> 

The problem isn't necessarily a recursing allocation - although that
wouldn't be helping. The main thing is an inversion in the PF_MEMALLOC
reserve logic.

Memory goes below pages_min, thread A is in the allocator, sets
PF_MEMALLOC and tries to clean some pages. The USB thread then can't
allocate memory to service these requests because it is not PF_MEMALLOC.

If you make the USB thread PF_MEMALLOC, you solve this problem at the
cost of making the PF_MEMALLOC reserve more fragile. If you're pretty
sure that it only allocates a small, bounded amount of memory then that
may be a good enough fix for now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-20  8:52           ` Oliver Neukum
  2004-08-20  9:06             ` Nick Piggin
@ 2004-08-26 21:16             ` Zephaniah E. Hull
  2004-08-26 22:04               ` Oliver Neukum
  2004-08-26 23:41               ` Mikulas Patocka
  1 sibling, 2 replies; 25+ messages in thread
From: Zephaniah E. Hull @ 2004-08-26 21:16 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Nick Piggin, Hugh Dickins, Pete Zaitcev, arjanv, alan, greg,
	linux-kernel, riel, sct

[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

On Fri, Aug 20, 2004 at 10:52:51AM +0200, Oliver Neukum wrote:
> Am Freitag, 20. August 2004 10:06 schrieb Nick Piggin:
> > >>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
> > >>a PF_MEMALLOC thread or use a mempool or something.
> > > 
> > > 
> > > Then the SCSI layer should pass down the flag.
> > > 
> > 
> > It would be ideal from the memory allocator's point of view to do it
> > on a per-request basis like that.
> > 
> > When the rubber hits the road, I think it is probably going to be very
> > troublesome to do it right that way. For example, what happens when
> > your usb-thingy-thread blocks on a memory allocation while handling a
> > read request, then the system gets low on memory and someone tries to
> > free some by submitting a write request to the USB device?
> 
> The write request will have to wait.

> Storage cannot do concurrent IO.

I'm going to jump in here and ask a simple question, what is the
blocking point that stops writes happening concurrent with reads?

-- 
	1024D/E65A7801 Zephaniah E. Hull <warp@babylon.d2dc.net>
	   92ED 94E4 B1E6 3624 226D  5727 4453 008B E65A 7801
	    CCs of replies from mailing lists are requested.

It was then I realized how dire my medical situation was.  Here I was,
a network admin, unable to leave, and here was someone with a broken
network.  And they didn't ask me to fix it.  They didn't even try to
casually pry a hint out of me.
  -- Ryan Tucker in the SDM.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-26 21:16             ` Zephaniah E. Hull
@ 2004-08-26 22:04               ` Oliver Neukum
       [not found]                 ` <20040827032554.GB30820@babylon.d2dc.net>
  2004-08-26 23:41               ` Mikulas Patocka
  1 sibling, 1 reply; 25+ messages in thread
From: Oliver Neukum @ 2004-08-26 22:04 UTC (permalink / raw)
  To: Zephaniah E. Hull
  Cc: Nick Piggin, Hugh Dickins, Pete Zaitcev, arjanv, alan, greg,
	linux-kernel, riel, sct


> > Storage cannot do concurrent IO.
> 
> I'm going to jump in here and ask a simple question, what is the
> blocking point that stops writes happening concurrent with reads?

The protocol on USB allows one command at a time only.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
  2004-08-26 21:16             ` Zephaniah E. Hull
  2004-08-26 22:04               ` Oliver Neukum
@ 2004-08-26 23:41               ` Mikulas Patocka
  1 sibling, 0 replies; 25+ messages in thread
From: Mikulas Patocka @ 2004-08-26 23:41 UTC (permalink / raw)
  To: Zephaniah E. Hull
  Cc: Oliver Neukum, Nick Piggin, Hugh Dickins, Pete Zaitcev, arjanv,
	alan, greg, linux-kernel, riel, sct



On Thu, 26 Aug 2004, Zephaniah E. Hull wrote:

> On Fri, Aug 20, 2004 at 10:52:51AM +0200, Oliver Neukum wrote:
> > Am Freitag, 20. August 2004 10:06 schrieb Nick Piggin:
> > > >>So I'd say try to find a way to only use PF_MEMALLOC on behalf of
> > > >>a PF_MEMALLOC thread or use a mempool or something.
> > > >
> > > >
> > > > Then the SCSI layer should pass down the flag.
> > > >
> > >
> > > It would be ideal from the memory allocator's point of view to do it
> > > on a per-request basis like that.
> > >
> > > When the rubber hits the road, I think it is probably going to be very
> > > troublesome to do it right that way. For example, what happens when
> > > your usb-thingy-thread blocks on a memory allocation while handling a
> > > read request, then the system gets low on memory and someone tries to
> > > free some by submitting a write request to the USB device?
> >
> > The write request will have to wait.
>
> > Storage cannot do concurrent IO.
>
> I'm going to jump in here and ask a simple question, what is the
> blocking point that stops writes happening concurrent with reads?

If writing process can't allocate request, because there's not enough
memory, it synchronously waits for some other request to terminate. This
is true for all block devices.

Mikulas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: PF_MEMALLOC in 2.6
       [not found]                 ` <20040827032554.GB30820@babylon.d2dc.net>
@ 2004-08-27  9:15                   ` Oliver Neukum
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Neukum @ 2004-08-27  9:15 UTC (permalink / raw)
  To: Zephaniah E. Hull
  Cc: Nick Piggin, Hugh Dickins, Pete Zaitcev, arjanv, alan, greg,
	linux-kernel, riel, sct

Am Freitag, 27. August 2004 05:25 schrieb Zephaniah E. Hull:
> On Fri, Aug 27, 2004 at 12:04:15AM +0200, Oliver Neukum wrote:
> > 
> > > > Storage cannot do concurrent IO.
> > > 
> > > I'm going to jump in here and ask a simple question, what is the
> > > blocking point that stops writes happening concurrent with reads?
> > 
> > The protocol on USB allows one command at a time only.
> 
> Are you sure on that?  Before some of the locking changes it was
> possible with usbfs to issue a bulk request that may block on the
> device, then issue a bulk write before it finished. (Sometimes the write
> tells the other end to send stuff with the read.)
> 
> Was this violating the spec or just an odd corner case?

6.2.1.
The device shall consider the CBW valid when:
- The CBW was recieved when the device had sent a CSW or after a reset

Sending two requests without reading the CSW in between is illegal.
So a storage device can only execute one command at a time.
As you have to evaluate the CSW just queuing the requests buys you little
and in terms of memory allocation is worse.
Besides, this applies only to the bulk only protocol variant and the storage
driver shall be universal.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-08-27  9:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-19  6:55 PF_MEMALLOC in 2.6 Pete Zaitcev
2004-08-19  6:59 ` William Lee Irwin III
2004-08-19  8:46 ` Stephen C. Tweedie
2004-08-19  8:59 ` Oliver Neukum
2004-08-19 12:41 ` Hugh Dickins
2004-08-19 18:25   ` Oliver Neukum
2004-08-20  2:37     ` Nick Piggin
2004-08-20  7:56       ` Oliver Neukum
2004-08-20  8:06         ` Nick Piggin
2004-08-20  8:40           ` Pete Zaitcev
2004-08-20 14:50             ` Oliver Neukum
2004-08-20 15:02               ` Alan Cox
2004-08-20 16:04                 ` Rik van Riel
2004-08-20 16:06                   ` Arjan van de Ven
2004-08-20 16:10                     ` Alan Cox
2004-08-20 16:14                       ` Rik van Riel
2004-08-21  2:03                 ` Nick Piggin
2004-08-20  8:52           ` Oliver Neukum
2004-08-20  9:06             ` Nick Piggin
2004-08-26 21:16             ` Zephaniah E. Hull
2004-08-26 22:04               ` Oliver Neukum
     [not found]                 ` <20040827032554.GB30820@babylon.d2dc.net>
2004-08-27  9:15                   ` Oliver Neukum
2004-08-26 23:41               ` Mikulas Patocka
2004-08-20 10:31   ` Stephen C. Tweedie
2004-08-20 15:34     ` Oliver Neukum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox