GFP_ATOMIC allocations...

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* GFP_ATOMIC allocations...
@ 2002-08-29  0:25 Doug Ledford
  2002-08-29  0:47 ` Patrick Mansfield
  0 siblings, 1 reply; 13+ messages in thread
From: Doug Ledford @ 2002-08-29  0:25 UTC (permalink / raw)
  To: linux-scsi

I'm looking through some of the new code that has been put in the kernel 
scsi subsystem lately and I'm seeing a few of the same mistakes that the 
old code made.  So, I'm pointing out one of them here.  There are lots of 
places in the SCSI mid layer that are called outside of any spinlocks and 
not from interrupt context where there is absolutely no need to use 
GFP_ATOMIC allocations for memory.  In general, if you don't need to use 
GFP_ATOMIC, then don't use it.  You're more likely to fail the allocation 
and therefore fail the operation.  So, if you are working on a section of 
code, please think about this when doing allocations and do whatever is 
right for the code snippet you are currently hacking.  Fixing places in 
the existing code that use GFP_ATOMIC needlessly is a plus ;-)

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29  0:25 Doug Ledford
@ 2002-08-29  0:47 ` Patrick Mansfield
  2002-08-29  1:42   ` Doug Ledford
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Mansfield @ 2002-08-29  0:47 UTC (permalink / raw)
  To: linux-scsi

On Wed, Aug 28, 2002 at 08:25:34PM -0400, Doug Ledford wrote:
> I'm looking through some of the new code that has been put in the kernel 
> scsi subsystem lately and I'm seeing a few of the same mistakes that the 
> old code made.  So, I'm pointing out one of them here.  There are lots of 
> places in the SCSI mid layer that are called outside of any spinlocks and 
> not from interrupt context where there is absolutely no need to use 
> GFP_ATOMIC allocations for memory.  In general, if you don't need to use 
> GFP_ATOMIC, then don't use it.  You're more likely to fail the allocation 
> and therefore fail the operation.  So, if you are working on a section of 
> code, please think about this when doing allocations and do whatever is 
> right for the code snippet you are currently hacking.  Fixing places in 
> the existing code that use GFP_ATOMIC needlessly is a plus ;-)

So, you think all (or most) of the GFP_ATOMIC's in scsi_scan.c should
be GFP_KERNEL? All the kmalloc calls should be at boot time, or via
insmod.

I was wondering about them, but left them to match the previous
scsi_scan.c code.

Do GFP_KERNEL alloc failures during boot time just return NULL?
(Given that there is really no memory left.) I'd hope so, but was
never sure.

I suppose I could change them and boot with mem=something-low and
see what happens.

-- Patrick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29  0:47 ` Patrick Mansfield
@ 2002-08-29  1:42   ` Doug Ledford
  2002-08-29 10:47     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Doug Ledford @ 2002-08-29  1:42 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

On Wed, Aug 28, 2002 at 05:47:37PM -0700, Patrick Mansfield wrote:
> So, you think all (or most) of the GFP_ATOMIC's in scsi_scan.c should
> be GFP_KERNEL?

Yep.  Until such time as a LLDD actually calls scsi_scan from an interrupt 
handler to handle a fiber fabric that has just come back up, it is safe 
for this code to use GFP_KERNEL.  As for the fiber channel issue, I 
actually think that it needs to notify the eh thread of the change in loop 
state and let the eh thread do the rescan from the eh thread context 
instead of interrupt context, so this won't be an issue when we support 
fiber loop transitions either.

> All the kmalloc calls should be at boot time, or via
> insmod.

Right, process context with no locks held.

> I was wondering about them, but left them to match the previous
> scsi_scan.c code.

I figured.  That's why I pointed it out ;-)

> Do GFP_KERNEL alloc failures during boot time just return NULL?

GFP_ATOMIC returns NULL on failure, GFP_KERNEL *should* never fail until 
we get to a true OOM condition because it will block and wait for the vm 
subsystem to free us up some space.  Under true OOM conditions it will 
return NULL AFAIK.

> (Given that there is really no memory left.) I'd hope so, but was
> never sure.
> 
> I suppose I could change them and boot with mem=something-low and
> see what happens.

I've known people that had actual, real problems because of the GFP_ATOMIC 
allocations in this code area.  It was related to bringing up a fiber 
controller that had 100s of disks hanging off of it and therefore wanted 
to do a *LOT* of atomic allocations in a short period of time, but it was 
a real problem none the less.  Making the allocations non-atomic would 
allow the scanning to block waiting on more ram instead of bailing on 
devices.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29  1:42   ` Doug Ledford
@ 2002-08-29 10:47     ` Alan Cox
  2002-08-29 15:58       ` Doug Ledford
  2002-08-29 17:10     ` Patrick Mansfield
  2002-08-30 16:22     ` Patrick Mansfield
  2 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2002-08-29 10:47 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Patrick Mansfield, linux-scsi

On Thu, 2002-08-29 at 02:42, Doug Ledford wrote:
> > Do GFP_KERNEL alloc failures during boot time just return NULL?
> 
> GFP_ATOMIC returns NULL on failure, GFP_KERNEL *should* never fail until 
> we get to a true OOM condition because it will block and wait for the vm 
> subsystem to free us up some space.  Under true OOM conditions it will 
> return NULL AFAIK.

Or maybe deadlock if its happening due to I/O. If it wants to flush a
page out to that scsi device and the scsi device is doing the allocation
life gets very unpleasant. SCSI may not be coming from block layer
access with PF_MEM* set either, think about /proc and scsi-generic

"Here be dragons"


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29 10:47     ` Alan Cox
@ 2002-08-29 15:58       ` Doug Ledford
  0 siblings, 0 replies; 13+ messages in thread
From: Doug Ledford @ 2002-08-29 15:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Patrick Mansfield, linux-scsi

On Thu, Aug 29, 2002 at 11:47:55AM +0100, Alan Cox wrote:
> On Thu, 2002-08-29 at 02:42, Doug Ledford wrote:
> > > Do GFP_KERNEL alloc failures during boot time just return NULL?
> > 
> > GFP_ATOMIC returns NULL on failure, GFP_KERNEL *should* never fail until 
> > we get to a true OOM condition because it will block and wait for the vm 
> > subsystem to free us up some space.  Under true OOM conditions it will 
> > return NULL AFAIK.
> 
> Or maybe deadlock if its happening due to I/O.

That would be a trick indeed since Patrick and I were referring to the
scanning code so the device isn't even configured for kernel use yet :-P

> If it wants to flush a
> page out to that scsi device and the scsi device is doing the allocation
> life gets very unpleasant.

Of course.  This is why the mid layer doesn't typically allocate anything 
at all once initial setup of a device is complete.  No allocations once 
setup is complete means no deadlock.

> SCSI may not be coming from block layer
> access with PF_MEM* set either, think about /proc and scsi-generic
> 
> "Here be dragons"

That's why I recommended people think each use through instead of blindly 
changing things ;-)

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
@ 2002-08-29 15:59 Bryan Henderson
  2002-08-29 16:25 ` Doug Ledford
  2002-08-29 16:50 ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: Bryan Henderson @ 2002-08-29 15:59 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi, patman

>GFP_KERNEL *should* never fail until
>we get to a true OOM condition because it will block and wait for the vm
>subsystem to free us up some space.  Under true OOM conditions it will
>return NULL AFAIK.

That (the NULL case) doesn't sound right to me.  The purpose of the OOM
Killer is to choose wisely which processes should die when there's simply
not enough memory to go around.  And it does it by terminating a process
directly, not by affecting the return value of kmalloc().  So if
kmalloc(GFP_KERNEL) returns NULL when the system is out of memory, the
system is arbitrarily killing whoever is the next guy to ask for memory,
thus defeating the purpose of the OOM Killer.

Incidentally, I've had to explain GFP_ATOMIC to many people, and I believe
the reason for its incorrect usage is that people know it's supposed to be
called by processes that can't wait, and they think that means "I can't
wait for this memory, so be sure you give it to me _now_."  Simple
explanations of GFP_ATOMIC give you the impression that it's just a higher
priority request than GFP_KERNEL.

One thing about GFP_ATOMIC that is often overlooked is that what it really
means is that the caller, instead of VM, is responsible for doing the
waiting and retrying and whatever to deal with memory allocation failure.
Failing some operation permanently because a kmalloc(GFP_ATOMIC) it did
failed is usually the wrong thing to do.

Again, many people have just the opposite impression.  One person told me
he used GFP_ATOMIC instead of GFP_KERNEL because he needed to be guaranteed
the allocation wouldn't fail.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29 15:59 GFP_ATOMIC allocations Bryan Henderson
@ 2002-08-29 16:25 ` Doug Ledford
  2002-08-29 16:50 ` Alan Cox
  1 sibling, 0 replies; 13+ messages in thread
From: Doug Ledford @ 2002-08-29 16:25 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-scsi, patman

On Thu, Aug 29, 2002 at 08:59:50AM -0700, Bryan Henderson wrote:
> 
> Incidentally, I've had to explain GFP_ATOMIC to many people, and I believe
> the reason for its incorrect usage is that people know it's supposed to be
> called by processes that can't wait, and they think that means "I can't
> wait for this memory, so be sure you give it to me _now_."  Simple
> explanations of GFP_ATOMIC give you the impression that it's just a higher
> priority request than GFP_KERNEL.

OK, then clarification time ;-)  GFP_KERNEL means give me memory that is 
kernel addressable, and we are allowed to block while writing stuff out to 
disk if necessary.  GFP_ATOMIC doesn't really mean give me memory now, but 
it does mean make your decision now and either give me my memory or give 
me an error condition immediately because I can't be blocked.  Of note, 
GFP_KERNEL allocations are not allowed to succeed if the amount of free 
RAM falls below free pages lo.  Instead they must block until we have 
freed up some ram (enough to get the number of free pages up above free 
pages high).  Atomic allocations will steal memory out the machine well 
below free pages low.  In fact, it's specifically for these atomic 
allocations that we keep that free pages high number of free memory 
around, and this memory pool is used to satisfy atomic allocations long 
after normal GFP_KERNEL allocations have started to block.

> One thing about GFP_ATOMIC that is often overlooked is that what it really
> means is that the caller, instead of VM, is responsible for doing the
> waiting and retrying and whatever to deal with memory allocation failure.
> Failing some operation permanently because a kmalloc(GFP_ATOMIC) it did
> failed is usually the wrong thing to do.

This depends on context.  Failing an atomic allocation in your interrupt 
handler shouldn't result in waiting or retries.  And, since we can't allow 
ourselves to block in an interrupt handler, atomic allocations are the 
only correct method of allocating in an interrupt handler (the scsi 
devices don't typically do this, but lots of network drivers do).  

Now, additionally, if we are failing an allocation in an interrupt 
handler, then often the only thing we can do is bail on the operation.  
So, IMHO, when you see places that truly need atomic allocations, then you 
should also see a place where if the allocations fails then we should 
bail immediately instead of blocking on RAM.

> Again, many people have just the opposite impression.  One person told me
> he used GFP_ATOMIC instead of GFP_KERNEL because he needed to be guaranteed
> the allocation wouldn't fail.

GFP_KERNEL == most likely to succeed, eventually
GFP_ATOMIC == most likely to succeed, right now.  

In fact, atomic allocations *will* succeed as long as there are enough
free pages to satisfy the request.  Regardless of the result, the call 
returns without any delay.  No sleeping is allowed in these allocations.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29 15:59 GFP_ATOMIC allocations Bryan Henderson
  2002-08-29 16:25 ` Doug Ledford
@ 2002-08-29 16:50 ` Alan Cox
  1 sibling, 0 replies; 13+ messages in thread
From: Alan Cox @ 2002-08-29 16:50 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: Doug Ledford, linux-scsi, patman

On Thu, 2002-08-29 at 16:59, Bryan Henderson wrote:
> That (the NULL case) doesn't sound right to me.  The purpose of the OOM
> Killer is to choose wisely which processes should die when there's simply
> not enough memory to go around. 

It can return NULL in such cases. It shouldn't in the current case ever
do this for an allocation exceeding 4K unless you are in an I/O path.
However its a good idea to program for the can always happen case


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29  1:42   ` Doug Ledford
  2002-08-29 10:47     ` Alan Cox
@ 2002-08-29 17:10     ` Patrick Mansfield
  2002-08-30 16:22     ` Patrick Mansfield
  2 siblings, 0 replies; 13+ messages in thread
From: Patrick Mansfield @ 2002-08-29 17:10 UTC (permalink / raw)
  To: linux-scsi

On Wed, Aug 28, 2002 at 09:42:15PM -0400, Doug Ledford wrote:

> I've known people that had actual, real problems because of the GFP_ATOMIC 
> allocations in this code area.  It was related to bringing up a fiber 
> controller that had 100s of disks hanging off of it and therefore wanted 
> to do a *LOT* of atomic allocations in a short period of time, but it was 
> a real problem none the less.  Making the allocations non-atomic would 
> allow the scanning to block waiting on more ram instead of bailing on 
> devices.
> 

There are some bad allocations in sd.c - there's now a (single) GFP_DMA
buffer for test unit ready, spinup and the write protect check, where we
used scsi_malloc() in 2.4.

There are a bunch of vmalloc's most of them should be kmalloc's. 

Here's the sd.c code that does the vmalloc:

#define init_mem_lth(x,n)	x = vmalloc((n) * sizeof(*x))
#define zero_mem_lth(x,n)	memset(x, 0, (n) * sizeof(*x))

	init_mem_lth(sd_dsk_arr, sd_template.dev_max);
	if (sd_dsk_arr) {
		zero_mem_lth(sd_dsk_arr, sd_template.dev_max);
		for (k = 0; k < sd_template.dev_max; ++k) {
			sdkp = vmalloc(sizeof(Scsi_Disk));
			if (NULL == sdkp)
				goto cleanup_mem;
			memset(sdkp, 0, sizeof(Scsi_Disk));
			sd_dsk_arr[k] = sdkp;
		}
	}
	init_mem_lth(sd_disks, sd_template.dev_max);
	if (sd_disks)
		zero_mem_lth(sd_disks, sd_template.dev_max);
	init_mem_lth(sd, maxparts);

	if (!sd_dsk_arr || !sd || !sd_disks)
		goto cleanup_mem;

	zero_mem_lth(sd, maxparts);
	return 0;

#undef init_mem_lth
#undef zero_mem_lth

The sdkp = vmalloc() certainly should be kmalloc().

Each of the init_mem_lth and associated variable:

	static Scsi_Disk ** sd_dsk_arr;
	init_mem_lth(sd_dsk_arr, sd_template.dev_max);

	static struct gendisk **sd_disks;
	init_mem_lth(sd_disks, sd_template.dev_max);

	struct hd_struct *sd;
	init_mem_lth(sd, maxparts);

dev_max maximum is N_SD_MAJORS * SCSI_DISKS_PER_MAJOR right now 128 (8 * 16),
even doubling won't hit kmalloc limits if we want an array of pointers
(256 * 4 or 256 * 8).

So only the last one might need a vmalloc(), since maxparts is max_dev * 16,
and hd_structs are larger.

The only use of sd is in sd_attach to set the part - can we now just
allocate a hd_struct as needed in sd_attach, as part of the struct *p
data? Also it looks like there is never a kfree(p) for the normal
non-error case.

Or, can the size be checked and vmalloc() only called when necessary,
or called if kmalloc() fails?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-29  1:42   ` Doug Ledford
  2002-08-29 10:47     ` Alan Cox
  2002-08-29 17:10     ` Patrick Mansfield
@ 2002-08-30 16:22     ` Patrick Mansfield
  2002-08-30 16:46       ` James Bottomley
  2002-09-03 16:07       ` Pete Zaitcev
  2 siblings, 2 replies; 13+ messages in thread
From: Patrick Mansfield @ 2002-08-30 16:22 UTC (permalink / raw)
  To: linux-scsi, zaitcev

On Wed, Aug 28, 2002 at 09:42:15PM -0400, Doug Ledford wrote:
> On Wed, Aug 28, 2002 at 05:47:37PM -0700, Patrick Mansfield wrote:
> > So, you think all (or most) of the GFP_ATOMIC's in scsi_scan.c should
> > be GFP_KERNEL?
> 

> 
> > All the kmalloc calls should be at boot time, or via
> > insmod.
> 
> Right, process context with no locks held.
> 

There was past discussion about calling into scsi_build_commandblocks()
while not in user context (via usb etc.), this implies that we could
scan while not in user context.

I don't see how that would work, since the scan sleeps waiting for IO. 

Do we really call scan_scsis and scsi_build_commandblocks while not in user
context? Who is the caller (to scsi_register_host)?

Here is Pete's response saying he found such a case, but not where:

http://marc.theaimsgroup.com/?l=linux-scsi&m=101678634608966&w=2

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-30 16:22     ` Patrick Mansfield
@ 2002-08-30 16:46       ` James Bottomley
  2002-08-30 18:58         ` Doug Ledford
  2002-09-03 16:07       ` Pete Zaitcev
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2002-08-30 16:46 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi, zaitcev

patmans@us.ibm.com said:
> There was past discussion about calling into scsi_build_commandblocks()
>  while not in user context (via usb etc.), this implies that we could
> scan while not in user context.

> I don't see how that would work, since the scan sleeps waiting for IO.
> 

> Do we really call scan_scsis and scsi_build_commandblocks while not in
> user context? Who is the caller (to scsi_register_host)?

> Here is Pete's response saying he found such a case, but not where:

> http://marc.theaimsgroup.com/?l=linux-scsi&m=101678634608966&w=2 

I agree we shouldn't not be in user context here.

What about, instead of Pete's patch, changing the GFP_ATOMIC to GFP_KERNEL and 
adding a BUG_ON(in_interrupt()) at the top and see who sends back bug 
reports...

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-30 16:46       ` James Bottomley
@ 2002-08-30 18:58         ` Doug Ledford
  0 siblings, 0 replies; 13+ messages in thread
From: Doug Ledford @ 2002-08-30 18:58 UTC (permalink / raw)
  To: James Bottomley; +Cc: Patrick Mansfield, linux-scsi, zaitcev

On Fri, Aug 30, 2002 at 11:46:02AM -0500, James Bottomley wrote:
> What about, instead of Pete's patch, changing the GFP_ATOMIC to GFP_KERNEL and 
> adding a BUG_ON(in_interrupt()) at the top and see who sends back bug 
> reports...

Works for me.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GFP_ATOMIC allocations...
  2002-08-30 16:22     ` Patrick Mansfield
  2002-08-30 16:46       ` James Bottomley
@ 2002-09-03 16:07       ` Pete Zaitcev
  1 sibling, 0 replies; 13+ messages in thread
From: Pete Zaitcev @ 2002-09-03 16:07 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi, zaitcev

> Date: Fri, 30 Aug 2002 09:22:57 -0700
> From: Patrick Mansfield <patmans@us.ibm.com>

>[...]
> Do we really call scan_scsis and scsi_build_commandblocks while not in user
> context? Who is the caller (to scsi_register_host)?
> 
> Here is Pete's response saying he found such a case, but not where:

I did a call graph, but I discarded it since. IIRC it showed
some paths, but perhaps I made a mistake. Please ignore it.

The "fix" did not go anywhere, actually. Arjan explained that
if a box runs out of atomics, it's bad anyway. We reduced the
queue size in the driver that the guy used.

-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-09-03 16:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-29 15:59 GFP_ATOMIC allocations Bryan Henderson
2002-08-29 16:25 ` Doug Ledford
2002-08-29 16:50 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2002-08-29  0:25 Doug Ledford
2002-08-29  0:47 ` Patrick Mansfield
2002-08-29  1:42   ` Doug Ledford
2002-08-29 10:47     ` Alan Cox
2002-08-29 15:58       ` Doug Ledford
2002-08-29 17:10     ` Patrick Mansfield
2002-08-30 16:22     ` Patrick Mansfield
2002-08-30 16:46       ` James Bottomley
2002-08-30 18:58         ` Doug Ledford
2002-09-03 16:07       ` Pete Zaitcev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox