public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* page allocation failure
@ 2004-01-19 11:36 Oliver Kiddle
  2004-01-19 14:54 ` Mike Fedyk
  2004-01-20  3:38 ` Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Oliver Kiddle @ 2004-01-19 11:36 UTC (permalink / raw)
  To: linux-kernel

There seems to be a problem with 2.6.1 on my machine. It will be fine
for a matter of a few days and then this error will appear on the
console. The message then appears repeatedly and continuously. The
first I know is that my remote login shell ceases to respond. About the
only thing I can do is switch between virtual consoles (until I hit the
reset button).

/var/log/messages shows:
kernel: cat: page allocation failure. order:0, mode:0x20

Then the same for lots of other processes (pdflush, syslogd, klogd,
kswapd0, nfsd to name a few). I expect that after a point it is unable
to even log stuff so syslog is quiet after a while.

It has happened three times now and on all occasions, I was untarring a
huge file on an XFS partition. I assume the problem is something to do
with VM. The machine has 1GB of RAM which should be plenty. For the
most part it is just serving NFS and NIS (to no more than about 10
clients).

The hardware is a Dell PowerEdge 600SC. It's a new machine that never
ran 2.4 before. I can supply any other information that might help in
diagnosing the problem. I don't subscribe so please CC me in any reply
(but I'll keep an eye on the archives).

If anyone can suggest any /proc variables I might change to reduce the
risk of it doing this again, I would appreciate it. I tried increasing
/proc/sys/vm/min_free_kbytes after the first time this happened. Not
that I understand what that does: I searched the archives and it was
mentioned in a vaguely relevant looking post.

Cheers

Oliver Kiddle

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-19 11:36 page allocation failure Oliver Kiddle
@ 2004-01-19 14:54 ` Mike Fedyk
  2004-01-19 17:29   ` Oliver Kiddle
  2004-01-20  3:38 ` Andrew Morton
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Fedyk @ 2004-01-19 14:54 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: linux-kernel

On Mon, Jan 19, 2004 at 12:36:02PM +0100, Oliver Kiddle wrote:
> If anyone can suggest any /proc variables I might change to reduce the
> risk of it doing this again, I would appreciate it. I tried increasing
> /proc/sys/vm/min_free_kbytes after the first time this happened. Not
> that I understand what that does: I searched the archives and it was
> mentioned in a vaguely relevant looking post.

Try running "vmstat 1" and output that to a file, and post your /proc/meminfo.

Do you start getting the error before a couple of days, or you just can't
login after that amount of time?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-19 14:54 ` Mike Fedyk
@ 2004-01-19 17:29   ` Oliver Kiddle
  2004-01-19 18:12     ` Mike Fedyk
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Kiddle @ 2004-01-19 17:29 UTC (permalink / raw)
  To: linux-kernel

Mike Fedyk wrote:
> 
> Try running "vmstat 1" and output that to a file, and post your /proc/meminfo.
> 
> Do you start getting the error before a couple of days, or you just can't
> login after that amount of time?

I can't log in immediately following the first occurence of the error.
I can type in a username at the login prompt but nothing happens after
pressing enter. Two days was just a rough idea of how long the system
could be up before going down. It has gone down twice since I posted
earlier so it wasn't even vaguely an accurate figure. On both
occasions, there has not been a "page allocation failure" error though.

These last two times, I was running xfsdump along with a nfsd activity.
I had the following, possibly unrelated messages on the console.

st0: Block limits 1 - 16777215 bytes.
spurious 8259A interrupt IRQ7

I've put /proc/meminfo below though that is from the beginning while
everything is still fine. The vmstat output is more interesting and I
have it captured for the period when it went down.

vmstat output starts off like this:
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 947908   5792  37128    0    0    54    49 1072   121  0  2 96  2

The free column then slowly drops.

Shortly before the end, is this sequence:

 2  1      0  57036   2412  62044    0    0  2224   512 1950   188  1 70 20  9
 0  0      0  55104   1284  64096    0    0  2204   320 1663   154  0 51 42  7
 2  1      0  53048     44  67168    0    0  3080     0 1939    32  0 59 38  3
 2  0   1388  49748     56  69592    0 1388  2796  1393 1909   161  1 64 15 19
 3  2   1928  45828     60  72376   64 1208  3056  1208 2146   184  3 70  2 25
 1  4   1464  94700     60  22088    0  808  3428   828 1873   213  1 58  0 41
 0  1   1176  93716     60  23060  356  316  1596   429 2079   342  0 56  4 40
 3  3   1176  94116     64  22368  144    0  1124   311 6419  1369  0  6  1 93
 1  2   1176 109176     36   7360    0    0   828   159 29189  7978  0  1  0 99

This is the first time the swpd column is non-zero. The figures don't
change a vast amount after that and only 25 samples later, the very last
sample I got looked like this:

 0  1   1176 109248     40   7364    0    0     0     0 1009    25  0  1  0 99

I can send you the full output if you want (70kb compressed).

/proc/meminfo:
MemTotal:      1034796 kB
MemFree:        884620 kB
Buffers:         14768 kB
Cached:          61192 kB
SwapCached:          0 kB
Active:          51972 kB
Inactive:        35992 kB
HighTotal:      131008 kB
HighFree:        57148 kB
LowTotal:       903788 kB
LowFree:        827472 kB
SwapTotal:      996020 kB
SwapFree:       996020 kB
Dirty:              24 kB
Writeback:           0 kB
Mapped:          16772 kB
Slab:            31064 kB
Committed_AS:    24876 kB
PageTables:        536 kB
VmallocTotal:   114680 kB
VmallocUsed:       692 kB
VmallocChunk:   113988 kB

I have /tmp mounted using tmpfs if that is in any way significant.

Thanks

Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-19 17:29   ` Oliver Kiddle
@ 2004-01-19 18:12     ` Mike Fedyk
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Fedyk @ 2004-01-19 18:12 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: linux-kernel

On Mon, Jan 19, 2004 at 06:29:03PM +0100, Oliver Kiddle wrote:
> could be up before going down. It has gone down twice since I posted
> earlier so it wasn't even vaguely an accurate figure. On both
> occasions, there has not been a "page allocation failure" error though.

Ok, turn on the nmi_watchdog, and see if you get any traces...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-19 11:36 page allocation failure Oliver Kiddle
  2004-01-19 14:54 ` Mike Fedyk
@ 2004-01-20  3:38 ` Andrew Morton
  2004-01-20  6:00   ` Nathan Scott
  2004-01-20 17:08   ` Oliver Kiddle
  1 sibling, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2004-01-20  3:38 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: linux-kernel

Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
>
> There seems to be a problem with 2.6.1 on my machine. It will be fine
>  for a matter of a few days and then this error will appear on the
>  console. The message then appears repeatedly and continuously. The
>  first I know is that my remote login shell ceases to respond. About the
>  only thing I can do is switch between virtual consoles (until I hit the
>  reset button).
> 
>  /var/log/messages shows:
>  kernel: cat: page allocation failure. order:0, mode:0x20
> 
>  Then the same for lots of other processes (pdflush, syslogd, klogd,
>  kswapd0, nfsd to name a few). I expect that after a point it is unable
>  to even log stuff so syslog is quiet after a while.
> 
>  It has happened three times now and on all occasions, I was untarring a
>  huge file on an XFS partition. I assume the problem is something to do
>  with VM. The machine has 1GB of RAM which should be plenty. For the
>  most part it is just serving NFS and NIS (to no more than about 10
>  clients).

Does the machine actually recover, or does it grind to a halt and need
resetting?

Is there much network receive happening at the time?

Are you using gig-E with large MTU's?

> If anyone can suggest any /proc variables I might change to reduce the
> risk of it doing this again, I would appreciate it. I tried increasing
> /proc/sys/vm/min_free_kbytes after the first time this happened. Not
> that I understand what that does: I searched the archives and it was
> mentioned in a vaguely relevant looking post.

Yup, min_free_kbytes is the right thing to increase.  Try it again, perhaps
increasing it by more - to 10000 or something like that.

min_free_kbytes will increase the amount of memory which the VM keeps in
reserve to satisfy interrupt-time memory allocation attempts - most
especially network receive.


You probably should apply this patch to tell us where the allocation
failures are coming from.  Make sure that CONFIG_KALLSYMS is enabled in
kernel config.


diff -puN mm/page_alloc.c~a mm/page_alloc.c
--- 25/mm/page_alloc.c~a	Mon Jan 19 19:34:09 2004
+++ 25-akpm/mm/page_alloc.c	Mon Jan 19 19:34:21 2004
@@ -674,6 +674,7 @@ nopage:
 		printk("%s: page allocation failure."
 			" order:%d, mode:0x%x\n",
 			p->comm, order, gfp_mask);
+		dump_stack();
 	}
 	return NULL;
 got_pg:

_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-20  3:38 ` Andrew Morton
@ 2004-01-20  6:00   ` Nathan Scott
  2004-01-20 17:08   ` Oliver Kiddle
  1 sibling, 0 replies; 13+ messages in thread
From: Nathan Scott @ 2004-01-20  6:00 UTC (permalink / raw)
  To: Oliver Kiddle, Andrew Morton, hch; +Cc: linux-kernel

On Mon, Jan 19, 2004 at 07:38:37PM -0800, Andrew Morton wrote:
> Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> >
> >  It has happened three times now and on all occasions, I was untarring a
> >  huge file on an XFS partition. I assume the problem is something to do
> >  with VM. The machine has 1GB of RAM which should be plenty. For the
> ...
> You probably should apply this patch to tell us where the allocation
> failures are coming from.  Make sure that CONFIG_KALLSYMS is enabled in
> kernel config.

We do have known issues in XFS on 2.6 with handling certain VM
allocation failures -- maybe hitting that here.  Christoph has
been looking at making XFS do a better job there; __GFP_NOFAIL
allocations failing seem to be the worst issue for us - on the
occasions I've hit that though, its always immediately fatal.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-20  3:38 ` Andrew Morton
  2004-01-20  6:00   ` Nathan Scott
@ 2004-01-20 17:08   ` Oliver Kiddle
  2004-01-20 18:35     ` Mike Fedyk
  1 sibling, 1 reply; 13+ messages in thread
From: Oliver Kiddle @ 2004-01-20 17:08 UTC (permalink / raw)
  To: linux-kernel

Andrew Morton wrote:
> 
> Does the machine actually recover, or does it grind to a halt and need
> resetting?

It needs resetting.

Today, I noticed that I could still ping it, though. I also had the
magic sysrq key stuff in the kernel and did a showTasks, Sync, Unmount
and kIll. That allowed me to briefly log in as root and save the output
of dmesg before an attempt to run vi caused it to die again.

> Is there much network receive happening at the time?

Not a lot I don't think. Is there anything like vmstat for measuring
network activity?

> Are you using gig-E with large MTU's?

No. 100Mbps full duplex, e1000 driver.

Again this time, I didn't get the "page allocation failure" message so
your patch couldn't print anything. The console was blank apart from
the message about the tape device. As suggested by Mike Fedyk, I had
the nmi_watchdog stuff enabled. Didn't see any output from it though.
Would that have displayed its output to the console?

I've put a few chunks of the saved dmesg output below incase they're
useful. All I have is some of the sysrq showTasks output. xfsdump seems
to be a reliable way to trigger the problem (perhaps once the tape
fills up) and I had run patch imediately before it died. Most other
processes seem to be in schedule_timeout.

I'm away next week and as other people use this machine I'll have to
switch it to 2.4. I'll still have opportunities to reboot to 2.6 and
try to find out what's going on, though.

Oliver

patch         D 00000000     0 11620    966                     (NOTLB)
d7bb7aa8 00000082 c03676c0 00000000 0000000c 00000050 00000000 c03677d4 
       c0138240 00001fec 536f5e97 00000bc6 f6d0f2e0 f6d0f4a0 00c029db d7bb7abc 
       c041dedc d7bb7ae8 c0121b9c d7bb7abc 00c029db 0000007b c040e460 f6c41920 
Call Trace:
 [<c0138240>] try_to_free_pages+0x9f/0x15f
 [<c0121b9c>] schedule_timeout+0x63/0xb7
 [<c0121b30>] process_timeout+0x0/0x9
 [<c01186f4>] io_schedule_timeout+0x11/0x19
 [<c024e343>] blk_congestion_wait+0x7e/0x8d
 [<c0118cd6>] autoremove_wake_function+0x0/0x4f
 [<c0118cd6>] autoremove_wake_function+0x0/0x4f
 [<c0132cd1>] __alloc_pages+0x294/0x319
 [<c012f3d5>] find_or_create_page+0xa0/0xaa
 [<c0214daf>] _pagebuf_lookup_pages+0x2fa/0x398
 [<c0215131>] pagebuf_get+0xba/0x135
 [<c0208a70>] xfs_trans_read_buf+0x32f/0x38b
 [<c01d8df1>] xfs_da_do_buf+0x6b1/0x9a6
 [<c01d9198>] xfs_da_read_buf+0x57/0x5b
 [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
 [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
 [<c01dce71>] xfs_dir2_block_lookup+0x2f/0xc8
 [<c01db3a8>] xfs_dir2_lookup+0xc4/0x13b
 [<c01db401>] xfs_dir2_lookup+0x11d/0x13b
 [<c0209afe>] xfs_dir_lookup_int+0x4c/0x12b
 [<c020f3d6>] xfs_lookup+0x50/0x88
 [<c021cae0>] linvfs_lookup+0x67/0x9f
 [<c0152ea2>] real_lookup+0xc8/0xea
 [<c01530f3>] do_lookup+0x96/0xa1
 [<c0153508>] link_path_walk+0x40a/0x7db
 [<c0154116>] open_namei+0x83/0x3e1
 [<c0146b0f>] filp_open+0x3e/0x64
 [<c0146e98>] sys_open+0x5b/0x8b
 [<c0108ab7>] syscall_call+0x7/0xb


xfsdump       D 9855CCF6  2760   676    673                     (NOTLB)
ece09b04 00000082 f7f9e080 9855ccf6 00000baf 00000000 9855ccf6 00000baf 
       f7f9e080 000012f8 9855d254 00000baf f51106a0 f5110860 f7ffe760 00000000 
       f7ffe778 ece09b0c c01186db ece09b3c c0131c2b 00000000 00000000 00000000 
Call Trace:
 [<c01186db>] io_schedule+0xe/0x16
 [<c0131c2b>] mempool_alloc+0xfa/0x117
 [<c0118cd6>] autoremove_wake_function+0x0/0x4f
 [<c021789b>] linvfs_get_block_core+0x87/0x2ab
 [<c0118cd6>] autoremove_wake_function+0x0/0x4f
 [<c0139296>] __blk_queue_bounce+0x1a1/0x232
 [<c013935d>] blk_queue_bounce+0x36/0x4d
 [<c024e510>] __make_request+0x4c/0x537
 [<c0162ff5>] do_mpage_readpage+0x1a9/0x32a
 [<c024eb04>] generic_make_request+0x109/0x18a
 [<c022765b>] radix_tree_node_alloc+0x1f/0x5a
 [<c02277f4>] radix_tree_insert+0x82/0xb8
 [<c024ebc2>] submit_bio+0x3d/0x6b
 [<c0162d26>] mpage_bio_submit+0x23/0x32
 [<c016324b>] mpage_readpages+0xd5/0x162
 [<c0217abf>] linvfs_get_block+0x0/0x43
 [<c0134460>] read_pages+0x134/0x13d
 [<c0217abf>] linvfs_get_block+0x0/0x43
 [<c0132ae3>] __alloc_pages+0xa6/0x319
 [<c010adcd>] do_IRQ+0xc4/0xdf
 [<c0109424>] common_interrupt+0x18/0x20
 [<c013469c>] do_page_cache_readahead+0xbf/0x109
 [<c0134852>] page_cache_readahead+0x16c/0x198
 [<c012f8d7>] do_generic_mapping_read+0x3c1/0x3d3
 [<c012f8e9>] file_read_actor+0x0/0xec
 [<c012fb91>] __generic_file_aio_read+0x1bc/0x1ee
 [<c012f8e9>] file_read_actor+0x0/0xec
 [<c021dcb2>] xfs_read+0x15a/0x26c
 [<c0117d92>] wait_for_completion+0x65/0x95
 [<c02181af>] linvfs_read_invis+0x90/0xa2
 [<c0147549>] do_sync_read+0x8b/0xb7
 [<c01217fa>] update_process_times+0x46/0x52
 [<c0121674>] update_wall_time+0xd/0x36
 [<c0121a6c>] do_timer+0xdf/0xe4
 [<c0147625>] vfs_read+0xb0/0x119
 [<c01478a0>] sys_read+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-20 17:08   ` Oliver Kiddle
@ 2004-01-20 18:35     ` Mike Fedyk
  2004-01-22  9:29       ` Oliver Kiddle
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Fedyk @ 2004-01-20 18:35 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: linux-kernel

On Tue, Jan 20, 2004 at 06:08:34PM +0100, Oliver Kiddle wrote:
> the message about the tape device. As suggested by Mike Fedyk, I had
> the nmi_watchdog stuff enabled. Didn't see any output from it though.
> Would that have displayed its output to the console?

It should have.  Run cat /proc/interrupts and again afew seconds later, does
the NMI: number change?

> 
> patch         D 00000000     0 11620    966                     (NOTLB)
> d7bb7aa8 00000082 c03676c0 00000000 0000000c 00000050 00000000 c03677d4 
>        c0138240 00001fec 536f5e97 00000bc6 f6d0f2e0 f6d0f4a0 00c029db d7bb7abc 
>        c041dedc d7bb7ae8 c0121b9c d7bb7abc 00c029db 0000007b c040e460 f6c41920 

There should be some lines above this in your log...

BTW, the NMI watchdog is supposed to oops when your system hangs so we can
see where the hang is coming from.  What was running when this oops happened?

> Call Trace:
>  [<c0138240>] try_to_free_pages+0x9f/0x15f
>  [<c0121b9c>] schedule_timeout+0x63/0xb7
>  [<c0121b30>] process_timeout+0x0/0x9
>  [<c01186f4>] io_schedule_timeout+0x11/0x19
>  [<c024e343>] blk_congestion_wait+0x7e/0x8d
>  [<c0118cd6>] autoremove_wake_function+0x0/0x4f
>  [<c0118cd6>] autoremove_wake_function+0x0/0x4f
>  [<c0132cd1>] __alloc_pages+0x294/0x319
>  [<c012f3d5>] find_or_create_page+0xa0/0xaa
>  [<c0214daf>] _pagebuf_lookup_pages+0x2fa/0x398
>  [<c0215131>] pagebuf_get+0xba/0x135
>  [<c0208a70>] xfs_trans_read_buf+0x32f/0x38b
>  [<c01d8df1>] xfs_da_do_buf+0x6b1/0x9a6
>  [<c01d9198>] xfs_da_read_buf+0x57/0x5b
>  [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
>  [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
>  [<c01dce71>] xfs_dir2_block_lookup+0x2f/0xc8
>  [<c01db3a8>] xfs_dir2_lookup+0xc4/0x13b
>  [<c01db401>] xfs_dir2_lookup+0x11d/0x13b
>  [<c0209afe>] xfs_dir_lookup_int+0x4c/0x12b
>  [<c020f3d6>] xfs_lookup+0x50/0x88
>  [<c021cae0>] linvfs_lookup+0x67/0x9f
>  [<c0152ea2>] real_lookup+0xc8/0xea
>  [<c01530f3>] do_lookup+0x96/0xa1
>  [<c0153508>] link_path_walk+0x40a/0x7db
>  [<c0154116>] open_namei+0x83/0x3e1
>  [<c0146b0f>] filp_open+0x3e/0x64
>  [<c0146e98>] sys_open+0x5b/0x8b
>  [<c0108ab7>] syscall_call+0x7/0xb
> 
> 
> xfsdump       D 9855CCF6  2760   676    673                     (NOTLB)
> ece09b04 00000082 f7f9e080 9855ccf6 00000baf 00000000 9855ccf6 00000baf 
>        f7f9e080 000012f8 9855d254 00000baf f51106a0 f5110860 f7ffe760 00000000 
>        f7ffe778 ece09b0c c01186db ece09b3c c0131c2b 00000000 00000000 00000000 
> Call Trace:
>  [<c01186db>] io_schedule+0xe/0x16
>  [<c0131c2b>] mempool_alloc+0xfa/0x117
>  [<c0118cd6>] autoremove_wake_function+0x0/0x4f
>  [<c021789b>] linvfs_get_block_core+0x87/0x2ab
>  [<c0118cd6>] autoremove_wake_function+0x0/0x4f
>  [<c0139296>] __blk_queue_bounce+0x1a1/0x232
>  [<c013935d>] blk_queue_bounce+0x36/0x4d
>  [<c024e510>] __make_request+0x4c/0x537
>  [<c0162ff5>] do_mpage_readpage+0x1a9/0x32a
>  [<c024eb04>] generic_make_request+0x109/0x18a
>  [<c022765b>] radix_tree_node_alloc+0x1f/0x5a
>  [<c02277f4>] radix_tree_insert+0x82/0xb8
>  [<c024ebc2>] submit_bio+0x3d/0x6b
>  [<c0162d26>] mpage_bio_submit+0x23/0x32
>  [<c016324b>] mpage_readpages+0xd5/0x162
>  [<c0217abf>] linvfs_get_block+0x0/0x43
>  [<c0134460>] read_pages+0x134/0x13d
>  [<c0217abf>] linvfs_get_block+0x0/0x43
>  [<c0132ae3>] __alloc_pages+0xa6/0x319
>  [<c010adcd>] do_IRQ+0xc4/0xdf
>  [<c0109424>] common_interrupt+0x18/0x20
>  [<c013469c>] do_page_cache_readahead+0xbf/0x109
>  [<c0134852>] page_cache_readahead+0x16c/0x198
>  [<c012f8d7>] do_generic_mapping_read+0x3c1/0x3d3
>  [<c012f8e9>] file_read_actor+0x0/0xec
>  [<c012fb91>] __generic_file_aio_read+0x1bc/0x1ee
>  [<c012f8e9>] file_read_actor+0x0/0xec
>  [<c021dcb2>] xfs_read+0x15a/0x26c
>  [<c0117d92>] wait_for_completion+0x65/0x95
>  [<c02181af>] linvfs_read_invis+0x90/0xa2
>  [<c0147549>] do_sync_read+0x8b/0xb7
>  [<c01217fa>] update_process_times+0x46/0x52
>  [<c0121674>] update_wall_time+0xd/0x36
>  [<c0121a6c>] do_timer+0xdf/0xe4
>  [<c0147625>] vfs_read+0xb0/0x119
>  [<c01478a0>] sys_read+0x42/0x63
>  [<c0108ab7>] syscall_call+0x7/0xb
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-20 18:35     ` Mike Fedyk
@ 2004-01-22  9:29       ` Oliver Kiddle
  2004-01-22  9:59         ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Kiddle @ 2004-01-22  9:29 UTC (permalink / raw)
  To: linux-kernel

Mike Fedyk wrote:
> On Tue, Jan 20, 2004 at 06:08:34PM +0100, Oliver Kiddle wrote:
> > the message about the tape device. As suggested by Mike Fedyk, I had
> > the nmi_watchdog stuff enabled. Didn't see any output from it though.
> > Would that have displayed its output to the console?
> 
> It should have.  Run cat /proc/interrupts and again afew seconds later, does
> the NMI: number change?

Yes, the number changes. Still haven't seen any output from it though.

> There should be some lines above this in your log...

Only the trace for other processes. Any initial part was lost, probably
because the task list overflowed the dmesg buffer. I didn't see anything
on the console though.

I got a few page allocation errors yesterday. As they now include
dump_stack() output, I have attached them below. This time, the system
kept going for a few minutes after these error messages. Again, when it
locked up, killing all processes with the sysrq key got things temporarily
back. I have the full dmesg output if anyone wants.

Oliver

st0: Block limits 1 - 16777215 bytes.
xfsdump: page allocation failure. order:9, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a3690>] st_read+0xe0/0x3d1
 [<c0147625>] vfs_read+0xb0/0x119
 [<c01478a0>] sys_read+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:8, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a3690>] st_read+0xe0/0x3d1
 [<c0147625>] vfs_read+0xb0/0x119
 [<c01478a0>] sys_read+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:7, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a3690>] st_read+0xe0/0x3d1
 [<c0147625>] vfs_read+0xb0/0x119
 [<c01478a0>] sys_read+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb

st0: Incorrect block size.
xfsdump: page allocation failure. order:9, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a2b86>] st_write+0x20c/0x7e7
 [<c0115ecb>] do_page_fault+0x120/0x501
 [<c02a297a>] st_write+0x0/0x7e7
 [<c01477f5>] vfs_write+0xb0/0x119
 [<c0147903>] sys_write+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:8, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a2b86>] st_write+0x20c/0x7e7
 [<c0115ecb>] do_page_fault+0x120/0x501
 [<c02a297a>] st_write+0x0/0x7e7
 [<c01477f5>] vfs_write+0xb0/0x119
 [<c0147903>] sys_write+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:7, mode:0xd0
Call Trace:
 [<c0132d18>] __alloc_pages+0x2db/0x319
 [<c02a5dc9>] enlarge_buffer+0xcf/0x182
 [<c02a6cd9>] st_map_user_pages+0x37/0x88
 [<c02a2909>] setup_buffering+0xf3/0x127
 [<c02a2b86>] st_write+0x20c/0x7e7
 [<c0115ecb>] do_page_fault+0x120/0x501
 [<c02a297a>] st_write+0x0/0x7e7
 [<c01477f5>] vfs_write+0xb0/0x119
 [<c0147903>] sys_write+0x42/0x63
 [<c0108ab7>] syscall_call+0x7/0xb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-01-22  9:29       ` Oliver Kiddle
@ 2004-01-22  9:59         ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2004-01-22  9:59 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: linux-kernel

Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
>
> st0: Block limits 1 - 16777215 bytes.
>  xfsdump: page allocation failure. order:9, mode:0xd0
>  Call Trace:
>   [<c0132d18>] __alloc_pages+0x2db/0x319
>   [<c02a5dc9>] enlarge_buffer+0xcf/0x182
>   [<c02a6cd9>] st_map_user_pages+0x37/0x88
>   [<c02a2909>] setup_buffering+0xf3/0x127
>   [<c02a3690>] st_read+0xe0/0x3d1
>   [<c0147625>] vfs_read+0xb0/0x119
>   [<c01478a0>] sys_read+0x42/0x63
>   [<c0108ab7>] syscall_call+0x7/0xb

This one's actually somewhat OK.  The tape driver is simply trying to
allocate a huge buffer and is falling back if it fails.

This will shut up the debugging code:

--- 25/drivers/scsi/osst.c~osst-warning-fix	2004-01-22 01:57:35.000000000 -0800
+++ 25-akpm/drivers/scsi/osst.c	2004-01-22 01:57:59.000000000 -0800
@@ -5106,6 +5106,8 @@ static int enlarge_buffer(OSST_buffer *S
 	if (need_dma)
 		priority |= GFP_DMA;
 
+	priority |= __GFP_NOWARN;
+
 	/* Try to allocate the first segment up to OS_DATA_SIZE and the others
 	   big enough to reach the goal (code assumes no segments in place) */
 	for (b_size = OS_DATA_SIZE, order = OSST_FIRST_ORDER; b_size >= PAGE_SIZE; order--, b_size /= 2) {

_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* page allocation failure
@ 2004-08-24 20:05 Dominik Karall
  2004-08-24 20:05 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Dominik Karall @ 2004-08-24 20:05 UTC (permalink / raw)
  To: Linux Kernel ML; +Cc: Andrew Morton

hi,

is this a kernel bug, or smbd failure? I think it could be caused by kernel 
and less memory. Cause the machine is running with 56MB ram. But IMHO I think 
the kernel shouldn't handle it this way. Running 2.6.8-rc4-mm1.

best regards,
dominik

syslog:
Aug 24 15:27:24 debian kernel: cupsd: page allocation failure. order:3, 
mode:0x20
Aug 24 15:27:24 debian kernel: Stack pointer is garbage, not printing trace
Aug 24 15:27:24 debian kernel: smbd: page allocation failure. order:3, 
mode:0x20
Aug 24 15:27:24 debian kernel:  [__alloc_pages+477/784] 
__alloc_pages+0x1dd/0x310
Aug 24 15:27:24 debian kernel:  [__get_free_pages+24/64] 
__get_free_pages+0x18/0x40
Aug 24 15:27:24 debian kernel:  [kmem_getpages+25/176] kmem_getpages+0x19/0xb0
Aug 24 15:27:24 debian kernel:  [cache_grow+182/400] cache_grow+0xb6/0x190
Aug 24 15:27:24 debian kernel:  [cache_alloc_refill+531/592] 
cache_alloc_refill+0x213/0x250
Aug 24 15:27:24 debian kernel:  [__kmalloc+92/96] __kmalloc+0x5c/0x60
Aug 24 15:27:24 debian kernel:  [alloc_skb+65/240] alloc_skb+0x41/0xf0
Aug 24 15:27:24 debian kernel:  [skb_copy+40/192] skb_copy+0x28/0xc0
Aug 24 15:27:24 debian kernel:  [skb_checksum_help+86/368] 
skb_checksum_help+0x56/0x170
Aug 24 15:27:24 debian kernel:  [pg0+71789271/1069305856] 
ip_nat_fn+0x177/0x2a0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [pg0+71789854/1069305856] 
ip_nat_local_fn+0x6e/0xb0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_iterate+85/160] nf_iterate+0x55/0xa0
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_hook_slow+120/288] nf_hook_slow+0x78/0x120
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [ip_queue_xmit+781/1328] 
ip_queue_xmit+0x30d/0x530
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [recalc_task_prio+187/432] 
recalc_task_prio+0xbb/0x1b0
Aug 24 15:27:24 debian kernel:  [do_IRQ+343/416] do_IRQ+0x157/0x1a0
Aug 24 15:27:24 debian kernel:  [common_interrupt+24/32] 
common_interrupt+0x18/0x20
Aug 24 15:27:24 debian kernel:  [tcp_v4_send_check+3/272] 
tcp_v4_send_check+0x3/0x110
Aug 24 15:27:24 debian kernel:  [tcp_transmit_skb+984/1760] 
tcp_transmit_skb+0x3d8/0x6e0
Aug 24 15:27:24 debian kernel:  [buffered_rmqueue+259/528] 
buffered_rmqueue+0x103/0x210
Aug 24 15:27:24 debian kernel:  [sk_stream_wait_memory+373/496] 
sk_stream_wait_memory+0x175/0x1f0
Aug 24 15:27:24 debian kernel:  [tcp_write_xmit+326/704] 
tcp_write_xmit+0x146/0x2c0
Aug 24 15:27:24 debian kernel:  [tcp_sendmsg+4088/4176] 
tcp_sendmsg+0xff8/0x1050
Aug 24 15:27:24 debian kernel:  [recalc_task_prio+187/432] 
recalc_task_prio+0xbb/0x1b0
Aug 24 15:27:24 debian kernel:  [inet_sendmsg+74/112] inet_sendmsg+0x4a/0x70
Aug 24 15:27:24 debian kernel:  [sock_sendmsg+190/240] sock_sendmsg+0xbe/0xf0
Aug 24 15:27:24 debian kernel:  [update_atime+71/176] update_atime+0x47/0xb0
Aug 24 15:27:24 debian kernel:  [do_generic_mapping_read+688/1088] 
do_generic_mapping_read+0x2b0/0x440
Aug 24 15:27:24 debian kernel:  [autoremove_wake_function+0/80] 
autoremove_wake_function+0x0/0x50
Aug 24 15:27:24 debian kernel:  [file_read_actor+0/240] 
file_read_actor+0x0/0xf0
Aug 24 15:27:24 debian kernel:  [sockfd_lookup+22/144] sockfd_lookup+0x16/0x90
Aug 24 15:27:24 debian kernel:  [sys_sendto+216/272] sys_sendto+0xd8/0x110
Aug 24 15:27:24 debian kernel:  [do_select+684/720] do_select+0x2ac/0x2d0
Aug 24 15:27:24 debian kernel:  [__pollwait+0/192] __pollwait+0x0/0xc0
Aug 24 15:27:24 debian kernel:  [sys_send+51/64] sys_send+0x33/0x40
Aug 24 15:27:24 debian kernel:  [sys_socketcall+322/592] 
sys_socketcall+0x142/0x250
Aug 24 15:27:24 debian kernel:  [do_gettimeofday+26/192] 
do_gettimeofday+0x1a/0xc0
Aug 24 15:27:24 debian kernel:  [sys_time+22/80] sys_time+0x16/0x50
Aug 24 15:27:24 debian kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 24 15:27:24 debian kernel: smbd: page allocation failure. order:3, 
mode:0x20
Aug 24 15:27:24 debian kernel:  [__alloc_pages+477/784] 
__alloc_pages+0x1dd/0x310
Aug 24 15:27:24 debian kernel:  [__get_free_pages+24/64] 
__get_free_pages+0x18/0x40
Aug 24 15:27:24 debian kernel:  [kmem_getpages+25/176] kmem_getpages+0x19/0xb0
Aug 24 15:27:24 debian kernel:  [cache_grow+182/400] cache_grow+0xb6/0x190
Aug 24 15:27:24 debian kernel:  [cache_alloc_refill+531/592] 
cache_alloc_refill+0x213/0x250
Aug 24 15:27:24 debian kernel:  [__kmalloc+92/96] __kmalloc+0x5c/0x60
Aug 24 15:27:24 debian kernel:  [alloc_skb+65/240] alloc_skb+0x41/0xf0
Aug 24 15:27:24 debian kernel:  [skb_copy+40/192] skb_copy+0x28/0xc0
Aug 24 15:27:24 debian kernel:  [skb_checksum_help+86/368] 
skb_checksum_help+0x56/0x170
Aug 24 15:27:24 debian kernel:  [pg0+71789271/1069305856] 
ip_nat_fn+0x177/0x2a0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [pg0+71789854/1069305856] 
ip_nat_local_fn+0x6e/0xb0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_iterate+85/160] nf_iterate+0x55/0xa0
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_hook_slow+120/288] nf_hook_slow+0x78/0x120
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [ip_queue_xmit+781/1328] 
ip_queue_xmit+0x30d/0x530
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [preempt_schedule+37/64] 
preempt_schedule+0x25/0x40
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_hook_slow+266/288] 
nf_hook_slow+0x10a/0x120
Aug 24 15:27:24 debian kernel:  [tcp_transmit_skb+984/1760] 
tcp_transmit_skb+0x3d8/0x6e0
Aug 24 15:27:24 debian kernel:  [tcp_ack_saw_tstamp+24/64] 
tcp_ack_saw_tstamp+0x18/0x40
Aug 24 15:27:24 debian kernel:  [tcp_write_xmit+326/704] 
tcp_write_xmit+0x146/0x2c0
Aug 24 15:27:24 debian kernel:  [__tcp_data_snd_check+78/240] 
__tcp_data_snd_check+0x4e/0xf0
Aug 24 15:27:24 debian kernel:  [__kfree_skb+168/320] __kfree_skb+0xa8/0x140
Aug 24 15:27:24 debian kernel:  [tcp_rcv_established+1029/1968] 
tcp_rcv_established+0x405/0x7b0
Aug 24 15:27:24 debian kernel:  [tcp_v4_do_rcv+240/256] 
tcp_v4_do_rcv+0xf0/0x100
Aug 24 15:27:24 debian kernel:  [__release_sock+82/128] 
__release_sock+0x52/0x80
Aug 24 15:27:24 debian kernel:  [release_sock+72/128] release_sock+0x48/0x80
Aug 24 15:27:24 debian kernel:  [tcp_sendmsg+1927/4176] 
tcp_sendmsg+0x787/0x1050
Aug 24 15:27:24 debian kernel:  [recalc_task_prio+187/432] 
recalc_task_prio+0xbb/0x1b0
Aug 24 15:27:24 debian kernel:  [inet_sendmsg+74/112] inet_sendmsg+0x4a/0x70
Aug 24 15:27:24 debian kernel:  [sock_sendmsg+190/240] sock_sendmsg+0xbe/0xf0
Aug 24 15:27:24 debian kernel:  [update_atime+71/176] update_atime+0x47/0xb0
Aug 24 15:27:24 debian kernel:  [do_generic_mapping_read+688/1088] 
do_generic_mapping_read+0x2b0/0x440
Aug 24 15:27:24 debian kernel:  [autoremove_wake_function+0/80] 
autoremove_wake_function+0x0/0x50
Aug 24 15:27:24 debian kernel:  [file_read_actor+0/240] 
file_read_actor+0x0/0xf0
Aug 24 15:27:24 debian kernel:  [sockfd_lookup+22/144] sockfd_lookup+0x16/0x90
Aug 24 15:27:24 debian kernel:  [sys_sendto+216/272] sys_sendto+0xd8/0x110
Aug 24 15:27:24 debian kernel:  [do_select+684/720] do_select+0x2ac/0x2d0
Aug 24 15:27:24 debian kernel:  [__pollwait+0/192] __pollwait+0x0/0xc0
Aug 24 15:27:24 debian kernel:  [sys_send+51/64] sys_send+0x33/0x40
Aug 24 15:27:24 debian kernel:  [sys_socketcall+322/592] 
sys_socketcall+0x142/0x250
Aug 24 15:27:24 debian kernel:  [do_gettimeofday+26/192] 
do_gettimeofday+0x1a/0xc0
Aug 24 15:27:24 debian kernel:  [sys_time+22/80] sys_time+0x16/0x50
Aug 24 15:27:24 debian kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 24 15:27:24 debian kernel: smbd: page allocation failure. order:3, 
mode:0x20
Aug 24 15:27:24 debian kernel:  [__alloc_pages+477/784] 
__alloc_pages+0x1dd/0x310
Aug 24 15:27:24 debian kernel:  [__get_free_pages+24/64] 
__get_free_pages+0x18/0x40
Aug 24 15:27:24 debian kernel:  [kmem_getpages+25/176] kmem_getpages+0x19/0xb0
Aug 24 15:27:24 debian kernel:  [cache_grow+182/400] cache_grow+0xb6/0x190
Aug 24 15:27:24 debian kernel:  [cache_alloc_refill+531/592] 
cache_alloc_refill+0x213/0x250
Aug 24 15:27:24 debian kernel:  [__kmalloc+92/96] __kmalloc+0x5c/0x60
Aug 24 15:27:24 debian kernel:  [alloc_skb+65/240] alloc_skb+0x41/0xf0
Aug 24 15:27:24 debian kernel:  [skb_copy+40/192] skb_copy+0x28/0xc0
Aug 24 15:27:24 debian kernel:  [skb_checksum_help+86/368] 
skb_checksum_help+0x56/0x170
Aug 24 15:27:24 debian kernel:  [pg0+71789271/1069305856] 
ip_nat_fn+0x177/0x2a0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [pg0+71789854/1069305856] 
ip_nat_local_fn+0x6e/0xb0 [iptable_nat]
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_iterate+85/160] nf_iterate+0x55/0xa0
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [nf_hook_slow+120/288] nf_hook_slow+0x78/0x120
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [ip_queue_xmit+781/1328] 
ip_queue_xmit+0x30d/0x530
Aug 24 15:27:24 debian kernel:  [dst_output+0/32] dst_output+0x0/0x20
Aug 24 15:27:24 debian kernel:  [tcp_transmit_skb+984/1760] 
tcp_transmit_skb+0x3d8/0x6e0
Aug 24 15:27:24 debian kernel:  [tcp_ack_saw_tstamp+24/64] 
tcp_ack_saw_tstamp+0x18/0x40
Aug 24 15:27:24 debian kernel:  [smp_apic_timer_interrupt+42/160] 
smp_apic_timer_interrupt+0x2a/0xa0
Aug 24 15:27:24 debian kernel:  [apic_timer_interrupt+26/32] 
apic_timer_interrupt+0x1a/0x20
Aug 24 15:27:24 debian kernel:  [tcp_v4_send_check+177/272] 
tcp_v4_send_check+0xb1/0x110
Aug 24 15:27:24 debian kernel:  [tcp_cwnd_restart+35/256] 
tcp_cwnd_restart+0x23/0x100
Aug 24 15:27:24 debian kernel:  [tcp_transmit_skb+984/1760] 
tcp_transmit_skb+0x3d8/0x6e0
Aug 24 15:27:24 debian kernel:  [buffered_rmqueue+259/528] 
buffered_rmqueue+0x103/0x210
Aug 24 15:27:24 debian kernel:  [tcp_write_xmit+326/704] 
tcp_write_xmit+0x146/0x2c0
Aug 24 15:27:24 debian kernel:  [tcp_sendmsg+4088/4176] 
tcp_sendmsg+0xff8/0x1050
Aug 24 15:27:24 debian kernel:  [inet_sendmsg+74/112] inet_sendmsg+0x4a/0x70
Aug 24 15:27:24 debian kernel:  [sock_sendmsg+190/240] sock_sendmsg+0xbe/0xf0
Aug 24 15:27:24 debian kernel:  [__copy_to_user_ll+56/112] 
__copy_to_user_ll+0x38/0x70
Aug 24 15:27:24 debian kernel:  [update_atime+155/176] update_atime+0x9b/0xb0
Aug 24 15:27:24 debian kernel:  [do_generic_mapping_read+688/1088] 
do_generic_mapping_read+0x2b0/0x440
Aug 24 15:27:24 debian kernel:  [autoremove_wake_function+0/80] 
autoremove_wake_function+0x0/0x50
Aug 24 15:27:24 debian kernel:  [file_read_actor+0/240] 
file_read_actor+0x0/0xf0
Aug 24 15:27:24 debian kernel:  [sockfd_lookup+22/144] sockfd_lookup+0x16/0x90
Aug 24 15:27:24 debian kernel:  [sys_sendto+216/272] sys_sendto+0xd8/0x110
Aug 24 15:27:24 debian kernel:  [do_select+684/720] do_select+0x2ac/0x2d0
Aug 24 15:27:24 debian kernel:  [__pollwait+0/192] __pollwait+0x0/0xc0
Aug 24 15:27:24 debian kernel:  [sys_send+51/64] sys_send+0x33/0x40
Aug 24 15:27:24 debian kernel:  [sys_socketcall+322/592] 
sys_socketcall+0x142/0x250
Aug 24 15:27:24 debian kernel:  [do_gettimeofday+26/192] 
do_gettimeofday+0x1a/0xc0
Aug 24 15:27:24 debian kernel:  [sys_time+22/80] sys_time+0x16/0x50
Aug 24 15:27:24 debian kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 24 15:27:24 debian kernel: smbd: page allocation failure. order:3, 
mode:0x20
Aug 24 15:27:24 debian kernel: Stack pointer is garbage, not printing trace

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-08-24 20:05 Dominik Karall
@ 2004-08-24 20:05 ` Andrew Morton
  2004-08-24 22:57   ` David S. Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2004-08-24 20:05 UTC (permalink / raw)
  To: Dominik Karall; +Cc: linux-kernel

Dominik Karall <dominik.karall@gmx.net> wrote:
>
> is this a kernel bug, or smbd failure? I think it could be caused by kernel 
>  and less memory. Cause the machine is running with 56MB ram. But IMHO I think 
>  the kernel shouldn't handle it this way. Running 2.6.8-rc4-mm1.

It's networking trying to allocate eight physically-contiguous pages with
GFP_ATOMIC.  Can you say "snowball's chance in hell"?

Probably we should kill off those noisy printk's, or make them dependent on
some debugging option.  But we keep on finding quite serious cases, such as
this one.

Sure, networking will recover from memory allocation failures - presumably
by dropping a packet.  But if it's doing frequent atomic 3-order allocations
then it will end up dropping a *lot* of packets, and performance will suffer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocation failure
  2004-08-24 20:05 ` Andrew Morton
@ 2004-08-24 22:57   ` David S. Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David S. Miller @ 2004-08-24 22:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dominik.karall, linux-kernel

On Tue, 24 Aug 2004 13:05:31 -0700
Andrew Morton <akpm@osdl.org> wrote:

> It's networking trying to allocate eight physically-contiguous pages with
> GFP_ATOMIC.  Can you say "snowball's chance in hell"?

It's netfilter's ip_nat_fn() calling skb_checksum_help() which does
an skb_copy() if the skb is either shared or cloned.

This patch from Herbert Xu, which I may integrate tonight, should
help with this case.

===== net/core/dev.c 1.155 vs edited =====
--- 1.155/net/core/dev.c	2004-07-31 07:23:04 +10:00
+++ edited/net/core/dev.c	2004-08-24 20:59:57 +10:00
@@ -1144,16 +1144,10 @@
 		goto out;
 	}
 
-	if (skb_shared(*pskb)  || skb_cloned(*pskb)) {
-		struct sk_buff *newskb = skb_copy(*pskb, GFP_ATOMIC);
-		if (!newskb) {
-			ret = -ENOMEM;
+	if (skb_cloned(*pskb)) {
+		ret = pskb_expand_head(*pskb, 0, 0, GFP_ATOMIC);
+		if (ret)
 			goto out;
-		}
-		if ((*pskb)->sk)
-			skb_set_owner_w(newskb, (*pskb)->sk);
-		kfree_skb(*pskb);
-		*pskb = newskb;
 	}
 
 	if (offset > (int)(*pskb)->len)


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-08-24 22:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-19 11:36 page allocation failure Oliver Kiddle
2004-01-19 14:54 ` Mike Fedyk
2004-01-19 17:29   ` Oliver Kiddle
2004-01-19 18:12     ` Mike Fedyk
2004-01-20  3:38 ` Andrew Morton
2004-01-20  6:00   ` Nathan Scott
2004-01-20 17:08   ` Oliver Kiddle
2004-01-20 18:35     ` Mike Fedyk
2004-01-22  9:29       ` Oliver Kiddle
2004-01-22  9:59         ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2004-08-24 20:05 Dominik Karall
2004-08-24 20:05 ` Andrew Morton
2004-08-24 22:57   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox