From: Stefan Schmidt <zaphodb@zaphods.net>
To: Andrew Morton <akpm@osdl.org>
Cc: marcelo.tosatti@cyclades.com, linux-kernel@vger.kernel.org,
piggin@cyberone.com.au
Subject: 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: Re: Kernel 2.6.9 Multiple Page Allocation Failures
Date: Wed, 10 Nov 2004 02:27:34 +0100 [thread overview]
Message-ID: <20041110012733.GD20754@zaphods.net> (raw)
In-Reply-To: <20041109235201.GC20754@zaphods.net>
Alright, i got a funny thing here that i suspect to be an(other?) vm issue:
We are running a third-party closed source software which handles many tcp
sessions and reads and writes that to/from several disks/partitions.
With 2.6.10-rc1-mm4 it is the first time we notice that, right after the kernel
throws a swapper: page allocation error thread (just like the ones you already
know), the interrupt rate, connection count and traffic decreases subsequently.
Here is part of a vmstat 10:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 11312 19404 268 1896896 0 0 1091 17578 25463 1225 7 38 37 18
0 0 11312 26372 180 1892836 0 0 1182 21433 25576 1216 7 38 31 24
1 2 11308 23784 608 1890168 0 0 1252 20667 25532 1243 7 40 24 29
0 2 11308 23304 428 1890552 0 0 1174 20363 25948 1332 7 40 32 21
1 1 11304 18496 444 1893328 0 0 1630 20506 25840 1322 7 38 30 26
1 1 11304 8712 232 1905508 0 0 1528 19662 26245 1305 7 40 25 28
1 0 11304 18952 180 1894000 0 0 1595 19680 26275 1215 7 38 27 28
1 0 11304 22404 132 1896632 0 0 369 17724 24072 1045 7 37 49 7
1 0 11304 23956 492 1899876 0 0 504 19609 20829 1151 9 34 49 7
1 0 11304 25380 460 1908340 0 0 424 17983 16964 927 9 28 55 8
1 0 11304 18244 464 1922140 0 0 309 14431 13417 836 10 27 60 3
0 0 11304 17720 472 1928388 0 0 224 11868 9933 607 11 23 63 3
1 0 11304 25720 476 1924440 0 0 133 7663 6780 504 10 20 68 2
1 0 11304 24156 488 1928168 0 0 107 6244 5011 315 8 18 73 1
0 0 11304 19544 712 1934268 0 0 76 3191 4464 299 8 18 73 1
0 0 11304 19248 728 1936564 0 0 23 1802 4002 249 7 17 76 0
1 0 11304 27092 736 1929892 0 0 16 1336 3655 284 6 16 78 0
0 0 11304 26472 752 1931984 0 0 19 1508 3408 248 5 16 78 1
0 0 11304 19000 768 1940944 0 0 20 1398 3195 226 5 14 81 0
1 0 11304 21460 776 1938896 0 0 14 1084 3057 241 5 14 82 0
0 0 11304 26268 848 1934608 0 0 12 927 2906 218 5 13 82 0
1 1 11304 22076 900 1939860 0 0 18 679 2897 215 5 11 84 1
0 0 11304 25880 952 1936748 0 0 17 653 2713 251 4 13 82 1
0 0 11304 20436 976 1942368 0 0 8 1117 2703 229 5 11 83 1
...
strace shows:
01:38:50.316041 gettimeofday({1100047130, 316054}, NULL) = 0
01:38:50.316188 poll([{fd=5671, events=POLLIN}, {fd=2727, events=POLLIN}, {fd=6663, events=POLLIN}, {fd=197, events=POLLIN}, {fd=3978, events=POLLIN}, {fd=779, events=POLLIN}, ...{line continues like this}...
...
01:38:50.328056 accept(5, 0xbffd4ab8, [16]) = -1 EAGAIN (Resource temporarily unavailable) ...{an awful lot of these}...
...
01:38:50.329585 futex(0xaf1a698, FUTEX_WAIT, 92828, {0, 9964000}) = -1 ETIMEDOUT (Connection timed out) ...{some of these}...
...
Application says:
"n.n.n.n:p Client closed connection in body read"
To me it seems like suddently all those open sockets are suddenly 'temporarily
unavailable' to the application and so the connections time out.
I have not (yet?) seen this behaviour on 2.6.9, 2.6.9-mm1, 2.6.10-rc1-bk12 or
2.6.10-rc1-mm3.
I am able to reproduce the behaviour if under the same load iptraf or
tethereal are started. (First thought it might be because of the promisc mode.)
This seems to be what _might_ have triggered this although it was logged
happened 5m earlier than the traffic decay:
printk: 36 messages suppressed.
swapper: page allocation failure. order:0, mode:0x20
[__alloc_pages+525/912] __alloc_pages+0x20d/0x390
[__get_free_pages+24/48] __get_free_pages+0x18/0x30
[kmem_getpages+24/192] kmem_getpages+0x18/0xc0
[cache_grow+157/304] cache_grow+0x9d/0x130
[cache_alloc_refill+380/576] cache_alloc_refill+0x17c/0x240
[__kmalloc+122/144] __kmalloc+0x7a/0x90
[alloc_skb+50/208] alloc_skb+0x32/0xd0
[tg3_alloc_rx_skb+112/304] tg3_alloc_rx_skb+0x70/0x130
[tg3_rx+518/944] tg3_rx+0x206/0x3b0
[tg3_poll+139/336] tg3_poll+0x8b/0x150
[net_rx_action+125/288] net_rx_action+0x7d/0x120
[__do_softirq+184/208] __do_softirq+0xb8/0xd0
[do_softirq+45/48] do_softirq+0x2d/0x30
[do_IRQ+30/48] do_IRQ+0x1e/0x30
[common_interrupt+26/32] common_interrupt+0x1a/0x20
[default_idle+0/64] default_idle+0x0/0x40
[default_idle+44/64] default_idle+0x2c/0x40
[cpu_idle+51/64] cpu_idle+0x33/0x40
[start_kernel+331/368] start_kernel+0x14b/0x170
[unknown_bootoption+0/432] unknown_bootoption+0x0/0x1b0
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
Free pages: 4616kB (1600kB HighMem)
Active:504159 inactive:454759 dirty:20020 writeback:115 unstable:0 free:1154 slab:50758 mapped:489095 pagetables:1222
DMA free:56kB min:144kB low:288kB high:432kB active:1936kB inactive:4932kB present:16384kB pages_scanned:32 all_unreclaimable? no
protections[]: 0 0 0
Normal free:2960kB min:8044kB low:16088kB high:24132kB active:492320kB inactive:166992kB present:901120kB pages_scanned:62 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:1600kB min:512kB low:1024kB high:1536kB active:1522380kB inactive:1647112kB present:3178432kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56kB
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2960kB
HighMem: 6*4kB 3*8kB 41*16kB 0*32kB 6*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1600kB
Swap cache: add 154147, delete 151810, find 29532/39794, race 0+0
Stefan
next prev parent reply other threads:[~2004-11-10 1:27 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-11-03 22:24 Kernel 2.6.9 Multiple Page Allocation Failures (Part 2) Stefan Schmidt
2004-11-04 12:17 ` Marcelo Tosatti
2004-11-04 18:18 ` Stefan Schmidt
2004-11-09 16:41 ` Kernel 2.6.9 Multiple Page Allocation Failures Marcelo Tosatti
2004-11-09 22:35 ` Lukas Hejtmanek
2004-11-09 22:46 ` Andrew Morton
2004-11-09 22:44 ` Lukas Hejtmanek
2004-11-09 20:33 ` Marcelo Tosatti
2004-11-10 20:35 ` Lukas Hejtmanek
2004-11-10 21:09 ` Andrew Morton
2004-11-10 21:24 ` Lukas Hejtmanek
2004-11-10 21:47 ` Andrew Morton
2004-11-10 21:28 ` Lukas Hejtmanek
2004-11-10 18:11 ` Marcelo Tosatti
2004-11-11 1:04 ` Lukas Hejtmanek
2004-11-11 21:44 ` Lukas Hejtmanek
2004-11-12 12:09 ` Nick Piggin
2004-11-13 14:47 ` Stefan Schmidt
2004-11-16 9:33 ` Marcelo Tosatti
2004-11-16 17:05 ` Lukas Hejtmanek
2004-11-21 1:43 ` Stefan Schmidt
2004-11-21 2:42 ` Stefan Schmidt
2004-12-02 19:54 ` Lukas Hejtmanek
2004-12-02 20:25 ` Andrew Morton
2004-12-02 21:03 ` Lukas Hejtmanek
2004-12-02 22:31 ` Stefan Schmidt
2004-12-02 22:48 ` Lukas Hejtmanek
2004-12-02 22:56 ` Andrew Morton
2004-12-02 23:18 ` Lukas Hejtmanek
2004-12-03 0:18 ` Andrew Morton
2004-12-03 12:11 ` Lukas Hejtmanek
2004-12-03 12:17 ` Lukas Hejtmanek
2004-12-07 22:52 ` Nick Piggin
2004-12-07 22:59 ` Lukas Hejtmanek
2004-12-07 23:05 ` Nick Piggin
2004-12-08 11:18 ` Lukas Hejtmanek
2004-12-08 11:23 ` Nick Piggin
2004-12-08 11:46 ` Lukas Hejtmanek
2004-12-08 13:14 ` Lukas Hejtmanek
2004-12-09 8:52 ` Nick Piggin
2004-12-09 9:02 ` Lukas Hejtmanek
2004-12-09 10:29 ` Nick Piggin
2004-12-09 10:37 ` Lukas Hejtmanek
2004-12-03 6:18 ` Nathan Scott
2004-12-03 7:06 ` Andrew Morton
2004-12-07 11:17 ` Lukas Hejtmanek
2004-12-08 0:15 ` Nathan Scott
2004-12-08 0:36 ` Lukas Hejtmanek
2004-12-03 10:35 ` Christoph Hellwig
2004-12-03 10:58 ` P
2004-12-03 17:11 ` Andrew Morton
2004-11-09 23:52 ` Stefan Schmidt
2004-11-10 1:27 ` Stefan Schmidt [this message]
2004-11-10 1:39 ` 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: " Andrew Morton
2004-11-10 2:03 ` Stefan Schmidt
2004-11-10 2:21 ` Andrew Morton
2004-11-10 4:24 ` Nick Piggin
2004-11-10 10:28 ` Stefan Schmidt
2004-11-10 12:06 ` Stefan Schmidt
2004-11-10 8:58 ` Marcelo Tosatti
2004-11-10 12:48 ` Stefan Schmidt
2004-11-10 10:56 ` Marcelo Tosatti
2004-11-11 1:23 ` Nick Piggin
2004-11-11 18:31 ` jhigdon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041110012733.GD20754@zaphods.net \
--to=zaphodb@zaphods.net \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=piggin@cyberone.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.