2.6.10: kswapd spins like crazy

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.10: kswapd spins like crazy
@ 2005-02-03 10:29 Terje Fåberg
  2005-02-03 10:47 ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Terje Fåberg @ 2005-02-03 10:29 UTC (permalink / raw)
  To: linux-kernel

I recently upgraded my desktop from 2.4.28 to
2.6.10. Even under moderate memory pressure kswapd
regularly eats almost all available cpu time 
whenever there is a little more IO throughput,
like copying large files. The system is extremely
sluggish during this. The system load goes up to 
7.5 or more.

This is a Pentium3-866 with 768MB RAM, 2x1GB 
swap partitions, vanilla 2.6.10. The strange 
behaviour starts at about 200 MB of swap in use.
2.4.28 masters the same workload without any
problems.

vmstat:
procs -----------memory---------- 
 r  b   swpd   free   buff  cache
 6  1 428012   4868  33236 347184
---swap-- -----io---- --system-- ----cpu----
 si   so    bi    bo   in    cs us sy id wa
 10    7   147   120  108   111 19 10 68  3

Is there anything I can do to track this down?

Regards, 
Terje

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-03 10:29 2.6.10: kswapd spins like crazy Terje Fåberg
@ 2005-02-03 10:47 ` Nick Piggin
  2005-02-03 11:54   ` Terje Fåberg
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2005-02-03 10:47 UTC (permalink / raw)
  To: Terje Fåberg; +Cc: linux-kernel

On Thu, 2005-02-03 at 11:29 +0100, Terje Fåberg wrote:
> I recently upgraded my desktop from 2.4.28 to
> 2.6.10. Even under moderate memory pressure kswapd
> regularly eats almost all available cpu time 
> whenever there is a little more IO throughput,
> like copying large files. The system is extremely
> sluggish during this. The system load goes up to 
> 7.5 or more.
>  
> This is a Pentium3-866 with 768MB RAM, 2x1GB 
> swap partitions, vanilla 2.6.10. The strange 
> behaviour starts at about 200 MB of swap in use.
> 2.4.28 masters the same workload without any
> problems.
> 
> vmstat:
> procs -----------memory---------- 
>  r  b   swpd   free   buff  cache
>  6  1 428012   4868  33236 347184
> ---swap-- -----io---- --system-- ----cpu----
>  si   so    bi    bo   in    cs us sy id wa
>  10    7   147   120  108   111 19 10 68  3
> 
> Is there anything I can do to track this down?
> 

Can you post about 10 seconds of `vmstat 1` output
while this is happening?

Also:
`cat /proc/vmstat > pre ; sleep 10 ; cat /proc/vmstat > post`
while this is happening, and send the pre and post files.

cat /proc/meminfo also might be helpful.

And compile a kernel with "magic sysrq" support, and get a
couple of Alt+SysRq+M dumps (the output will be in dmesg).

Thanks,
Nick




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-03 10:47 ` Nick Piggin
@ 2005-02-03 11:54   ` Terje Fåberg
  2005-02-03 19:50     ` Terje Fåberg
  0 siblings, 1 reply; 13+ messages in thread
From: Terje Fåberg @ 2005-02-03 11:54 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

Nick Piggin <nickpiggin@yahoo.com.au> skrev: 

> Can you post about 10 seconds of `vmstat 1` output
> while this is happening?
> 
> Also:
> `cat /proc/vmstat > pre ; sleep 10 ; cat
> /proc/vmstat > post`
> while this is happening, and send the pre and post
> files.
> 
> cat /proc/meminfo also might be helpful.

You will find those attached.

> And compile a kernel with "magic sysrq" support, 
> and get a couple of Alt+SysRq+M dumps (the output
> will be in dmesg).

The kernel is compiling right now, but I cannot 
reboot this machine until six or seven o'clock
tonight (CET). I will report then.

Regards,
Terje

[-- Attachment #2: stat --]
[-- Type: text/plain, Size: 3238 bytes --]


galileo:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  5 428692   4964 118492 320196   84   72  6560   696 1561  8993 40 60  0  0
 5  3 428884   3392 118516 318960  140  368  6832  1172 1517  8563 40 60  0  0
 5  3 429908   4888 120548 318092  108 1844  5812  2020 1498  7842 53 47  0  0
 4  4 430076   3296 121876 318472  340  184  6900   604 1396  8502 43 57  0  0
 5  3 430120   4776 121748 316820   80  440  6780   440 1391  8360 34 66  0  0
 4  4 430112   4916 123016 317304  376   68  7056   440 1293  8852 23 77  0  0
 4  7 430096   4916 123576 316468  348   60  7324   204 1233  8290 21 79  0  0
 5  3 430032  14084 129040 316960  192    0  6664   464 1380  8403 24 76  0  0
 4  4 430032   7044 135060 317516  244    0  6424     0 1166  8217 17 83  0  0
 5  3 430064   4548 138072 317388  172  216  6364   216 1176  8312 17 83  0  0
 2  3 430132   4856 139000 316860  252  156  6656   872 1311  8125 19 81  0  0
^C

galileo:~# cat /proc/meminfo
MemTotal:       646052 kB
MemFree:          3296 kB
Buffers:        156912 kB
Cached:         314876 kB
SwapCached:      47524 kB
Active:          92792 kB
Inactive:       447588 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       646052 kB
LowFree:          3296 kB
SwapTotal:     2101056 kB
SwapFree:      1661600 kB
Dirty:           12088 kB
Writeback:           0 kB
Mapped:         103476 kB
Slab:            30032 kB
CommitLimit:   2424080 kB
Committed_AS:  3125208 kB
PageTables:       8440 kB
VmallocTotal:   384980 kB
VmallocUsed:      7392 kB
VmallocChunk:   377500 kB

galileo:~# cat /proc/vmstat > pre ; sleep 10 ; cat /proc/vmstat > post

galileo:~# cat pre
nr_dirty 61
nr_writeback 138
nr_unstable 0
nr_page_table_pages 2118
nr_mapped 24903
nr_slab 7494
pgpgin 40072965
pgpgout 32683347
pswpin 707678
pswpout 491400
pgalloc_high 0
pgalloc_normal 289749372
pgalloc_dma 5185962
pgfree 294936222
pgactivate 7678427
pgdeactivate 7086934
pgfault 76930918
pgmajfault 422426
pgrefill_high 0
pgrefill_normal 63766162
pgrefill_dma 3133019
pgsteal_high 0
pgsteal_normal 11946755
pgsteal_dma 855413
pgscan_kswapd_high 0
pgscan_kswapd_normal 31430190
pgscan_kswapd_dma 2037500863
pgscan_direct_high 0
pgscan_direct_normal 1083423
pgscan_direct_dma 89251
pginodesteal 0
slabs_scanned 15591040
kswapd_steal 12527148
kswapd_inodesteal 2803439
pageoutrun 3511541
allocstall 6111
pgrotated 719114

galileo:~# cat post
nr_dirty 504
nr_writeback 38
nr_unstable 0
nr_page_table_pages 2093
nr_mapped 25652
nr_slab 7488
pgpgin 40106505
pgpgout 32695255
pswpin 710721
pswpout 491907
pgalloc_high 0
pgalloc_normal 289790611
pgalloc_dma 5185979
pgfree 294977468
pgactivate 7680721
pgdeactivate 7089056
pgfault 76933748
pgmajfault 423145
pgrefill_high 0
pgrefill_normal 63776342
pgrefill_dma 3133311
pgsteal_high 0
pgsteal_normal 11957164
pgsteal_dma 855422
pgscan_kswapd_high 0
pgscan_kswapd_normal 31443126
pgscan_kswapd_dma 2038597486
pgscan_direct_high 0
pgscan_direct_normal 1100385
pgscan_direct_dma 90604
pginodesteal 0
slabs_scanned 15596032
kswapd_steal 12531233
kswapd_inodesteal 2803526
pageoutrun 3511829
allocstall 6272
pgrotated 719582

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-03 11:54   ` Terje Fåberg
@ 2005-02-03 19:50     ` Terje Fåberg
  2005-02-04  0:12       ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Terje Fåberg @ 2005-02-03 19:50 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

Terje Fåberg <terje_fb@yahoo.no> skrev: 

> The kernel is compiling right now, but I cannot 
> reboot this machine until six or seven o'clock
> tonight (CET). I will report then.

Well, well, I rebooted the same kernel, now with
MAGIC-SYSRQ enabled.  At first the kswapd-effect
wouldn't show up, but now the image is much clearer
than before. kswapd eats constantly 95% cpu time while
the system is "idle".

The System is quite sluggish. Switching between
applications needs ages. After Eclipse has been active
for a few minutes, I it lasts 45 seconds until enough
of Mozilla is swapped back in, and Mozilla has redrawn
its window. 

Complete info including SysRq-Meminfo is attached.

Regards,
Terje

[-- Attachment #2: stat2 --]
[-- Type: text/plain, Size: 4946 bytes --]


galileo:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0 286576   4984   8712 352232    0    0     0    88 1015  4288  5 95  0  0
 1  0 286576   4984   8712 352232    0    0     0     0 1002  4278  5 95  0  0
 1  0 286576   4984   8712 352232   32    0    32     0 1003  4365  7 93  0  0
 1  0 286576   4984   8736 352256    0    0    40     0 1010  4296  6 94  0  0
 2  1 286696   4856   8756 352328    0  120   920   120 1081  4406  4 96  0  0
 1  0 287068   4936   8832 352496  104  448   588   568 1072  4422  7 93  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4275  5 95  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4289  6 94  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4324  6 94  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4285  5 95  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4271  5 95  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4335  6 94  0  0
 1  0 287068   4936   8832 352496    0    0     0     0 1002  4297  5 95  0  0

galileo:~# cat /proc/vmstat > pre ; sleep 10 ; cat /proc/vmstat > post

galileo:~# cat pre
nr_dirty 201
nr_writeback 0
nr_unstable 0
nr_page_table_pages 1667
nr_mapped 113889
nr_slab 4289
pgpgin 1653048
pgpgout 532204
pswpin 67956
pswpout 84224
pgalloc_high 0
pgalloc_normal 6255968
pgalloc_dma 91765
pgfree 6350163
pgactivate 381383
pgdeactivate 364613
pgfault 2110239
pgmajfault 36305
pgrefill_high 0
pgrefill_normal 4903463
pgrefill_dma 116195
pgsteal_high 0
pgsteal_normal 366259
pgsteal_dma 17568
pgscan_kswapd_high 0
pgscan_kswapd_normal 2504667
pgscan_kswapd_dma 615532032
pgscan_direct_high 0
pgscan_direct_normal 60489
pgscan_direct_dma 11979
pginodesteal 0
slabs_scanned 510336
kswapd_steal 364044
kswapd_inodesteal 99903
pageoutrun 105762
allocstall 435
pgrotated 77400

galileo:~# cat post
nr_dirty 31
nr_writeback 0
nr_unstable 0
nr_page_table_pages 1667
nr_mapped 113890
nr_slab 4285
pgpgin 1653308
pgpgout 532340
pswpin 67956
pswpout 84224
pgalloc_high 0
pgalloc_normal 6290302
pgalloc_dma 91765
pgfree 6384390
pgactivate 381413
pgdeactivate 364613
pgfault 2110638
pgmajfault 36308
pgrefill_high 0
pgrefill_normal 4903463
pgrefill_dma 116195
pgsteal_high 0
pgsteal_normal 366259
pgsteal_dma 17568
pgscan_kswapd_high 0
pgscan_kswapd_normal 2504667
pgscan_kswapd_dma 649881006
pgscan_direct_high 0
pgscan_direct_normal 60489
pgscan_direct_dma 11979
pginodesteal 0
slabs_scanned 514944
kswapd_steal 364044
kswapd_inodesteal 99903
pageoutrun 111269
allocstall 435
pgrotated 77400

galileo:~# cat /proc/meminfo
MemTotal:       645976 kB
MemFree:          8776 kB
Buffers:          9228 kB
Cached:         350380 kB
SwapCached:      74776 kB
Active:         443452 kB
Inactive:        97500 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       645976 kB
LowFree:          8776 kB
SwapTotal:     2101056 kB
SwapFree:      1812348 kB
Dirty:              56 kB
Writeback:           0 kB
Mapped:         455596 kB
Slab:            17124 kB
CommitLimit:   2424044 kB
Committed_AS:  1216312 kB
PageTables:       6668 kB
VmallocTotal:   384980 kB
VmallocUsed:     16420 kB
VmallocChunk:   367568 kB

galileo:~# uname -a
Linux galileo 2.6.10-4 #7 Thu Feb 3 16:34:30 CET 2005 i686 GNU/Linux

galileo:~# uptime
 20:39:55 up 50 min,  2 users,  load average: 4.54, 3.05, 2.25

galileo:~# ps aux | grep kswapd
root       105 34.5  0.0     0    0 ?        R    19:49  17:27 [kswapd0]
root      8111  0.0  0.0  1548  444 pts/4    S+   20:39   0:00 grep kswapd

galileo:~# dmesg 
[...]
SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:        7872kB (0kB HighMem)
Active:48698 inactive:86241 dirty:0 writeback:0 unstable:0 free:1968 slab:4509 mapped:50560 pagetables:1717
DMA free:80kB min:80kB low:100kB high:120kB active:0kB inactive:11716kB present:16384kB pages_scanned:123 all_unreclaimable? no
protections[]: 0 0 0
Normal free:7792kB min:3152kB low:3940kB high:4728kB active:194792kB inactive:333248kB present:638976kB pages_scanned:0all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80kB
Normal: 590*4kB 153*8kB 59*16kB 4*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 7792kB
HighMem: empty
Swap cache: add 173594, delete 157410, find 30045/43843, race 0+0
Free swap:       1763412kB
163840 pages of RAM
0 pages of HIGHMEM
9692 reserved pages
156561 pages shared
16184 pages swap cached


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-03 19:50     ` Terje Fåberg
@ 2005-02-04  0:12       ` Nick Piggin
  2005-02-04  0:56         ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2005-02-04  0:12 UTC (permalink / raw)
  To: �; +Cc: linux-kernel, Andrew Morton, Linus Torvalds

Terje Fåberg wrote:
> Terje Fåberg <terje_fb@yahoo.no> skrev: 
> 
> 
>>The kernel is compiling right now, but I cannot 
>>reboot this machine until six or seven o'clock
>>tonight (CET). I will report then.
> 
> 
> Well, well, I rebooted the same kernel, now with
> MAGIC-SYSRQ enabled.  At first the kswapd-effect
> wouldn't show up, but now the image is much clearer
> than before. kswapd eats constantly 95% cpu time while
> the system is "idle".
> 
> The System is quite sluggish. Switching between
> applications needs ages. After Eclipse has been active
> for a few minutes, I it lasts 45 seconds until enough
> of Mozilla is swapped back in, and Mozilla has redrawn
> its window. 
> 
> Complete info including SysRq-Meminfo is attached.
> 

Thanks very much, this is a good help.

> galileo:~# cat /proc/vmstat > pre ; sleep 10 ; cat /proc/vmstat > post
> 
> galileo:~# cat pre
...
> pgscan_kswapd_high 0
> pgscan_kswapd_normal 2504667
> pgscan_kswapd_dma 615532032
...
> 
> galileo:~# cat post
...
> pgscan_kswapd_high 0
> pgscan_kswapd_normal 2504667
> pgscan_kswapd_dma 649881006
...

So we can see it is trying to scan the DMA zone.

> galileo:~# dmesg 
> [...]
> SysRq : Show Memory
> Mem-info:
> DMA per-cpu:
> cpu 0 hot: low 2, high 6, batch 1
> cpu 0 cold: low 0, high 2, batch 1
> Normal per-cpu:
> cpu 0 hot: low 32, high 96, batch 16
> cpu 0 cold: low 0, high 32, batch 16
> HighMem per-cpu: empty
> 
> Free pages:        7872kB (0kB HighMem)
> Active:48698 inactive:86241 dirty:0 writeback:0 unstable:0 free:1968 slab:4509 mapped:50560 pagetables:1717
> DMA free:80kB min:80kB low:100kB high:120kB active:0kB inactive:11716kB present:16384kB pages_scanned:123 all_unreclaimable? no
> protections[]: 0 0 0

This is the reason why: DMA only has 80K free, and kswapd won't stop until either 120K
is free, or all_unreclaimable gets switched on.

Now clearly all_unreclaimable should be getting set if nothing can be reclaimed (although
it is possible that non pagecache allocating and freeing can mess it up, that's unlikely).

Hmm, your DMA zone has no active pages, and pages_scanned (which triggers all_unreclaimable)
is only incremented when scanning the active list. But I wonder, if the pages can't be
freed, why aren't they being put on the active list?

Nick

PS. let's not release 2.6.11 just yet :\


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04  0:12       ` Nick Piggin
@ 2005-02-04  0:56         ` Nick Piggin
  2005-02-04  1:16           ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2005-02-04  0:56 UTC (permalink / raw)
  To: �; +Cc: linux-kernel, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 336 bytes --]

Nick Piggin wrote:

> Hmm, your DMA zone has no active pages, and pages_scanned (which 
> triggers all_unreclaimable)
> is only incremented when scanning the active list. But I wonder, if the 
> pages can't be
> freed, why aren't they being put on the active list?

Oh, attached should be a minimal fix if you would like to try it out.

[-- Attachment #2: vmscan-minfix.patch --]
[-- Type: text/plain, Size: 491 bytes --]




---

 linux-2.6-npiggin/mm/vmscan.c |    1 +
 1 files changed, 1 insertion(+)

diff -puN mm/vmscan.c~vmscan-minfix mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vmscan-minfix	2005-02-04 11:52:37.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2005-02-04 11:53:32.000000000 +1100
@@ -575,6 +575,7 @@ static void shrink_cache(struct zone *zo
 			nr_taken++;
 		}
 		zone->nr_inactive -= nr_taken;
+		zone->pages_scanned += nr_scan;
 		spin_unlock_irq(&zone->lru_lock);
 
 		if (nr_taken == 0)

_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04  0:56         ` Nick Piggin
@ 2005-02-04  1:16           ` Andrew Morton
  2005-02-04  1:19             ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2005-02-04  1:16 UTC (permalink / raw)
  To: Nick Piggin; +Cc: terje_fb, linux-kernel, torvalds

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> Oh, attached should be a minimal fix if you would like to try it out.
> 
> 
> ...
> --- linux-2.6/mm/vmscan.c~vmscan-minfix	2005-02-04 11:52:37.000000000 +1100
> +++ linux-2.6-npiggin/mm/vmscan.c	2005-02-04 11:53:32.000000000 +1100
> @@ -575,6 +575,7 @@ static void shrink_cache(struct zone *zo
>  			nr_taken++;
>  		}
>  		zone->nr_inactive -= nr_taken;
> +		zone->pages_scanned += nr_scan;
>  		spin_unlock_irq(&zone->lru_lock);
>  
>  		if (nr_taken == 0)
> 

Any theories as to why these pages aren't being activated and aren't being
reclaimed?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04  1:16           ` Andrew Morton
@ 2005-02-04  1:19             ` Nick Piggin
  2005-02-04 10:26               ` Terje Fåberg
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2005-02-04  1:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: terje_fb, linux-kernel, torvalds


Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>>Oh, attached should be a minimal fix if you would like to try it out.
>>
>>
>>...
>>--- linux-2.6/mm/vmscan.c~vmscan-minfix	2005-02-04 11:52:37.000000000 +1100
>>+++ linux-2.6-npiggin/mm/vmscan.c	2005-02-04 11:53:32.000000000 +1100
>>@@ -575,6 +575,7 @@ static void shrink_cache(struct zone *zo
>> 			nr_taken++;
>> 		}
>> 		zone->nr_inactive -= nr_taken;
>>+		zone->pages_scanned += nr_scan;
>> 		spin_unlock_irq(&zone->lru_lock);
>> 
>> 		if (nr_taken == 0)
>>
> 
> 
> Any theories as to why these pages aren't being activated and aren't being
> reclaimed?
> 
> 

No none yet, which is what we should get to the bottom of. I must be
overlooking something, but the only ways I can see should be due to
transient conditions like page locked or under writeback. laptop_mode?

Terje, what is /proc/sys/vm/laptop_mode set to?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04  1:19             ` Nick Piggin
@ 2005-02-04 10:26               ` Terje Fåberg
  2005-02-04 17:26                 ` Terje Fåberg
  0 siblings, 1 reply; 13+ messages in thread
From: Terje Fåberg @ 2005-02-04 10:26 UTC (permalink / raw)
  To: Nick Piggin, Andrew Morton; +Cc: terje_fb, linux-kernel, torvalds

Nick Piggin <nickpiggin@yahoo.com.au> skrev: 

> No none yet, which is what we should get to the
> bottom of. I must be overlooking something, but the
> only ways I can see should be due to transient 
> conditions like page locked or under writeback. 
> laptop_mode?
> 
> Terje, what is /proc/sys/vm/laptop_mode set to?

0. I didn't touch any vm-specific options at all.

I just rebooted with your patch. I can _not_ reproduce
the problem until now. So far so good. But yesterday I
couldn't reproduce it straightaway either. 

I'll continue to do the same things I did yesterday
before kswapd started to spin. 

Regards,
Terje


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04 10:26               ` Terje Fåberg
@ 2005-02-04 17:26                 ` Terje Fåberg
  2005-02-04 22:18                   ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Terje Fåberg @ 2005-02-04 17:26 UTC (permalink / raw)
  To: Nick Piggin, Andrew Morton; +Cc: terje_fb, linux-kernel, torvalds

Terje Fåberg <terje_fb@yahoo.no> skrev: 

> I'll continue to do the same things I did yesterday
> before kswapd started to spin. 

Looks very good so far. I am unable to reproduce the
bad kswapd behaviour with your patch, Nick.

To double-check I booted into the old kernel an hour
ago and I _could_ reproduce the bad behaviour within a
few minutes. 

Looks like your patch fixes it for my workload.

Thanks a lot,
Terje

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04 17:26                 ` Terje Fåberg
@ 2005-02-04 22:18                   ` Nick Piggin
  2005-02-05  7:12                     ` Terje Fåberg
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2005-02-04 22:18 UTC (permalink / raw)
  To: Terje Fåberg; +Cc: Andrew Morton, linux-kernel, torvalds

Terje Fåberg wrote:
> Terje Fåberg <terje_fb@yahoo.no> skrev: 
> 
> 
>>I'll continue to do the same things I did yesterday
>>before kswapd started to spin. 
> 
> 
> Looks very good so far. I am unable to reproduce the
> bad kswapd behaviour with your patch, Nick.
> 
> To double-check I booted into the old kernel an hour
> ago and I _could_ reproduce the bad behaviour within a
> few minutes. 
> 
> Looks like your patch fixes it for my workload.
> 

OK that's good to know. At this stage it is only working
around the intermediate symptoms, and we might want a
different fix for 2.6.11...

So hopefully you'll be able to test a patch or two if
you get time.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.10: kswapd spins like crazy
  2005-02-04 22:18                   ` Nick Piggin
@ 2005-02-05  7:12                     ` Terje Fåberg
  0 siblings, 0 replies; 13+ messages in thread
From: Terje Fåberg @ 2005-02-05  7:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, torvalds

Nick Piggin <nickpiggin@yahoo.com.au> skrev: 

> OK that's good to know. At this stage it is only
> working around the intermediate symptoms, and we
> might want a different fix for 2.6.11...
> 
> So hopefully you'll be able to test a patch or two
> if you get time.

Sure. Just drop me a mail.
I'm glad if I can help.

Regards,
Terje

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: 2.6.10: kswapd spins like crazy
@ 2005-02-04 16:16 Weathers, Norman R.
  0 siblings, 0 replies; 13+ messages in thread
From: Weathers, Norman R. @ 2005-02-04 16:16 UTC (permalink / raw)
  To: linux-kernel



We have had a similar problem with all kernels since 2.6.8.1.  It has
gotten so bad that we had to drop back to 2.6.7 with some extra patches
to get our systems working.  Our situation is a little bit different.

We are using smp Opteron boxes as NFS servers.  Under almost any load at
all, kswapd goes nuts, taking up
99 % of the CPU cycles for long periods of time.  With 2.6.7, this has
not been noticed as bad (just periods of about 3 - 5 seconds of 10 - 35
% utilized, then off for a few seconds, then back again.  Sometimes
kswapd lingers longer as the most aggressive app in top, but with 2.6.7,
the nfsd's are the most prevalent).

Also, we have noticed something else.  Our servers have dual Broadcom
gigabit nics (Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
(rev 03)).  We have bonded both NICS back to our core switch, both
running at gigabit speed.  Under different loads, we start to get call
traces in dmesg and the syslog.  An excerpt follows:


<Jan/06 03:50 pm>Call Trace:<IRQ> <ffffffff80158fa0>{__alloc_pages+816}
<ffffffff8013ffd3>{del_timer+115}
<Jan/06 03:50 pm>       <ffffffff80158fe0>{__get_free_pages+16}
<ffffffff8015c886>{kmem_getpages+38}
<Jan/06 03:50 pm>       <ffffffff8015d8be>{cache_grow+190}
<ffffffff8015db16>{cache_alloc_refill+422}
<Jan/06 03:50 pm>       <ffffffff8015de06>{kmem_cache_alloc+54}
<ffffffff802d5eaf>{dst_alloc+47}
<Jan/06 03:50 pm>       <ffffffff802e3d17>{ip_route_input_slow+1639}
<ffffffff803085bb>{udp_rcv+267}
<Jan/06 03:50 pm>       <ffffffff802e612e>{ip_rcv+526}
<ffffffff802d297d>{netif_receive_skb+477}
<Jan/06 03:50 pm>
<ffffffffa0120fe8>{:bcm5700:MM_IndicateRxPackets+920}
<Jan/06 03:50 pm>       <ffffffffa011c9fe>{:bcm5700:bcm5700_poll+158}
<ffffffff802d2b94>{net_rx_action+132}
<Jan/06 03:50 pm>       <ffffffff8013c4b1>{__do_softirq+113}
<ffffffff8013c565>{do_softirq+53}
<Jan/06 03:50 pm>       <ffffffff80113baf>{do_IRQ+335}
<ffffffff80111001>{ret_from_intr+0}
<Jan/06 03:50 pm>        <EOI> <ffffffff8031e419>{thread_return+41}
<ffffffff8010eb20>{default_idle+0}
<Jan/06 03:50 pm>       <ffffffff8010eb44>{default_idle+36}
<ffffffff8010ebdc>{cpu_idle+44}
<Jan/06 03:50 pm>       <ffffffff80517885>{start_kernel+453}
<Jan/06 03:50 pm>swapper: page allocation failure. order:0, mode:0x20

<Jan/06 03:50 pm>Call Trace:<IRQ> <ffffffff80158fa0>{__alloc_pages+816}
<ffffffff801158b4>{end_8259A_irq+100}
<Jan/06 03:50 pm>       <ffffffff80158fe0>{__get_free_pages+16}
<ffffffff8015c886>{kmem_getpages+38}
<Jan/06 03:50 pm>       <ffffffff8015d8be>{cache_grow+190}
<ffffffff8015db16>{cache_alloc_refill+422}
<Jan/06 03:50 pm>       <ffffffff8015de06>{kmem_cache_alloc+54}
<ffffffff802d5eaf>{dst_alloc+47}
<Jan/06 03:50 pm>       <ffffffff802e3d17>{ip_route_input_slow+1639}
<ffffffff802e612e>{ip_rcv+526}
<Jan/06 03:50 pm>       <ffffffff80131b2b>{try_to_wake_up+523}
<ffffffff802d297d>{netif_receive_skb+477}
<Jan/06 03:50 pm>
<ffffffffa0120fe8>{:bcm5700:MM_IndicateRxPackets+920}
<Jan/06 03:50 pm>       <ffffffffa011c9fe>{:bcm5700:bcm5700_poll+158}
<ffffffff802d2b94>{net_rx_action+132}
<Jan/06 03:50 pm>       <ffffffff8013c4b1>{__do_softirq+113}
<ffffffff8013c565>{do_softirq+53}
<Jan/06 03:50 pm>       <ffffffff80113baf>{do_IRQ+335}
<ffffffff80111001>{ret_from_intr+0}
<Jan/06 03:50 pm>        <EOI> <ffffffff8031e419>{thread_return+41}
<ffffffff8010eb20>{default_idle+0}
<Jan/06 03:50 pm>       <ffffffff8010eb44>{default_idle+36}
<ffffffff8010ebdc>{cpu_idle+44}
<Jan/06 03:50 pm>       <ffffffff80517885>{start_kernel+453}
<Jan/06 03:50 pm>swapper: page allocation failure. order:0, mode:0x20

<Jan/06 03:50 pm>Call Trace:<IRQ> <ffffffff80158fa0>{__alloc_pages+816}
<ffffffff801158b4>{end_8259A_irq+100}
<Jan/06 03:50 pm>       <ffffffff80158fe0>{__get_free_pages+16}
<ffffffff8015c886>{kmem_getpages+38}
<Jan/06 03:50 pm>       <ffffffff8015d8be>{cache_grow+190}
<ffffffff8015db16>{cache_alloc_refill+422}
<Jan/06 03:50 pm>       <ffffffff8015de06>{kmem_cache_alloc+54}
<ffffffff802d5eaf>{dst_alloc+47}
<Jan/06 03:50 pm>       <ffffffff802e3d17>{ip_route_input_slow+1639}
<ffffffff803085bb>{udp_rcv+267}
<Jan/06 03:50 pm>       <ffffffff802e612e>{ip_rcv+526}
<ffffffff802d297d>{netif_receive_skb+477}
<Jan/06 03:50 pm>
<ffffffffa0120fe8>{:bcm5700:MM_IndicateRxPackets+920}
<Jan/06 03:50 pm>       <ffffffffa011c9fe>{:bcm5700:bcm5700_poll+158}
<ffffffff802d2b94>{net_rx_action+132}
<Jan/06 03:50 pm>       <ffffffff8013c4b1>{__do_softirq+113}
<ffffffff8013c565>{do_softirq+53}
<Jan/06 03:50 pm>       <ffffffff80113baf>{do_IRQ+335}
<ffffffff80111001>{ret_from_intr+0}
<Jan/06 03:50 pm>        <EOI> <ffffffff8031e419>{thread_return+41}
<ffffffff8010eb20>{default_idle+0}
<Jan/06 03:50 pm>       <ffffffff8010eb44>{default_idle+36}
<ffffffff8010ebdc>{cpu_idle+44}
<Jan/06 03:50 pm>       <ffffffff80517885>{start_kernel+453}
<Jan/06 03:50 pm>swapper: page allocation failure. order:0, mode:0x20

<Jan/06 03:50 pm>Call Trace:<IRQ> <ffffffff80158fa0>{__alloc_pages+816}
<ffffffff801158b4>{end_8259A_irq+100}
<Jan/06 03:50 pm>       <ffffffff80158fe0>{__get_free_pages+16}
<ffffffff8015c886>{kmem_getpages+38}
<Jan/06 03:50 pm>       <ffffffff8015d8be>{cache_grow+190}
<ffffffff8015db16>{cache_alloc_refill+422}
<Jan/06 03:50 pm>       <ffffffff8015de06>{kmem_cache_alloc+54}
<ffffffff802d5eaf>{dst_alloc+47}
<Jan/06 03:50 pm>       <ffffffff802e3d17>{ip_route_input_slow+1639}
<ffffffff802e612e>{ip_rcv+526}
<Jan/06 03:50 pm>       <ffffffff802d297d>{netif_receive_skb+477}
<ffffffffa0120fe8>{:bcm5700:MM_IndicateRxPackets+920}
<Jan/06 03:50 pm>       <ffffffffa011c9fe>{:bcm5700:bcm5700_poll+158}
<ffffffff802d2b94>{net_rx_action+132}
<Jan/06 03:50 pm>       <ffffffff8013c4b1>{__do_softirq+113}
<ffffffff8013c565>{do_softirq+53}
<Jan/06 03:50 pm>       <ffffffff80113baf>{do_IRQ+335}
<ffffffff80111001>{ret_from_intr+0}
<Jan/06 03:50 pm>        <EOI> <ffffffff8031e419>{thread_return+41}
<ffffffff8010eb20>{default_idle+0}
<Jan/06 03:50 pm>       <ffffffff8010eb44>{default_idle+36}
<ffffffff8010ebdc>{cpu_idle+44}
<Jan/06 03:50 pm>       <ffffffff80517885>{start_kernel+453}
<Jan/06 03:50 pm>swapper: page allocation failure. order:0, mode:0x20

<Jan/06 03:50 pm>Call Trace:<IRQ> <ffffffff80158fa0>{__alloc_pages+816}
<ffffffff80158fe0>{__get_free_pages+16}
<Jan/06 03:50 pm>       <ffffffff8015c886>{kmem_getpages+38}
<ffffffff8015d8be>{cache_grow+190}
<Jan/06 03:50 pm>       <ffffffff8015db16>{cache_alloc_refill+422}
<ffffffff8015de06>{kmem_cache_alloc+54}
<Jan/06 03:50 pm>       <ffffffff802d5eaf>{dst_alloc+47}
<ffffffff802e3d17>{ip_route_input_slow+1639}
<Jan/06 03:50 pm>       <ffffffff802e612e>{ip_rcv+526}
<ffffffff802d297d>{netif_receive_skb+477}

This was just a partial listing from one of our servers.  I had read in
several lists that this was not considered fatal.  The problem is that
with our setup, it has turned fatal, to the point of locking out the
system remotely, and only a reset from the machine itself able to work
(didn't even honor the sysrq-b combo at the console).

Has anyone else run into this?  I can get this kind of error using about
20 clients (100 MB connected) hitting one server (dual gigabit bonded).
With 2.6.8.1 and newer, the errors are reproducible, but I can't exactly
tell when they happen (either write or read).  I think I have seen them
happen in both writes and reads.  And the kswapd problems happened
during writes and reads both as well.

I can also get the kswapd going crazy with a local set of disk I/O
tests.

Any information needed, please ask.  Any help would be appreciated.

Thanks,
Norman Weathers




-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Nick Piggin
Sent: Thursday, February 03, 2005 7:20 PM
To: Andrew Morton
Cc: terje_fb@yahoo.no; linux-kernel@vger.kernel.org; torvalds@osdl.org
Subject: Re: 2.6.10: kswapd spins like crazy



Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>>Oh, attached should be a minimal fix if you would like to try it out.
>>
>>
>>...
>>--- linux-2.6/mm/vmscan.c~vmscan-minfix	2005-02-04
11:52:37.000000000 +1100
>>+++ linux-2.6-npiggin/mm/vmscan.c	2005-02-04 11:53:32.000000000
+1100
>>@@ -575,6 +575,7 @@ static void shrink_cache(struct zone *zo
>> 			nr_taken++;
>> 		}
>> 		zone->nr_inactive -= nr_taken;
>>+		zone->pages_scanned += nr_scan;
>> 		spin_unlock_irq(&zone->lru_lock);
>> 
>> 		if (nr_taken == 0)
>>
> 
> 
> Any theories as to why these pages aren't being activated and aren't
being
> reclaimed?
> 
> 

No none yet, which is what we should get to the bottom of. I must be
overlooking something, but the only ways I can see should be due to
transient conditions like page locked or under writeback. laptop_mode?

Terje, what is /proc/sys/vm/laptop_mode set to?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-02-05  7:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-03 10:29 2.6.10: kswapd spins like crazy Terje Fåberg
2005-02-03 10:47 ` Nick Piggin
2005-02-03 11:54   ` Terje Fåberg
2005-02-03 19:50     ` Terje Fåberg
2005-02-04  0:12       ` Nick Piggin
2005-02-04  0:56         ` Nick Piggin
2005-02-04  1:16           ` Andrew Morton
2005-02-04  1:19             ` Nick Piggin
2005-02-04 10:26               ` Terje Fåberg
2005-02-04 17:26                 ` Terje Fåberg
2005-02-04 22:18                   ` Nick Piggin
2005-02-05  7:12                     ` Terje Fåberg
  -- strict thread matches above, loose matches on Subject: below --
2005-02-04 16:16 Weathers, Norman R.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox