public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.5.41-mm3
@ 2002-10-11  9:28 Andrew Morton
  2002-10-11 10:58 ` 2.5.41-mm3 Henrik Storner
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2002-10-11  9:28 UTC (permalink / raw)
  To: lkml


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.41/2.5.41-mm3/

. Merged up John's latest 2.5 oprofile, so folks will have to wean
  themselves off the crufty old one.

  You'll have to grab the userspace tools from
  http://oprofile.sourceforge.net/oprofile-2.5.html

  Use:

	mkdir /dev/oprofile	# John forgot this
	./configure --with-kernel-support
	make install

  Or just do what I didn't do and read the web page.

  Quite a few things seem to have changed with oprofile.  A typical
  profiling cycle would now be:

	rm -rf /var/lib/oprofile
	op_start --ctr0-count=50000 --ctr0-event=CPU_CLK_UNHALTED \
		--vmlinux=/path/to/vmlinux 
	<run test>
	op_stop
	sleep 3
	oprofpp -l -i /boot/vmlinux
	kill $(cat /var/lib/oprofile/lock)

  You must kill the daemon by hand before you can run op_start
  again.

. I've dropped the 512-byte O_DIRECT alignment patch for now. It's
  over in the experimental directory.  I'd like to get a decent round
  of testing with the bio_add_page fix so we can get that into Linus
  and get direct-io generally stabilised again before moving on.

. We've had some encouraging performance test results on the
  shared pagetable code, but also a couple of crashes.  The people
  who are monitoring performance may want to try that out.  It is
  selectable in config.

. Turns out that the idea of unmapped mapped pagecache a little earlier
  than swapping out anon memory was a poor one.  Changed the VM so that
  we treat these types of pages the same.

  It would be really appreciated if people who are interested in "the
  desktop experience" could give this patchset a try.  It's working
  well for me; but that's not a large sample...


-guruhugh.patch
-pte-highmem-warning.patch
-raw-use-o_direct.patch
-remove-radix_tree_reserve.patch
-ext3-yield.patch
-readv-writev-check-fix.patch

 Merged

+kgdb.patch

 Make things simpler for myself

+oprofile-25.patch

 Latest version

+hugetlb-meminfo.patch

 Change the layout of the hugetlbpage info in /proc/meminfo

+dio-bio-add-fix-1.patch

 Direct-io fixes

+net-loopback.patch

 Davem's patch to make the loopback device save a copy.  Doesn't seem
 to affect anything really.

-dio-fine-alignment.patch

 Moved to ../experimental for now

+blkdev-o_direct-short-read.patch

 Fix O_DIRECT-read-past-EOF for blockdevs

+msync-correctness.patch

 msync() standards fix

+page_reserved-accounting.patch

 Global accounting for PageReserved pages

+use-page_reserved_accounting.patch

 Use the above in a couple of VM decision-making places

+shpte-ifdef.patch

 Reduce shpte ifdeffery a little

+shpte-mprotect-fix.patch

 Shared pagetable mprotect fix.




linus.patch
  cset-1.573.100.12-to-1.738.txt.gz

kgdb.patch

oprofile-25.patch

misc.patch
  misc

swsusp-feature.patch
  add shrink_all_memory() for swsusp

hugetlb-meminfo.patch
  change hugetlbpage info in /proc/meminfo

dio-bio-add-fix-1.patch
  Fix direct-io for bio_add_page()

net-loopback.patch
  Disable second copy in the network loopback driver

large-queue-throttle.patch
  Improve writer throttling for small machines

exit-page-referenced.patch
  Propagate pte referenced bit into pagecache during unmap

swappiness.patch
  swappiness control

mapped-start-active.patch
  start anonymous pages on the active list

rename-dirty_async_ratio.patch
  rename dirty_async_ratio to dirty_ratio

auto-dirty-memory.patch
  adaptive dirty-memory thresholding

batched-slab-asap.patch
  batched slab shrinking and shrinker callback API

blkdev-o_direct-short-read.patch
  Fix O_DIRECT blockdev reads at end-of-device

orlov-allocator.patch

lseek-ext2_readdir.patch
  remove lock_kernel() from ext2_readdir()

msync-correctness.patch
  msync correctness fix

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

hugetlb-prefault.patch
  hugetlbpages: factor out some code for hugetlbfs

ramfs-aops.patch
  Move ramfs address_space ops into libfs

hugetlb-header-split.patch
  Move hugetlb declarations into their own header

hugetlbfs.patch
  hugetlbfs file system

hugetlb-shm.patch
  hugetlbfs backing for SYSV shared memory

page_reserved-accounting.patch
  Global PageReserved accounting

use-page_reserved_accounting.patch
  Use PG_reserved accounting in the VM

ramfs-prepare-write-speedup.patch
  correctness fixes in libfs address_space ops

akpm-deadline.patch
  deadline scheduler tweaks

intel-user-copy.patch
  Faster copt_*_user for Intel ia32 CPUs

raid0-fix.patch
  RAID0 fix

rmqueue_bulk.patch
  bulk page allocator

free_pages_bulk.patch
  Bulk page freeing function

hot_cold_pages.patch
  Hot/Cold pages and zone->lock amortisation

readahead-cold-pages.patch
  Use cache-cold pages for pagecache reads.

pagevec-hot-cold-hint.patch
  hot/cold hints for truncate and page reclaim

page-reservation.patch
  Page reservation API

slab-split-01-rename.patch
  slab cleanup: rename static functions

slab-split-02-SMP.patch
  slab: enable the cpu arrays on uniprocessor

slab-split-03-tail.patch
  slab: reduced internal fragmentation

slab-split-04-drain.patch
  slab: take the spinlock in the drain function.

slab-split-05-name.patch
  slab: remove spaces from /proc identifiers

slab-split-06-mand-cpuarray.patch
  slab: cleanups and speedups

slab-split-07-inline.patch
  slab: uninline poisoning checks

slab-split-08-reap.patch
  slab: reap timers

cpucache_init-fix.patch
  cpucache_init fix

slab-split-10-list_for_each_fix.patch
  slab: for a list walking bug

shpte.patch

shpte-ifdef.patch
  reduced ifdeffery in the shared pagetable code

shpte-mprotect-fix.patch
  fix shared pagetable handling of mprotect

read_barrier_depends.patch
  extended barrier primitives

rcu_ltimer.patch
  RCU core

dcache_rcu.patch
  Use RCU for dcache

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
@ 2002-10-11 10:26 Con Kolivas
  2002-10-11 11:40 ` 2.5.41-mm3 Anton Blanchard
  0 siblings, 1 reply; 10+ messages in thread
From: Con Kolivas @ 2002-10-11 10:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux kernel mailing list

Compile failure:

  gcc -Wp,-MD,fs/smbfs/.inode.o.d -D__KERNEL__ -Iinclude -Wall 
-Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -pipe 
-mpreferred-stack-boundary=2 -march=i686 -Iarch/i386/mach-generic 
-fomit-frame-pointer -nostdinc -iwithprefix include  -DSMBFS_PARANOIA  
-DKBUILD_BASENAME=inode   -c -o fs/smbfs/inode.o fs/smbfs/inode.c
fs/smbfs/inode.c: In function `smb_show_options':
fs/smbfs/inode.c:436: `CONFIG_NLS_DEFAULT' undeclared (first use in this 
function)
fs/smbfs/inode.c:436: (Each undeclared identifier is reported only once
fs/smbfs/inode.c:436: for each function it appears in.)
fs/smbfs/inode.c: In function `smb_fill_super':
fs/smbfs/inode.c:536: `CONFIG_NLS_DEFAULT' undeclared (first use in this 
function)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-11  9:28 2.5.41-mm3 Andrew Morton
@ 2002-10-11 10:58 ` Henrik Storner
  2002-10-11 13:38 ` 2.5.41-mm3 William Lee Irwin III
  2002-10-12  2:29 ` 2.5.41-mm3 Daniel Phillips
  2 siblings, 0 replies; 10+ messages in thread
From: Henrik Storner @ 2002-10-11 10:58 UTC (permalink / raw)
  To: linux-kernel

>url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.41/2.5.41-mm3/

Won't build:
In file included from arch/i386/kernel/timers/timer_pit.c:14:
arch/i386/mach-generic/do_timer.h: In function `do_timer_interrupt_hook':
arch/i386/mach-generic/do_timer.h:25: `using_apic_timer' undeclared (first use in this function)
arch/i386/mach-generic/do_timer.h:25: (Each undeclared identifier is reported only once
arch/i386/mach-generic/do_timer.h:25: for each function it appears in.)
arch/i386/mach-generic/do_timer.h:26: warning: implicit declaration of function `smp_local_timer_interrupt'
make[2]: *** [arch/i386/kernel/timers/timer_pit.o] Error 1

UP config with IO APIC enabled.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-11 10:26 2.5.41-mm3 Con Kolivas
@ 2002-10-11 11:40 ` Anton Blanchard
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2002-10-11 11:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, linux kernel mailing list


Hi,

> Compile failure:
> 
>   gcc -Wp,-MD,fs/smbfs/.inode.o.d -D__KERNEL__ -Iinclude -Wall 
> -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -pipe 
> -mpreferred-stack-boundary=2 -march=i686 -Iarch/i386/mach-generic 
> -fomit-frame-pointer -nostdinc -iwithprefix include  -DSMBFS_PARANOIA  
> -DKBUILD_BASENAME=inode   -c -o fs/smbfs/inode.o fs/smbfs/inode.c
> fs/smbfs/inode.c: In function `smb_show_options':
> fs/smbfs/inode.c:436: `CONFIG_NLS_DEFAULT' undeclared (first use in this 
> function)
> fs/smbfs/inode.c:436: (Each undeclared identifier is reported only once
> fs/smbfs/inode.c:436: for each function it appears in.)
> fs/smbfs/inode.c: In function `smb_fill_super':
> fs/smbfs/inode.c:536: `CONFIG_NLS_DEFAULT' undeclared (first use in this 
> function)

There is a space missing in fs/nls/Config.in:

===== fs/nls/Config.in 1.7 vs edited =====
--- 1.7/fs/nls/Config.in	Fri Oct 11 05:16:07 2002
+++ edited/fs/nls/Config.in	Fri Oct 11 10:25:51 2002
@@ -12,7 +12,7 @@
 # msdos and Joliet want NLS
 if [ "$CONFIG_JOLIET" = "y" -o "$CONFIG_FAT_FS" != "n" \
 	-o "$CONFIG_NTFS_FS" != "n" -o "$CONFIG_NCPFS_NLS" = "y" \
-	-o "$CONFIG_SMB_NLS" = "y" -o "$CONFIG_JFS_FS" != "n" -o "$CONFIG_CIFS" != "n"]; then
+	-o "$CONFIG_SMB_NLS" = "y" -o "$CONFIG_JFS_FS" != "n" -o "$CONFIG_CIFS" != "n" ]; then
   define_bool CONFIG_NLS y
 else
   define_bool CONFIG_NLS n

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
       [not found] <3DA683F4.944DFC11@digeo.com>
@ 2002-10-11 12:37 ` Ed Tomlinson
  2002-10-12 14:21   ` 2.5.41-mm3 Ed Tomlinson
  0 siblings, 1 reply; 10+ messages in thread
From: Ed Tomlinson @ 2002-10-11 12:37 UTC (permalink / raw)
  To: Andrew Morton, lkml, linux-mm@kvack.org

Hi,

I get this opps just after boot - the box was sitting waiting for me to login 
and start X.  Nothing unsual in the boot log - same config I have been using

-------------
oscar login: Unable to handle kernel paging request at virtual address 8978408f
 printing eip:
c012b364
*pde = 00000000
Oops: 0002
af_packet snd-seq-midi snd-seq-oss snd-seq-midi-event snd-seq snd-pcm-oss snd-mixer-oss snd-cs46xx snd-pcm snd-timer snd-rawmidi snd-seq-device snd-ac97-codec snd soundcore gameport softdog matroxfb_base matroxfb_g450 matroxfb_DAC1064 g450_pll matroxfb_accel matroxfb_misc fbcon-cfb16 fbcon-cfb8 fbcon-cfb24 fbcon-cfb32 mga agpgart pppoe pppox ipchains msdos fat sd_mod floppy dummy bsd_comp ppp_generic slhc parport_pc lp parport ipip smbfs binfmt_aout autofs4 cdrom via-rhine mii tulip crc32 usb-storage scsi_mod hid pl2303 usbserial  
CPU:    0
EIP:    0060:[<c012b364>]    Not tainted
EFLAGS: 00010012
EIP is at free_block+0x50/0xe4
eax: 8978408b   ebx: dc2ad240   ecx: dc2bd000   edx: 558ba445
esi: dffec21c   edi: 00000004   ebp: dffec228   esp: c0295eec
ds: 0068   es: 0068   ss: 0068
Process swapper (pid: 0, threadinfo=c0294000 task=c02596c0)
Stack: 00000008 c173a400 c173a410 dffec21c c0295f18 c173a420 c012b86e dffec21c 
       c173a410 00000008 c0353b1c c0294000 c02ab480 00000000 dffec408 c0294000 
       dffec288 c011b6ef 00000000 00000000 c032fc60 fffffffe c032fc60 c012b7ec 
Call Trace:
 [<c012b86e>] reap_timer_fnc+0x82/0x478
 [<c011b6ef>] run_timer_tasklet+0xe7/0x130
 [<c012b7ec>] reap_timer_fnc+0x0/0x478
 [<c01187e8>] tasklet_hi_action+0x3c/0x60
 [<c011860b>] do_softirq+0x5b/0xac
 [<c0108560>] do_IRQ+0xfc/0x114
 [<c01052e0>] default_idle+0x0/0x28
 [<c0105000>] stext+0x0/0x50
 [<c01070e8>] common_interrupt+0x18/0x20
 [<c01052e0>] default_idle+0x0/0x28
 [<c0105000>] stext+0x0/0x50
 [<c0105303>] default_idle+0x23/0x28
 [<c0105374>] cpu_idle+0x28/0x38
 [<c010504d>] stext+0x4d/0x50

Code: 89 50 04 89 02 2b 59 0c 89 d8 31 d2 f7 76 30 89 c3 8b 41 14 
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Hope this helps,
Ed Tomlinson
 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-11  9:28 2.5.41-mm3 Andrew Morton
  2002-10-11 10:58 ` 2.5.41-mm3 Henrik Storner
@ 2002-10-11 13:38 ` William Lee Irwin III
  2002-10-12  2:29 ` 2.5.41-mm3 Daniel Phillips
  2 siblings, 0 replies; 10+ messages in thread
From: William Lee Irwin III @ 2002-10-11 13:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml

On Fri, Oct 11, 2002 at 02:28:10AM -0700, Andrew Morton wrote:
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.41/2.5.41-mm3/

This paride driver seems to have been missed with all the sector_t
and sector_div() business.

--- akpm-2.5.41-3/drivers/block/paride/pf.c	2002-10-11 06:09:31.000000000 -0700
+++ wli-2.5.41-3/drivers/block/paride/pf.c	2002-10-11 06:38:16.000000000 -0700
@@ -369,11 +369,11 @@
 		return -EINVAL;
 	capacity = get_capacity(pf->disk);
 	if (capacity < PF_FD_MAX) {
-		g.cylinders = capacity / (PF_FD_HDS * PF_FD_SPT);
+		g.cylinders = sector_div(capacity, PF_FD_HDS * PF_FD_SPT);
 		g.heads = PF_FD_HDS;
 		g.sectors = PF_FD_SPT;
 	} else {
-		g.cylinders = capacity / (PF_HD_HDS * PF_HD_SPT);
+		g.cylinders = sector_div(capacity, PF_HD_HDS * PF_HD_SPT);
 		g.heads = PF_HD_HDS;
 		g.sectors = PF_HD_SPT;
 	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-11  9:28 2.5.41-mm3 Andrew Morton
  2002-10-11 10:58 ` 2.5.41-mm3 Henrik Storner
  2002-10-11 13:38 ` 2.5.41-mm3 William Lee Irwin III
@ 2002-10-12  2:29 ` Daniel Phillips
  2002-10-12  2:31   ` 2.5.41-mm3 William Lee Irwin III
  2002-10-12  3:26   ` 2.5.41-mm3 Andrew Morton
  2 siblings, 2 replies; 10+ messages in thread
From: Daniel Phillips @ 2002-10-12  2:29 UTC (permalink / raw)
  To: Andrew Morton, lkml

On Friday 11 October 2002 11:28, Andrew Morton wrote:
> . Turns out that the idea of unmapped mapped pagecache a little earlier
>   than swapping out anon memory was a poor one.

Translation into English?

-- 
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-12  2:29 ` 2.5.41-mm3 Daniel Phillips
@ 2002-10-12  2:31   ` William Lee Irwin III
  2002-10-12  3:26   ` 2.5.41-mm3 Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: William Lee Irwin III @ 2002-10-12  2:31 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrew Morton, lkml

On Friday 11 October 2002 11:28, Andrew Morton wrote:
>> . Turns out that the idea of unmapped mapped pagecache a little earlier
>>   than swapping out anon memory was a poor one.

On Sat, Oct 12, 2002 at 04:29:20AM +0200, Daniel Phillips wrote:
> Translation into English?

Priority paging blows goats.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-12  2:29 ` 2.5.41-mm3 Daniel Phillips
  2002-10-12  2:31   ` 2.5.41-mm3 William Lee Irwin III
@ 2002-10-12  3:26   ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2002-10-12  3:26 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: lkml

Daniel Phillips wrote:
> 
> On Friday 11 October 2002 11:28, Andrew Morton wrote:
> > . Turns out that the idea of unmapped mapped pagecache a little earlier
> >   than swapping out anon memory was a poor one.
> 
> Translation into English?
> 

Sorry.  Twas a strategic typo.  "unmapping mapped pagecache".

What I was doing was to reclaim file-backed mmapped data a little
more eagerly than to start swapping anonymous memory.  Under the
theory that program text can be reestablished with one IO, but swapout
needs two.

Disabling that cool idea made things heaps better, so....

I assume what was happening was that we'd reach the "reclaim mapped
pagecache" level, reclaim a ton of memory and then never start
swapping.

Actually, the thing which throws the spanner in the works, again and
again and again and again is having lots of dirty pagecache around. 
The more stuff I put in over there, and the more the kernel clamps
down on the heavy writers, makes them write back their own data (actually
their own spindle....) the better things get.

Here's the diff.  It is really just 2.4 in disguise.



 include/linux/swap.h   |    1 
 include/linux/sysctl.h |    1 
 kernel/sysctl.c        |    3 +
 mm/vmscan.c            |  103 +++++++++++++++++++++++++++++++++++++++----------
 4 files changed, 89 insertions(+), 19 deletions(-)

--- 2.5.41/mm/vmscan.c~swappiness	Fri Oct 11 11:18:11 2002
+++ 2.5.41-akpm/mm/vmscan.c	Fri Oct 11 11:18:47 2002
@@ -35,13 +35,18 @@
 #include <linux/swapops.h>
 
 /*
- * The "priority" of VM scanning is how much of the queues we
- * will scan in one go. A value of 12 for DEF_PRIORITY implies
- * that we'll scan 1/4096th of the queues ("queue_length >> 12")
- * during a normal aging round.
+ * The "priority" of VM scanning is how much of the queues we will scan in one
+ * go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
+ * queues ("queue_length >> 12") during an aging round.
  */
 #define DEF_PRIORITY 12
 
+/*
+ * From 0 .. 100.  Higher means more swappy.
+ */
+int vm_swappiness = 60;
+static long total_memory;
+
 #ifdef ARCH_HAS_PREFETCH
 #define prefetch_prev_lru_page(_page, _base, _field)			\
 	do {								\
@@ -101,7 +106,6 @@ static inline int is_page_cache_freeable
 	return page_count(page) - !!PagePrivate(page) == 2;
 }
 
-
 /*
  * shrink_list returns the number of reclaimed pages
  */
@@ -439,7 +443,8 @@ done:
  * But we had to alter page->flags anyway.
  */
 static /* inline */ void
-refill_inactive_zone(struct zone *zone, const int nr_pages_in)
+refill_inactive_zone(struct zone *zone, const int nr_pages_in,
+			struct page_state *ps, int priority)
 {
 	int pgdeactivate = 0;
 	int nr_pages = nr_pages_in;
@@ -448,6 +453,10 @@ refill_inactive_zone(struct zone *zone, 
 	LIST_HEAD(l_active);	/* Pages to go onto the active_list */
 	struct page *page;
 	struct pagevec pvec;
+	int reclaim_mapped = 0;
+	long mapped_ratio;
+	long distress;
+	long swap_tendency;
 
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
@@ -469,6 +478,37 @@ refill_inactive_zone(struct zone *zone, 
 	}
 	spin_unlock_irq(&zone->lru_lock);
 
+	/*
+	 * `distress' is a measure of how much trouble we're having reclaiming
+	 * pages.  0 -> no problems.  100 -> great trouble.
+	 */
+	distress = 100 >> priority;
+
+	/*
+	 * The point of this algorithm is to decide when to start reclaiming
+	 * mapped memory instead of just pagecache.  Work out how much memory
+	 * is mapped.
+	 */
+	mapped_ratio = (ps->nr_mapped * 100) / total_memory;
+
+	/*
+	 * Now decide how much we really want to unmap some pages.  The mapped
+	 * ratio is downgraded - just because there's a lot of mapped memory
+	 * doesn't necessarily mean that page reclaim isn't succeeding.
+	 *
+	 * The distress ratio is important - we don't want to start going oom.
+	 *
+	 * A 100% value of vm_swappiness overrides this algorithm altogether.
+	 */
+	swap_tendency = mapped_ratio / 2 + distress + vm_swappiness;
+
+	/*
+	 * Now use this metric to decide whether to start moving mapped memory
+	 * onto the inactive list.
+	 */
+	if (swap_tendency >= 100)
+		reclaim_mapped = 1;
+
 	while (!list_empty(&l_hold)) {
 		page = list_entry(l_hold.prev, struct page, lru);
 		list_del(&page->lru);
@@ -480,6 +520,10 @@ refill_inactive_zone(struct zone *zone, 
 				continue;
 			}
 			pte_chain_unlock(page);
+			if (!reclaim_mapped) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
 		}
 		/*
 		 * FIXME: need to consider page_count(page) here if/when we
@@ -546,7 +590,7 @@ refill_inactive_zone(struct zone *zone, 
  */
 static /* inline */ int
 shrink_zone(struct zone *zone, int max_scan, unsigned int gfp_mask,
-		const int nr_pages, int *nr_mapped)
+	const int nr_pages, int *nr_mapped, struct page_state *ps, int priority)
 {
 	unsigned long ratio;
 
@@ -563,11 +607,23 @@ shrink_zone(struct zone *zone, int max_s
 	ratio = (unsigned long)nr_pages * zone->nr_active /
 				((zone->nr_inactive | 1) * 2);
 	atomic_add(ratio+1, &zone->refill_counter);
-	while (atomic_read(&zone->refill_counter) > SWAP_CLUSTER_MAX) {
-		atomic_sub(SWAP_CLUSTER_MAX, &zone->refill_counter);
-		refill_inactive_zone(zone, SWAP_CLUSTER_MAX);
+	if (atomic_read(&zone->refill_counter) > SWAP_CLUSTER_MAX) {
+		int count;
+
+		/*
+		 * Don't try to bring down too many pages in one attempt.
+		 * If this fails, the caller will increase `priority' and
+		 * we'll try again, with an increased chance of reclaiming
+		 * mapped memory.
+		 */
+		count = atomic_read(&zone->refill_counter);
+		if (count > SWAP_CLUSTER_MAX * 4)
+			count = SWAP_CLUSTER_MAX * 4;
+		atomic_sub(count, &zone->refill_counter);
+		refill_inactive_zone(zone, count, ps, priority);
 	}
-	return shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_mapped);
+	return shrink_cache(nr_pages, zone, gfp_mask,
+				max_scan, nr_mapped);
 }
 
 /*
@@ -603,7 +659,8 @@ static void shrink_slab(int total_scanne
  */
 static int
 shrink_caches(struct zone *classzone, int priority, int *total_scanned,
-		int gfp_mask, const int nr_pages, int order)
+		int gfp_mask, const int nr_pages, int order,
+		struct page_state *ps)
 {
 	struct zone *first_classzone;
 	struct zone *zone;
@@ -630,7 +687,7 @@ shrink_caches(struct zone *classzone, in
 		if (max_scan < to_reclaim * 2)
 			max_scan = to_reclaim * 2;
 		ret += shrink_zone(zone, max_scan, gfp_mask,
-				to_reclaim, &nr_mapped);
+				to_reclaim, &nr_mapped, ps, priority);
 		*total_scanned += max_scan;
 		*total_scanned += nr_mapped;
 		if (ret >= nr_pages)
@@ -666,12 +723,14 @@ try_to_free_pages(struct zone *classzone
 
 	inc_page_state(pageoutrun);
 
-	for (priority = DEF_PRIORITY; priority; priority--) {
+	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		int total_scanned = 0;
+		struct page_state ps;
 
+		get_page_state(&ps);
 		nr_reclaimed += shrink_caches(classzone, priority,
 					&total_scanned, gfp_mask,
-					nr_pages, order);
+					nr_pages, order, &ps);
 		if (nr_reclaimed >= nr_pages)
 			return 1;
 		if (total_scanned == 0)
@@ -704,7 +763,7 @@ try_to_free_pages(struct zone *classzone
  *
  * Returns the number of pages which were actually freed.
  */
-static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
+static int balance_pgdat(pg_data_t *pgdat, int nr_pages, struct page_state *ps)
 {
 	int to_free = nr_pages;
 	int priority;
@@ -729,7 +788,7 @@ static int balance_pgdat(pg_data_t *pgda
 			if (max_scan < to_reclaim * 2)
 				max_scan = to_reclaim * 2;
 			to_free -= shrink_zone(zone, max_scan, GFP_KSWAPD,
-					to_reclaim, &nr_mapped);
+					to_reclaim, &nr_mapped, ps, priority);
 			shrink_slab(max_scan + nr_mapped, GFP_KSWAPD);
 		}
 		if (success)
@@ -778,12 +837,15 @@ int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC|PF_KSWAPD;
 
 	for ( ; ; ) {
+		struct page_state ps;
+
 		if (current->flags & PF_FREEZE)
 			refrigerator(PF_IOTHREAD);
 		prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
 		schedule();
 		finish_wait(&pgdat->kswapd_wait, &wait);
-		balance_pgdat(pgdat, 0);
+		get_page_state(&ps);
+		balance_pgdat(pgdat, 0, &ps);
 		blk_run_queues();
 	}
 }
@@ -801,8 +863,10 @@ int shrink_all_memory(int nr_pages)
 
 	for_each_pgdat(pgdat) {
 		int freed;
+		struct page_state ps;
 
-		freed = balance_pgdat(pgdat, nr_to_free);
+		get_page_state(&ps);
+		freed = balance_pgdat(pgdat, nr_to_free, &ps);
 		ret += freed;
 		nr_to_free -= freed;
 		if (nr_to_free <= 0)
@@ -819,6 +883,7 @@ static int __init kswapd_init(void)
 	swap_setup();
 	for_each_pgdat(pgdat)
 		kernel_thread(kswapd, pgdat, CLONE_KERNEL);
+	total_memory = nr_free_pagecache_pages();
 	return 0;
 }
 
--- 2.5.41/kernel/sysctl.c~swappiness	Fri Oct 11 11:18:11 2002
+++ 2.5.41-akpm/kernel/sysctl.c	Fri Oct 11 11:18:12 2002
@@ -308,6 +308,9 @@ static ctl_table vm_table[] = {
 	{ VM_NR_PDFLUSH_THREADS, "nr_pdflush_threads",
 	  &nr_pdflush_threads, sizeof nr_pdflush_threads,
 	  0444 /* read-only*/, NULL, &proc_dointvec},
+	{VM_SWAPPINESS, "swappiness", &vm_swappiness, sizeof(vm_swappiness),
+	 0644, NULL, &proc_dointvec_minmax, &sysctl_intvec, NULL, &zero,
+	 &one_hundred },
 #ifdef CONFIG_HUGETLB_PAGE
 	 {VM_HUGETLB_PAGES, "nr_hugepages", &htlbpage_max, sizeof(int), 0644, NULL, 
 	  &proc_dointvec},
--- 2.5.41/include/linux/sysctl.h~swappiness	Fri Oct 11 11:18:11 2002
+++ 2.5.41-akpm/include/linux/sysctl.h	Fri Oct 11 11:18:12 2002
@@ -152,6 +152,7 @@ enum
 	VM_OVERCOMMIT_RATIO=16, /* percent of RAM to allow overcommit in */
 	VM_PAGEBUF=17,		/* struct: Control pagebuf parameters */
 	VM_HUGETLB_PAGES=18,	/* int: Number of available Huge Pages */
+	VM_SWAPPINESS=19,	/* Tendency to steal mapped memory */
 };
 
 
--- 2.5.41/include/linux/swap.h~swappiness	Fri Oct 11 11:18:11 2002
+++ 2.5.41-akpm/include/linux/swap.h	Fri Oct 11 11:18:12 2002
@@ -164,6 +164,7 @@ extern void swap_setup(void);
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(struct zone *, unsigned int, unsigned int);
 int shrink_all_memory(int nr_pages);
+extern int vm_swappiness;
 
 /* linux/mm/page_io.c */
 int swap_readpage(struct file *file, struct page *page);

.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.41-mm3
  2002-10-11 12:37 ` 2.5.41-mm3 Ed Tomlinson
@ 2002-10-12 14:21   ` Ed Tomlinson
  0 siblings, 0 replies; 10+ messages in thread
From: Ed Tomlinson @ 2002-10-12 14:21 UTC (permalink / raw)
  To: Andrew Morton, lkml, linux-mm@kvack.org

On October 11, 2002 08:37 am, Ed Tomlinson wrote:
> Hi,
>
> I get this opps just after boot - the box was sitting waiting for me to
> login and start X.  Nothing unsual in the boot log - same config I have
> been using

Note that 2.5.42-mm2 is starting correctly.  Not sure what happened here..

Ed

PS. email problems at this end...


> -------------
> oscar login: Unable to handle kernel paging request at virtual address
> 8978408f printing eip:
> c012b364
> *pde = 00000000
> Oops: 0002
> af_packet snd-seq-midi snd-seq-oss snd-seq-midi-event snd-seq snd-pcm-oss
> snd-mixer-oss snd-cs46xx snd-pcm snd-timer snd-rawmidi snd-seq-device
> snd-ac97-codec snd soundcore gameport softdog matroxfb_base matroxfb_g450
> matroxfb_DAC1064 g450_pll matroxfb_accel matroxfb_misc fbcon-cfb16
> fbcon-cfb8 fbcon-cfb24 fbcon-cfb32 mga agpgart pppoe pppox ipchains msdos
> fat sd_mod floppy dummy bsd_comp ppp_generic slhc parport_pc lp parport
> ipip smbfs binfmt_aout autofs4 cdrom via-rhine mii tulip crc32 usb-storage
> scsi_mod hid pl2303 usbserial CPU:    0
> EIP:    0060:[<c012b364>]    Not tainted
> EFLAGS: 00010012
> EIP is at free_block+0x50/0xe4
> eax: 8978408b   ebx: dc2ad240   ecx: dc2bd000   edx: 558ba445
> esi: dffec21c   edi: 00000004   ebp: dffec228   esp: c0295eec
> ds: 0068   es: 0068   ss: 0068
> Process swapper (pid: 0, threadinfo=c0294000 task=c02596c0)
> Stack: 00000008 c173a400 c173a410 dffec21c c0295f18 c173a420 c012b86e
> dffec21c c173a410 00000008 c0353b1c c0294000 c02ab480 00000000 dffec408
> c0294000 dffec288 c011b6ef 00000000 00000000 c032fc60 fffffffe c032fc60
> c012b7ec Call Trace:
>  [<c012b86e>] reap_timer_fnc+0x82/0x478
>  [<c011b6ef>] run_timer_tasklet+0xe7/0x130
>  [<c012b7ec>] reap_timer_fnc+0x0/0x478
>  [<c01187e8>] tasklet_hi_action+0x3c/0x60
>  [<c011860b>] do_softirq+0x5b/0xac
>  [<c0108560>] do_IRQ+0xfc/0x114
>  [<c01052e0>] default_idle+0x0/0x28
>  [<c0105000>] stext+0x0/0x50
>  [<c01070e8>] common_interrupt+0x18/0x20
>  [<c01052e0>] default_idle+0x0/0x28
>  [<c0105000>] stext+0x0/0x50
>  [<c0105303>] default_idle+0x23/0x28
>  [<c0105374>] cpu_idle+0x28/0x38
>  [<c010504d>] stext+0x4d/0x50
>
> Code: 89 50 04 89 02 2b 59 0c 89 d8 31 d2 f7 76 30 89 c3 8b 41 14
>  <0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>
> Hope this helps,
> Ed Tomlinson


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-10-15 11:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-11  9:28 2.5.41-mm3 Andrew Morton
2002-10-11 10:58 ` 2.5.41-mm3 Henrik Storner
2002-10-11 13:38 ` 2.5.41-mm3 William Lee Irwin III
2002-10-12  2:29 ` 2.5.41-mm3 Daniel Phillips
2002-10-12  2:31   ` 2.5.41-mm3 William Lee Irwin III
2002-10-12  3:26   ` 2.5.41-mm3 Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2002-10-11 10:26 2.5.41-mm3 Con Kolivas
2002-10-11 11:40 ` 2.5.41-mm3 Anton Blanchard
     [not found] <3DA683F4.944DFC11@digeo.com>
2002-10-11 12:37 ` 2.5.41-mm3 Ed Tomlinson
2002-10-12 14:21   ` 2.5.41-mm3 Ed Tomlinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox