Linux-mm Archive on lore.kernel.org

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [patch] fix extra page ref count in follow_hugetlb_page
From: Chen, Kenneth W @ 2006-03-30 23:19 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, 'Adam Litke'

"[PATCH] optimize follow_hugetlb_page" breaks mlock on hugepage areas.

I mis-interpret pages argument and made get_page() unconditional.  It
should only get a ref count when "pages" argument is non-null.

Credit goes to Adam Litke who spotted the bug.


Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Adam Litke <agl@us.ibm.com>


--- ./mm/hugetlb.c.orig	2006-03-30 15:54:20.000000000 -0800
+++ ./mm/hugetlb.c	2006-03-30 15:54:56.000000000 -0800
@@ -555,9 +555,10 @@ int follow_hugetlb_page(struct mm_struct
 		pfn_offset = (vaddr & ~HPAGE_MASK) >> PAGE_SHIFT;
 		page = pte_page(*pte);
 same_page:
-		get_page(page);
-		if (pages)
+		if (pages) {
+			get_page(page);
 			pages[i] = page + pfn_offset;
+		}
 
 		if (vmas)
 			vmas[i] = vma;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Rafael J. Wysocki @ 2006-03-30 20:57 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603310638.23873.kernel@kolivas.org>

On Thursday 30 March 2006 22:38, Con Kolivas wrote:
> On Friday 31 March 2006 03:12, Rafael J. Wysocki wrote:
> > OK, I have the following observations:
> 
> Thanks.
> >
> > 1) The patch generally causes more memory to be freed during suspend than
> > the unpatched code (good).
> 
> Yes I know you meant less, that's good.
> 
> > 2) However, if more than 50% of RAM is used by application data, it causes
> > the swap prefetch to trigger during resume (that's an impression; anyway
> > the system swaps in a lot at that time), which takes some time (generally
> > it makes resume 5-10s longer on my box).
> 
> Is that with this "swsusp shrink_all_memory tweaks" patch alone? It doesn't 
> touch swap prefetch.

Still swap prefetch is present in -mm so it can be triggered incidentally
I think.
 
> > 3) The problem with returning zero prematurely has not been entirely
> > eliminated.  It's happened for me only once, though.
> 
> Probably hard to say, but is the system in any better state after resume has 
> completed?

It seems so, but it also depends on the (actual) image size, memory usage
before suspend etc.  Well ...

> That was one of the aims. Also a major part of this patch is a cleanup of
> the hot balance_pgdat function as well, which suspend no longer touches with
> this patch.

I think the patch is a good idea overall, but it needs some more testing.
I'll try to figure out a way to measure its performance, so we have some
hard data to discuss.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Con Kolivas @ 2006-03-30 20:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603301912.32204.rjw@sisk.pl>

On Friday 31 March 2006 03:12, Rafael J. Wysocki wrote:
> OK, I have the following observations:

Thanks.
>
> 1) The patch generally causes more memory to be freed during suspend than
> the unpatched code (good).

Yes I know you meant less, that's good.

> 2) However, if more than 50% of RAM is used by application data, it causes
> the swap prefetch to trigger during resume (that's an impression; anyway
> the system swaps in a lot at that time), which takes some time (generally
> it makes resume 5-10s longer on my box).

Is that with this "swsusp shrink_all_memory tweaks" patch alone? It doesn't 
touch swap prefetch.

> 3) The problem with returning zero prematurely has not been entirely
> eliminated.  It's happened for me only once, though.

Probably hard to say, but is the system in any better state after resume has 
completed? That was one of the aims. Also a major part of this patch is a 
cleanup of the hot balance_pgdat function as well, which suspend no longer 
touches with this patch.

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Rafael J. Wysocki @ 2006-03-30 18:37 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603301912.32204.rjw@sisk.pl>

[update]

On Thursday 30 March 2006 19:12, Rafael J. Wysocki wrote:
> On Friday 24 March 2006 17:14, Rafael J. Wysocki wrote:
> > On Friday 24 March 2006 16:30, Con Kolivas wrote:
> > > On Saturday 25 March 2006 02:16, Rafael J. Wysocki wrote:
> > > > On Friday 24 March 2006 08:07, Con Kolivas wrote:
> > > > > On Tuesday 21 March 2006 05:46, Rafael J. Wysocki wrote:
> > > > > > swsusp_shrink_memory() is still wrong, because it will always fail for
> > > > > > image_size = 0.  My bad, sorry.
> > > > > >
> > > > > > The appended patch (on top of yours) should fix that (hope I did it
> > > > > > right this time).
> > > > >
> > > > > Well I discovered that if all the necessary memory is freed in one call
> > > > > to shrink_all_memory we don't get the nice updating printout from
> > > > >  swsusp_shrink_memory telling us we're making progress. So instead of
> > > > >  modifying the function to call shrink_all_memory with the full amount
> > > > > (and since we've botched swsusp_shrink_memory a few times between us), we
> > > > > should limit it to a max of SHRINK_BITEs instead.
> > > > >
> > > > >  This patch is fine standalone.
> > > > >
> > > > >  Rafael, Pavel what do you think of this one?
> > > >
> > > > In principle it looks good to me, but when I tested the previous one I
> > > > noticed shrink_all_memory() tended to return 0 prematurely (ie. when it was
> > > > possible to free some more pages).  It only happened if more than 50% of
> > > > memory was occupied by application data.
> > > >
> > > > Unfortunately I couldn't find the reason.
> > > 
> > > Perhaps it was just trying to free up too much in one go. There are a number 
> > > of steps a mapped page needs to go through before being finally swapped and 
> > > there are a limited number of iterations over it. Limiting it to SHRINK_BITEs 
> > > at a time will probably improve that.
> > 
> > OK [I'll be testing it for the next couple of days.]
> 
> OK, I have the following observations:
> 
> 1) The patch generally causes more memory to be freed during suspend than
> the unpatched code (good).

Oops.  s/more/less/

By which I mean with the patch applied the actual image size is usually closer
to the value of image_size.

> 2) However, if more than 50% of RAM is used by application data, it causes
> the swap prefetch to trigger during resume (that's an impression; anyway
> the system swaps in a lot at that time), which takes some time (generally
> it makes resume 5-10s longer on my box).
> 3) The problem with returning zero prematurely has not been entirely
> eliminated.  It's happened for me only once, though.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Rafael J. Wysocki @ 2006-03-30 17:12 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603241714.48909.rjw@sisk.pl>

Hi,

On Friday 24 March 2006 17:14, Rafael J. Wysocki wrote:
> On Friday 24 March 2006 16:30, Con Kolivas wrote:
> > On Saturday 25 March 2006 02:16, Rafael J. Wysocki wrote:
> > > On Friday 24 March 2006 08:07, Con Kolivas wrote:
> > > > On Tuesday 21 March 2006 05:46, Rafael J. Wysocki wrote:
> > > > > swsusp_shrink_memory() is still wrong, because it will always fail for
> > > > > image_size = 0.  My bad, sorry.
> > > > >
> > > > > The appended patch (on top of yours) should fix that (hope I did it
> > > > > right this time).
> > > >
> > > > Well I discovered that if all the necessary memory is freed in one call
> > > > to shrink_all_memory we don't get the nice updating printout from
> > > >  swsusp_shrink_memory telling us we're making progress. So instead of
> > > >  modifying the function to call shrink_all_memory with the full amount
> > > > (and since we've botched swsusp_shrink_memory a few times between us), we
> > > > should limit it to a max of SHRINK_BITEs instead.
> > > >
> > > >  This patch is fine standalone.
> > > >
> > > >  Rafael, Pavel what do you think of this one?
> > >
> > > In principle it looks good to me, but when I tested the previous one I
> > > noticed shrink_all_memory() tended to return 0 prematurely (ie. when it was
> > > possible to free some more pages).  It only happened if more than 50% of
> > > memory was occupied by application data.
> > >
> > > Unfortunately I couldn't find the reason.
> > 
> > Perhaps it was just trying to free up too much in one go. There are a number 
> > of steps a mapped page needs to go through before being finally swapped and 
> > there are a limited number of iterations over it. Limiting it to SHRINK_BITEs 
> > at a time will probably improve that.
> 
> OK [I'll be testing it for the next couple of days.]

OK, I have the following observations:

1) The patch generally causes more memory to be freed during suspend than
the unpatched code (good).
2) However, if more than 50% of RAM is used by application data, it causes
the swap prefetch to trigger during resume (that's an impression; anyway
the system swaps in a lot at that time), which takes some time (generally
it makes resume 5-10s longer on my box).
3) The problem with returning zero prematurely has not been entirely
eliminated.  It's happened for me only once, though.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Renato hat damit 8 Kilos abgenommen
From: Wagner Anna @ 2006-03-30  7:55 UTC (permalink / raw)
  To: Erich

[-- Attachment #1: Type: text/plain, Size: 102 bytes --]

Guten Morgen Erich,
Renato hat damit 8 Kilos abgenommen
http://imimoc.topworldduo.info/?62744608&los

[-- Attachment #2: Type: text/html, Size: 2600 bytes --]

^ permalink raw reply

* Re: [PATCH 2.6.16] mm: POSIX Memory Lock
From: Andrew Morton @ 2006-03-30  0:29 UTC (permalink / raw)
  To: Stone Wang; +Cc: nickpiggin, linux-mm, linux-kernel
In-Reply-To: <bc56f2f0603290531v2680a403tb30ad1bf94cc1d68@mail.gmail.com>

"Stone Wang" <pwstone@gmail.com> wrote:
>
> Currently, Linux's mlock series memory locks/unlocks may fail with
> part of their jobs done, thus may confuse the programmers of which
> part of memory is locked, which is not.
> 
> While a better implementation is transaction-like POSIX memory lock.
> 
> POSIX mlock/munlock :
> 
> http://www.opengroup.org/onlinepubs/009695399/functions/mlock.html
> 
> RETURN VALUE
> 
>    Upon successful completion, the mlock() and munlock() functions
> shall return a value of zero. Otherwise, no change is made to any
> locks in the address space of the process, and the function shall
> return a value of -1 and set errno to indicate the error.
> 
> POSIX mlockall/munlockall :
> 
> http://www.opengroup.org/onlinepubs/009695399/functions/mlockall.html
> 
> RETURN VALUE
> 
>    Upon successful completion, the mlockall() function shall return a
> value of zero.  Otherwise, no additional memory shall be locked, and
> the function shall return a value of -1 and set errno to indicate the
> error. The effect of failure of mlockall() on previously existing
> locks in the address space is unspecified.
> 
>    If it is supported by the implementation, the munlockall() function
> shall always return a value of zero. Otherwise, the function shall
> return a value of -1 and set errno to indicate the error.
> 
> 
> The patch try to fix this, tests proved it works.
> 
> Nick Piggin suggested that the patch submited alone, as well as using 1 bit of
> vm_flags instead of adding 1 member to vm_area_struct. Special thanks to him.
> Besides, the patch is largely rewritten to make it clearer.
> 

Thanks.  This will take about an hour to review :( VMA merging and
splitting aren't the simplest things in the world.

Anyway, I'll queue it up for some testing - but I'm not sure when I (or
anyone else) will have the bandwidth for a line-by-line review, and that's
what it needs.

The mlockall/munlockall approach is nice.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 2.6.16] mm: POSIX Memory Lock
From: Stone Wang @ 2006-03-29 13:31 UTC (permalink / raw)
  To: akpm; +Cc: Nick Piggin, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11681 bytes --]

Currently, Linux's mlock series memory locks/unlocks may fail with
part of their jobs done, thus may confuse the programmers of which
part of memory is locked, which is not.

While a better implementation is transaction-like POSIX memory lock.

POSIX mlock/munlock :

http://www.opengroup.org/onlinepubs/009695399/functions/mlock.html

RETURN VALUE

   Upon successful completion, the mlock() and munlock() functions
shall return a value of zero. Otherwise, no change is made to any
locks in the address space of the process, and the function shall
return a value of -1 and set errno to indicate the error.

POSIX mlockall/munlockall :

http://www.opengroup.org/onlinepubs/009695399/functions/mlockall.html

RETURN VALUE

   Upon successful completion, the mlockall() function shall return a
value of zero.  Otherwise, no additional memory shall be locked, and
the function shall return a value of -1 and set errno to indicate the
error. The effect of failure of mlockall() on previously existing
locks in the address space is unspecified.

   If it is supported by the implementation, the munlockall() function
shall always return a value of zero. Otherwise, the function shall
return a value of -1 and set errno to indicate the error.


The patch try to fix this, tests proved it works.

Nick Piggin suggested that the patch submited alone, as well as using 1 bit of
vm_flags instead of adding 1 member to vm_area_struct. Special thanks to him.
Besides, the patch is largely rewritten to make it clearer.

Signed-off-by: Shaoping Wang <pwstone@gmail.com>

-----

 include/linux/mm.h |    1
 mm/mlock.c         |  283 ++++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 196 insertions(+), 88 deletions(-)


diff -urNp linux-2.6.16/include/linux/mm.h
linux-2.6.16-release/include/linux/mm.h
--- linux-2.6.16/include/linux/mm.h	2006-03-28 02:38:07.000000000 -0500
+++ linux-2.6.16-posixmlock/include/linux/mm.h	2006-03-29
05:40:55.000000000 -0500
@@ -166,6 +166,7 @@ extern unsigned int kobjsize(const void
 #define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_MAPPED_COPY	0x01000000	/* T if mapped copy of data (nommu mmap) */
 #define VM_INSERTPAGE	0x02000000	/* The vma has had
"vm_insert_page()" done on it */
+#define VM_CHANGELOCK	0x04000000	/* The vma just has VM_LOCKED bit changed */

 #ifndef VM_STACK_DEFAULT_FLAGS		/* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff -urNp linux-2.6.16/mm/mlock.c linux-2.6.16-release/mm/mlock.c
--- linux-2.6.16/mm/mlock.c	2006-03-28 02:38:07.000000000 -0500
+++ linux-2.6.16-posixmlock/mm/mlock.c	2006-03-29 05:38:59.000000000 -0500
@@ -3,6 +3,7 @@
  *
  *  (C) Copyright 1995 Linus Torvalds
  *  (C) Copyright 2002 Christoph Hellwig
+ *  (C) Copyright 2006 Peter Wang
  */

 #include <linux/capability.h>
@@ -11,72 +12,120 @@
 #include <linux/mempolicy.h>
 #include <linux/syscalls.h>

-
-static int mlock_fixup(struct vm_area_struct *vma, struct
vm_area_struct **prev,
-	unsigned long start, unsigned long end, unsigned int newflags)
+static int do_mlock(unsigned long start, size_t len,unsigned int jump_hole)
 {
-	struct mm_struct * mm = vma->vm_mm;
-	pgoff_t pgoff;
-	int pages;
+	unsigned long  end = 0, vmoff = 0;
+	unsigned long  pages = 0;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma, *prev, **pprev,*next;
 	int ret = 0;

-	if (newflags == vma->vm_flags) {
-		*prev = vma;
-		goto out;
-	}
+	len = PAGE_ALIGN(len);
+	end = start + len;
+	if (end < start)
+		return -EINVAL;
+	if (end == start)
+		return 0;

-	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
-			  vma->vm_file, pgoff, vma_policy(vma));
-	if (*prev) {
-		vma = *prev;
-		goto success;
-	}
+	vma = find_vma_prev(current->mm, start, &prev);
+	if (!vma || vma->vm_start > start)
+		return -ENOMEM;

-	*prev = vma;
+	while (vma->vm_start < end) {
+		if (vma->vm_flags & VM_LOCKED) {
+			if (vma->vm_end < end)
+				goto next;
+			else
+				break;
+		} else {
+			if (vma->vm_start < start) {
+				prev = vma;
+				ret = split_vma(mm, prev, start, 0);
+				if (!ret) {
+					vma = prev->vm_next;
+					vmoff = vma->vm_end;
+				}
+				else		
+					break;
+			}
+			if (vma->vm_end > end) {
+				ret = split_vma(mm, vma, end, 0);
+   				if (!ret)
+					vmoff = vma->vm_end;
+				else
+					break;
+			}
+		}

-	if (start != vma->vm_start) {
-		ret = split_vma(mm, vma, start, 1);
-		if (ret)
-			goto out;
-	}
+		pages += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;

-	if (end != vma->vm_end) {
-		ret = split_vma(mm, vma, end, 0);
-		if (ret)
-			goto out;
+		vma->vm_flags |= VM_LOCKED;
+		vma->vm_flags |= VM_CHANGELOCK;
+		vmoff = vma->vm_end;
+		if (!(vma->vm_flags & VM_IO)) {
+   			ret = make_pages_present(vma->vm_start, vma->vm_end);
+			if (ret)
+				break;
+		}
+next:
+		if (vma->vm_end == end)
+			break;
+		prev = vma;
+		vma = vma->vm_next;
+		
+		/* If called from do_mlockall,
+		 * we may jump over holes.
+		 */
+		if (jump_hole) {
+			if (vma)
+				continue;
+			else
+				break;
+		}
+		else if (!vma || vma->vm_start != prev->vm_end) {
+			ret = -ENOMEM;
+			break;
+		}
 	}

-success:
-	/*
-	 * vm_flags is protected by the mmap_sem held in write mode.
-	 * It's okay if try_to_unmap_one unmaps a page just after we
-	 * set VM_LOCKED, make_pages_present below will bring it back.
-	 */
-	vma->vm_flags = newflags;
+	pprev = &prev;
+	vma = find_vma_prev(mm, start, pprev);

-	/*
-	 * Keep track of amount of locked VM.
+	/*
+	 * Try to merge the vmas.
+	 * If error happened, rollback vmas to original status.
 	 */
-	pages = (end - start) >> PAGE_SHIFT;
-	if (newflags & VM_LOCKED) {
-		pages = -pages;
-		if (!(newflags & VM_IO))
-			ret = make_pages_present(start, end);
+	while (vma && vma->vm_end <= vmoff ) {
+		if (vma->vm_flags & VM_CHANGELOCK) {
+			vma->vm_flags &= ~VM_CHANGELOCK;
+			if (ret)
+				vma->vm_flags &= ~VM_LOCKED;
+		}
+		next = vma->vm_next;
+		if (next && (next->vm_flags & VM_CHANGELOCK)) {
+			next->vm_flags &= ~VM_CHANGELOCK;
+			if (ret)
+				next->vm_flags &= ~VM_LOCKED;
+		}
+		*pprev = vma_merge(mm, *pprev, vma->vm_start, vma->vm_end, vma->vm_flags,
+					vma->anon_vma,vma->vm_file, vma->vm_pgoff, vma_policy(vma));
+		if (*pprev)
+			vma = *pprev;
+		vma = vma->vm_next;
 	}

-	vma->vm_mm->locked_vm -= pages;
-out:
-	if (ret == -ENOMEM)
-		ret = -EAGAIN;
+	if (!ret)
+		mm->locked_vm += pages;
 	return ret;
 }

-static int do_mlock(unsigned long start, size_t len, int on)
+static int do_munlock(unsigned long start, size_t len, unsigned int jump_hole)
 {
-	unsigned long nstart, end, tmp;
-	struct vm_area_struct * vma, * prev;
-	int error;
+	unsigned long  end = 0,vmoff = 0;
+	unsigned long  pages = 0;
+	struct mm_struct *mm=current->mm;
+	struct vm_area_struct * vma, *prev, **pprev, *next;
+	int ret = 0;

 	len = PAGE_ALIGN(len);
 	end = start + len;
@@ -88,37 +137,86 @@ static int do_mlock(unsigned long start,
 	if (!vma || vma->vm_start > start)
 		return -ENOMEM;

-	if (start > vma->vm_start)
-		prev = vma;
-
-	for (nstart = start ; ; ) {
-		unsigned int newflags;
+	while (vma->vm_start < end) {
+		if (!(vma->vm_flags & VM_LOCKED)) {
+			if(vma->vm_end < end)
+				goto next;
+			else
+				break;
+		} else {
+			if (vma->vm_start < start) {
+				prev = vma;
+				ret = split_vma(mm, prev, start, 0);
+				if (!ret) {
+					vma = prev->vm_next;
+					vmoff = vma->vm_end;
+				}
+				else
+					break;
+			}
+			if (vma->vm_end > end) {
+				ret = split_vma(mm, vma, end, 0);
+				if (!ret)
+					vmoff = vma->vm_end;
+				else
+					break;
+			}
+		}

-		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
+		/* Delay clearing VM_LOCKED bit here,
+		 * thus make the possibly rollback easy.
+		 */
+		vma->vm_flags |= VM_CHANGELOCK;
+		vmoff = vma->vm_end;
+		pages += (vma->vm_end -vma->vm_start) >> PAGE_SHIFT;

-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!on)
-			newflags &= ~VM_LOCKED;
-
-		tmp = vma->vm_end;
-		if (tmp > end)
-			tmp = end;
-		error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
-		if (error)
-			break;
-		nstart = tmp;
-		if (nstart < prev->vm_end)
-			nstart = prev->vm_end;
-		if (nstart >= end)
+next:
+		if (vma->vm_end == end)
 			break;
+		prev = vma;
+		vma = vma->vm_next;

-		vma = prev->vm_next;
-		if (!vma || vma->vm_start != nstart) {
-			error = -ENOMEM;
+		/* If called from munlockall,
+		 * we may jump over holes.
+		 */
+		if (jump_hole) {
+			if (!vma)
+				break;
+			else
+				continue;
+		}
+		else if (!vma || (vma->vm_start != prev->vm_end)) {
+			ret = -ENOMEM;
 			break;
 		}
 	}
-	return error;
+
+	pprev = &prev;
+	vma = find_vma_prev(current->mm, start, pprev);
+
+	while (vma && vma->vm_end <= vmoff) {
+		if (vma->vm_flags & VM_CHANGELOCK) {
+			vma->vm_flags &= ~VM_CHANGELOCK;
+			if (!ret)
+				vma->vm_flags &=~VM_LOCKED;
+		}
+		next = vma->vm_next;
+		if (next && (next->vm_flags & VM_CHANGELOCK)) {
+			next->vm_flags &= ~VM_CHANGELOCK;
+			if (!ret)
+				next->vm_flags &= ~VM_LOCKED;
+		}
+		*pprev = vma_merge(mm, *pprev, vma->vm_start, vma->vm_end, vma->vm_flags,
+				vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma));
+		if (*pprev)
+			vma = *pprev;
+		vma = vma->vm_next;
+	}
+
+	if (!ret)
+		mm->locked_vm -= pages;
+	
+	return ret;
 }

 asmlinkage long sys_mlock(unsigned long start, size_t len)
@@ -142,7 +240,7 @@ asmlinkage long sys_mlock(unsigned long

 	/* check against resource limits */
 	if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
-		error = do_mlock(start, len, 1);
+		error = do_mlock(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 	return error;
 }
@@ -154,34 +252,43 @@ asmlinkage long sys_munlock(unsigned lon
 	down_write(&current->mm->mmap_sem);
 	len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
 	start &= PAGE_MASK;
-	ret = do_mlock(start, len, 0);
+	ret = do_munlock(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 	return ret;
 }

 static int do_mlockall(int flags)
 {
-	struct vm_area_struct * vma, * prev = NULL;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma;
 	unsigned int def_flags = 0;
+	unsigned long start;
+	int ret = 0;

 	if (flags & MCL_FUTURE)
 		def_flags = VM_LOCKED;
-	current->mm->def_flags = def_flags;
+	mm->def_flags = def_flags;
 	if (flags == MCL_FUTURE)
 		goto out;
+	vma = mm->mmap;
+	start = vma->vm_start;
+	ret = do_mlock(start, TASK_SIZE, 1);
+out:
+	return ret;
+}

-	for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
-		unsigned int newflags;
+static int do_munlockall(void)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma;
+	unsigned long start;
+	int ret;

-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!(flags & MCL_CURRENT))
-			newflags &= ~VM_LOCKED;
+	vma = mm->mmap;
+	start = vma->vm_start;
+	ret = do_munlock(start, TASK_SIZE, 1);

-		/* Ignore errors */
-		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
-	}
-out:
-	return 0;
+	return ret;
 }

 asmlinkage long sys_mlockall(int flags)
@@ -215,7 +322,7 @@ asmlinkage long sys_munlockall(void)
 	int ret;

 	down_write(&current->mm->mmap_sem);
-	ret = do_mlockall(0);
+	ret = do_munlockall();
 	up_write(&current->mm->mmap_sem);
 	return ret;
 }

[-- Attachment #2: patch-2.6.16-posixmlock --]
[-- Type: application/octet-stream, Size: 9556 bytes --]

diff -urNp linux-2.6.16/include/linux/mm.h linux-2.6.16-release/include/linux/mm.h
--- linux-2.6.16/include/linux/mm.h	2006-03-28 02:38:07.000000000 -0500
+++ linux-2.6.16-posixmlock/include/linux/mm.h	2006-03-29 05:40:55.000000000 -0500
@@ -166,6 +166,7 @@ extern unsigned int kobjsize(const void 
 #define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_MAPPED_COPY	0x01000000	/* T if mapped copy of data (nommu mmap) */
 #define VM_INSERTPAGE	0x02000000	/* The vma has had "vm_insert_page()" done on it */
+#define VM_CHANGELOCK	0x04000000	/* The vma just has VM_LOCKED bit changed */
 
 #ifndef VM_STACK_DEFAULT_FLAGS		/* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff -urNp linux-2.6.16/mm/mlock.c linux-2.6.16-release/mm/mlock.c
--- linux-2.6.16/mm/mlock.c	2006-03-28 02:38:07.000000000 -0500
+++ linux-2.6.16-posixmlock/mm/mlock.c	2006-03-29 05:38:59.000000000 -0500
@@ -3,6 +3,7 @@
  *
  *  (C) Copyright 1995 Linus Torvalds
  *  (C) Copyright 2002 Christoph Hellwig
+ *  (C) Copyright 2006 Peter Wang
  */
 
 #include <linux/capability.h>
@@ -11,72 +12,120 @@
 #include <linux/mempolicy.h>
 #include <linux/syscalls.h>
 
-
-static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
-	unsigned long start, unsigned long end, unsigned int newflags)
+static int do_mlock(unsigned long start, size_t len,unsigned int jump_hole)
 {
-	struct mm_struct * mm = vma->vm_mm;
-	pgoff_t pgoff;
-	int pages;
+	unsigned long  end = 0, vmoff = 0;
+	unsigned long  pages = 0;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma, *prev, **pprev,*next;
 	int ret = 0;
 
-	if (newflags == vma->vm_flags) {
-		*prev = vma;
-		goto out;
-	}
+	len = PAGE_ALIGN(len);
+	end = start + len;
+	if (end < start)
+		return -EINVAL;
+	if (end == start)
+		return 0;
 
-	pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
-	*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
-			  vma->vm_file, pgoff, vma_policy(vma));
-	if (*prev) {
-		vma = *prev;
-		goto success;
-	}
+	vma = find_vma_prev(current->mm, start, &prev);
+	if (!vma || vma->vm_start > start)
+		return -ENOMEM;
 
-	*prev = vma;
+	while (vma->vm_start < end) {
+		if (vma->vm_flags & VM_LOCKED) {
+			if (vma->vm_end < end)
+				goto next;
+			else
+				break;
+		} else {
+			if (vma->vm_start < start) {
+				prev = vma;
+				ret = split_vma(mm, prev, start, 0);
+				if (!ret) {
+					vma = prev->vm_next;
+					vmoff = vma->vm_end;
+				}
+				else		
+					break;
+			}
+			if (vma->vm_end > end) {
+				ret = split_vma(mm, vma, end, 0);
+   				if (!ret) 
+					vmoff = vma->vm_end;
+				else
+					break;
+			}
+		}
 
-	if (start != vma->vm_start) {
-		ret = split_vma(mm, vma, start, 1);
-		if (ret)
-			goto out;
-	}
+		pages += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
-	if (end != vma->vm_end) {
-		ret = split_vma(mm, vma, end, 0);
-		if (ret)
-			goto out;
+		vma->vm_flags |= VM_LOCKED;
+		vma->vm_flags |= VM_CHANGELOCK;
+		vmoff = vma->vm_end;
+		if (!(vma->vm_flags & VM_IO)) {
+   			ret = make_pages_present(vma->vm_start, vma->vm_end);
+			if (ret)
+				break;
+		}
+next:
+		if (vma->vm_end == end)
+			break;
+		prev = vma;
+		vma = vma->vm_next;
+		
+		/* If called from do_mlockall, 
+		 * we may jump over holes. 
+		 */
+		if (jump_hole) { 
+			if (vma)
+				continue;
+			else
+				break;
+		}
+		else if (!vma || vma->vm_start != prev->vm_end) {
+			ret = -ENOMEM;
+			break;
+		}
 	}
 
-success:
-	/*
-	 * vm_flags is protected by the mmap_sem held in write mode.
-	 * It's okay if try_to_unmap_one unmaps a page just after we
-	 * set VM_LOCKED, make_pages_present below will bring it back.
-	 */
-	vma->vm_flags = newflags;
+	pprev = &prev;
+	vma = find_vma_prev(mm, start, pprev);
 
-	/*
-	 * Keep track of amount of locked VM.
+	/* 
+	 * Try to merge the vmas.
+	 * If error happened, rollback vmas to original status.
 	 */
-	pages = (end - start) >> PAGE_SHIFT;
-	if (newflags & VM_LOCKED) {
-		pages = -pages;
-		if (!(newflags & VM_IO))
-			ret = make_pages_present(start, end);
+	while (vma && vma->vm_end <= vmoff ) {
+		if (vma->vm_flags & VM_CHANGELOCK) {
+			vma->vm_flags &= ~VM_CHANGELOCK;
+			if (ret)
+				vma->vm_flags &= ~VM_LOCKED;
+		}
+		next = vma->vm_next;
+		if (next && (next->vm_flags & VM_CHANGELOCK)) {
+			next->vm_flags &= ~VM_CHANGELOCK;
+			if (ret)
+				next->vm_flags &= ~VM_LOCKED;
+		}
+		*pprev = vma_merge(mm, *pprev, vma->vm_start, vma->vm_end, vma->vm_flags,
+					vma->anon_vma,vma->vm_file, vma->vm_pgoff, vma_policy(vma));
+		if (*pprev)
+			vma = *pprev;
+		vma = vma->vm_next;
 	}
 
-	vma->vm_mm->locked_vm -= pages;
-out:
-	if (ret == -ENOMEM)
-		ret = -EAGAIN;
+	if (!ret)
+		mm->locked_vm += pages;
 	return ret;
 }
 
-static int do_mlock(unsigned long start, size_t len, int on)
+static int do_munlock(unsigned long start, size_t len, unsigned int jump_hole)
 {
-	unsigned long nstart, end, tmp;
-	struct vm_area_struct * vma, * prev;
-	int error;
+	unsigned long  end = 0,vmoff = 0;
+	unsigned long  pages = 0;
+	struct mm_struct *mm=current->mm;
+	struct vm_area_struct * vma, *prev, **pprev, *next;
+	int ret = 0;
 
 	len = PAGE_ALIGN(len);
 	end = start + len;
@@ -88,37 +137,86 @@ static int do_mlock(unsigned long start,
 	if (!vma || vma->vm_start > start)
 		return -ENOMEM;
 
-	if (start > vma->vm_start)
-		prev = vma;
-
-	for (nstart = start ; ; ) {
-		unsigned int newflags;
+	while (vma->vm_start < end) {
+		if (!(vma->vm_flags & VM_LOCKED)) {
+			if(vma->vm_end < end)
+				goto next;
+			else
+				break;
+		} else {
+			if (vma->vm_start < start) {
+				prev = vma;
+				ret = split_vma(mm, prev, start, 0);
+				if (!ret) {
+					vma = prev->vm_next;
+					vmoff = vma->vm_end;
+				}
+				else 
+					break;
+			}
+			if (vma->vm_end > end) {
+				ret = split_vma(mm, vma, end, 0);
+				if (!ret)
+					vmoff = vma->vm_end;
+				else
+					break;
+			}
+		}
 
-		/* Here we know that  vma->vm_start <= nstart < vma->vm_end. */
+		/* Delay clearing VM_LOCKED bit here,
+		 * thus make the possibly rollback easy.
+		 */
+		vma->vm_flags |= VM_CHANGELOCK;
+		vmoff = vma->vm_end;
+		pages += (vma->vm_end -vma->vm_start) >> PAGE_SHIFT;
 
-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!on)
-			newflags &= ~VM_LOCKED;
-
-		tmp = vma->vm_end;
-		if (tmp > end)
-			tmp = end;
-		error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
-		if (error)
-			break;
-		nstart = tmp;
-		if (nstart < prev->vm_end)
-			nstart = prev->vm_end;
-		if (nstart >= end)
+next:
+		if (vma->vm_end == end)
 			break;
+		prev = vma;
+		vma = vma->vm_next;
 
-		vma = prev->vm_next;
-		if (!vma || vma->vm_start != nstart) {
-			error = -ENOMEM;
+		/* If called from munlockall,
+		 * we may jump over holes.
+		 */
+		if (jump_hole) {
+			if (!vma)
+				break;
+			else
+				continue;
+		}
+		else if (!vma || (vma->vm_start != prev->vm_end)) {
+			ret = -ENOMEM;
 			break;
 		}
 	}
-	return error;
+
+	pprev = &prev;
+	vma = find_vma_prev(current->mm, start, pprev);
+
+	while (vma && vma->vm_end <= vmoff) {
+		if (vma->vm_flags & VM_CHANGELOCK) {
+			vma->vm_flags &= ~VM_CHANGELOCK;
+			if (!ret)
+				vma->vm_flags &=~VM_LOCKED;
+		}
+		next = vma->vm_next;
+		if (next && (next->vm_flags & VM_CHANGELOCK)) {
+			next->vm_flags &= ~VM_CHANGELOCK;
+			if (!ret)
+				next->vm_flags &= ~VM_LOCKED;
+		}
+		*pprev = vma_merge(mm, *pprev, vma->vm_start, vma->vm_end, vma->vm_flags,
+				vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma));
+		if (*pprev)
+			vma = *pprev;
+		vma = vma->vm_next;
+	}
+
+	if (!ret)
+		mm->locked_vm -= pages;
+	
+	return ret;
 }
 
 asmlinkage long sys_mlock(unsigned long start, size_t len)
@@ -142,7 +240,7 @@ asmlinkage long sys_mlock(unsigned long 
 
 	/* check against resource limits */
 	if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
-		error = do_mlock(start, len, 1);
+		error = do_mlock(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 	return error;
 }
@@ -154,34 +252,43 @@ asmlinkage long sys_munlock(unsigned lon
 	down_write(&current->mm->mmap_sem);
 	len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
 	start &= PAGE_MASK;
-	ret = do_mlock(start, len, 0);
+	ret = do_munlock(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 	return ret;
 }
 
 static int do_mlockall(int flags)
 {
-	struct vm_area_struct * vma, * prev = NULL;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma;
 	unsigned int def_flags = 0;
+	unsigned long start;
+	int ret = 0;
 
 	if (flags & MCL_FUTURE)
 		def_flags = VM_LOCKED;
-	current->mm->def_flags = def_flags;
+	mm->def_flags = def_flags;
 	if (flags == MCL_FUTURE)
 		goto out;
+	vma = mm->mmap;
+	start = vma->vm_start;
+	ret = do_mlock(start, TASK_SIZE, 1);
+out:
+	return ret;
+}
 
-	for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
-		unsigned int newflags;
+static int do_munlockall(void)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct * vma;
+	unsigned long start;
+	int ret;
 
-		newflags = vma->vm_flags | VM_LOCKED;
-		if (!(flags & MCL_CURRENT))
-			newflags &= ~VM_LOCKED;
+	vma = mm->mmap;
+	start = vma->vm_start;
+	ret = do_munlock(start, TASK_SIZE, 1);
 
-		/* Ignore errors */
-		mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
-	}
-out:
-	return 0;
+	return ret;
 }
 
 asmlinkage long sys_mlockall(int flags)
@@ -215,7 +322,7 @@ asmlinkage long sys_munlockall(void)
 	int ret;
 
 	down_write(&current->mm->mmap_sem);
-	ret = do_mlockall(0);
+	ret = do_munlockall();
 	up_write(&current->mm->mmap_sem);
 	return ret;
 }

^ permalink raw reply

* Setting the PSE bit
From: VASM @ 2006-03-29  6:46 UTC (permalink / raw)
  To: linux-kernel, linux-mm

Hi ,
      I need some help for my project , I have 1024 contiguous 4 kb
pages in the memory (aligned to a 4mb boundary ) , i want to convert
these pages into one 4M page , I have written code in
do_anonymous_page()  , i have trapped my test program (which has a
mmap call for anonymous memory) in side this function and I want this
to work for this test process only , AFAIK the changes that need to be
done are , an new mk_pte_large should be added  where the PSE bit is
set and then use set_pte.
but is there any thing else that needs to be done , do we need to set
the pse bit in the pgd  , is yes , how ?
I am working on a intel 32 platform , I have read somewhere that a bit
in cr4 also needs to be set , is it already done or I'll have to do it
now.
and is there anything more that has to be done.

working on 2.4.32

--
Vasm

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 00/34] mm: Page Replacement Policy Framework
From: Elladan @ 2006-03-28 23:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Marcelo Tosatti, Andrew Morton, Peter Zijlstra, linux-mm,
	linux-kernel, bob.picco, iwamoto, christoph, wfg, npiggin, riel
In-Reply-To: <Pine.LNX.4.64.0603231003390.26286@g5.osdl.org>

On Thu, Mar 23, 2006 at 10:15:47AM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 23 Mar 2006, Marcelo Tosatti wrote:
> > 
> > IMHO the page replacement framework intent is wider than fixing the     
> > currently known performance problems.
> > 
> > It allows easier implementation of new algorithms, which are being
> > invented/adapted over time as necessity appears.
> 
> Yes and no.
> 
> It smells wonderful for a pluggable page replacement standpoint, but 
> here's a couple of observations/questions:
>  a) the current one actually seems to have beaten the on-comers (except 
>     for loads that were actually made up to try to defeat LRU)
>  b) is page replacement actually a huge issue?
> 
> Now, the reason I ask about (b) is that these days, you buy a Mac Mini, 
> and it comes with half a gig of RAM, and some apple users seem to worry 
> about the fact that the UMA graphics removes 50MB or something of that is 
> a problem.

Data point:

I run into swap all the time on my 1gig machine.  There are a few reasons for
this.

* Applications are incredibly bloated.  Just running a bunch of gnome apps
  sucks down 1000 megs almost instantly.  However, these apps don't seem to use
  most of the space they bloat into, so after a bit of fighting for VM the
  chaff gets forced out and they run fine.

* Apps are also incredibly buggy.  Eg. Firefox seems to leak up to 50 megs per
  second in some workloads, so I run it for a day or two and my machine tends
  to go heavily into swap.

* VM system prefers disk cache over applications.  Eg. updated runs at 
  3am and indexes all my files.  Since the applications were idle, the 
  VM decides to page out all my executables and fill my ram with page 
  cache which is only used once.  In the morning, my machine spends a few
  minutes paging everything back in.

* Similarly, I have a 2gig machine available, and it's also showing about 512MB
  swapped out and also 500MB free.

-J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Pavel Machek @ 2006-03-27 12:24 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Rafael J. Wysocki, Nick Piggin, linux list, ck list,
	Andrew Morton, linux-mm
In-Reply-To: <200603241807.41175.kernel@kolivas.org>

Hi!

> > swsusp_shrink_memory() is still wrong, because it will always fail for
> > image_size = 0.  My bad, sorry.
> >
> > The appended patch (on top of yours) should fix that (hope I did it right
> > this time).
> 
> Well I discovered that if all the necessary memory is freed in one call to
>  shrink_all_memory we don't get the nice updating printout from
>  swsusp_shrink_memory telling us we're making progress. So instead of
>  modifying the function to call shrink_all_memory with the full amount (and
>  since we've botched swsusp_shrink_memory a few times between us), we should
>  limit it to a max of SHRINK_BITEs instead.
> 
>  This patch is fine standalone.
> 
>  Rafael, Pavel what do you think of this one? 

Looks good to me (but I'm not a mm expert).
									Pavel

-- 
Picture of sleeping (Linux) penguin wanted...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations
From: Paul Jackson @ 2006-03-27  7:29 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, clameter, ak, linux-mm, linux-kernel
In-Reply-To: <20060324174448.0ac4a520.pj@sgi.com>

  (Executive, aka Andrew, summary: no action items here yet ...)

Christoph sent me some corrections offline to my previous post.

I (pj) had written:
> This patch does not always fix the problem that first motivated it of
> failed memory migrations,

I had misunderstood Christoph's patch.  He never intended to fix the
cpuset induced failure of memory migration.  He intended to restore 
proper behavior of the slab allocator and other kernel subsystems.

Part of my confusion arose from the fact that he took the occassion of
his patch to ask Andrew to drop an earlier patch of ours that -had-
intended, in part, to fix this cpuset-migration interaction.

And part of my confusion was just plain old confusion on my part.

>      If I get the chance this weekend, I will at least try to
>      write up an lkml post describing some of the '(mis)features' we
>      observed during our analysis of this area, under some such Subject
>      as "Misfeatures of the kernel allocators and memory policy."

I won't get that far.  I'm still working with Christoph offline to make
sense of this.  Hopefully I won't drive him to drink first ;-).

I still hope to have a much improved, agreed to by Christoph, patch to
fix the cpuset-migration interaction, posted to lkml in a day or two.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Lockless pagecache perhaps for 2.6.18?
From: Nigel Cunningham @ 2006-03-27  0:54 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Nick Piggin, Andrew Morton, Hugh Dickins, Andrea Arcangeli,
	Linux Memory Management List, Ingo Molnar
In-Reply-To: <4427353A.6060905@yahoo.com.au>

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

Hi Nick.

On Monday 27 March 2006 10:43, Nick Piggin wrote:
> Nigel Cunningham wrote:
> > Can I get a pointer to the patches and any docs please? Since I save the
> > page cache separately, I'd need a good understanding of the implications
> > of the changes.
>
> Hi Nigel,
>
> http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/lockless/2.6.
>16-rc5/
>
> There are some patches... a lot of them, but only the last 5 in the series
> matter (the rest are pretty much in 2.6.16-head).
>
> There is also a small doc on the lockless radix-tree in that directory. I'm
> in the process of writing some documentation on the lockless pagecache
> itself...
>
> You probably don't need to worry too much unless you are testing
> page_count() under the tree_lock, held for writing, expecting that to
> stabilise page_count. In which case I could have a look at your code and
> see if it would be a problem.

Thanks.

I'm not far from head now, so guess I have no problems with the rest.

From what you say about the other patches, I think I'm fine as far as the rest 
go too. I was mostly concerned that the modifications might make it possible 
for the lru to start changing while the image is being written. It looks to 
me now like I was being too paranoid (which isn't necessarily a bad thing, is 
it?).

Regards,

Nigel

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Lockless pagecache perhaps for 2.6.18?
From: Nick Piggin @ 2006-03-27  0:43 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Nick Piggin, Andrew Morton, Hugh Dickins, Andrea Arcangeli,
	Linux Memory Management List, Ingo Molnar
In-Reply-To: <200603262021.46276.ncunningham@cyclades.com>

Nigel Cunningham wrote:

> Can I get a pointer to the patches and any docs please? Since I save the page 
> cache separately, I'd need a good understanding of the implications of the 
> changes.
> 

Hi Nigel,

http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/lockless/2.6.16-rc5/

There are some patches... a lot of them, but only the last 5 in the series
matter (the rest are pretty much in 2.6.16-head).

There is also a small doc on the lockless radix-tree in that directory. I'm in
the process of writing some documentation on the lockless pagecache itself...

You probably don't need to worry too much unless you are testing page_count()
under the tree_lock, held for writing, expecting that to stabilise page_count.
In which case I could have a look at your code and see if it would be a
problem.

Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Lockless pagecache perhaps for 2.6.18?
From: Nigel Cunningham @ 2006-03-26 10:21 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli,
	Linux Memory Management List, Ingo Molnar
In-Reply-To: <20060323081100.GE26146@wotan.suse.de>

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

Hi Nick.

On Thursday 23 March 2006 18:11, Nick Piggin wrote:
> Hi,
>
> Would there be any objection to having my lockless pagecache patches
> merged into -mm, for a possible mainline merge after 2.6.17 (ie. if/
> when the mm hackers feel comfortable with it).
>
> There are now just 3 patches: 15 files, 312 insertions, 81 deletions
> for the core changes, including RCU radix-tree. (not counting those
> last two I just sent you Andrew (VM_BUG_ON, find_trylock_page))
>
> It is fairly well commented, and not overly complex (IMO) compared
> with other lockless stuff in the tree now.
>
> My main motivation is to get more testing and more serious reviews,
> rather than trying to clear a fast path into mainline.
>
> Nick

Can I get a pointer to the patches and any docs please? Since I save the page 
cache separately, I'd need a good understanding of the implications of the 
changes.

Regards,

Nigel

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Probleme löst Oswald hier
From: Wolke Beni @ 2006-03-25  3:34 UTC (permalink / raw)
  To: Nora

[-- Attachment #1: Type: text/plain, Size: 86 bytes --]

Guten Tag Nora,
Probleme lost Oswald hier
http://foibne.goseeworld.net/?eksbethikhmp

[-- Attachment #2: Type: text/html, Size: 2473 bytes --]

^ permalink raw reply

* Re: Add gfp flag __GFP_POLICY to control policies and cpusets redirection of allocations
From: Paul Jackson @ 2006-03-25  1:44 UTC (permalink / raw)
  To: akpm; +Cc: Christoph Lameter, ak, linux-mm, linux-kernel
In-Reply-To: <Pine.LNX.4.64.0603221342170.24959@schroedinger.engr.sgi.com>

Andrew,

I am NAQ'ing this patch, aka:

  add-gfp-flag-__gfp_policy-to-control-policies-and-cpusets-redirection.patch added to -mm tree

This patch does not always fix the problem that first motivated it of
failed memory migrations, and it changes the semantics of the
interaction of the kernel page allocators with the cpuset and mempolicy
memory policies in ways that, in my view, need more analysis first.

I intend to send a patch with a different solution on about Monday
three days from now, hopefully with Christoph's review and ACK.

Details ... for the curious:

  We have two sets of problems here.

  1) Invoking memory migration via the cpuset interface 'memory_migrate'
     would fail (do nothing, without complaint or explanation) if
     the task invoking the migration was not in the target cpuset of
     the migration.  This caused much confusion and befuddlement of
     Christoph, myself and our test engineers.

     The key problem was that we are trying to allocate the new pages
     to receive the migration in the context of the task invoking the
     migration.  If that tasks cpusets (or mbind mempolicy) doesn't allow
     allocation on those nodes, the migration will move the target
     task to some nodes that are in the invoking tasks cpuset instead.

     This needs fixing sooner rather than later.  The ordinary user
     of memory migration will often find it broken until we fix this.

     My next attempt to fix this will have the kernel migration code
     temporarilly and silently and automatically put the invoking task
     in the necessary cpuset, so that the migration code can allocate
     the new pages on the requested nodes.  I hope to prepare this
     patch this weekend, so Christoph can review it Monday, and we
     can submit it then.

  2) The GFP flags and the interaction with various kernel allocators
     and the cpuset and mm/mempolicy memory policies have some strange
     '(mis)features'.  In the normal case, when there is enough memory
     where asked for, they are ok.

     Or, at least, no one has actually noticed the breakage, even
     though much of it has been there for over a year.

     The 2 patches that Christoph and I sent so far (the above
     NAQ'd patch and its predecessor) both addressed some of these
     '(mis)features', with the side affect of fixing (most of the time,
     not all cases) the failed migrations of problem (1) above.

     But both patches were partial bandaids.

     More thought will be required before we offer up solutions for
     (2).  If I get the chance this weekend, I will at least try to
     write up an lkml post describing some of the '(mis)features' we
     observed during our analysis of this area, under some such Subject
     as "Misfeatures of the kernel allocators and memory policy."

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH][5/8] proc: export mlocked pages info through "/proc/meminfo: Wired"
From: Nick Piggin @ 2006-03-24 18:25 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stone Wang, akpm, linux-kernel, linux-mm
In-Reply-To: <Pine.LNX.4.63.0603241319130.30426@cuia.boston.redhat.com>

Rik van Riel wrote:
> On Sat, 25 Mar 2006, Nick Piggin wrote:
> 
>>Rik van Riel wrote:
>>
>>>On Wed, 22 Mar 2006, Nick Piggin wrote:
>>>
>>>
>>>>Why would you want to ever do something like that though? I don't think
>>>>you should use this name "just in case", unless you have some really good
>>>>potential usage in mind.
>>>
>>>ramfs
>>
>>Why would ramfs want its pages in this wired list? (I'm not so
>>familiar with it but I can't think of a reason).
> 
> 
> Because ramfs pages cannot be paged out, which makes them locked
> into memory the same way mlocked pages are.
> 

I don't understand why they need to be on any list though,
that isn't an internal ramfs specific structure (ie. not
the just-in-case wired list).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH][5/8] proc: export mlocked pages info through "/proc/meminfo: Wired"
From: Rik van Riel @ 2006-03-24 18:19 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Stone Wang, akpm, linux-kernel, linux-mm
In-Reply-To: <442420A2.80807@yahoo.com.au>

On Sat, 25 Mar 2006, Nick Piggin wrote:
> Rik van Riel wrote:
> > On Wed, 22 Mar 2006, Nick Piggin wrote:
> > 
> > > Why would you want to ever do something like that though? I don't think
> > > you should use this name "just in case", unless you have some really good
> > > potential usage in mind.
> > 
> > ramfs
> 
> Why would ramfs want its pages in this wired list? (I'm not so
> familiar with it but I can't think of a reason).

Because ramfs pages cannot be paged out, which makes them locked
into memory the same way mlocked pages are.

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH][0/8] (Targeting 2.6.17) Posix memory locking and balanced mlock-LRU semantic
From: Nick Piggin @ 2006-03-24 16:57 UTC (permalink / raw)
  To: Stone Wang; +Cc: akpm, linux-kernel, linux-mm
In-Reply-To: <bc56f2f0603240705y3b4abe3ej@mail.gmail.com>

Stone Wang wrote:
> 2006/3/21, Nick Piggin <nickpiggin@yahoo.com.au>:

>>In what way are we not now posix compliant now?
> 
> 
> Currently, Linux's mlock for example, may fail with  only part of its
> task finished.
> 
> While accroding to POSIX definition:
> 
> man mlock(2)
> 
> "
> RETURN VALUE
>        On success, mlock returns zero.  On error, -1 is returned, errno is set
>        appropriately, and no changes are made to  any  locks  in  the  address
>        space of the process.
> "
> 

Looks like you're right, so good catch. You should probably try to submit your
posix mlock patch by itself then. Make sure you look at the coding standards
though, and try to _really_ follow coding conventions of the file you're
modifying.

You also should make sure the patch works standalone (ie. not just as part of
a set). Oh, and introducing a new field in vma for a flag is probably not the
best option if you still have room in the vm_flags field.

And the patch changelog should contain the actual problem, and quote the
relevant part of the POSIX definition, if applicable.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH][5/8] proc: export mlocked pages info through "/proc/meminfo: Wired"
From: Nick Piggin @ 2006-03-24 16:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stone Wang, akpm, linux-kernel, linux-mm
In-Reply-To: <Pine.LNX.4.63.0603241133550.30426@cuia.boston.redhat.com>

Rik van Riel wrote:
> On Wed, 22 Mar 2006, Nick Piggin wrote:
> 
> 
>>Why would you want to ever do something like that though? I don't think 
>>you should use this name "just in case", unless you have some really 
>>good potential usage in mind.
> 
> 
> ramfs
> 

Why would ramfs want its pages in this wired list? (I'm not so
familiar with it but I can't think of a reason).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH][5/8] proc: export mlocked pages info through "/proc/meminfo: Wired"
From: Rik van Riel @ 2006-03-24 16:34 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Stone Wang, akpm, linux-kernel, linux-mm
In-Reply-To: <442098B6.5000607@yahoo.com.au>

On Wed, 22 Mar 2006, Nick Piggin wrote:

> Why would you want to ever do something like that though? I don't think 
> you should use this name "just in case", unless you have some really 
> good potential usage in mind.

ramfs

-- 
All Rights Reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Rafael J. Wysocki @ 2006-03-24 16:14 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603250230.08140.kernel@kolivas.org>

On Friday 24 March 2006 16:30, Con Kolivas wrote:
> On Saturday 25 March 2006 02:16, Rafael J. Wysocki wrote:
> > On Friday 24 March 2006 08:07, Con Kolivas wrote:
> > > On Tuesday 21 March 2006 05:46, Rafael J. Wysocki wrote:
> > > > swsusp_shrink_memory() is still wrong, because it will always fail for
> > > > image_size = 0.  My bad, sorry.
> > > >
> > > > The appended patch (on top of yours) should fix that (hope I did it
> > > > right this time).
> > >
> > > Well I discovered that if all the necessary memory is freed in one call
> > > to shrink_all_memory we don't get the nice updating printout from
> > >  swsusp_shrink_memory telling us we're making progress. So instead of
> > >  modifying the function to call shrink_all_memory with the full amount
> > > (and since we've botched swsusp_shrink_memory a few times between us), we
> > > should limit it to a max of SHRINK_BITEs instead.
> > >
> > >  This patch is fine standalone.
> > >
> > >  Rafael, Pavel what do you think of this one?
> >
> > In principle it looks good to me, but when I tested the previous one I
> > noticed shrink_all_memory() tended to return 0 prematurely (ie. when it was
> > possible to free some more pages).  It only happened if more than 50% of
> > memory was occupied by application data.
> >
> > Unfortunately I couldn't find the reason.
> 
> Perhaps it was just trying to free up too much in one go. There are a number 
> of steps a mapped page needs to go through before being finally swapped and 
> there are a limited number of iterations over it. Limiting it to SHRINK_BITEs 
> at a time will probably improve that.

OK [I'll be testing it for the next couple of days.]

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Con Kolivas @ 2006-03-24 15:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603241616.06687.rjw@sisk.pl>

On Saturday 25 March 2006 02:16, Rafael J. Wysocki wrote:
> On Friday 24 March 2006 08:07, Con Kolivas wrote:
> > On Tuesday 21 March 2006 05:46, Rafael J. Wysocki wrote:
> > > swsusp_shrink_memory() is still wrong, because it will always fail for
> > > image_size = 0.  My bad, sorry.
> > >
> > > The appended patch (on top of yours) should fix that (hope I did it
> > > right this time).
> >
> > Well I discovered that if all the necessary memory is freed in one call
> > to shrink_all_memory we don't get the nice updating printout from
> >  swsusp_shrink_memory telling us we're making progress. So instead of
> >  modifying the function to call shrink_all_memory with the full amount
> > (and since we've botched swsusp_shrink_memory a few times between us), we
> > should limit it to a max of SHRINK_BITEs instead.
> >
> >  This patch is fine standalone.
> >
> >  Rafael, Pavel what do you think of this one?
>
> In principle it looks good to me, but when I tested the previous one I
> noticed shrink_all_memory() tended to return 0 prematurely (ie. when it was
> possible to free some more pages).  It only happened if more than 50% of
> memory was occupied by application data.
>
> Unfortunately I couldn't find the reason.

Perhaps it was just trying to free up too much in one go. There are a number 
of steps a mapped page needs to go through before being finally swapped and 
there are a limited number of iterations over it. Limiting it to SHRINK_BITEs 
at a time will probably improve that.

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: swsusp shrink_all_memory tweaks
From: Rafael J. Wysocki @ 2006-03-24 15:16 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Nick Piggin, linux list, ck list, Andrew Morton, Pavel Machek,
	linux-mm
In-Reply-To: <200603241807.41175.kernel@kolivas.org>

On Friday 24 March 2006 08:07, Con Kolivas wrote:
> On Tuesday 21 March 2006 05:46, Rafael J. Wysocki wrote:
> > swsusp_shrink_memory() is still wrong, because it will always fail for
> > image_size = 0.  My bad, sorry.
> >
> > The appended patch (on top of yours) should fix that (hope I did it right
> > this time).
> 
> Well I discovered that if all the necessary memory is freed in one call to
>  shrink_all_memory we don't get the nice updating printout from
>  swsusp_shrink_memory telling us we're making progress. So instead of
>  modifying the function to call shrink_all_memory with the full amount (and
>  since we've botched swsusp_shrink_memory a few times between us), we should
>  limit it to a max of SHRINK_BITEs instead.
> 
>  This patch is fine standalone.
> 
>  Rafael, Pavel what do you think of this one? 

In principle it looks good to me, but when I tested the previous one I noticed
shrink_all_memory() tended to return 0 prematurely (ie. when it was possible
to free some more pages).  It only happened if more than 50% of memory was
occupied by application data.

Unfortunately I couldn't find the reason.

Greetings,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox