Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Andrew Morton @ 2006-05-12  4:30 UTC (permalink / raw)
  To: Nick Piggin
  Cc: a.p.zijlstra, clameter, torvalds, ak, rohitseth, mbligh, hugh,
	riel, andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <4463EA16.5090208@cyberone.com.au>

Nick Piggin <piggin@cyberone.com.au> wrote:
>
>  >So let's see.  We take a write fault, we mark the page dirty then we return
>  >to userspace which will proceed with the write and will mark the pte dirty.
>  >
>  >Later, the VM will write the page out.
>  >
>  >Later still, the pte will get cleaned by reclaim or by munmap or whatever
>  >and the page will be marked dirty and the page will again be written out. 
>  >Potentially needlessly.
>  >
> 
>  page_wrprotect also marks the page clean,

Oh.  I missed that when reading the comment which describes
page_wrprotect() (I do go on).

> so this window is very small.
>  The window is that the fault path might set_page_dirty, then throttle
>  on writeout, and the page gets written out before it really gets dirtied
>  by the application (which then has to fault again).

: int test_clear_page_dirty(struct page *page)
: {
: 	struct address_space *mapping = page_mapping(page);
: 	unsigned long flags;
: 
: 	if (mapping) {
: 		write_lock_irqsave(&mapping->tree_lock, flags);
: 		if (TestClearPageDirty(page)) {
: 			radix_tree_tag_clear(&mapping->page_tree,
: 						page_index(page),
: 						PAGECACHE_TAG_DIRTY);
: 			write_unlock_irqrestore(&mapping->tree_lock, flags);
: 			/*
: 			 * We can continue to use `mapping' here because the
: 			 * page is locked, which pins the address_space
: 			 */

So if userspace modifies the page right here, and marks the pte dirty.

: 			if (mapping_cap_account_dirty(mapping)) {
: 				page_wrprotect(page);

We just lost that pte dirty bit, and hence the user's data.

: 				dec_page_state(nr_dirty);
: 			}
: 			return 1;
: 		}
: 		write_unlock_irqrestore(&mapping->tree_lock, flags);
: 		return 0;
: 	}
: 	return TestClearPageDirty(page);
: }
: 

Which is just the sort of subtle and nasty problem I was referring to...

If that's correct then I guess we need the

                if (ptep_clear_flush_dirty(vma, addr, pte) ||
                                page_test_and_clear_dirty(page))
                        ret += set_page_dirty(page);

treatment in page_wrprotect().

Now I suppose it's not really a dataloss race, because in practice the
kernel is about to write this page to backing store anwyay.  I guess.  I
cannot immediately think of any clear_page_dirty() callers for whom that
won't be true.

Someone please convince me that this has all been thought about and is solid
as a rock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: Christoph Lameter @ 2006-05-12  3:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <20060512110825.7a49f17d.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> Hmm...it seems the kernel drivers assumes the pages will not moved if VM_LOCKED.
> I'm not sure which is better to replace all driver's VM_LOCKED to VM_DONTMOVE or
> to add VM_KEEPONMEMORY for mlock() codes and just modify the kernel core.

We could add a MCL_DONTMOVE to mlockall() because we need also some way 
for user space to pin pages and then add a VM_DONTMOVE to the vm 
flags. Then do a global search through the kernel source and replace 
VM_LOCKED in the drivers with VM_DONTMOVE. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: KAMEZAWA Hiroyuki @ 2006-05-12  2:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <Pine.LNX.4.64.0605111841060.17334@schroedinger.engr.sgi.com>

On Thu, 11 May 2006 18:43:13 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > > You are right but there may be system components (such as device drivers) 
> > > that require the page not to be moved. Without page migration VM_LOCKED 
> > > implies that the physical address stays the same. Kernel code may assume 
> > > that VM_LOCKED -> dont migrate.
> > > 
> > Hmm.. I think such pages should have extra refcnt to prevent migration.
> 
> refcnts are for temporary use. An extra refcnt will make page migration 
> retry until it gives up. It should not try to migrate an unmovable page.
> 
Hmm...it seems the kernel drivers assumes the pages will not moved if VM_LOCKED.
I'm not sure which is better to replace all driver's VM_LOCKED to VM_DONTMOVE or
to add VM_KEEPONMEMORY for mlock() codes and just modify the kernel core.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Nick Piggin @ 2006-05-12  1:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, clameter, torvalds, ak, rohitseth, mbligh, hugh,
	riel, andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <20060511080220.48688b40.akpm@osdl.org>


Andrew Morton wrote:

>Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>>
>>From: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>
>>People expressed the need to track dirty pages in shared mappings.
>>
>>Linus outlined the general idea of doing that through making clean
>>writable pages write-protected and taking the write fault.
>>
>>This patch does exactly that, it makes pages in a shared writable
>>mapping write-protected. On write-fault the pages are marked dirty and
>>made writable. When the pages get synced with their backing store, the
>>write-protection is re-instated.
>>
>>It survives a simple test and shows the dirty pages in /proc/vmstat.
>>
>
>It'd be nice to have more that a "simple test" done.  Bugs in this area
>will be subtle and will manifest in unpleasant ways.  That goes for both
>correctness and performance bugs.
>
>
>>Index: linux-2.6/mm/memory.c
>>===================================================================
>>--- linux-2.6.orig/mm/memory.c	2006-05-08 18:49:39.000000000 +0200
>>+++ linux-2.6/mm/memory.c	2006-05-09 09:15:11.000000000 +0200
>>@@ -49,6 +49,7 @@
>> #include <linux/module.h>
>> #include <linux/init.h>
>> #include <linux/mm_page_replace.h>
>>+#include <linux/backing-dev.h>
>> 
>> #include <asm/pgalloc.h>
>> #include <asm/uaccess.h>
>>@@ -2077,6 +2078,7 @@ static int do_no_page(struct mm_struct *
>> 	unsigned int sequence = 0;
>> 	int ret = VM_FAULT_MINOR;
>> 	int anon = 0;
>>+	struct page *dirty_page = NULL;
>> 
>> 	pte_unmap(page_table);
>> 	BUG_ON(vma->vm_flags & VM_PFNMAP);
>>@@ -2150,6 +2152,11 @@ retry:
>> 		entry = mk_pte(new_page, vma->vm_page_prot);
>> 		if (write_access)
>> 			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>>+		else if (VM_SharedWritable(vma)) {
>>+			struct address_space *mapping = page_mapping(new_page);
>>+			if (mapping && mapping_cap_account_dirty(mapping))
>>+				entry = pte_wrprotect(entry);
>>+		}
>> 		set_pte_at(mm, address, page_table, entry);
>> 		if (anon) {
>> 			inc_mm_counter(mm, anon_rss);
>>@@ -2159,6 +2166,10 @@ retry:
>> 		} else {
>> 			inc_mm_counter(mm, file_rss);
>> 			page_add_file_rmap(new_page);
>>+			if (write_access) {
>>+				dirty_page = new_page;
>>+				get_page(dirty_page);
>>+			}
>>
>
>So let's see.  We take a write fault, we mark the page dirty then we return
>to userspace which will proceed with the write and will mark the pte dirty.
>
>Later, the VM will write the page out.
>
>Later still, the pte will get cleaned by reclaim or by munmap or whatever
>and the page will be marked dirty and the page will again be written out. 
>Potentially needlessly.
>

page_wrprotect also marks the page clean, so this window is very small.
The window is that the fault path might set_page_dirty, then throttle
on writeout, and the page gets written out before it really gets dirtied
by the application (which then has to fault again).

>
>How much extra IO will we be doing because of this change?
>

Of course it can do potentially quite a lot more IO in some cases, if
an application likes to dirty a working set larger than the writeout
thresholds... the same scenario as write(2) has now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: Christoph Lameter @ 2006-05-12  1:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <20060512103553.fafce5b2.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> > What precise information would be needed? We could return the current node 
> > information in a status array. Right I forgot to include the status array 
> > that returns success / or failure of the call. The status array would 
> > allow to find out the failure reason for each page.
> > 
> I'm sorry I missed "F.e. user space..."
> BTW, we can get statistics of off-node-access for each vma now ?

You can do that by programming the PMU (IA64) to notify you on each long 
latency memory access.

> > You are right but there may be system components (such as device drivers) 
> > that require the page not to be moved. Without page migration VM_LOCKED 
> > implies that the physical address stays the same. Kernel code may assume 
> > that VM_LOCKED -> dont migrate.
> > 
> Hmm.. I think such pages should have extra refcnt to prevent migration.

refcnts are for temporary use. An extra refcnt will make page migration 
retry until it gives up. It should not try to migrate an unmovable page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: KAMEZAWA Hiroyuki @ 2006-05-12  1:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <Pine.LNX.4.64.0605111758400.17334@schroedinger.engr.sgi.com>

On Thu, 11 May 2006 18:06:20 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:
> 
> > > 4. A new system call for the migration of lists of pages (incomplete
> > >    implementation!)
> > > 
> > >    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
> > >    		int *nodes, unsigned int flags);
> > > 
> > >    This function would migrate individual pages of a process to specific nodes.
> > >    F.e. user space tools exist that can provide off node access statistics
> > >    that show from what node a pages is most frequently accessed.
> > >    Additional code could then use this new system call to migrate the lists
> > >    of pages to the more advantageous location. Automatic page migration
> > >    could be implemented in user space. Many of us remain unconvinced that
> > >    automatic page migration can provide a consistent benefit.
> > >    This API would allow the implementation of various automatic migration
> > >    methods without changes to the kernel.
> > > 
> > Maybe implementing the interface to show necessary information to do this is
> > necessary before doing this. A user process can get enough precise information now ?
> 
> What precise information would be needed? We could return the current node 
> information in a status array. Right I forgot to include the status array 
> that returns success / or failure of the call. The status array would 
> allow to find out the failure reason for each page.
> 
I'm sorry I missed "F.e. user space..."
BTW, we can get statistics of off-node-access for each vma now ?



> > > - Implement the migration of mlocked pages. This would mean to ignore
> > >   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
> > >   migration of pages. If we allow the migration of mlocked pages then we
> > >   would need to introduce some alternate means of being able to declare a
> > >   page not migratable (VM_DONTMIGRATE?).
> > >   Not sure if this should be done at all.
> > > 
> > I think VM_LOCKED just means the address has the physical page. So I think
> > migration is Okay. But I don't think VM_DONTMIGRATE is necessary..
> 
> You are right but there may be system components (such as device drivers) 
> that require the page not to be moved. Without page migration VM_LOCKED 
> implies that the physical address stays the same. Kernel code may assume 
> that VM_LOCKED -> dont migrate.
> 
Hmm.. I think such pages should have extra refcnt to prevent migration.


-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: Christoph Lameter @ 2006-05-12  1:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <20060512095614.7f3d2047.kamezawa.hiroyu@jp.fujitsu.com>

On Fri, 12 May 2006, KAMEZAWA Hiroyuki wrote:

> > 4. A new system call for the migration of lists of pages (incomplete
> >    implementation!)
> > 
> >    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
> >    		int *nodes, unsigned int flags);
> > 
> >    This function would migrate individual pages of a process to specific nodes.
> >    F.e. user space tools exist that can provide off node access statistics
> >    that show from what node a pages is most frequently accessed.
> >    Additional code could then use this new system call to migrate the lists
> >    of pages to the more advantageous location. Automatic page migration
> >    could be implemented in user space. Many of us remain unconvinced that
> >    automatic page migration can provide a consistent benefit.
> >    This API would allow the implementation of various automatic migration
> >    methods without changes to the kernel.
> > 
> Maybe implementing the interface to show necessary information to do this is
> necessary before doing this. A user process can get enough precise information now ?

What precise information would be needed? We could return the current node 
information in a status array. Right I forgot to include the status array 
that returns success / or failure of the call. The status array would 
allow to find out the failure reason for each page.

> > 5. vma migration hooks
> >    Adds a new function call "migrate" to the vm_operations structure. The
> >    vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
> >    to implement their own migration schemes. Currently there is no user of
> >    such functionality. The uncached allocator for IA64 could potentially use
> >    such vma migration hooks.
> > 
> uncached allocator doesn't use struct address_space ?

Right.

> > - Implement the migration of mlocked pages. This would mean to ignore
> >   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
> >   migration of pages. If we allow the migration of mlocked pages then we
> >   would need to introduce some alternate means of being able to declare a
> >   page not migratable (VM_DONTMIGRATE?).
> >   Not sure if this should be done at all.
> > 
> I think VM_LOCKED just means the address has the physical page. So I think
> migration is Okay. But I don't think VM_DONTMIGRATE is necessary..

You are right but there may be system components (such as device drivers) 
that require the page not to be moved. Without page migration VM_LOCKED 
implies that the physical address stays the same. Kernel code may assume 
that VM_LOCKED -> dont migrate.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Status and the future of page migration
From: KAMEZAWA Hiroyuki @ 2006-05-12  0:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, ak, pj, kravetz, marcelo.tosatti, taka,
	lee.schermerhorn, haveblue
In-Reply-To: <Pine.LNX.4.64.0605111703020.17098@schroedinger.engr.sgi.com>

On Thu, 11 May 2006 17:06:31 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> Some additional patches for page migration are at
> ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc3-mm1/.
> These are in testing and need work. Feedback on these would be useful.
> 
Thank you for clarification and, of course, your works.

> 1. Restructure migrate_pages() so that the current goto mess is avoided. This
>    extracts two functions from migrate pages that deal with either taking the
>    page lock for the source or destination page.
> 
> 2. Dispose of migrated pages immediately. Moves the recycling of migrated
>    pages into migrate_pages(). Callers only have to deal with pages that
>    are still candidates for still could be repeated. This simplifies handling
>    but prevents potential necessary post processing of migrated pages.
>    Should we do this at all?
> 
I don't think this is necessary now.
Some codes may be going to use migrated pages, I think. For example, migrating
pages to create Hugepage size contigous pages. But this will not come in near
future ;)

> 3. Uses arrays to pass list of pages to migrate_pages().
>    Doing so will make a 1-1 association possible between the pages to be
>    migrated. If we have this 1-1 association then we can accurately allocate
>    pages for MPOL_INTERLEAVE during migration. Specifying
>    MPOL_INTERLEAVE|MPOL_MF_MOVE to mbind() could move all pages so that they
>    follow the best interleave pattern accurately.
> 
I like this. 

> 4. A new system call for the migration of lists of pages (incomplete
>    implementation!)
> 
>    sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
>    		int *nodes, unsigned int flags);
> 
>    This function would migrate individual pages of a process to specific nodes.
>    F.e. user space tools exist that can provide off node access statistics
>    that show from what node a pages is most frequently accessed.
>    Additional code could then use this new system call to migrate the lists
>    of pages to the more advantageous location. Automatic page migration
>    could be implemented in user space. Many of us remain unconvinced that
>    automatic page migration can provide a consistent benefit.
>    This API would allow the implementation of various automatic migration
>    methods without changes to the kernel.
> 
Maybe implementing the interface to show necessary information to do this is
necessary before doing this. A user process can get enough precise information now ?


> 5. vma migration hooks
>    Adds a new function call "migrate" to the vm_operations structure. The
>    vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
>    to implement their own migration schemes. Currently there is no user of
>    such functionality. The uncached allocator for IA64 could potentially use
>    such vma migration hooks.
> 
uncached allocator doesn't use struct address_space ?

> Potential future work:
> 
> - Implement the migration of mlocked pages. This would mean to ignore
>   VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
>   migration of pages. If we allow the migration of mlocked pages then we
>   would need to introduce some alternate means of being able to declare a
>   page not migratable (VM_DONTMIGRATE?).
>   Not sure if this should be done at all.
> 
I think VM_LOCKED just means the address has the physical page. So I think
migration is Okay. But I don't think VM_DONTMIGRATE is necessary..

> - Migration of pages outside of a process context.
>   Currently page migration requires that a read lock on mmap_sem is held to
>   prevent the anonymous vmas from vanishing while we migrate pages.
>   If page migration would be used to remove all pages from a zone (like needed
>   by the memory hotplug project) then we would need to first find a way
>   to insure that the anon_vmas do not vanish under us.
>   We could f.e. take a read_lock on the one of the mm_structs that may be
>   discovered via the reverse maps.
> 
I think taking anon_vma->lock while migration is one way. But this will make
try_to_umap() dirtier...

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Linus Torvalds @ 2006-05-12  0:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: clameter, a.p.zijlstra, piggin, ak, rohitseth, mbligh, hugh, riel,
	andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <20060511164448.4686a2bd.akpm@osdl.org>


On Thu, 11 May 2006, Andrew Morton wrote:
> 
> I think that was me, back in my programming days.

How times flies ;)

> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

usemem was one of the tools I was thinking of, so this may well be it. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Status and the future of page migration
From: Christoph Lameter @ 2006-05-12  0:06 UTC (permalink / raw)
  To: linux-mm
  Cc: ak, pj, kravetz, marcelo.tosatti, kamezawa.hiroyu, taka,
	lee.schermerhorn, haveblue

The current page migration in Linus tree uses swap entries to track unmapped
anonymous pages and has the side effect of removing all references to file
backed pages. If multiple migrations run concurrently then we typically are
limited by contention around the tree_lock for swap space. We see migration
rates of around 600-900 MB/sec for a single migration and around 250MB/sec
for 4 concurrent migrations.

The code in Andrew's tree uses migration entries, restores ptes
to file backed pages and preserves the write enable bit. This means
that a process can be repeatedly migrated without loosing
the file backed pages that were not referenced in the intermediate
period. Also we avoid useless COW faults. The contention around
the swap tree_lock has been removed and so we see increased
migration rates for a single process of around 800-1GB/sec that then
only slightly degrades for 4 concurrent processes.

I would like to keep the features of page migraton as they are right now
in Andrew's tree until the patches have made it into Linus tree.

Some additional patches for page migration are at
ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc3-mm1/.
These are in testing and need work. Feedback on these would be useful.

1. Restructure migrate_pages() so that the current goto mess is avoided. This
   extracts two functions from migrate pages that deal with either taking the
   page lock for the source or destination page.

2. Dispose of migrated pages immediately. Moves the recycling of migrated
   pages into migrate_pages(). Callers only have to deal with pages that
   are still candidates for still could be repeated. This simplifies handling
   but prevents potential necessary post processing of migrated pages.
   Should we do this at all?

3. Uses arrays to pass list of pages to migrate_pages().
   Doing so will make a 1-1 association possible between the pages to be
   migrated. If we have this 1-1 association then we can accurately allocate
   pages for MPOL_INTERLEAVE during migration. Specifying
   MPOL_INTERLEAVE|MPOL_MF_MOVE to mbind() could move all pages so that they
   follow the best interleave pattern accurately.

4. A new system call for the migration of lists of pages (incomplete
   implementation!)

   sys_move_pages([int pid,?] int nr_pages, unsigned long *addresses,
   		int *nodes, unsigned int flags);

   This function would migrate individual pages of a process to specific nodes.
   F.e. user space tools exist that can provide off node access statistics
   that show from what node a pages is most frequently accessed.
   Additional code could then use this new system call to migrate the lists
   of pages to the more advantageous location. Automatic page migration
   could be implemented in user space. Many of us remain unconvinced that
   automatic page migration can provide a consistent benefit.
   This API would allow the implementation of various automatic migration
   methods without changes to the kernel.

5. vma migration hooks
   Adds a new function call "migrate" to the vm_operations structure. The
   vm_ops migration method may be used by vmas without page structs (PFN_MAP?)
   to implement their own migration schemes. Currently there is no user of
   such functionality. The uncached allocator for IA64 could potentially use
   such vma migration hooks.

Potential future work:

- Implement the migration of mlocked pages. This would mean to ignore
  VM_LOCKED in try_to_unmap. Currently VM_LOCKED can be used to prevent the
  migration of pages. If we allow the migration of mlocked pages then we
  would need to introduce some alternate means of being able to declare a
  page not migratable (VM_DONTMIGRATE?).
  Not sure if this should be done at all.

- Migration of pages outside of a process context.
  Currently page migration requires that a read lock on mmap_sem is held to
  prevent the anonymous vmas from vanishing while we migrate pages.
  If page migration would be used to remove all pages from a zone (like needed
  by the memory hotplug project) then we would need to first find a way
  to insure that the anon_vmas do not vanish under us. We could f.e. take
  a read_lock on the one of the mm_structs that may be discovered via the
  reverse maps.

Did I miss anything?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Andrew Morton @ 2006-05-11 23:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: clameter, a.p.zijlstra, piggin, ak, rohitseth, mbligh, hugh, riel,
	andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <Pine.LNX.4.64.0605111616490.3866@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
>
> What happened to the VM stress-test programs that we used to test the 
> page-out with? I forget who kept a collection of them around, but they did 
> things like trying to cause MM problems on purpose.

I think that was me, back in my programming days.

> And I'm pretty sure 
> some of the nastiest ones used shared mappings, exactly because we've had 
> problems with the virtual scanning.

http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

run-bash-shared-mapping.sh is a good stress-tester and deadlock-finder.

Running fsx-linux (in mmap-read and mmap-write and read and write mode) in
combination with memory pressure is a good correctness-tester.  Needs to be
run on various filesystems too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Linus Torvalds @ 2006-05-11 23:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Peter Zijlstra, piggin, ak, rohitseth, mbligh,
	hugh, riel, andrea, arjan, apw, mel, marcelo, anton, paulmck,
	linux-mm
In-Reply-To: <Pine.LNX.4.64.0605111546480.16571@schroedinger.engr.sgi.com>


On Thu, 11 May 2006, Christoph Lameter wrote:
> On Thu, 11 May 2006, Andrew Morton wrote:
> >
> > It'd be nice to have more that a "simple test" done.  Bugs in this area
> > will be subtle and will manifest in unpleasant ways.  That goes for both
> > correctness and performance bugs.
> 
> Standard tests such as AIM7 will not trigger these paths. It is rather
> unusual for small unix processes to have a shared writable mapping and 
> therefore I doubt that the typical benchmarks may show much of a 
> difference. These  types of mappings are more typical for large or 
> specialized apps. Be sure that the tests actually do dirty 
> pages in shared writeable mappings.

What happened to the VM stress-test programs that we used to test the 
page-out with? I forget who kept a collection of them around, but they did 
things like trying to cause MM problems on purpose. And I'm pretty sure 
some of the nastiest ones used shared mappings, exactly because we've had 
problems with the virtual scanning.

I have a very distinct memory of somebody (I'd like to say Con, but that's 
probably bogus) collecting a few programs that were known to cause nasty 
problems (like the system just becoming totally unresponsive). For
checking that things degraded reasonably before getting killed by OOM.

I'm talking the 2.4.x timeframe, so it's a few years ago. It might not be 
a real _benchmark_ per se, but I think it would be an interesting 
data-point whether the system acts "better" with some of those tests..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Christoph Lameter @ 2006-05-11 22:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, piggin, torvalds, ak, rohitseth, mbligh, hugh,
	riel, andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <20060511080220.48688b40.akpm@osdl.org>

On Thu, 11 May 2006, Andrew Morton wrote:

> > It survives a simple test and shows the dirty pages in /proc/vmstat.
> 
> It'd be nice to have more that a "simple test" done.  Bugs in this area
> will be subtle and will manifest in unpleasant ways.  That goes for both
> correctness and performance bugs.

Standard tests such as AIM7 will not trigger these paths. It is rather
unusual for small unix processes to have a shared writable mapping and 
therefore I doubt that the typical benchmarks may show much of a 
difference. These  types of mappings are more typical for large or 
specialized apps. Be sure that the tests actually do dirty 
pages in shared writeable mappings.

> > +int page_wrprotect(struct page *page)
> > +{
> > +	int ret = 0;
> > +
> > +	BUG_ON(!PageLocked(page));
> 
> hm.  So clear_page_dirty() and clear_page_dirty_for_io() are only ever
> called against a locked page?  I guess that makes sense, but it's not a
> guarantee which we had in the past.  It really _has_ to be true, because
> lock_page() is the only thing which can protect the address_space from
> memory reclaim in those two functions.

If that is true then we can get rid of atomic ops in both functions.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Andy Whitcroft @ 2006-05-11 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, clameter, piggin, torvalds, ak, rohitseth, mbligh,
	hugh, riel, andrea, arjan, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <20060511080220.48688b40.akpm@osdl.org>

Andrew Morton wrote:
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
>>
>>From: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>
>>People expressed the need to track dirty pages in shared mappings.
>>
>>Linus outlined the general idea of doing that through making clean
>>writable pages write-protected and taking the write fault.
>>
>>This patch does exactly that, it makes pages in a shared writable
>>mapping write-protected. On write-fault the pages are marked dirty and
>>made writable. When the pages get synced with their backing store, the
>>write-protection is re-instated.
>>
>>It survives a simple test and shows the dirty pages in /proc/vmstat.
> 
> 
> It'd be nice to have more that a "simple test" done.  Bugs in this area
> will be subtle and will manifest in unpleasant ways.  That goes for both
> correctness and performance bugs.

I'll kick off some testing of this stack and see what occurs.  Should
appear on t.k.o in due time.

Cheers.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
From: Andrew Morton @ 2006-05-11 15:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: clameter, piggin, torvalds, ak, rohitseth, mbligh, hugh, riel,
	andrea, arjan, apw, mel, marcelo, anton, paulmck, linux-mm
In-Reply-To: <1147207458.27680.19.camel@lappy>

Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> 
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> People expressed the need to track dirty pages in shared mappings.
> 
> Linus outlined the general idea of doing that through making clean
> writable pages write-protected and taking the write fault.
> 
> This patch does exactly that, it makes pages in a shared writable
> mapping write-protected. On write-fault the pages are marked dirty and
> made writable. When the pages get synced with their backing store, the
> write-protection is re-instated.
> 
> It survives a simple test and shows the dirty pages in /proc/vmstat.

It'd be nice to have more that a "simple test" done.  Bugs in this area
will be subtle and will manifest in unpleasant ways.  That goes for both
correctness and performance bugs.

> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c	2006-05-08 18:49:39.000000000 +0200
> +++ linux-2.6/mm/memory.c	2006-05-09 09:15:11.000000000 +0200
> @@ -49,6 +49,7 @@
>  #include <linux/module.h>
>  #include <linux/init.h>
>  #include <linux/mm_page_replace.h>
> +#include <linux/backing-dev.h>
>  
>  #include <asm/pgalloc.h>
>  #include <asm/uaccess.h>
> @@ -2077,6 +2078,7 @@ static int do_no_page(struct mm_struct *
>  	unsigned int sequence = 0;
>  	int ret = VM_FAULT_MINOR;
>  	int anon = 0;
> +	struct page *dirty_page = NULL;
>  
>  	pte_unmap(page_table);
>  	BUG_ON(vma->vm_flags & VM_PFNMAP);
> @@ -2150,6 +2152,11 @@ retry:
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		if (write_access)
>  			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		else if (VM_SharedWritable(vma)) {
> +			struct address_space *mapping = page_mapping(new_page);
> +			if (mapping && mapping_cap_account_dirty(mapping))
> +				entry = pte_wrprotect(entry);
> +		}
>  		set_pte_at(mm, address, page_table, entry);
>  		if (anon) {
>  			inc_mm_counter(mm, anon_rss);
> @@ -2159,6 +2166,10 @@ retry:
>  		} else {
>  			inc_mm_counter(mm, file_rss);
>  			page_add_file_rmap(new_page);
> +			if (write_access) {
> +				dirty_page = new_page;
> +				get_page(dirty_page);
> +			}

So let's see.  We take a write fault, we mark the page dirty then we return
to userspace which will proceed with the write and will mark the pte dirty.

Later, the VM will write the page out.

Later still, the pte will get cleaned by reclaim or by munmap or whatever
and the page will be marked dirty and the page will again be written out. 
Potentially needlessly.

How much extra IO will we be doing because of this change?

>  		return 0;
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c	2006-05-08 18:49:39.000000000 +0200
> +++ linux-2.6/mm/rmap.c	2006-05-08 18:53:34.000000000 +0200
> @@ -478,6 +478,72 @@ int page_referenced(struct page *page, i
>  	return referenced;
>  }
>  
> +static int page_wrprotect_one(struct page *page, struct vm_area_struct *vma)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	unsigned long address;
> +	pte_t *pte, entry;
> +	spinlock_t *ptl;
> +	int ret = 0;
> +
> +	address = vma_address(page, vma);
> +	if (address == -EFAULT)
> +		goto out;
> +
> +	pte = page_check_address(page, mm, address, &ptl);
> +	if (!pte)
> +		goto out;
> +
> +	if (!pte_write(*pte))
> +		goto unlock;
> +
> +	entry = pte_mkclean(pte_wrprotect(*pte));
> +	ptep_establish(vma, address, pte, entry);
> +	update_mmu_cache(vma, address, entry);
> +	lazy_mmu_prot_update(entry);
> +	ret = 1;
> +
> +unlock:
> +	pte_unmap_unlock(pte, ptl);
> +out:
> +	return ret;
> +}
> +
> +static int page_wrprotect_file(struct page *page)
> +{
> +	struct address_space *mapping = page->mapping;
> +	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> +	struct vm_area_struct *vma;
> +	struct prio_tree_iter iter;
> +	int ret = 0;
> +
> +	BUG_ON(PageAnon(page));
> +
> +	spin_lock(&mapping->i_mmap_lock);
> +
> +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> +		if (VM_SharedWritable(vma))
> +			ret += page_wrprotect_one(page, vma);
> +	}
> +
> +	spin_unlock(&mapping->i_mmap_lock);
> +	return ret;
> +}
> +
> +int page_wrprotect(struct page *page)
> +{
> +	int ret = 0;
> +
> +	BUG_ON(!PageLocked(page));

hm.  So clear_page_dirty() and clear_page_dirty_for_io() are only ever
called against a locked page?  I guess that makes sense, but it's not a
guarantee which we had in the past.  It really _has_ to be true, because
lock_page() is the only thing which can protect the address_space from
memory reclaim in those two functions.

Oh well.  We'll find out if people's machines start to go BUG.

> +	if (page_mapped(page) && page->mapping) {

umm, afaict this function can be called for swapcache pages and Bad Things
will happen.  I think we need page_mapping(page) here?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 0/3] Zone boundry alignment fixes
From: Andrew Morton @ 2006-05-11  7:59 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: nickpiggin, haveblue, bob.picco, mingo, mbligh, ak, linux-kernel,
	linux-mm
In-Reply-To: <exportbomb.1147172704@pinky>

Andy Whitcroft <apw@shadowen.org> wrote:
>
> Ok.  Finally got my test bed working and got this lot tested.
> 
> To summarise the problem , the buddy allocator currently requires
> that the boundries between zones occur at MAX_ORDER boundries.
> The specific case where we were tripping up on this was in x86 with
> NUMA enabled.  There we try to ensure that each node's stuct pages
> are in node local memory, in order to allow them to be virtually
> mapped we have to reduce the size of ZONE_NORMAL.  Here we are
> rounding the remap space up to a large page size to allow large
> page TLB entries to be used.  However, these are smaller than
> MAX_ORDER.  This can lead to bad buddy merges.  With VM_DEBUG enabled
> we detect the attempts to merge across this boundry and panic.
> 
> We have two basic options we can either apply the appropriate
> alignment when we make make the NUMA remap space, or we can 'fix'
> the assumption in the buddy allocator.  The fix for the buddy
> allocator involves adding conditionals to the free fast path and
> so it seems reasonable to at least favor realigning the remap space.
> 
> Following this email are 3 patches:
> 
> zone-init-check-and-report-unaligned-zone-boundries -- introduces
>   a zone alignement helper, and uses it to add a check to zone
>   initialisation for unaligned zone boundries,
> 
> x86-align-highmem-zone-boundries-with-NUMA -- uses the zone alignment
>   helper to align the end of ZONE_NORMAL after the remap space has
>   been reserved, and
> 
> zone-allow-unaligned-zone-boundries -- modifies the buddy allocator
>   so that we can allow unaligned zone boundries.  A new configuration
>   option is added to enable this functionality.
> 
> The first two are the fixes for alignement in x86, these fix the
> panics thrown when VM_DEBUG is enabled.
> 
> The last is a patch to support unaligned zone boundries.  As this
> (re)introduces a zone check into the free hot path it seems
> reasonable to only enable this should it be needed; for example
> we never need this if we have a single zone.  I have tested the
> failing system with this patch enabled and it also fixes the panic.
> I am inclined to suggest that it be included as it very clearly
> documents the alignment requirements for the buddy allocator.

There's some possibility here of interaction with Mel's "patchset to size
zones and memory holes in an architecture-independent manner." I jammed
them together - let's see how it goes.

I also fixed the spelling of "boundary" in about 1.5 zillion places ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Question: what happens if writeing back to swap ends in error
From: KAMEZAWA Hiroyuki @ 2006-05-11  1:49 UTC (permalink / raw)
  To: Linux-MM

Hi,

What happens when I/O request from swap_writeback() ends in I/O Error ?

swap_writepage() (in mm/page_io.c) sets bio->bi_end_io as end_swap_bio_write().
After I/O ends, bio_endio()->end_swap_bio_write() is called, I think.

If that writeback was end in error, bio-bi_flags's BIO_UPTODATE is cleared.
Then, page is marked with PG_error.
==
static int end_swap_bio_write(struct bio *bio, unsigned int bytes_done, int err)
{
        const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
        struct page *page = bio->bi_io_vec[0].bv_page;

        if (bio->bi_size)
                return 1;

        if (!uptodate)
                SetPageError(page);
        end_page_writeback(page);
        bio_put(bio);
        return 0;
}
==
But here, PG_writeback is cleared, anyway.

Now, shrink_list() doesn't handle PG_error.
If the page is not accessed, page's state is !PageDirty() && !PageWriteback()
and SwapCache and on LRU.
Finally, page marked with PG_error is freed by shrink_list() and data in the
page will be lost.

correct ?

-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC] Hugetlb demotion for x86
From: Christoph Lameter @ 2006-05-10 23:42 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, linux-kernel
In-Reply-To: <1147287400.24029.81.camel@localhost.localdomain>

Seems that the code is not modifying x86 code but all code. 

An app should be getting an out of memory error and not a SIGBUS when 
running out of memory.

I thought we fixed the SIGBUS problems and were now reporting out of 
memory? If there still is an issue then we better fix out of memory 
handling. Provide a way for the app to trap OOM conditions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Christoph Lameter @ 2006-05-10 23:04 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux list, linux-mm, Andrew Morton
In-Reply-To: <200605102132.41217.kernel@kolivas.org>

On Wed, 10 May 2006, Con Kolivas wrote:

> Are there any users of swp_entry_t when CONFIG_SWAP is not defined?

Yes, a migration entry is a form of swap entry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 0/2][RFC] New version of shared page tables
From: Brian Twichell @ 2006-05-10 19:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Dave McCracken, Linux Memory Management,
	Linux Kernel
In-Reply-To: <44600F9B.1060207@yahoo.com.au>

Nick Piggin wrote:

> Brian Twichell wrote:
>
>>
>> If we had to choose between pagetable sharing for small pages and 
>> hugepages, we would be in favor of retaining pagetable sharing for 
>> small pages.  That is where the discernable benefit is for customers 
>> that run with "out-of-the-box" settings.  Also, there is still some 
>> benefit there on x86-64 for customers that use hugepages for the 
>> bufferpools.
>
>
> Of course if it was free performance then we'd want it. The downsides 
> are that it
> is a significant complexity for a pretty small (3%) performance gain 
> for your apparent
> target workload, which is pretty uncommon among all Linux users.

Our performance data demonstrated that the potential gain for the 
non-hugepage case is much higher than 3%.

>
> Ignoring the complexity, it is still not free. Sharing data across 
> processes adds to
> synchronisation overhead and hurts scalability. Some of these page 
> fault scalability
> scenarios have shown to be important enough that we have introduced 
> complexity _there_.

True, but this needs to be balanced against the fact that pagetable 
sharing will reduce the number of page faults when it is achieved.  
Let's say you have N processes which touch all the pages in an M page 
shared memory region.  Without shared pagetables this requires N*M page 
faults; if pagetable sharing is achieved, only M pagefaults are required.

>
> And it seems customers running "out-of-the-box" settings really want 
> to start using
> hugepages if they're interested in getting the most performance 
> possible, no?

My perspective is that, once the customer is required to invoke "echo 
XXX > /proc/sys/vm/nr_hugepages" they've left the "out-of-the-box" 
domain, and entered the domain of hoping that the number of hugepages is 
sufficient, because if it's not, they'll probably need to reboot, which 
can be pretty inconvenient for a production transaction-processing 
application.

Cheers,
Brian



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Daniel Walker @ 2006-05-10 18:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux-kernel, linux-mm
In-Reply-To: <20060510043834.70f40ddc.akpm@osdl.org>

On Wed, 2006-05-10 at 04:38 -0700, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> >
> > Are there any users of swp_entry_t when CONFIG_SWAP is not defined?
> 
> Well there shouldn't be.  Making accesses to swp_entry_t.val fail to
> compile if !CONFIG_SWAP might be useful.

In mm/vmscan.c line 387 it defined swp_entry_t and sets val regardless
of CONFIG_SWAP , but the value never really gets used .. Showed up in my
warning reviews.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Con Kolivas @ 2006-05-10 11:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, linux-mm
In-Reply-To: <200605102146.26080.kernel@kolivas.org>

On Wednesday 10 May 2006 21:46, Con Kolivas wrote:
> On Wednesday 10 May 2006 21:38, Andrew Morton wrote:
> > We have __attribute_used__, which hides a gcc oddity.
>
> I tried that.
>
> In file included from arch/i386/mm/pgtable.c:11:
> include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
> In file included from include/linux/suspend.h:8,
>                  from init/do_mounts.c:7:
> include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
> In file included from arch/i386/mm/init.c:22:
> include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
>   AS      arch/i386/kernel/vsyscall-sysenter.o
>
> etc..
>
> and doesn't fix the warning in vmscan.c. __attribute_used__ is handled
> differently by gcc4 it seems (this is 4.1.0)

in compiler-gcc3.h
#if __GNUC_MINOR__ >= 3
# define __attribute_used__     __attribute__((__used__))
#else
# define __attribute_used__     __attribute__((__unused__))
#endif

and in compiler-gcc4.h
#define __attribute_used__      __attribute__((__used__))

it looks like the pre gcc3.3 version is suited here or I'm misusing the 
__attribute_used__ extension somehow.

-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Con Kolivas @ 2006-05-10 11:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm
In-Reply-To: <20060510043834.70f40ddc.akpm@osdl.org>

On Wednesday 10 May 2006 21:38, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > Are there any users of swp_entry_t when CONFIG_SWAP is not defined?
>
> Well there shouldn't be.  Making accesses to swp_entry_t.val fail to
> compile if !CONFIG_SWAP might be useful.
>
> > +/*
> > + * A swap entry has to fit into a "unsigned long", as
> > + * the entry is hidden in the "index" field of the
> > + * swapper address space.
> > + */
> > +#ifdef CONFIG_SWAP
> >  typedef struct {
> >  	unsigned long val;
> >  } swp_entry_t;
> > +#else
> > +typedef struct {
> > +	unsigned long val;
> > +} swp_entry_t __attribute__((__unused__));
> > +#endif
>
> We have __attribute_used__, which hides a gcc oddity.

I tried that.

In file included from arch/i386/mm/pgtable.c:11:
include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
In file included from include/linux/suspend.h:8,
                 from init/do_mounts.c:7:
include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
In file included from arch/i386/mm/init.c:22:
include/linux/swap.h:82: warning: a??__used__a?? attribute ignored
  AS      arch/i386/kernel/vsyscall-sysenter.o

etc..

and doesn't fix the warning in vmscan.c. __attribute_used__ is handled 
differently by gcc4 it seems (this is 4.1.0)

-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Pekka Enberg @ 2006-05-10 11:42 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux list, linux-mm, Andrew Morton
In-Reply-To: <200605102132.41217.kernel@kolivas.org>

On 5/10/06, Con Kolivas <kernel@kolivas.org> wrote:
> +/*
> + * A swap entry has to fit into a "unsigned long", as
> + * the entry is hidden in the "index" field of the
> + * swapper address space.
> + */
> +#ifdef CONFIG_SWAP
>  typedef struct {
>         unsigned long val;
>  } swp_entry_t;
> +#else
> +typedef struct {
> +       unsigned long val;
> +} swp_entry_t __attribute__((__unused__));
> +#endif

Or we could make swap_free() an empty static inline function for the
non-CONFIG_SWAP case.

                                                     Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] mm: cleanup swap unused warning
From: Andrew Morton @ 2006-05-10 11:38 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm
In-Reply-To: <200605102132.41217.kernel@kolivas.org>

Con Kolivas <kernel@kolivas.org> wrote:
>
> Are there any users of swp_entry_t when CONFIG_SWAP is not defined?

Well there shouldn't be.  Making accesses to swp_entry_t.val fail to
compile if !CONFIG_SWAP might be useful.

> +/*
> + * A swap entry has to fit into a "unsigned long", as
> + * the entry is hidden in the "index" field of the
> + * swapper address space.
> + */
> +#ifdef CONFIG_SWAP
>  typedef struct {
>  	unsigned long val;
>  } swp_entry_t;
> +#else
> +typedef struct {
> +	unsigned long val;
> +} swp_entry_t __attribute__((__unused__));
> +#endif

We have __attribute_used__, which hides a gcc oddity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox