* [PATCH 04/31] mm: tag reseve pages
@ 2009-10-01 14:05 Suresh Jayaraman
2009-10-01 21:09 ` David Rientjes
0 siblings, 1 reply; 4+ messages in thread
From: Suresh Jayaraman @ 2009-10-01 14:05 UTC (permalink / raw)
To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
Peter Zijlstra, trond.myklebust, Suresh Jayaraman
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tag pages allocated from the reserves with a non-zero page->reserve.
This allows us to distinguish and account reserve pages.
Since low-memory situations are transient, and unrelated the the actual
page (any page can be on the freelist when we run low), don't mark the
page in any permanent way - just pass along the information to the
allocatee.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
include/linux/mm_types.h | 1 +
mm/page_alloc.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
Index: mmotm/include/linux/mm_types.h
===================================================================
--- mmotm.orig/include/linux/mm_types.h
+++ mmotm/include/linux/mm_types.h
@@ -77,6 +77,7 @@ struct page {
union {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
+ int reserve; /* page_alloc: page is a reserve page */
};
struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
Index: mmotm/mm/page_alloc.c
===================================================================
--- mmotm.orig/mm/page_alloc.c
+++ mmotm/mm/page_alloc.c
@@ -1501,8 +1501,10 @@ zonelist_scan:
try_this_zone:
page = buffered_rmqueue(preferred_zone, zone, order,
gfp_mask, migratetype);
- if (page)
+ if (page) {
+ page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
break;
+ }
this_zone_full:
if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 04/31] mm: tag reseve pages
2009-10-01 14:05 [PATCH 04/31] mm: tag reseve pages Suresh Jayaraman
@ 2009-10-01 21:09 ` David Rientjes
2009-10-02 4:43 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: David Rientjes @ 2009-10-01 21:09 UTC (permalink / raw)
To: Suresh Jayaraman
Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm, netdev,
Neil Brown, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
trond.myklebust
On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
> Index: mmotm/mm/page_alloc.c
> ===================================================================
> --- mmotm.orig/mm/page_alloc.c
> +++ mmotm/mm/page_alloc.c
> @@ -1501,8 +1501,10 @@ zonelist_scan:
> try_this_zone:
> page = buffered_rmqueue(preferred_zone, zone, order,
> gfp_mask, migratetype);
> - if (page)
> + if (page) {
> + page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> break;
> + }
> this_zone_full:
> if (NUMA_BUILD)
> zlc_mark_zone_full(zonelist, z);
page->reserve won't necessary indicate that access to reserves was
_necessary_ for the allocation to succeed, though. This will mark any
page being allocated under PF_MEMALLOC as reserve when all zones may be
well above their min watermarks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 04/31] mm: tag reseve pages
2009-10-01 21:09 ` David Rientjes
@ 2009-10-02 4:43 ` Neil Brown
2009-10-02 9:50 ` David Rientjes
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2009-10-02 4:43 UTC (permalink / raw)
To: David Rientjes
Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
trond.myklebust
On Thursday October 1, rientjes@google.com wrote:
> On Thu, 1 Oct 2009, Suresh Jayaraman wrote:
>
> > Index: mmotm/mm/page_alloc.c
> > ===================================================================
> > --- mmotm.orig/mm/page_alloc.c
> > +++ mmotm/mm/page_alloc.c
> > @@ -1501,8 +1501,10 @@ zonelist_scan:
> > try_this_zone:
> > page = buffered_rmqueue(preferred_zone, zone, order,
> > gfp_mask, migratetype);
> > - if (page)
> > + if (page) {
> > + page->reserve = !!(alloc_flags & ALLOC_NO_WATERMARKS);
> > break;
> > + }
> > this_zone_full:
> > if (NUMA_BUILD)
> > zlc_mark_zone_full(zonelist, z);
>
> page->reserve won't necessary indicate that access to reserves was
> _necessary_ for the allocation to succeed, though. This will mark any
> page being allocated under PF_MEMALLOC as reserve when all zones may be
> well above their min watermarks.
Normally if zones are above their watermarks, page->reserve will not
be set.
This is because __alloc_page_nodemask (which seems to be the main
non-inline entrypoint) first calls get_page_from_freelist with
alloc_flags set to ALLOC_WMARK_LOW|ALLOC_CPUSET.
Only if this fails does __alloc_page_nodemask call
__alloc_pages_slowpath which potentially sets ALLOC_NO_WATERMARKS in
alloc_flags.
So page->reserved being set actually tells us:
PF_MEMALLOC or GFP_MEMALLOC were used, and
a WMARK_LOW allocation attempt failed very recently
which is close enough to "the emergency reserves were used" I think.
Thanks,
NeilBrown
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 04/31] mm: tag reseve pages
2009-10-02 4:43 ` Neil Brown
@ 2009-10-02 9:50 ` David Rientjes
0 siblings, 0 replies; 4+ messages in thread
From: David Rientjes @ 2009-10-02 9:50 UTC (permalink / raw)
To: Neil Brown
Cc: Suresh Jayaraman, Linus Torvalds, Andrew Morton, linux-kernel,
linux-mm, netdev, Miklos Szeredi, Wouter Verhelst, Peter Zijlstra,
trond.myklebust
On Fri, 2 Oct 2009, Neil Brown wrote:
> Normally if zones are above their watermarks, page->reserve will not
> be set.
> This is because __alloc_page_nodemask (which seems to be the main
> non-inline entrypoint) first calls get_page_from_freelist with
> alloc_flags set to ALLOC_WMARK_LOW|ALLOC_CPUSET.
> Only if this fails does __alloc_page_nodemask call
> __alloc_pages_slowpath which potentially sets ALLOC_NO_WATERMARKS in
> alloc_flags.
>
> So page->reserved being set actually tells us:
> PF_MEMALLOC or GFP_MEMALLOC were used, and
> a WMARK_LOW allocation attempt failed very recently
>
> which is close enough to "the emergency reserves were used" I think.
>
There're a couple cornercases for GFP_ATOMIC, though:
- it isn't restricted by cpuset, so ALLOC_CPUSET will never get set for
the slowpath allocs and may very well allow the allocation to succeed
in zones far above their min watermark.
- it allows for allocating beyond the min watermark in allowed zones
simply by setting ALLOC_HARDER; these types of "reserve" allocations
wouldn't be marked as page->reserve with your patches if
ALLOC_NO_WATERMARKS wasn't set because of the allocation context.
The second one is debatable whether it fits your definition of reserve or
not, but there's an inconsistency if it doesn't because the allocation may
succeed in "no watermark" context (for example, in hard irq context) even
though that privilege wasn't necessary to successfully allocate: perhaps
it only needed ALLOC_HARDER.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-10-02 9:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-01 14:05 [PATCH 04/31] mm: tag reseve pages Suresh Jayaraman
2009-10-01 21:09 ` David Rientjes
2009-10-02 4:43 ` Neil Brown
2009-10-02 9:50 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).