public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Pavel Emelianov <xemul@openvz.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Memory controller merge (was Re: -mm merge plans for 2.6.24)
Date: Wed, 03 Oct 2007 13:43:06 +0530	[thread overview]
Message-ID: <47034F12.8020505@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0710021604260.4916@blonde.wat.veritas.com>

Hugh Dickins wrote:
> On Tue, 2 Oct 2007, Balbir Singh wrote:
>> Andrew Morton wrote:
>>> memory-controller-add-documentation.patch
>>> ...
>>> kswapd-should-only-wait-on-io-if-there-is-io.patch
>>>
>>>   Hold.  This needs a serious going-over by page reclaim people.
>> I mostly agree with your decision. I am a little concerned however
>> that as we develop and add more features (a.k.a better statistics/
>> forced reclaim), which are very important; the code base gets larger,
>> the review takes longer :)
> 
> I agree with putting the memory controller stuff on hold from 2.6.24.
> 
> Sorry, Balbir, I've failed to get back to you, still attending to
> priorities.  Let me briefly summarize my issue with the mem controller:
> you've not yet given enough attention to swap.
>

I am open to suggestions and ways and means of making swap control
complete and more usable.

> I accept that full swap control is something you're intending to add
> incrementally later; but the current state doesn't make sense to me.
> 
> The problems are swapoff and swapin readahead.  These pull pages into
> the swap cache, which are assigned to the cgroup (or the whatever-we-
> call-the-remainder-outside-all-the-cgroups) which is running swapoff
> or faulting in its own page; yet they very clearly don't (in general)
> belong to that cgroup, but to other cgroups which will be discovered
> later.
> 

I understand what your trying to say, but with several approaches that
we tried in the past, we found caches the hardest to most accurately
account. IIRC, with readahead, we don't even know if all the pages
readahead will be used, that's why we charge everything to the cgroup
that added the page to the cache.

> I did try removing the cgroup mods to mm/swap_state.c, so swap pages
> get assigned to a cgroup only once it's really known; but that's not
> enough by itself, because cgroup RSS reclaim doesn't touch those
> pages, so the cgroup can easily OOM much too soon.  I was thinking
> that you need a "limbo" cgroup for these pages, which can be attacked
> for reclaim along with any cgroup being reclaimed, but from which
> pages are readily migrated to their real cgroup once that's known.
> 

Is migrating the charge to the real cgroup really required?

> But I had to switch over to other work before trying that out:
> perhaps the idea doesn't really fly at all.  And it might well
> be no longer needed once full mem+swap control is there.
> 
> So in the current memory controller, that unuse_pte mem charge I was
> originally worried about failing (I hadn't at that point delved in
> to see how it tries to reclaim) actually never fails (and never
> does anything): the page is already assigned to some cgroup-or-
> whatever and is never charged to vma->vm_mm at that point.
> 

Excellent!

> And small point: once that is sorted out and the page is properly
> assigned in unuse_pte, you'll be needing to pte_unmap_unlock and
> pte_offset_map_lock around the mem_cgroup_charge call there -
> you're right to call it with GFP_KERNEL, but cannot do so while
> holding the page table locked and mapped.  (But because the page
> lock is held, there shouldn't be any raciness to dropping and
> retaking the ptl.)
> 

Good catch! I'll fix that.


> Hugh


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

  reply	other threads:[~2007-10-03  8:14 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton
2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson
2007-10-02 12:36   ` Nick Piggin
2007-10-03  5:21     ` Paul Jackson
2007-10-02 13:12       ` Nick Piggin
2007-10-03  7:00         ` Paul Jackson
2007-10-03 10:57           ` Andrew Morton
2007-10-02  4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh
2007-10-02 15:46   ` Hugh Dickins
2007-10-03  8:13     ` Balbir Singh [this message]
2007-10-03 18:47       ` Hugh Dickins
2007-10-04  4:16         ` Balbir Singh
2007-10-04 13:16           ` Hugh Dickins
2007-10-05  3:07             ` Balbir Singh
2007-10-07 17:41               ` Hugh Dickins
2007-10-08  2:54                 ` Balbir Singh
2007-10-04 16:10     ` Paul Menage
2007-10-10 21:07   ` Rik van Riel
2007-10-11  6:33     ` Balbir Singh
2007-10-02  6:18 ` x86 patches was Re: -mm merge plans for 2.6.24 Andi Kleen
2007-10-02  6:32   ` Andrew Morton
2007-10-02  7:01     ` Andi Kleen
2007-10-02  7:18       ` Andrew Morton
2007-10-02  7:36         ` KAMEZAWA Hiroyuki
2007-10-02  7:43           ` Andrew Morton
2007-10-02  8:16             ` KAMEZAWA Hiroyuki
2007-10-02 10:48               ` Yasunori Goto
2007-10-02 18:18               ` Christoph Lameter
2007-10-02 17:25             ` Lee Schermerhorn
2007-10-02 16:40           ` Nish Aravamudan
2007-10-02 17:17           ` Lee Schermerhorn
2007-10-02 18:16           ` Christoph Lameter
2007-10-02  7:55         ` Matt Mackall
2007-10-02  7:59           ` Andi Kleen
2007-10-02  9:26       ` Andy Whitcroft
2007-10-02  7:37     ` Ingo Molnar
2007-10-02  7:46       ` Andi Kleen
2007-10-02  7:58         ` Thomas Gleixner
2007-10-02  7:59 ` v4l-stk11xx* [Was: -mm merge plans for 2.6.24] Jiri Slaby
     [not found] ` <4701FC79.3060608@gmail.com>
2007-10-02  8:10   ` Wireless damage " Jiri Slaby
2007-10-02  8:17 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Peter Zijlstra
     [not found]   ` <20071002082831.GA19954@mail.ustc.edu.cn>
2007-10-02  8:28     ` Fengguang Wu
2007-10-02  8:31   ` Andrew Morton
2007-10-02  8:48     ` Peter Zijlstra
2007-10-02 10:31       ` Kay Sievers
2007-10-02 10:44         ` Peter Zijlstra
     [not found]           ` <20071002104734.GA9410@mail.ustc.edu.cn>
2007-10-02 10:47             ` Fengguang Wu
2007-10-02 11:22               ` Kay Sievers
     [not found]                 ` <20071002112802.GA12607@mail.ustc.edu.cn>
2007-10-02 11:28                   ` Fengguang Wu
2007-10-02 11:21           ` Kay Sievers
2007-10-02 11:40             ` Peter Zijlstra
2007-10-02 12:05               ` Nick Piggin
2007-10-03 10:15                 ` Kay Sievers
2007-10-03 10:37                   ` Peter Zijlstra
2007-10-03 13:35                     ` Kay Sievers
2007-10-03 13:58                       ` Peter Zijlstra
2007-10-26 14:48                       ` Peter Zijlstra
2007-10-26 15:06                         ` Miklos Szeredi
2007-10-26 15:10                         ` Kay Sievers
2007-10-26 15:22                           ` Peter Zijlstra
2007-10-26 15:33                             ` Kay Sievers
2007-10-26 15:33                               ` Peter Zijlstra
2007-10-26 15:55                                 ` Kay Sievers
2007-10-26 20:04                                   ` Peter Zijlstra
2007-10-27  1:18                                     ` Peter Zijlstra
2007-10-27  2:40                                       ` Greg KH
2007-10-27  8:39                                         ` Peter Zijlstra
2007-10-27 16:02                                           ` Greg KH
2007-10-27 16:07                                             ` Peter Zijlstra
2007-10-27 21:08                                             ` Kay Sievers
2007-10-27 21:35                                               ` Peter Zijlstra
2007-10-28  7:10                                                 ` Greg KH
2007-11-02 13:15                                               ` Peter Zijlstra
2007-11-02 13:50                                                 ` Kay Sievers
2007-11-02 13:54                                                   ` Peter Zijlstra
2007-11-02 14:17                                                   ` Peter Zijlstra
2007-11-02 14:32                                                     ` Kay Sievers
2007-11-02 14:59                                                       ` [PATCH] mm: sysfs: expose the BDI object in sysfs Peter Zijlstra
2007-11-02 15:13                                                         ` Kay Sievers
2007-10-26 16:37                         ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Trond Myklebust
2007-12-14 14:50                           ` Peter Zijlstra
2007-12-14 15:14                             ` Miklos Szeredi
2007-12-14 15:54                               ` Peter Zijlstra
2007-10-02 14:38               ` Kay Sievers
2007-10-03 11:00   ` Martin Knoblauch
     [not found] ` <20071002083922.GA28892@mail.ustc.edu.cn>
2007-10-02  8:39   ` writeback fixes Fengguang Wu
2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins
2007-10-02  9:10   ` Nick Piggin
2007-10-02 18:38   ` Mel Gorman
2007-10-02 18:28     ` Christoph Lameter
2007-10-03  0:37       ` Christoph Lameter
2007-10-02 16:12 ` -mm merge plans for 2.6.24 Pekka Enberg
2007-10-02 16:21 ` new aops merge [was Re: -mm merge plans for 2.6.24] Hugh Dickins
2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin
2007-10-03 10:58   ` Andrew Morton
2007-10-03 15:21   ` Linus Torvalds
2007-10-08 15:17     ` Nick Piggin
2007-10-09 13:00       ` Hugh Dickins
2007-10-09 14:52       ` Linus Torvalds
2007-10-09  9:31         ` Nick Piggin
2007-10-10  2:22           ` Linus Torvalds
2007-10-09 10:15             ` Nick Piggin
2007-10-10  3:06               ` Linus Torvalds
2007-10-10  4:06               ` Hugh Dickins
2007-10-10  5:20                 ` Linus Torvalds
2007-10-09 14:30                   ` Nick Piggin
2007-10-10 15:04                     ` Linus Torvalds
2007-10-03 19:50 ` A kernel Tracing interface " David Wilder
2007-10-09  9:19 ` r/o bind mounts, was Re: -mm merge plans for 2.6.24 Christoph Hellwig
2007-10-13  8:44 ` Borislav Petkov
2007-10-13  8:52   ` Andrew Morton
2007-10-13 11:45     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47034F12.8020505@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox