linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"paulus@samba.org" <paulus@samba.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Date: Tue, 06 May 2014 09:21:45 +0200	[thread overview]
Message-ID: <53688D89.1070201@suse.de> (raw)
In-Reply-To: <1399360775.20388.112.camel@pasglop>


On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>   
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)

Yes. Write that in the patch description and I'm happy ;).


Alex

  reply	other threads:[~2014-05-06  7:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-04 17:25 [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Aneesh Kumar K.V
2014-05-05 11:26 ` Alexander Graf
2014-05-05 14:35   ` Aneesh Kumar K.V
2014-05-05 15:16     ` Alexander Graf
2014-05-05 15:40       ` Aneesh Kumar K.V
2014-05-06  0:06       ` Benjamin Herrenschmidt
2014-05-06  7:05         ` Alexander Graf
2014-05-06  7:19           ` Benjamin Herrenschmidt
2014-05-06  7:21             ` Alexander Graf [this message]
2014-05-06 14:20               ` Aneesh Kumar K.V
2014-05-06 14:25                 ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53688D89.1070201@suse.de \
    --to=agraf@suse.de \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).