From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"paulus@samba.org" <paulus@samba.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
"kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab
Date: Tue, 06 May 2014 07:21:45 +0000 [thread overview]
Message-ID: <53688D89.1070201@suse.de> (raw)
In-Reply-To: <1399360775.20388.112.camel@pasglop>
On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)
Yes. Write that in the patch description and I'm happy ;).
Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"paulus@samba.org" <paulus@samba.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
"kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Date: Tue, 06 May 2014 09:21:45 +0200 [thread overview]
Message-ID: <53688D89.1070201@suse.de> (raw)
In-Reply-To: <1399360775.20388.112.camel@pasglop>
On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)
Yes. Write that in the patch description and I'm happy ;).
Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Graf <agraf@suse.de>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"paulus@samba.org" <paulus@samba.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
"kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Date: Tue, 06 May 2014 09:21:45 +0200 [thread overview]
Message-ID: <53688D89.1070201@suse.de> (raw)
In-Reply-To: <1399360775.20388.112.camel@pasglop>
On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)
Yes. Write that in the patch description and I'm happy ;).
Alex
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
next prev parent reply other threads:[~2014-05-06 7:21 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-04 17:25 [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Aneesh Kumar K.V
2014-05-04 17:37 ` Aneesh Kumar K.V
2014-05-04 17:25 ` Aneesh Kumar K.V
2014-05-05 11:26 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Alexander Graf
2014-05-05 11:26 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Alexander Graf
2014-05-05 11:26 ` Alexander Graf
2014-05-05 14:35 ` Aneesh Kumar K.V
2014-05-05 14:47 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Aneesh Kumar K.V
2014-05-05 14:35 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Aneesh Kumar K.V
2014-05-05 15:16 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Alexander Graf
2014-05-05 15:16 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Alexander Graf
2014-05-05 15:16 ` Alexander Graf
2014-05-05 15:40 ` Aneesh Kumar K.V
2014-05-05 15:52 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Aneesh Kumar K.V
2014-05-05 15:40 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Aneesh Kumar K.V
2014-05-06 0:06 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Benjamin Herrenschmidt
2014-05-06 0:06 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Benjamin Herrenschmidt
2014-05-06 0:06 ` Benjamin Herrenschmidt
2014-05-06 7:05 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Alexander Graf
2014-05-06 7:05 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Alexander Graf
2014-05-06 7:05 ` Alexander Graf
2014-05-06 7:19 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Benjamin Herrenschmidt
2014-05-06 7:19 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Benjamin Herrenschmidt
2014-05-06 7:19 ` Benjamin Herrenschmidt
2014-05-06 7:21 ` Alexander Graf [this message]
2014-05-06 7:21 ` Alexander Graf
2014-05-06 7:21 ` Alexander Graf
2014-05-06 14:20 ` Aneesh Kumar K.V
2014-05-06 14:32 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Aneesh Kumar K.V
2014-05-06 14:20 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Aneesh Kumar K.V
2014-05-06 14:25 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page tab Alexander Graf
2014-05-06 14:25 ` [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table Alexander Graf
2014-05-06 14:25 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53688D89.1070201@suse.de \
--to=agraf@suse.de \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.