From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Vlastimil Babka <vbabka@suse.cz>, linux-mm@kvack.org
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Gibson <david@gibson.dropbear.id.au>,
Johannes Weiner <hannes@cmpxchg.org>,
Joonsoo Kim <js1304@gmail.com>, Mel Gorman <mgorman@suse.de>,
Michal Hocko <mhocko@suse.cz>, Paul Mackerras <paulus@samba.org>,
Sasha Levin <sasha.levin@oracle.com>,
linux-kernel@vger.kernel.org,
Alex Williamson <alex.williamson@redhat.com>,
Alexander Graf <agraf@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH kernel vfio] mm: vfio: Move pages out of CMA before pinning
Date: Mon, 17 Aug 2015 19:11:01 +1000 [thread overview]
Message-ID: <55D1A525.5090706@ozlabs.ru> (raw)
In-Reply-To: <55D1910C.7070006@suse.cz>
On 08/17/2015 05:45 PM, Vlastimil Babka wrote:
> On 08/05/2015 10:08 AM, Alexey Kardashevskiy wrote:
>> This is about VFIO aka PCI passthrough used from QEMU.
>> KVM is irrelevant here.
>>
>> QEMU is a machine emulator. It allocates guest RAM from anonymous memory
>> and these pages are movable which is ok. They may happen to be allocated
>> from the contiguous memory allocation zone (CMA). Which is also ok as
>> long they are movable.
>>
>> However if the guest starts using VFIO (which can be hotplugged into
>> the guest), in most cases it involves DMA which requires guest RAM pages
>> to be pinned and not move once their addresses are programmed to
>> the hardware for DMA.
>>
>> So we end up in a situation when quite many pages in CMA are not movable
>> anymore. And we get bunch of these:
>>
>> [77306.513966] alloc_contig_range: [1f3800, 1f78c4) PFNs busy
>> [77306.514448] alloc_contig_range: [1f3800, 1f78c8) PFNs busy
>> [77306.514927] alloc_contig_range: [1f3800, 1f78cc) PFNs busy
>
> IIRC CMA was for mobile devices and their camera/codec drivers and you
> don't use QEMU on those? What do you need CMA for in your case?
I do not want QEMU to get memory from CMA, this is my point. It just
happens sometime that the kernel allocates movable pages from there.
>
>> This is a very rough patch to start the conversation about how to move
>> pages properly. mm/page_alloc.c does this and
>> arch/powerpc/mm/mmu_context_iommu.c exploits it.
>
> OK such conversation should probably start by mentioning the VM_PINNED
> effort by Peter Zijlstra: https://lkml.org/lkml/2014/5/26/345
>
> It's more general approach to dealing with pinned pages, and moving them
> out of CMA area (and compacting them in general) prior pinning is one of
> the things that should be done within that framework.
And I assume these patches did not go anywhere, right?...
> Then there's the effort to enable migrating pages other than LRU during
> compaction (and thus CMA allocation): https://lwn.net/Articles/650864/
> I don't know if that would be applicable in your use case, i.e. are the
> pins for DMA short-lived and can the isolation/migration code wait a bit
> for the transfer to finish so it can grab them, or something?
Pins for DMA are long-lived, pretty much as long as the guest is running.
So this "compaction" is too late.
>>
>> Please do not comment on the style and code placement,
>> this is just to give some context :)
>>
>> Obviously, this does not work well - it manages to migrate only few pages
>> and crashes as it is missing locks/disabling interrupts and I probably
>> should not just remove pages from LRU list (normally, I guess, only these
>> can migrate) and a million of other things.
>>
>> The questions are:
>>
>> - what is the correct way of telling if the page is in CMA?
>> is (get_pageblock_migratetype(page) == MIGRATE_CMA) good enough?
>
> Should be.
>
>> - how to tell MM to move page away? I am calling migrate_pages() with
>> an get_new_page callback which allocates a page with GFP_USER but without
>> GFP_MOVABLE which should allocate new page out of CMA which seems ok but
>> there is a little convern that we might want to add MOVABLE back when
>> VFIO device is unplugged from the guest.
>
> Hmm, once the page is allocated, then the migratetype is not tracked
> anywhere (except in page_owner debug data). But the unmovable allocations
> might exhaust available unmovable pageblocks and lead to fragmentation. So
> "add MOVABLE back" would be too late. Instead we would need to tell the
>allocator somehow to give us movable page but outside of CMA.
It is it movable, why do we care if it is in CMA or not?
> CMA's own
> __alloc_contig_migrate_range() avoids this problem by allocating movable
> pages, but the range has been already page-isolated and thus the allocator
> won't see the pages there.You obviously can't take this approach and
> isolate all CMA pageblocks like that. That smells like a new __GFP_FLAG, meh.
I understood (more or less) all of it except the __GFP_FLAG - when/what
would use it?
>> - do I need to isolate pages by using isolate_migratepages_range,
>> reclaim_clean_pages_from_list like __alloc_contig_migrate_range does?
>> I dropped them for now and the patch uses only @migratepages from
>> the compact_control struct.
>
> You don't have to do reclaim_clean_pages_from_list(), but the isolation has
> to be careful, yeah.
The isolation here means the whole CMA zone isolation which I "obviously
can't take this approach"? :)
>> - are there any flags in madvise() to address this (could not
>> locate any relevant)?
>
> AFAIK there's no madvise(I_WILL_BE_PINNING_THIS_RANGE)
>
>> - what else is missing? disabled interrupts? locks?
>
> See what isolate_migratepages_block() does.
Thanks for the pointers! I'll have a closer look at Peter's patchset.
--
Alexey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-08-17 9:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-05 8:08 [RFC PATCH kernel vfio] mm: vfio: Move pages out of CMA before pinning Alexey Kardashevskiy
2015-08-15 10:50 ` Alexey Kardashevskiy
2015-08-17 7:45 ` Vlastimil Babka
2015-08-17 9:11 ` Alexey Kardashevskiy [this message]
2015-08-17 9:53 ` Vlastimil Babka
2015-08-17 9:58 ` Benjamin Herrenschmidt
2015-08-17 9:57 ` Benjamin Herrenschmidt
2015-08-31 13:48 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55D1A525.5090706@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=agraf@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=alexander.h.duyck@redhat.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=david@gibson.dropbear.id.au \
--cc=hannes@cmpxchg.org \
--cc=js1304@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=paulus@samba.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).