linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Donnelly <John.p.donnelly@oracle.com>
To: Baoquan He <bhe@redhat.com>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hch@lst.de,
	robin.murphy@arm.com, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com, vbabka@suse.cz,
	m.szyprowski@samsung.com, kexec@lists.infradead.org,
	rppt@linux.ibm.com
Subject: Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages
Date: Mon, 6 Dec 2021 22:03:59 -0600	[thread overview]
Message-ID: <01b4831f-7136-80af-a6cb-93698cb31fc4@oracle.com> (raw)
In-Reply-To: <20211207031631.GA5604@MiWiFi-R3L-srv>

On 12/6/21 9:16 PM, Baoquan He wrote:
> Sorry, forgot adding x86 and x86/mm maintainers

Hi,

   These commits need applied to Linux-5.15.0 (LTS) too since it has the 
original regression :

  1d659236fb43 ("dma-pool: scale the default DMA coherent pool
size with memory capacity")

Maybe add "Fixes" to the other commits ?


> 
> On 12/07/21 at 11:07am, Baoquan He wrote:
>> ***Problem observed:
>> On x86_64, when crash is triggered and entering into kdump kernel, page
>> allocation failure can always be seen.
>>
>>   ---------------------------------
>>   DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
>>   swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>   CPU: 0 PID: 1 Comm: swapper/0
>>   Call Trace:
>>    dump_stack+0x7f/0xa1
>>    warn_alloc.cold+0x72/0xd6
>>    ......
>>    __alloc_pages+0x24d/0x2c0
>>    ......
>>    dma_atomic_pool_init+0xdb/0x176
>>    do_one_initcall+0x67/0x320
>>    ? rcu_read_lock_sched_held+0x3f/0x80
>>    kernel_init_freeable+0x290/0x2dc
>>    ? rest_init+0x24f/0x24f
>>    kernel_init+0xa/0x111
>>    ret_from_fork+0x22/0x30
>>   Mem-Info:
>>   ------------------------------------
>>
>> ***Root cause:
>> In the current kernel, it assumes that DMA zone must have managed pages
>> and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not
>> always true. E.g in kdump kernel of x86_64, only low 1M is presented and
>> locked down at very early stage of boot, so that this low 1M won't be
>> added into buddy allocator to become managed pages of DMA zone. This
>> exception will always cause page allocation failure if page is requested
>> from DMA zone.
>>
>> ***Investigation:
>> This failure happens since below commit merged into linus's tree.
>>    1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
>>    23721c8e92f7 x86/crash: Remove crash_reserve_low_1M()
>>    f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM
>>    7c321eb2b843 x86/kdump: Remove the backup region handling
>>    6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified
>>
>> Before them, on x86_64, the low 640K area will be reused by kdump kernel.
>> So in kdump kernel, the content of low 640K area is copied into a backup
>> region for dumping before jumping into kdump. Then except of those firmware
>> reserved region in [0, 640K], the left area will be added into buddy
>> allocator to become available managed pages of DMA zone.
>>
>> However, after above commits applied, in kdump kernel of x86_64, the low
>> 1M is reserved by memblock, but not released to buddy allocator. So any
>> later page allocation requested from DMA zone will fail.
>>
>> This low 1M lock down is needed because AMD SME encrypts memory making
>> the old backup region mechanims impossible when switching into kdump
>> kernel. And Intel engineer mentioned their TDX (Trusted domain extensions)
>> which is under development in kernel also needs lock down the low 1M.
>> So we can't simply revert above commits to fix the page allocation
>> failure from DMA zone as someone suggested.
>>
>> ***Solution:
>> Currently, only DMA atomic pool and dma-kmalloc will initialize and
>> request page allocation with GFP_DMA during bootup. So only initialize
>> them when DMA zone has available managed pages, otherwise just skip the
>> initialization. From testing and code, this doesn't matter. In kdump
>> kernel of x86_64, the page allocation failure disappear.
>>
>> ***Further thinking
>> On x86_64, it consistently takes [0, 16M] into ZONE_DMA, and (16M, 4G]
>> into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
>> take care of antique ISA devices. In fact, on 64bit system, it rarely
>> need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
>> However, some components treat DMA as a generic concept, e.g
>> kmalloc-dma, slab allocator initializes it for later any DMA related
>> buffer allocation, but not limited to ISA DMA.
>>
>> On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
>> are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
>> empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
>> then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
>> the 32-bit addressable memory.
>>
>> I am wondering if we can also change the size of DMA and DMA32 ZONE as
>> dynamically adjusted, just as arm64 is doing? On x86_64, we can make
>> zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
>> default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
>> low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
>> (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
>> memory when enabled?)
>>
>> Change history:
>>
>> v2 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210810094835.13402-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EjaERCi0$
>>
>> v1 post:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210624052010.5676-1-bhe@redhat.com/T/*u__;Iw!!ACWV5N9M2RV99hQ!beOGaLK9suYILSZ8uvbAt4Xd7raHP_p6tcVTvcnZMWCq_eL1VQxSMIJdw-z6EgRgBiPP$
>>
>> v2->v2 RESEND:
>>   John pinged to push the repost of this patchset. So fix one typo of
>>   suject of patch 3/5; Fix a building error caused by mix declaration in
>>   patch 5/5. Both of them are found by John from his testing.
>>
>> v1->v2:
>>   Change to check if managed DMA zone exists. If DMA zone has managed
>>   pages, go further to request page from DMA zone to initialize. Otherwise,
>>   just skip to initialize stuffs which need pages from DMA zone.
>>
>> Baoquan He (5):
>>    docs: kernel-parameters: Update to reflect the current default size of
>>      atomic pool
>>    dma-pool: allow user to disable atomic pool
>>    mm_zone: add function to check if managed dma zone exists
>>    dma/pool: create dma atomic pool only if dma zone has managed pages
>>    mm/slub: do not create dma-kmalloc if no managed pages in DMA zone
>>
>>   .../admin-guide/kernel-parameters.txt         |  5 ++++-
>>   include/linux/mmzone.h                        | 21 +++++++++++++++++++
>>   kernel/dma/pool.c                             | 11 ++++++----
>>   mm/page_alloc.c                               | 11 ++++++++++
>>   mm/slab_common.c                              |  9 ++++++++
>>   5 files changed, 52 insertions(+), 5 deletions(-)
>>
>> -- 
>> 2.17.2
>>
> 



  reply	other threads:[~2021-12-07  4:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07  3:07 [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 1/5] docs: kernel-parameters: Update to reflect the current default size of atomic pool Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-07  3:07 ` [PATCH RESEND v2 2/5] dma-pool: allow user to disable " Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-13  7:44   ` Christoph Hellwig
2021-12-13  8:16     ` Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 3/5] mm_zone: add function to check if managed dma zone exists Baoquan He
2021-12-07  3:53   ` John Donnelly
2021-12-07 11:23   ` David Hildenbrand
2021-12-09 13:02     ` Baoquan He
2021-12-09 13:10       ` David Hildenbrand
2021-12-09 13:23         ` Baoquan He
2021-12-07  3:07 ` [PATCH RESEND v2 4/5] dma/pool: create dma atomic pool only if dma zone has managed pages Baoquan He
2021-12-07  3:54   ` John Donnelly
2021-12-07  3:07 ` [PATCH RESEND v2 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone Baoquan He
2021-12-07  3:54   ` John Donnelly
2021-12-07  3:16 ` [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages Baoquan He
2021-12-07  4:03   ` John Donnelly [this message]
2021-12-08  4:33     ` Andrew Morton
2021-12-08  4:56       ` John Donnelly
2021-12-13  3:54     ` Baoquan He
     [not found]   ` <YbdJ00wRFvi0aqze@zn.tnic>
2021-12-13 14:03     ` Baoquan He
2021-12-07  8:05 ` Christoph Lameter
2021-12-09  8:05   ` Baoquan He
2021-12-09 12:59     ` Christoph Lameter
2021-12-13  7:39       ` Baoquan He
2021-12-13  7:49         ` Christoph Hellwig
2021-12-13  7:47   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01b4831f-7136-80af-a6cb-93698cb31fc4@oracle.com \
    --to=john.p.donnelly@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mingo@redhat.com \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).