From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuahkhan@gmail.com>, Yinghai Lu <yinghai@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>, Jan Kiszka <jan.kiszka@web.de>,
Jason Wessel <jason.wessel@windriver.com>,
linux-kernel@vger.kernel.org, Joerg Roedel <joro@8bytes.org>
Subject: Re: [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it
Date: Mon, 7 Jan 2013 21:48:30 -0500 [thread overview]
Message-ID: <20130108024830.GE9865@phenom.dumpdata.com> (raw)
In-Reply-To: <8762382z2c.fsf@xmission.com>
On Mon, Jan 07, 2013 at 06:22:51PM -0800, Eric W. Biederman wrote:
> Shuah Khan <shuahkhan@gmail.com> writes:
>
> > On Mon, Jan 7, 2013 at 8:26 AM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> >> On Fri, Jan 04, 2013 at 02:10:25PM -0800, Yinghai Lu wrote:
> >>> On Fri, Jan 4, 2013 at 1:02 PM, Shuah Khan <shuahkhan@gmail.com> wrote:
> >>> > Pani'cing the system doesn't sound like a good option to me in this
> >>> > case. This change to disable swiotlb is made for kdump. However, with
> >>> > this change several system fail to boot, unless crashkernel_low=72M is
> >>> > specified.
> >>>
> >>> this patchset is new feature to put second kdump kernel above 4G.
> >>>
> >>> >
> >>> > I would the say the right approach to solve this would be to not
> >>> > change the current pci_swiotlb_detect_override() behavior and treat
> >>> > swiotlb =1 upon entry equivalent to swiotlb_force set.
> >>>
> >>> that will make intel system have to take crashkernel_low=72M too.
> >>> otherwise intel system will get panic during swiotlb allocation.
> >>
> >> Two things:
> >>
> >> 1). You need to wrap the 'is_enough_..' in CONFIG_KEXEC, which means
> >> that the function needs to go in a header file.
> >> 2). The check for 1MB is suspect. Why only 1MB? You mentioned it is
> >> b/c of crashkernel_low=72M (which I am not seeing in v3.8 kernel-parameters.txt?
> >> Is that part of your mega-patchset?). Anyhow, there seems to be a disconnect -
> >> what if the user supplied crashkernel_low=27M? Perhaps the 'is_enough'
> >> should also parse the bootparams to double-check that there is enough
> >> low-mem space? But then if the kernel grows then 72M might not be enough -
> >> you might need 82M with 3.9.
> >>
> >> Perhaps a better way for this is to do:
> >> 1). Change 'is_enough' to check only for 4MB.
> >> 2). When booting as kexec, the SWIOTLB would only use 4MB instead of 64MB?
> >>
> >> Or, we could also use the post-late SWIOTLB initialization similiary to how it was
> >> done on ia64. This would mean that the AMD VI code would just call the
> >> .. something like this - NOT tested or even compile tested:
> >>
> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> >> index c1c74e0..e7fa8f7 100644
> >> --- a/drivers/iommu/amd_iommu.c
> >> +++ b/drivers/iommu/amd_iommu.c
> >> @@ -3173,6 +3173,24 @@ int __init amd_iommu_init_dma_ops(void)
> >> if (unhandled && max_pfn > MAX_DMA32_PFN) {
> >> /* There are unhandled devices - initialize swiotlb for them */
> >> swiotlb = 1;
> >> + /* Late (so no bootmem allocator) usage and only if the early SWIOTLB
> >> + * hadn't been allocated (which can happen on kexec kernels booted
> >> + * above 4GB). */
> >> + if (!swiotlb_nr_tbl()) {
> >> + int retry = 3;
> >> + int mb_size = 64;
> >> + int rc = 0;
> >> +retry_me:
> >> + if (retry < 0)
> >> + panic("We tried setting %dMB for SWIOTLB but got -ENOMEM", mb_size << 1);
> >> + rc = swiotlb_late_init_with_default_size(mb_size * (1<<20));
> >> + if (rc) {
> >> + retry --;
> >> + mb_size >> 1;
> >> + goto retry_me;
> >> + }
> >> + dma_ops = &swiotlb_dma_ops;
> >> + }
> >> }
> >>
> >> amd_iommu_stats_init();
> >>
> >> And then the early SWIOTLB initialization for 64MB can fail and we are still OK.
> >>>
> >
> > Yinghai/Konrad,
> >
> > Did more testing. btw this patch depends on your [v7u1,25/31]
> > memblock: add memblock_mem_size(). Here are the test results:
> >
> > 1. When there is not enough memory: (enough_mem_for_swiotlb() returns false)
> > system will panic in amd_iommu_init_dma_ops().
> >
> > 2. When there is enough memory: (enough_mem_for_swiotlb() returns true):
> > swiotlb is reserved
> > pci_swiotlb_late_init() leaves the buffer allocated since swiotlb=1
> > with that getting changed in amd_iommu_init_dma_ops().
> >
> > I agree with Konrad that the logic should be wrapped in CONFIG_KEXEC.
>
> If enough_mem_for_swiotlb needs to be conditional on CONFIG_KEXEC the
> code is architected wrong. None of this logic has anything to do with
> kexec except that the kexec path is one way to get this condition to
> happen. Especially since the kexec'd kernel where this condition occurs
> does not need kexec support built in.
Fair enough - with the 'memmap' command line options one can trigger
this.
>
> Yinghai I sat down and read your patch and the approach you are taking
> is totally wrong.
>
> The problem is that swiotlb_init() in lib/swiotlb.c does not know how to
> fail without panic'ing the system.
>
> Which leaves two valid approaches.
> - Create a variant of swiotlb_init that can fail for use on x86 and
> handle the failure.
As an safe-fail step we could retry with an smaller size until a fit is found.
> - Delay initializing the swiotlb until someone actually needs a mapping
> from it.
So late init the SWIOTLB and perhaps have multiple "segments" of 4MB
of SWIOTLB that can grow as we exhaust its memory. Could work.
>
> Delaying the initialization of the swiotlb is out because the code
> needs an early memory allocation to get a large chunk of contiguous
> memory for the bounce buffers.
Or it can use the late init, but with a smaller chunk of memory.
>
> Which means the panics that occurr in swiotlb_init() need to be delayed
> until someone something actually needs bounce buffers from the swiotlb.
>
> Although arguably what should actually happen instead of panic() is that
> swiotlb_map_single should simply fail early when it was not possible to
> preallocate bounce buffers.
This sounds like a Catch-22. Fail early implies that it would have to do
this when using the bootmem allocator. But the swiotlb_map_single is not
called at that time - it is called _after_ the bootmem allocator has been
de-activated. Actually it is called pretty late - when built-in PCI devices
start off or when 'udev' starts scanning the PCI bus and loading modules.
I think I am misunderstanding you - could you clarify please?
next prev parent reply other threads:[~2013-01-08 2:49 UTC|newest]
Thread overview: 199+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-04 0:48 [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 01/31] x86, mm: Fix page table early allocation offset checking Yinghai Lu
2013-01-04 7:17 ` Borislav Petkov
2013-01-04 21:50 ` Yinghai Lu
2013-01-05 13:05 ` Borislav Petkov
2013-01-15 12:27 ` Stefano Stabellini
2013-01-04 0:48 ` [PATCH v7u1 02/31] x86, 64bit, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 03/31] x86, realmode: set real_mode permissions early Yinghai Lu
2013-01-04 20:15 ` Borislav Petkov
2013-01-04 20:58 ` Yinghai Lu
2013-01-04 21:04 ` Borislav Petkov
2013-01-04 22:13 ` Yinghai Lu
2013-01-05 13:25 ` Borislav Petkov
2013-01-07 12:40 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 04/31] x86, 64bit, mm: add generic kernel/ident mapping helper Yinghai Lu
2013-01-04 21:19 ` Borislav Petkov
2013-01-04 22:19 ` Yinghai Lu
2013-01-05 13:21 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 05/31] x86, 64bit: copy zero-page early Yinghai Lu
2013-01-07 15:53 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 06/31] x86, 64bit, realmode: use init_level4_pgt to set trapmoline_pgt directly Yinghai Lu
2013-01-04 17:18 ` Sakkinen, Jarkko
2013-01-04 22:01 ` Yinghai Lu
2013-01-05 9:59 ` Sakkinen, Jarkko
2013-01-07 15:54 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 07/31] x86, realmode: Separate real_mode reserve and setup Yinghai Lu
2013-01-04 17:18 ` Sakkinen, Jarkko
2013-01-07 15:54 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 08/31] x86, 64bit: early #PF handler set page table Yinghai Lu
2013-01-07 15:55 ` Borislav Petkov
2013-01-10 1:56 ` Yinghai Lu
2013-01-10 12:19 ` Borislav Petkov
2013-01-10 17:05 ` Yinghai Lu
2013-01-10 20:27 ` Borislav Petkov
2013-01-12 22:04 ` H. Peter Anvin
2013-01-04 0:48 ` [PATCH v7u1 09/31] x86, 64bit: #PF handler set page to cover 2M only Yinghai Lu
2013-01-09 22:57 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 10/31] x86, 64bit: Don't set max_pfn_mapped wrong value early on native path Yinghai Lu
2013-01-11 12:13 ` Borislav Petkov
2013-01-11 16:42 ` Yinghai Lu
2013-01-11 16:52 ` Borislav Petkov
2013-01-15 13:48 ` Stefano Stabellini
2013-01-15 15:22 ` Konrad Rzeszutek Wilk
2013-01-15 15:59 ` Stefano Stabellini
2013-01-15 16:37 ` Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 11/31] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 12/31] x86: add get_ramdisk_image/size() Yinghai Lu
2013-01-07 15:56 ` Borislav Petkov
2013-01-10 1:53 ` Yinghai Lu
2013-01-10 12:13 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 13/31] x86, boot: add get_cmd_line_ptr() Yinghai Lu
2013-01-07 15:56 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 14/31] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
2013-01-07 16:00 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 15/31] x86, boot: pass cmd_line_ptr with unsigned long instead Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 16/31] x86, boot: move verify_cpu.S and no_longmode down Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 17/31] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 18/31] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 19/31] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 20/31] x86, kexec: replace ident_mapping_init and init_level4_page Yinghai Lu
2013-01-04 21:01 ` Borislav Petkov
2013-01-04 22:04 ` Yinghai Lu
2013-01-05 13:24 ` Borislav Petkov
2013-01-10 1:26 ` Yinghai Lu
2013-01-10 11:59 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 21/31] x86, kexec: only set ident mapping for ram Yinghai Lu
2013-01-13 12:56 ` Borislav Petkov
2013-01-14 5:46 ` Yinghai Lu
2013-01-14 9:53 ` Borislav Petkov
2013-01-14 18:17 ` Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 22/31] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
2013-01-13 21:41 ` Borislav Petkov
2013-01-14 5:37 ` Yinghai Lu
2013-01-14 9:43 ` Borislav Petkov
2013-01-14 23:06 ` Yinghai Lu
2013-01-14 17:49 ` H. Peter Anvin
2013-01-14 18:57 ` Yinghai Lu
2013-01-14 18:59 ` H. Peter Anvin
2013-01-14 19:19 ` Yinghai Lu
2013-01-14 19:50 ` Yinghai Lu
2013-01-14 19:56 ` H. Peter Anvin
2013-01-14 20:05 ` Yinghai Lu
2013-01-15 6:17 ` Yinghai Lu
2013-01-15 15:50 ` Borislav Petkov
2013-01-15 16:03 ` Yinghai Lu
2013-01-15 16:48 ` Borislav Petkov
2013-01-15 18:43 ` Yinghai Lu
2013-01-15 19:49 ` Borislav Petkov
2013-01-15 20:16 ` Yinghai Lu
2013-01-15 20:28 ` Borislav Petkov
2013-01-14 20:05 ` Borislav Petkov
2013-01-14 20:14 ` Yinghai Lu
2013-01-14 20:26 ` Borislav Petkov
2013-01-14 22:38 ` Yinghai Lu
2013-01-14 23:11 ` Borislav Petkov
2013-01-15 1:04 ` Yinghai Lu
2013-01-14 23:10 ` H. Peter Anvin
2013-01-14 23:21 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 23/31] x86, boot: update comments about entries for 64bit image Yinghai Lu
2013-01-14 11:20 ` Borislav Petkov
2013-01-14 18:35 ` Yinghai Lu
2013-01-14 18:37 ` Yinghai Lu
2013-01-14 18:46 ` Borislav Petkov
2013-01-14 20:01 ` Yinghai Lu
2013-01-14 18:43 ` Borislav Petkov
2013-01-04 0:48 ` [PATCH v7u1 24/31] x86, boot: Not need to check setup_header version for setup_data Yinghai Lu
2013-01-14 11:26 ` Borislav Petkov
2013-01-14 17:37 ` H. Peter Anvin
2013-01-14 18:04 ` Borislav Petkov
2013-01-14 18:42 ` H. Peter Anvin
2013-01-04 0:48 ` [PATCH v7u1 25/31] memblock: add memblock_mem_size() Yinghai Lu
2013-01-14 20:42 ` H. Peter Anvin
2013-01-14 22:28 ` Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 26/31] x86: Don't enable swiotlb if there is not enough ram for it Yinghai Lu
2013-01-04 16:05 ` Konrad Rzeszutek Wilk
2013-01-04 19:57 ` Yinghai Lu
2013-01-04 17:50 ` Shuah Khan
2013-01-04 20:34 ` Yinghai Lu
2013-01-04 21:02 ` Shuah Khan
2013-01-04 22:10 ` Yinghai Lu
2013-01-04 22:26 ` Shuah Khan
2013-01-04 22:34 ` Yinghai Lu
2013-01-04 22:47 ` Eric W. Biederman
2013-01-04 22:56 ` Shuah Khan
2013-01-04 23:00 ` Yinghai Lu
2013-01-04 23:21 ` Shuah Khan
2013-01-04 23:55 ` Yinghai Lu
2013-01-05 2:02 ` Shuah Khan
2013-01-05 4:10 ` Yinghai Lu
2013-01-05 22:04 ` Shuah Khan
2013-01-04 22:58 ` Yinghai Lu
2013-01-07 15:26 ` Konrad Rzeszutek Wilk
2013-01-07 17:02 ` Shuah Khan
2013-01-07 19:29 ` Konrad Rzeszutek Wilk
2013-01-08 2:22 ` Eric W. Biederman
2013-01-08 2:48 ` Konrad Rzeszutek Wilk [this message]
2013-01-08 3:03 ` Eric W. Biederman
2013-01-08 3:01 ` Yinghai Lu
2013-01-08 3:13 ` Eric W. Biederman
2013-01-08 3:50 ` Yinghai Lu
2013-01-08 23:40 ` Yinghai Lu
2013-01-09 0:04 ` Eric W. Biederman
2013-01-09 0:43 ` Konrad Rzeszutek Wilk
2013-01-09 0:56 ` Yinghai Lu
2013-01-09 0:58 ` Eric W. Biederman
2013-01-09 1:07 ` Yinghai Lu
2013-01-09 1:12 ` Yinghai Lu
2013-01-09 2:31 ` Eric W. Biederman
2013-01-09 13:24 ` Konrad Rzeszutek Wilk
2013-01-09 17:27 ` Yinghai Lu
2013-01-09 18:01 ` Shuah Khan
2013-01-09 19:13 ` Yinghai Lu
2013-01-09 21:00 ` Eric W. Biederman
2013-01-09 21:15 ` Yinghai Lu
2013-01-10 23:07 ` Yinghai Lu
2013-01-10 23:15 ` Eric W. Biederman
2013-01-10 23:55 ` Yinghai Lu
2013-01-11 16:35 ` Konrad Rzeszutek Wilk
2013-01-11 16:52 ` Yinghai Lu
2013-01-11 17:49 ` Yinghai Lu
2013-01-15 6:19 ` Yinghai Lu
2013-01-18 15:55 ` Konrad Rzeszutek Wilk
2013-01-24 15:39 ` Konrad Rzeszutek Wilk
2013-01-24 16:51 ` Shuah Khan
2013-01-24 19:22 ` Shuah Khan
2013-01-24 21:50 ` Yinghai Lu
2013-01-29 2:27 ` Shuah Khan
2013-01-29 3:44 ` Yinghai Lu
2013-01-31 19:28 ` Shuah Khan
2013-01-31 19:35 ` H. Peter Anvin
2013-01-09 13:12 ` Konrad Rzeszutek Wilk
2013-01-07 20:32 ` Yinghai Lu
2013-01-07 21:30 ` Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 27/31] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
2013-01-14 15:43 ` Borislav Petkov
2013-01-14 18:18 ` Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 28/31] x86: add Crash kernel low reservation Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 29/31] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 30/31] x86, 64bit, mm: Mark data/bss/brk to nx Yinghai Lu
2013-01-04 0:48 ` [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init Yinghai Lu
2013-01-04 11:43 ` Rafael J. Wysocki
2013-01-04 21:59 ` Yinghai Lu
2013-01-04 22:07 ` Rafael J. Wysocki
2013-01-04 7:09 ` [PATCH v7u1 00/31] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Borislav Petkov
2013-01-04 21:44 ` Yinghai Lu
2013-01-14 20:45 ` H. Peter Anvin
2013-01-14 22:44 ` Yinghai Lu
2013-01-14 23:16 ` H. Peter Anvin
2013-01-14 23:39 ` David Woodhouse
2013-01-14 23:50 ` H. Peter Anvin
2013-01-15 0:12 ` David Woodhouse
2013-01-15 12:19 ` Stefano Stabellini
2013-01-15 16:43 ` Yinghai Lu
2013-01-15 19:28 ` Yinghai Lu
2013-01-16 11:32 ` Stefano Stabellini
2013-01-16 17:31 ` Yinghai Lu
2013-01-16 17:38 ` H. Peter Anvin
2013-01-16 18:20 ` Yinghai Lu
2013-01-17 2:35 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130108024830.GE9865@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=ebiederm@xmission.com \
--cc=hpa@zytor.com \
--cc=jan.kiszka@web.de \
--cc=jason.wessel@windriver.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=shuahkhan@gmail.com \
--cc=tglx@linutronix.de \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.