* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
[not found] ` <20190215102458.GD10433-Jj63ApZU6fQ@public.gmane.org>
@ 2019-02-18 1:48 ` Dave Young
[not found] ` <20190218014820.GA10711-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2019-02-20 8:32 ` Borislav Petkov
0 siblings, 2 replies; 18+ messages in thread
From: Dave Young @ 2019-02-18 1:48 UTC (permalink / raw)
To: Borislav Petkov
Cc: Randy Dunlap, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
x86-DgEjT+Ai2ygdnm+yROfE0A,
kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jerry Hoemann,
Pingfan Liu, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Rapoport,
Andrew Morton, yinghai-DgEjT+Ai2ygdnm+yROfE0A,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA
On 02/15/19 at 11:24am, Borislav Petkov wrote:
> On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > Even we make it automatic in kernel, but we have to have some default
> > value for swiotlb in case crashkernel can not find a free region under 4G.
> > So this default value can not work for every use cases, people need
> > manually use crashkernel=,low and crashkernel=,high in case
> > crashkernel=X does not work.
>
> Why would the user need to find swiotlb range? The kernel has all the
> information it requires at its finger tips in order to decide properly.
>
> The user wants a crashkernel range, the kernel tries the low range =>
> no workie, then it tries the next range => workie but needs to allocate
> swiotlb range so that DMA can happen too. Doh, then the kernel does
> allocate that too.
It is ideal if kernel can do it automatically, but I'm not sure if
kernel can predict the swiotlb reserved size automatically.
Let's add more people to seek for comments.
>
> Why would the user need to do anything here?!
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
[not found] ` <20190218014820.GA10711-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2019-02-20 7:38 ` Pingfan Liu
0 siblings, 0 replies; 18+ messages in thread
From: Pingfan Liu @ 2019-02-20 7:38 UTC (permalink / raw)
To: Dave Young
Cc: Randy Dunlap, Baoquan He, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
x86-DgEjT+Ai2ygdnm+yROfE0A,
kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jerry Hoemann, LKML,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Rapoport,
Borislav Petkov, Andrew Morton, Yinghai Lu,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA
On Mon, Feb 18, 2019 at 9:48 AM Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> On 02/15/19 at 11:24am, Borislav Petkov wrote:
> > On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > > Even we make it automatic in kernel, but we have to have some default
> > > value for swiotlb in case crashkernel can not find a free region under 4G.
> > > So this default value can not work for every use cases, people need
> > > manually use crashkernel=,low and crashkernel=,high in case
> > > crashkernel=X does not work.
> >
> > Why would the user need to find swiotlb range? The kernel has all the
> > information it requires at its finger tips in order to decide properly.
> >
> > The user wants a crashkernel range, the kernel tries the low range =>
> > no workie, then it tries the next range => workie but needs to allocate
> > swiotlb range so that DMA can happen too. Doh, then the kernel does
> > allocate that too.
>
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
>
Agreed, I think it is hard to decide the reserved size automatically.
We do not know the requirement for memory of ZONE_DMA32 at boot time.
The requirement depends on how many DMA32 devices, and the dynamic
payload of them.
> Let's add more people to seek for comments.
>
> >
> > Why would the user need to do anything here?!
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-18 1:48 ` [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Dave Young
[not found] ` <20190218014820.GA10711-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2019-02-20 8:32 ` Borislav Petkov
2019-02-20 9:41 ` Dave Young
1 sibling, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2019-02-20 8:32 UTC (permalink / raw)
To: Dave Young
Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal, iommu,
konrad.wilk
On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> It is ideal if kernel can do it automatically, but I'm not sure if
> kernel can predict the swiotlb reserved size automatically.
Do you see how even more absurd this gets?
If the kernel cannot know the swiotlb reserved size automatically, how
is the normal user even supposed to know?!
I see swiotlb_size_or_default() so we have a sane default which we fall
back to. Now where's the problem with that?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-20 8:32 ` Borislav Petkov
@ 2019-02-20 9:41 ` Dave Young
2019-02-20 12:51 ` Pingfan Liu
2019-02-21 17:13 ` Borislav Petkov
0 siblings, 2 replies; 18+ messages in thread
From: Dave Young @ 2019-02-20 9:41 UTC (permalink / raw)
To: Borislav Petkov
Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal, iommu,
konrad.wilk, Joerg Roedel
On 02/20/19 at 09:32am, Borislav Petkov wrote:
> On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > It is ideal if kernel can do it automatically, but I'm not sure if
> > kernel can predict the swiotlb reserved size automatically.
>
> Do you see how even more absurd this gets?
>
> If the kernel cannot know the swiotlb reserved size automatically, how
> is the normal user even supposed to know?!
>
> I see swiotlb_size_or_default() so we have a sane default which we fall
> back to. Now where's the problem with that?
Good question, I expect some answer from people who know more about the
background. It would be good to have some actual test results, Pingfan
is trying to do some tests.
Previously Joerg posted below patch, maybe he has some idea. Joerg?
commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
Author: Joerg Roedel <jroedel@suse.de>
Date: Wed Jun 10 17:49:42 2015 +0200
x86/crash: Allocate enough low memory when crashkernel=high
Thanks
Dave
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-20 9:41 ` Dave Young
@ 2019-02-20 12:51 ` Pingfan Liu
2019-02-21 17:13 ` Borislav Petkov
1 sibling, 0 replies; 18+ messages in thread
From: Pingfan Liu @ 2019-02-20 12:51 UTC (permalink / raw)
To: Dave Young
Cc: Borislav Petkov, Baoquan He, Jerry Hoemann, x86, Randy Dunlap,
kexec, LKML, Mike Rapoport, Andrew Morton, Yinghai Lu, vgoyal,
iommu, konrad.wilk, Joerg Roedel
On Wed, Feb 20, 2019 at 5:41 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 02/20/19 at 09:32am, Borislav Petkov wrote:
> > On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > > It is ideal if kernel can do it automatically, but I'm not sure if
> > > kernel can predict the swiotlb reserved size automatically.
> >
> > Do you see how even more absurd this gets?
> >
> > If the kernel cannot know the swiotlb reserved size automatically, how
> > is the normal user even supposed to know?!
> >
I think swiotlb is bounce-buffer, if we enlarge it, we can get better
performance. Default size should be enough for platform to work. But
in case of reserving low memory for crashkernel, things are different.
The reserve low memory = swiotlb_size_or_default() + DMA32 memory for
devices. And the 2nd item in the right of the equation varies, based
on machine type and dynamic payload
> > I see swiotlb_size_or_default() so we have a sane default which we fall
> > back to. Now where's the problem with that?
>
> Good question, I expect some answer from people who know more about the
> background. It would be good to have some actual test results, Pingfan
> is trying to do some tests.
>
Not following the idea, I do not think the following test result can
tell much. (We need various type of machine to get a final result.)
I do a quick test on "HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10",
command line "crashkernel=180M,high crashkernel=64M,low" can work for
the 2nd kernel. Although it complained some memory shortage issue:
[ 7.655591] fbcon: mgadrmfb (fb0) is primary device
[ 7.655639] Console: switching to colour frame buffer device 128x48
[ 7.660609] systemd-udevd: page allocation failure: order:0, mode:0x280d4
[ 7.660611] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted
3.10.0-957.el7.x86_64 #1
[ 7.660612] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 06/20/2018
[ 7.660612] Call Trace:
[ 7.660621] [<ffffffff81761dc1>] dump_stack+0x19/0x1b
[ 7.660625] [<ffffffff811bc830>] warn_alloc_failed+0x110/0x180
[ 7.660628] [<ffffffff8175d3ce>] __alloc_pages_slowpath+0x6b6/0x724
[ 7.660631] [<ffffffff811c0e95>] __alloc_pages_nodemask+0x405/0x420
[ 7.660633] [<ffffffff8120dcf8>] alloc_pages_current+0x98/0x110
[ 7.660638] [<ffffffffc00c8622>] ttm_pool_populate+0x3d2/0x4b0 [ttm]
[ 7.660641] [<ffffffffc00bf1cd>] ttm_tt_populate+0x7d/0x90 [ttm]
[ 7.660644] [<ffffffffc00c3c74>] ttm_bo_kmap+0x124/0x240 [ttm]
[ 7.660648] [<ffffffff810cecbf>] ? __wake_up_sync_key+0x4f/0x60
[ 7.660650] [<ffffffffc012677e>] mga_dirty_update+0x25e/0x310 [mgag200]
[ 7.660653] [<ffffffffc012685f>] mga_imageblit+0x2f/0x40 [mgag200]
[ 7.660657] [<ffffffff813f97ca>] soft_cursor+0x1ba/0x260
[ 7.660659] [<ffffffff813f8f53>] bit_cursor+0x663/0x6a0
[ 7.660662] [<ffffffff81098739>] ? console_trylock+0x19/0x70
[ 7.660664] [<ffffffff813f514d>] fbcon_cursor+0x13d/0x1c0
[ 7.660665] [<ffffffff813f88f0>] ? bit_clear+0x120/0x120
[ 7.660668] [<ffffffff8146af2e>] hide_cursor+0x2e/0xa0
[ 7.660669] [<ffffffff8146d4e8>] redraw_screen+0x188/0x270
[ 7.660671] [<ffffffff8146e086>] do_bind_con_driver+0x316/0x340
[ 7.660672] [<ffffffff8146e5e9>] do_take_over_console+0x49/0x60
[ 7.660674] [<ffffffff813f24c3>] do_fbcon_takeover+0x63/0xd0
[ 7.660675] [<ffffffff813f808d>] fbcon_event_notify+0x61d/0x730
[ 7.660678] [<ffffffff8176fb0f>] notifier_call_chain+0x4f/0x70
[ 7.660681] [<ffffffff810c7f6d>] __blocking_notifier_call_chain+0x4d/0x70
[ 7.660683] [<ffffffff810c7fa6>] blocking_notifier_call_chain+0x16/0x20
[ 7.660684] [<ffffffff813e8b9b>] fb_notifier_call_chain+0x1b/0x20
[ 7.660686] [<ffffffff813e9e46>] register_framebuffer+0x1f6/0x340
[ 7.660690] [<ffffffffc01027e2>]
__drm_fb_helper_initial_config_and_unlock+0x252/0x3e0 [drm_kms_helper]
[ 7.660694] [<ffffffffc01029ae>]
drm_fb_helper_initial_config+0x3e/0x50 [drm_kms_helper]
[ 7.660697] [<ffffffffc01269d3>] mgag200_fbdev_init+0xe3/0x100 [mgag200]
[ 7.660699] [<ffffffffc01254f4>] mgag200_modeset_init+0x154/0x1d0 [mgag200]
[ 7.660701] [<ffffffffc012157d>] mgag200_driver_load+0x41d/0x5b0 [mgag200]
[ 7.660708] [<ffffffffc005ba4f>] drm_dev_register+0x15f/0x1f0 [drm]
[ 7.660711] [<ffffffff813c3518>] ? pci_enable_device_flags+0xe8/0x140
[ 7.660718] [<ffffffffc005d0da>] drm_get_pci_dev+0x8a/0x1a0 [drm]
[ 7.660720] [<ffffffffc012626b>] mga_pci_probe+0x9b/0xc0 [mgag200]
[ 7.660722] [<ffffffff813c4aca>] local_pci_probe+0x4a/0xb0
[ 7.660723] [<ffffffff813c6209>] pci_device_probe+0x109/0x160
[ 7.660726] [<ffffffff814a8285>] driver_probe_device+0xc5/0x3e0
[ 7.660727] [<ffffffff814a8683>] __driver_attach+0x93/0xa0
[ 7.660728] [<ffffffff814a85f0>] ? __device_attach+0x50/0x50
[ 7.660730] [<ffffffff814a5e25>] bus_for_each_dev+0x75/0xc0
[ 7.660731] [<ffffffff814a7bfe>] driver_attach+0x1e/0x20
[ 7.660733] [<ffffffff814a76a0>] bus_add_driver+0x200/0x2d0
[ 7.660734] [<ffffffff814a8d14>] driver_register+0x64/0xf0
[ 7.660735] [<ffffffff813c5a45>] __pci_register_driver+0xa5/0xc0
[ 7.660737] [<ffffffffc012d000>] ? 0xffffffffc012cfff
[ 7.660739] [<ffffffffc012d039>] mgag200_init+0x39/0x1000 [mgag200]
[ 7.660742] [<ffffffff8100210a>] do_one_initcall+0xba/0x240
[ 7.660745] [<ffffffff81118f8c>] load_module+0x272c/0x2bc0
[ 7.660748] [<ffffffff813a3030>] ? ddebug_proc_write+0x100/0x100
[ 7.660750] [<ffffffff8111950f>] SyS_init_module+0xef/0x140
[ 7.660752] [<ffffffff81774ddb>] system_call_fastpath+0x22/0x27
[ 7.660753] Mem-Info:
[ 7.660756] active_anon:3364 inactive_anon:6661 isolated_anon:0
[ 7.660756] active_file:0 inactive_file:0 isolated_file:0
[ 7.660756] unevictable:0 dirty:0 writeback:0 unstable:0
[ 7.660756] slab_reclaimable:1492 slab_unreclaimable:3116
[ 7.660756] mapped:1223 shmem:8449 pagetables:179 bounce:0
[ 7.660756] free:20626 free_pcp:0 free_cma:0
[ 7.660761] Node 0 DMA free:0kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:564kB
managed:448kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 7.660762] lowmem_reserve[]: 0 0 152 152
[ 7.660766] Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65536kB
managed:0kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 7.660767] lowmem_reserve[]: 0 0 152 152
[ 7.660771] Node 0 Normal free:82504kB min:1572kB low:1964kB
high:2356kB active_anon:13456kB inactive_anon:26644kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:183740kB managed:158716kB mlocked:0kB
dirty:0kB writeback:0kB mapped:4892kB shmem:33796kB
slab_reclaimable:5968kB slab_unreclaimable:12464kB kernel_stack:784kB
pagetables:716kB unstable:0kB bounce:0kB free_pcp:[ 8.722693]
Microsemi PQI Driver (v1.1.4-115)
> Previously Joerg posted below patch, maybe he has some idea. Joerg?
>
> commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
> Author: Joerg Roedel <jroedel@suse.de>
> Date: Wed Jun 10 17:49:42 2015 +0200
>
> x86/crash: Allocate enough low memory when crashkernel=high
>
> Thanks
> Dave
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-20 9:41 ` Dave Young
2019-02-20 12:51 ` Pingfan Liu
@ 2019-02-21 17:13 ` Borislav Petkov
2019-02-22 2:11 ` Dave Young
1 sibling, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2019-02-21 17:13 UTC (permalink / raw)
To: Dave Young
Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal, iommu,
konrad.wilk, Joerg Roedel
On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> Previously Joerg posted below patch, maybe he has some idea. Joerg?
Isn't it clear from the commit message?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-21 17:13 ` Borislav Petkov
@ 2019-02-22 2:11 ` Dave Young
2019-02-22 8:42 ` Joerg Roedel
0 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2019-02-22 2:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal, iommu,
konrad.wilk, Joerg Roedel
On 02/21/19 at 06:13pm, Borislav Petkov wrote:
> On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> > Previously Joerg posted below patch, maybe he has some idea. Joerg?
>
> Isn't it clear from the commit message?
Then, does it answered your question?
256M is set as a default value in the patch, but it is not a predict to
satisfy all use cases, from the description it is also possible that some
people run out of the 256M and the ,low and ,high format is still
necessary to exist even if we make crashkernel=X do the allocation
automatically in high in case failed in low area.
crashkernel=X: allocate in low first, if not possible, then allocate in
high
In case people have a lot of devices need more swiotlb, then he manually
set the ,high with ,low together.
What's your suggestion then? remove ,low and ,high and increase default
256M in case we get failure bug report?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-22 2:11 ` Dave Young
@ 2019-02-22 8:42 ` Joerg Roedel
2019-02-22 13:00 ` Borislav Petkov
0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2019-02-22 8:42 UTC (permalink / raw)
To: Dave Young
Cc: Borislav Petkov, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
vgoyal, iommu, konrad.wilk
On Fri, Feb 22, 2019 at 10:11:01AM +0800, Dave Young wrote:
> In case people have a lot of devices need more swiotlb, then he manually
> set the ,high with ,low together.
The option to specify the high and low values for the crashkernel are
important for certain machines. The point is that swiotlb already
allocates 64MB of low memory by default. But that memory is only used
for 32bit DMA-mask devices that want to DMA into high memory. There are
drivers just allocating GFP_DMA32 memory, which also ends up in the low
region (but not swiotlb), that is why the previous default of 72MB low
memory was not enough, it only left 8MB of GFP_DMA32 memory. The current
default of 256MB was found by experiments on a bigger number of
machines, to create a reasonable default that is at least likely to be
sufficient of an average machine.
There is no way today for the kernel to find an optimum value for the
amount of low memory required to successfully create a crash dump. It
depends on the amount of devices in the system and how the drivers
for them are written. The drivers have no way to report back their
requirements, and even if they had, at the time the allocation happens
no driver is loaded yet.
So it is up to the system administrator to find workable values for the
high and low memory requirements, even using experiments as a last
resort.
Regards,
Joerg
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-22 8:42 ` Joerg Roedel
@ 2019-02-22 13:00 ` Borislav Petkov
2019-02-24 13:25 ` Pingfan Liu
2019-02-25 11:00 ` Joerg Roedel
0 siblings, 2 replies; 18+ messages in thread
From: Borislav Petkov @ 2019-02-22 13:00 UTC (permalink / raw)
To: Joerg Roedel, Dave Young
Cc: bhe, Jerry Hoemann, x86, Randy Dunlap, kexec, linux-kernel,
Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai, vgoyal, iommu,
konrad.wilk
On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> The current default of 256MB was found by experiments on a bigger
> number of machines, to create a reasonable default that is at least
> likely to be sufficient of an average machine.
Exactly, and this is what makes sense.
The code should try the requested reservation and if it fails, it should
try high allocation with default swiotlb size because we need to reserve
*some* range.
If that reservation succeeds, we should say something along the lines of
"... requested range failed, reserved <X> range instead."
And then in Documentation/admin-guide/kernel-parameters.txt above the
crashkernel= explanations, the allocation strategy of best effort should
be explained in short. That the kernel will try to allocate high if the
requested allocation didn't succeed and that the user can tweak the
allocation with the below options.
Bottom line is: the kernel should assist the user and try harder to
allocate *some* range for a crash kernel when there's no detailed
specification what that range should be.
*If* the user adds ,low, high, then the kernel should try only that
specified range because the assumption is that the user knows what she's
doing.
But if the user simply wants a range for a crash kernel without stating
where that range should be in particular and it's placement is a don't
care - as long as there is a range - then the kernel should simply try
high, etc.
Makes sense?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-22 13:00 ` Borislav Petkov
@ 2019-02-24 13:25 ` Pingfan Liu
2019-02-25 1:53 ` Dave Young
2019-02-25 9:39 ` Borislav Petkov
2019-02-25 11:00 ` Joerg Roedel
1 sibling, 2 replies; 18+ messages in thread
From: Pingfan Liu @ 2019-02-24 13:25 UTC (permalink / raw)
To: Borislav Petkov
Cc: Joerg Roedel, Dave Young, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
>
> If that reservation succeeds, we should say something along the lines of
>
> "... requested range failed, reserved <X> range instead."
>
Maybe I misunderstood you, but does "requested range failed" mean that
user specify the range? If yes, then it should be the duty of user as
you said later, not the duty of kernel"
> And then in Documentation/admin-guide/kernel-parameters.txt above the
> crashkernel= explanations, the allocation strategy of best effort should
> be explained in short. That the kernel will try to allocate high if the
> requested allocation didn't succeed and that the user can tweak the
> allocation with the below options.
>
Yes, it should be improved.
> Bottom line is: the kernel should assist the user and try harder to
> allocate *some* range for a crash kernel when there's no detailed
> specification what that range should be.
>
> *If* the user adds ,low, high, then the kernel should try only that
> specified range because the assumption is that the user knows what she's
> doing.
>
> But if the user simply wants a range for a crash kernel without stating
> where that range should be in particular and it's placement is a don't
> care - as long as there is a range - then the kernel should simply try
> high, etc.
>
We do not know the memory layout of a system, maybe a system with
memory less than 4GB. So it is better to try all the range of system
memory
Thanks,
Pingfan
> Makes sense?
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-24 13:25 ` Pingfan Liu
@ 2019-02-25 1:53 ` Dave Young
2019-02-25 9:39 ` Borislav Petkov
1 sibling, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-02-25 1:53 UTC (permalink / raw)
To: Pingfan Liu
Cc: Borislav Petkov, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
On 02/24/19 at 09:25pm, Pingfan Liu wrote:
> On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> >
> > If that reservation succeeds, we should say something along the lines of
> >
> > "... requested range failed, reserved <X> range instead."
> >
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"
If you go with the changes in your current patch it is needed to say
something like:
"crashkernel: can not find free memory under 4G, reserve XM@.. instead"
Also need to print the reserved low memory area in case ,high being used.
But for 896M -> 4G, the 896M faulure is not necessary to show in dmesg,
it is some in kernel logic.
Thanks
Dave
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-24 13:25 ` Pingfan Liu
2019-02-25 1:53 ` Dave Young
@ 2019-02-25 9:39 ` Borislav Petkov
1 sibling, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2019-02-25 9:39 UTC (permalink / raw)
To: Pingfan Liu
Cc: Joerg Roedel, Dave Young, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
On Sun, Feb 24, 2019 at 09:25:18PM +0800, Pingfan Liu wrote:
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"
No, it should say that it selected a different range only when the user
didn't specify it. Which would mean that the user didn't care about the
range - she/he only wanted to have *any* crashkernel range reserved.
I.e., crashkernel=X invocation.
> We do not know the memory layout of a system, maybe a system with
> memory less than 4GB. So it is better to try all the range of system
> memory.
Ok. If 4G fails, you set high and then try again.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-22 13:00 ` Borislav Petkov
2019-02-24 13:25 ` Pingfan Liu
@ 2019-02-25 11:00 ` Joerg Roedel
2019-02-25 11:12 ` Dave Young
1 sibling, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2019-02-25 11:00 UTC (permalink / raw)
To: Borislav Petkov
Cc: Dave Young, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
vgoyal, iommu, konrad.wilk
On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > The current default of 256MB was found by experiments on a bigger
> > number of machines, to create a reasonable default that is at least
> > likely to be sufficient of an average machine.
>
> Exactly, and this is what makes sense.
>
> The code should try the requested reservation and if it fails, it should
> try high allocation with default swiotlb size because we need to reserve
> *some* range.
Right, makes sense. While at it, maybe it is time to move the default
allocation policy to 'high' again. The change was reverted six years ago
because it broke old kexec tools, but those are probably out-of-service
now. I think this change would make the whole crashdump allocation
process less fragile.
Regards,
Joerg
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-25 11:00 ` Joerg Roedel
@ 2019-02-25 11:12 ` Dave Young
[not found] ` <20190225111216.GA9276-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2019-02-25 11:12 UTC (permalink / raw)
To: Joerg Roedel
Cc: Borislav Petkov, bhe, Jerry Hoemann, x86, Randy Dunlap, kexec,
linux-kernel, Pingfan Liu, Mike Rapoport, Andrew Morton, yinghai,
vgoyal, iommu, konrad.wilk
On 02/25/19 at 12:00pm, Joerg Roedel wrote:
> On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
>
> Right, makes sense. While at it, maybe it is time to move the default
> allocation policy to 'high' again. The change was reverted six years ago
> because it broke old kexec tools, but those are probably out-of-service
> now. I think this change would make the whole crashdump allocation
> process less fragile.
One concern about this is for average cases, one do not need so much
memory for kdump. For example in RHEL we use crashkernel=auto to
automatically reserve kdump kernel memory, and for x86 the reserved size
is like below now:
1G-64G:160M,64G-1T:256M,1T-:512M
That means for a machine with less than 64G memory we only allocate
160M, it works for most machines in our lab.
If we move to high as default, it will allocate 160M high + 256M low. It
is too much for people who is good with the default 160M. Especially
for virtual machine with less memory (but > 4G)
To make the process less fragile maybe we can remove the 896M limitation
and only try <4G then go to high.
Thanks
Dave
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
[not found] ` <20190225111216.GA9276-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2019-02-25 11:30 ` Borislav Petkov
2019-03-01 3:04 ` Pingfan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2019-02-25 11:30 UTC (permalink / raw)
To: Dave Young
Cc: Joerg Roedel, bhe-H+wXaHxf7aLQT0dZR+AlfA,
konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA, x86-DgEjT+Ai2ygdnm+yROfE0A,
kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jerry Hoemann,
Pingfan Liu, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mike Rapoport,
Randy Dunlap, Andrew Morton, yinghai-DgEjT+Ai2ygdnm+yROfE0A,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA
On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> If we move to high as default, it will allocate 160M high + 256M low. It
We won't move to high by default - we will *fall* back to high if the
default allocation fails.
> To make the process less fragile maybe we can remove the 896M limitation
> and only try <4G then go to high.
Sure, the more robust for the user, the better.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-02-25 11:30 ` Borislav Petkov
@ 2019-03-01 3:04 ` Pingfan Liu
2019-03-01 3:19 ` Pingfan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Pingfan Liu @ 2019-03-01 3:04 UTC (permalink / raw)
To: Borislav Petkov
Cc: Dave Young, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
Hi Borislav,
Do you think the following patch is good at present?
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 81f9d23..9213073 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -460,7 +460,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_ADDR_LOW_MAX (896UL << 20)
+# define CRASH_ADDR_LOW_MAX (1 << 32)
# define CRASH_ADDR_HIGH_MAX MAXMEM
#endif
For documentation, I will send another patch to improve the description.
Thanks,
Pingfan
On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > If we move to high as default, it will allocate 160M high + 256M low. It
>
> We won't move to high by default - we will *fall* back to high if the
> default allocation fails.
>
> > To make the process less fragile maybe we can remove the 896M limitation
> > and only try <4G then go to high.
>
> Sure, the more robust for the user, the better.
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-03-01 3:04 ` Pingfan Liu
@ 2019-03-01 3:19 ` Pingfan Liu
2019-03-22 8:22 ` Dave Young
0 siblings, 1 reply; 18+ messages in thread
From: Pingfan Liu @ 2019-03-01 3:19 UTC (permalink / raw)
To: Borislav Petkov
Cc: Dave Young, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> Hi Borislav,
>
> Do you think the following patch is good at present?
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 81f9d23..9213073 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -460,7 +460,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX (512 << 20)
> #else
> -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX (1 << 32)
> # define CRASH_ADDR_HIGH_MAX MAXMEM
> #endif
>
Or patch lools like:
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..ed0def5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -459,7 +459,7 @@ static void __init
memblock_x86_reserve_range_setup_data(void)
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_ADDR_LOW_MAX (896UL << 20)
+# define CRASH_ADDR_LOW_MAX (1 << 32)
# define CRASH_ADDR_HIGH_MAX MAXMEM
#endif
@@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
high ? CRASH_ADDR_HIGH_MAX
: CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+ /*
+ * crashkernel=X reserve below 4G fails? Try MAXMEM
+ */
+ if (!high && !crash_base)
+ crash_base = memblock_find_in_range(CRASH_ALIGN,
+ CRASH_ADDR_HIGH_MAX,
+ crash_size, CRASH_ALIGN);
+#endif
which tries 0-4G, the fall back to 4G above
> For documentation, I will send another patch to improve the description.
>
> Thanks,
> Pingfan
>
> On Mon, Feb 25, 2019 at 7:30 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Feb 25, 2019 at 07:12:16PM +0800, Dave Young wrote:
> > > If we move to high as default, it will allocate 160M high + 256M low. It
> >
> > We won't move to high by default - we will *fall* back to high if the
> > default allocation fails.
> >
> > > To make the process less fragile maybe we can remove the 896M limitation
> > > and only try <4G then go to high.
> >
> > Sure, the more robust for the user, the better.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr
2019-03-01 3:19 ` Pingfan Liu
@ 2019-03-22 8:22 ` Dave Young
0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2019-03-22 8:22 UTC (permalink / raw)
To: Pingfan Liu
Cc: Borislav Petkov, Joerg Roedel, Baoquan He, Jerry Hoemann, x86,
Randy Dunlap, kexec, LKML, Mike Rapoport, Andrew Morton,
Yinghai Lu, vgoyal, iommu, konrad.wilk
Hi Pingfan,
Thanks for the effort,
On 03/01/19 at 11:19am, Pingfan Liu wrote:
> On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > Hi Borislav,
> >
> > Do you think the following patch is good at present?
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 81f9d23..9213073 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -460,7 +460,7 @@ static void __init
> > memblock_x86_reserve_range_setup_data(void)
> > # define CRASH_ADDR_LOW_MAX (512 << 20)
> > # define CRASH_ADDR_HIGH_MAX (512 << 20)
> > #else
> > -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> > +# define CRASH_ADDR_LOW_MAX (1 << 32)
> > # define CRASH_ADDR_HIGH_MAX MAXMEM
> > #endif
> >
> Or patch lools like:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..ed0def5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -459,7 +459,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX (512 << 20)
> #else
> -# define CRASH_ADDR_LOW_MAX (896UL << 20)
> +# define CRASH_ADDR_LOW_MAX (1 << 32)
> # define CRASH_ADDR_HIGH_MAX MAXMEM
> #endif
>
> @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
> : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> + * crashkernel=X reserve below 4G fails? Try MAXMEM
> + */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>
> which tries 0-4G, the fall back to 4G above
This way looks good to me, I will do some testing with old kexec-tools,
Once testing done I can take up this again and repost later with some documentation
update. Also will split to 2 patches one to drop the old limitation,
another for the fallback.
Thanks
Dave
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-03-22 8:22 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190125140823.GC27998@zn.tnic>
[not found] ` <20190131075907.GB19091@dhcp-128-65.nay.redhat.com>
[not found] ` <20190131105732.GC6749@zn.tnic>
[not found] ` <20190131222732.GA946@anatevka>
[not found] ` <20190131234740.GO6749@zn.tnic>
[not found] ` <20190204223016.GB11986@anatevka>
[not found] ` <20190205081552.GG21801@zn.tnic>
[not found] ` <20190206120804.GC10062@dhcp-128-65.nay.redhat.com>
[not found] ` <20190211204816.GB21473@dhcp-128-65.nay.redhat.com>
[not found] ` <20190215102458.GD10433@zn.tnic>
[not found] ` <20190215102458.GD10433-Jj63ApZU6fQ@public.gmane.org>
2019-02-18 1:48 ` [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr Dave Young
[not found] ` <20190218014820.GA10711-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2019-02-20 7:38 ` Pingfan Liu
2019-02-20 8:32 ` Borislav Petkov
2019-02-20 9:41 ` Dave Young
2019-02-20 12:51 ` Pingfan Liu
2019-02-21 17:13 ` Borislav Petkov
2019-02-22 2:11 ` Dave Young
2019-02-22 8:42 ` Joerg Roedel
2019-02-22 13:00 ` Borislav Petkov
2019-02-24 13:25 ` Pingfan Liu
2019-02-25 1:53 ` Dave Young
2019-02-25 9:39 ` Borislav Petkov
2019-02-25 11:00 ` Joerg Roedel
2019-02-25 11:12 ` Dave Young
[not found] ` <20190225111216.GA9276-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2019-02-25 11:30 ` Borislav Petkov
2019-03-01 3:04 ` Pingfan Liu
2019-03-01 3:19 ` Pingfan Liu
2019-03-22 8:22 ` Dave Young
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox