* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
[not found] ` <CAAmzW4MrD75+Prw=fQ=d5uXKgGy3urBwmxnNtoNsw5M1m9xjYQ@mail.gmail.com>
@ 2020-05-04 14:03 ` Eric W. Biederman
2020-05-04 21:59 ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
2020-05-06 5:23 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
0 siblings, 2 replies; 6+ messages in thread
From: Eric W. Biederman @ 2020-05-04 14:03 UTC (permalink / raw)
To: Joonsoo Kim
Cc: kernel-team, Michal Hocko, Minchan Kim, Aneesh Kumar K . V,
Rik van Riel, Rafael J . Wysocki, LKML, Christian Koenig,
Christoph Hellwig, Linux Memory Management List, Huang Rui,
Kexec Mailing List, Pavel Machek, Johannes Weiner, Joonsoo Kim,
Andrew Morton, Laura Abbott, Mel Gorman, Roman Gushchin,
Vlastimil Babka
I have added in the kexec mailling list.
Looking at the patch we are discussing it appears that the kexec code
could be doing much better in highmem situations today but is not.
Joonsoo Kim <js1304@gmail.com> writes:
> 2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman <ebiederm@xmission.com>님이 작성:
>>
>> js1304@gmail.com writes:
>>
>> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >
>> > Until now, PageHighMem() is used for two different cases. One is to check
>> > if there is a direct mapping for this page or not. The other is to check
>> > the zone of this page, that is, weather it is the highmem type zone or not.
>> >
>> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
>> > each cases. Use appropriate one.
>> >
>> > Note that there are some rules to determine the proper macro.
>> >
>> > 1. If PageHighMem() is called for checking if the direct mapping exists
>> > or not, use PageHighMem().
>> > 2. If PageHighMem() is used to predict the previous gfp_flags for
>> > this page, use PageHighMemZone(). The zone of the page is related to
>> > the gfp_flags.
>> > 3. If purpose of calling PageHighMem() is to count highmem page and
>> > to interact with the system by using this count, use PageHighMemZone().
>> > This counter is usually used to calculate the available memory for an
>> > kernel allocation and pages on the highmem zone cannot be available
>> > for an kernel allocation.
>> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
>> > is just copy of the previous PageHighMem() implementation and won't
>> > be changed.
>> >
>> > I apply the rule #2 for this patch.
>>
>> Hmm.
>>
>> What happened to the notion of deprecating and reducing the usage of
>> highmem? I know that we have some embedded architectures where it is
>> still important but this feels like it flies in the face of that.
>
> AFAIK, deprecating highmem requires some more time and, before then,
> we need to support it.
But it at least makes sense to look at what we are doing with highmem
and ask if it makes sense.
>> This part of kexec would be much more maintainable if it had a proper
>> mm layer helper that tested to see if the page matched the passed in
>> gfp flags. That way the mm layer could keep changing and doing weird
>> gyrations and this code would not care.
>
> Good idea! I will do it.
>
>>
>> What would be really helpful is if there was a straight forward way to
>> allocate memory whose physical address fits in the native word size.
>>
>>
>> All I know for certain about this patch is that it takes a piece of code
>> that looked like it made sense, and transfroms it into something I can
>> not easily verify, and can not maintain.
>
> Although I decide to make a helper as you described above, I don't
> understand why you think that a new code isn't maintainable. It is just
> the same thing with different name. Could you elaborate more why do
> you think so?
Because the current code is already wrong. It does not handle
the general case of what it claims to handle. When the only distinction
that needs to be drawn is highmem or not highmem that is likely fine.
But now you are making it possible to draw more distinctions. At which
point I have no idea which distinction needs to be drawn.
The code and the logic is about 20 years old. When it was written I
don't recally taking numa seriously and the kernel only had 3 zones
as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH).
The code attempts to work around limitations of those old zones amd play
nice in a highmem world by allocating memory HIGH memory and not using
it if the memory was above 4G ( on 32bit ).
Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should
probably be using that, when allocating memory.
Further in dealing with this memory management situation we only
have two situations we call kimage_alloc_page.
For an indirect page which must have a valid page_address(page).
We could probably relax that if we cared to.
For a general kexec page to store the next kernel in until we switch.
The general pages can be in high memory.
In a highmem world all of those pages should be below 32bit.
Given that we fundamentally have two situations my sense is that we
should just refactor the code so that we never have to deal with:
/* The old page I have found cannot be a
* destination page, so return it if it's
* gfp_flags honor the ones passed in.
*/
if (!(gfp_mask & __GFP_HIGHMEM) &&
PageHighMem(old_page)) {
kimage_free_pages(old_page);
continue;
}
Either we teach kimage_add_entry how to work with high memory pages
(still 32bit accessible) or we teach kimage_alloc_page to notice it is
an indirect page allocation and to always skip trying to reuse the page
it found in that case.
That way the code does not need to know about forever changing mm internals.
We should probably investigate GFP_DMA32 at the same time, and switch to
that for 32bit rather than continuing to use GFP_HIGHUSER.
Eric
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
2020-05-04 14:03 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Eric W. Biederman
@ 2020-05-04 21:59 ` Eric W. Biederman
2020-05-05 17:44 ` Hari Bathini
2020-05-06 5:23 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
1 sibling, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2020-05-04 21:59 UTC (permalink / raw)
To: Joonsoo Kim
Cc: kernel-team, Michal Hocko, Minchan Kim, Aneesh Kumar K . V,
Rik van Riel, Rafael J . Wysocki, LKML, Christian Koenig,
Christoph Hellwig, Linux Memory Management List, Huang Rui,
Kexec Mailing List, Pavel Machek, Johannes Weiner, Joonsoo Kim,
Andrew Morton, Laura Abbott, Mel Gorman, Roman Gushchin,
Vlastimil Babka
Recently a patch was proposed to kimage_alloc_page to slightly alter
the logic of how pages allocated with incompatible flags were
detected. The logic was being altered because the semantics of the
page alloctor were changing yet again.
Looking at that case I realized that there is no reason for it to even
exist. Either the indirect page allocations and the source page
allocations could be separated out, or I could do as I am doing now
and simply teach the indirect pages to live in high memory.
This patch replaced pointers of type kimage_entry_t * with a new type
kimage_entry_pos_t. This new type holds the physical address of the
indirect page and the offset within that page of the next indirect
entry to write. A special constant KIMAGE_ENTRY_POS_INVALID is added
that kimage_image_pos_t variables that don't currently have a valid
may be set to.
Two new functions kimage_read_entry and kimage_write_entry have been
provided to write entries in way that works if they live in high
memory.
The now unnecessary checks to see if a destination entry is non-zero
and to increment it if so have been removed. For safety new indrect
pages are now cleared so we have a guarantee everything that has not
been used yet is zero. Along with this writing an extra trailing 0
entry has been removed, as it is known all trailing entries are now 0.
With highmem support implemented for indirect pages
kimage_image_alloc_page has been updated to always allocate
GFP_HIGHUSER pages, and handling of pages with different
gfp flags has been removed.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
I have not done more than compile test this but I think this will remove
that tricky case in the kexec highmem support.
Any comments? Does anyone have a 32bit highmem system where they can
test this code? I can probably do something with a 32bit x86 kernel
but it has been a few days.
Does anyone know how we can more effectively allocate memory below
whatever the maximum limit that kexec supports? Typically below
4G on 32bit and below 2^64 on 64bits.
Eric
include/linux/kexec.h | 5 +-
kernel/kexec_core.c | 119 +++++++++++++++++++++++++-----------------
2 files changed, 73 insertions(+), 51 deletions(-)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..6d3f6f4cb926 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -69,6 +69,8 @@
*/
typedef unsigned long kimage_entry_t;
+typedef unsigned long kimage_entry_pos_t;
+#define KIMAGE_ENTRY_POS_INVALID ((kimage_entry_pos_t)-2)
struct kexec_segment {
/*
@@ -243,8 +245,7 @@ int kexec_elf_probe(const char *buf, unsigned long len);
#endif
struct kimage {
kimage_entry_t head;
- kimage_entry_t *entry;
- kimage_entry_t *last_entry;
+ kimage_entry_pos_t entry_pos;
unsigned long start;
struct page *control_code_page;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..45862fda9e60 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -142,7 +142,6 @@ EXPORT_SYMBOL_GPL(kexec_crash_loaded);
#define PAGE_COUNT(x) (((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
static struct page *kimage_alloc_page(struct kimage *image,
- gfp_t gfp_mask,
unsigned long dest);
int sanity_check_segment_list(struct kimage *image)
@@ -261,8 +260,7 @@ struct kimage *do_kimage_alloc_init(void)
return NULL;
image->head = 0;
- image->entry = &image->head;
- image->last_entry = &image->head;
+ image->entry_pos = KIMAGE_ENTRY_POS_INVALID;
image->control_page = ~0; /* By default this does not apply */
image->type = KEXEC_TYPE_DEFAULT;
@@ -531,28 +529,56 @@ int kimage_crash_copy_vmcoreinfo(struct kimage *image)
return 0;
}
-static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+static kimage_entry_t kimage_read_entry(kimage_entry_pos_t pos)
{
- if (*image->entry != 0)
- image->entry++;
+ kimage_entry_t *arr, entry;
+ struct page *page;
+ unsigned long off;
+
+ page = boot_pfn_to_page(pos >> PAGE_SHIFT);
+ off = pos & ~PAGE_MASK;
+ arr = kmap_atomic(page);
+ entry = arr[off];
+ kunmap_atomic(arr);
+
+ return entry;
+}
- if (image->entry == image->last_entry) {
- kimage_entry_t *ind_page;
+static void kimage_write_entry(kimage_entry_pos_t pos, kimage_entry_t entry)
+{
+ kimage_entry_t *arr;
+ struct page *page;
+ unsigned long off;
+
+ page = boot_pfn_to_page(pos >> PAGE_SHIFT);
+ off = pos & ~PAGE_MASK;
+ arr = kmap_atomic(page);
+ arr[off] = entry;
+ kunmap_atomic(arr);
+}
+
+#define LAST_KIMAGE_ENTRY ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1)
+static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
+{
+ if ((image->entry_pos == KIMAGE_ENTRY_POS_INVALID) ||
+ ((image->entry_pos & ~PAGE_MASK) == LAST_KIMAGE_ENTRY)) {
+ unsigned long ind_addr;
struct page *page;
- page = kimage_alloc_page(image, GFP_KERNEL, KIMAGE_NO_DEST);
+ page = kimage_alloc_page(image, KIMAGE_NO_DEST);
if (!page)
return -ENOMEM;
- ind_page = page_address(page);
- *image->entry = virt_to_boot_phys(ind_page) | IND_INDIRECTION;
- image->entry = ind_page;
- image->last_entry = ind_page +
- ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
+ ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
+ kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
+
+ clear_highpage(page);
+
+ image->entry_pos = ind_addr;
}
- *image->entry = entry;
- image->entry++;
- *image->entry = 0;
+
+ kimage_write_entry(image->entry_pos, entry);
+ image->entry_pos++;
return 0;
}
@@ -597,16 +623,14 @@ int __weak machine_kexec_post_load(struct kimage *image)
void kimage_terminate(struct kimage *image)
{
- if (*image->entry != 0)
- image->entry++;
-
- *image->entry = IND_DONE;
+ kimage_write_entry(image->entry_pos, IND_DONE);
}
-#define for_each_kimage_entry(image, ptr, entry) \
- for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
- ptr = (entry & IND_INDIRECTION) ? \
- boot_phys_to_virt((entry & PAGE_MASK)) : ptr + 1)
+#define for_each_kimage_entry(image, pos, entry) \
+ for (entry = image->head, pos = KIMAGE_ENTRY_POS_INVALID; \
+ entry && !(entry & IND_DONE); \
+ pos = ((entry & IND_INDIRECTION) ? (entry & PAGE_MASK) : pos + 1), \
+ entry = kimage_read_entry(pos))
static void kimage_free_entry(kimage_entry_t entry)
{
@@ -618,8 +642,8 @@ static void kimage_free_entry(kimage_entry_t entry)
void kimage_free(struct kimage *image)
{
- kimage_entry_t *ptr, entry;
- kimage_entry_t ind = 0;
+ kimage_entry_t entry, ind = 0;
+ kimage_entry_pos_t pos;
if (!image)
return;
@@ -630,7 +654,7 @@ void kimage_free(struct kimage *image)
}
kimage_free_extra_pages(image);
- for_each_kimage_entry(image, ptr, entry) {
+ for_each_kimage_entry(image, pos, entry) {
if (entry & IND_INDIRECTION) {
/* Free the previous indirection page */
if (ind & IND_INDIRECTION)
@@ -662,27 +686,27 @@ void kimage_free(struct kimage *image)
kfree(image);
}
-static kimage_entry_t *kimage_dst_used(struct kimage *image,
- unsigned long page)
+static kimage_entry_pos_t kimage_dst_used(struct kimage *image,
+ unsigned long page)
{
- kimage_entry_t *ptr, entry;
unsigned long destination = 0;
+ kimage_entry_pos_t pos;
+ kimage_entry_t entry;
- for_each_kimage_entry(image, ptr, entry) {
+ for_each_kimage_entry(image, pos, entry) {
if (entry & IND_DESTINATION)
destination = entry & PAGE_MASK;
else if (entry & IND_SOURCE) {
if (page == destination)
- return ptr;
+ return pos;
destination += PAGE_SIZE;
}
}
- return NULL;
+ return KIMAGE_ENTRY_POS_INVALID;
}
static struct page *kimage_alloc_page(struct kimage *image,
- gfp_t gfp_mask,
unsigned long destination)
{
/*
@@ -719,10 +743,10 @@ static struct page *kimage_alloc_page(struct kimage *image,
}
page = NULL;
while (1) {
- kimage_entry_t *old;
+ kimage_entry_pos_t pos;
/* Allocate a page, if we run out of memory give up */
- page = kimage_alloc_pages(gfp_mask, 0);
+ page = kimage_alloc_pages(GFP_HIGHUSER, 0);
if (!page)
return NULL;
/* If the page cannot be used file it away */
@@ -747,26 +771,23 @@ static struct page *kimage_alloc_page(struct kimage *image,
* See if there is already a source page for this
* destination page. And if so swap the source pages.
*/
- old = kimage_dst_used(image, addr);
- if (old) {
+ pos = kimage_dst_used(image, addr);
+ if (pos != KIMAGE_ENTRY_POS_INVALID) {
/* If so move it */
+ kimage_entry_t old, replacement;
unsigned long old_addr;
struct page *old_page;
- old_addr = *old & PAGE_MASK;
+ old = kimage_read_entry(pos);
+ old_addr = old & PAGE_MASK;
old_page = boot_pfn_to_page(old_addr >> PAGE_SHIFT);
copy_highpage(page, old_page);
- *old = addr | (*old & ~PAGE_MASK);
+ replacement = addr | (old & ~PAGE_MASK);
+ kimage_write_entry(pos, replacement);
/* The old page I have found cannot be a
- * destination page, so return it if it's
- * gfp_flags honor the ones passed in.
+ * destination page, so return it.
*/
- if (!(gfp_mask & __GFP_HIGHMEM) &&
- PageHighMem(old_page)) {
- kimage_free_pages(old_page);
- continue;
- }
addr = old_addr;
page = old_page;
break;
@@ -805,7 +826,7 @@ static int kimage_load_normal_segment(struct kimage *image,
char *ptr;
size_t uchunk, mchunk;
- page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
+ page = kimage_alloc_page(image, maddr);
if (!page) {
result = -ENOMEM;
goto out;
--
2.25.0
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
2020-05-04 21:59 ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
@ 2020-05-05 17:44 ` Hari Bathini
2020-05-05 18:39 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Hari Bathini @ 2020-05-05 17:44 UTC (permalink / raw)
To: Eric W. Biederman, Joonsoo Kim
Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Rik van Riel,
Aneesh Kumar K . V, Minchan Kim, Rafael J . Wysocki, LKML,
Roman Gushchin, Christoph Hellwig, kernel-team, Huang Rui,
Linux Memory Management List, Pavel Machek, Kexec Mailing List,
Joonsoo Kim, Laura Abbott, Mel Gorman, Christian Koenig,
Vlastimil Babka
On 05/05/20 3:29 am, Eric W. Biederman wrote:
>
> Recently a patch was proposed to kimage_alloc_page to slightly alter
> the logic of how pages allocated with incompatible flags were
> detected. The logic was being altered because the semantics of the
> page alloctor were changing yet again.
>
> Looking at that case I realized that there is no reason for it to even
> exist. Either the indirect page allocations and the source page
> allocations could be separated out, or I could do as I am doing now
> and simply teach the indirect pages to live in high memory.
>
> This patch replaced pointers of type kimage_entry_t * with a new type
> kimage_entry_pos_t. This new type holds the physical address of the
> indirect page and the offset within that page of the next indirect
> entry to write. A special constant KIMAGE_ENTRY_POS_INVALID is added
> that kimage_image_pos_t variables that don't currently have a valid
> may be set to.
>
> Two new functions kimage_read_entry and kimage_write_entry have been
> provided to write entries in way that works if they live in high
> memory.
>
> The now unnecessary checks to see if a destination entry is non-zero
> and to increment it if so have been removed. For safety new indrect
> pages are now cleared so we have a guarantee everything that has not
> been used yet is zero. Along with this writing an extra trailing 0
> entry has been removed, as it is known all trailing entries are now 0.
>
> With highmem support implemented for indirect pages
> kimage_image_alloc_page has been updated to always allocate
> GFP_HIGHUSER pages, and handling of pages with different
> gfp flags has been removed.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric, the patch failed with data access exception on ppc64. Using the below patch on top
got me going...
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 45862fd..bef52f1 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -570,7 +570,12 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
return -ENOMEM;
ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
- kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
+
+ /* If it is the first entry, handle it here */
+ if (!image->head)
+ image->head = ind_addr | IND_INDIRECTION;
+ else
+ kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
clear_highpage(page);
@@ -623,7 +628,11 @@ int __weak machine_kexec_post_load(struct kimage *image)
void kimage_terminate(struct kimage *image)
{
- kimage_write_entry(image->entry_pos, IND_DONE);
+ /* This could be the only entry in case of kdump */
+ if (!image->head)
+ image->head = IND_DONE;
+ else
+ kimage_write_entry(image->entry_pos, IND_DONE);
}
#define for_each_kimage_entry(image, pos, entry) \
Thanks
Hari
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
2020-05-05 17:44 ` Hari Bathini
@ 2020-05-05 18:39 ` Eric W. Biederman
2020-10-09 1:35 ` Joonsoo Kim
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2020-05-05 18:39 UTC (permalink / raw)
To: Hari Bathini
Cc: Michal Hocko, Andrew Morton, Joonsoo Kim, Rik van Riel,
Aneesh Kumar K . V, Minchan Kim, Rafael J . Wysocki, LKML,
Roman Gushchin, Christoph Hellwig, kernel-team, Huang Rui,
Johannes Weiner, Linux Memory Management List, Pavel Machek,
Kexec Mailing List, Joonsoo Kim, Laura Abbott, Mel Gorman,
Christian Koenig, Vlastimil Babka
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 05/05/20 3:29 am, Eric W. Biederman wrote:
>>
>> Recently a patch was proposed to kimage_alloc_page to slightly alter
>> the logic of how pages allocated with incompatible flags were
>> detected. The logic was being altered because the semantics of the
>> page alloctor were changing yet again.
>>
>> Looking at that case I realized that there is no reason for it to even
>> exist. Either the indirect page allocations and the source page
>> allocations could be separated out, or I could do as I am doing now
>> and simply teach the indirect pages to live in high memory.
>>
>> This patch replaced pointers of type kimage_entry_t * with a new type
>> kimage_entry_pos_t. This new type holds the physical address of the
>> indirect page and the offset within that page of the next indirect
>> entry to write. A special constant KIMAGE_ENTRY_POS_INVALID is added
>> that kimage_image_pos_t variables that don't currently have a valid
>> may be set to.
>>
>> Two new functions kimage_read_entry and kimage_write_entry have been
>> provided to write entries in way that works if they live in high
>> memory.
>>
>> The now unnecessary checks to see if a destination entry is non-zero
>> and to increment it if so have been removed. For safety new indrect
>> pages are now cleared so we have a guarantee everything that has not
>> been used yet is zero. Along with this writing an extra trailing 0
>> entry has been removed, as it is known all trailing entries are now 0.
>>
>> With highmem support implemented for indirect pages
>> kimage_image_alloc_page has been updated to always allocate
>> GFP_HIGHUSER pages, and handling of pages with different
>> gfp flags has been removed.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Eric, the patch failed with data access exception on ppc64. Using the below patch on top
> got me going...
Doh! Somehow I thought I had put that logic or something equivalent
into kimage_write_entry and it appears I did not. I will see if I can
respin the patch.
Thank you very much for testing.
Eric
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 45862fd..bef52f1 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -570,7 +570,12 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
> return -ENOMEM;
>
> ind_addr = page_to_boot_pfn(page) << PAGE_SHIFT;
> - kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
> +
> + /* If it is the first entry, handle it here */
> + if (!image->head)
> + image->head = ind_addr | IND_INDIRECTION;
> + else
> + kimage_write_entry(image->entry_pos, ind_addr | IND_INDIRECTION);
>
> clear_highpage(page);
>
> @@ -623,7 +628,11 @@ int __weak machine_kexec_post_load(struct kimage *image)
>
> void kimage_terminate(struct kimage *image)
> {
> - kimage_write_entry(image->entry_pos, IND_DONE);
> + /* This could be the only entry in case of kdump */
> + if (!image->head)
> + image->head = IND_DONE;
> + else
> + kimage_write_entry(image->entry_pos, IND_DONE);
> }
>
> #define for_each_kimage_entry(image, pos, entry) \
>
>
> Thanks
> Hari
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case
2020-05-04 14:03 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Eric W. Biederman
2020-05-04 21:59 ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
@ 2020-05-06 5:23 ` Joonsoo Kim
1 sibling, 0 replies; 6+ messages in thread
From: Joonsoo Kim @ 2020-05-06 5:23 UTC (permalink / raw)
To: Eric W. Biederman
Cc: kernel-team, Michal Hocko, Minchan Kim, Aneesh Kumar K . V,
Rik van Riel, Rafael J . Wysocki, LKML, Christian Koenig,
Christoph Hellwig, Linux Memory Management List, Huang Rui,
Kexec Mailing List, Pavel Machek, Johannes Weiner, Andrew Morton,
Laura Abbott, Mel Gorman, Roman Gushchin, Vlastimil Babka
On Mon, May 04, 2020 at 09:03:56AM -0500, Eric W. Biederman wrote:
>
> I have added in the kexec mailling list.
>
> Looking at the patch we are discussing it appears that the kexec code
> could be doing much better in highmem situations today but is not.
Sound great!
>
>
> Joonsoo Kim <js1304@gmail.com> writes:
>
> > 2020년 5월 1일 (금) 오후 11:06, Eric W. Biederman <ebiederm@xmission.com>님이 작성:
> >>
> >> js1304@gmail.com writes:
> >>
> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >
> >> > Until now, PageHighMem() is used for two different cases. One is to check
> >> > if there is a direct mapping for this page or not. The other is to check
> >> > the zone of this page, that is, weather it is the highmem type zone or not.
> >> >
> >> > Now, we have separate functions, PageHighMem() and PageHighMemZone() for
> >> > each cases. Use appropriate one.
> >> >
> >> > Note that there are some rules to determine the proper macro.
> >> >
> >> > 1. If PageHighMem() is called for checking if the direct mapping exists
> >> > or not, use PageHighMem().
> >> > 2. If PageHighMem() is used to predict the previous gfp_flags for
> >> > this page, use PageHighMemZone(). The zone of the page is related to
> >> > the gfp_flags.
> >> > 3. If purpose of calling PageHighMem() is to count highmem page and
> >> > to interact with the system by using this count, use PageHighMemZone().
> >> > This counter is usually used to calculate the available memory for an
> >> > kernel allocation and pages on the highmem zone cannot be available
> >> > for an kernel allocation.
> >> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
> >> > is just copy of the previous PageHighMem() implementation and won't
> >> > be changed.
> >> >
> >> > I apply the rule #2 for this patch.
> >>
> >> Hmm.
> >>
> >> What happened to the notion of deprecating and reducing the usage of
> >> highmem? I know that we have some embedded architectures where it is
> >> still important but this feels like it flies in the face of that.
> >
> > AFAIK, deprecating highmem requires some more time and, before then,
> > we need to support it.
>
> But it at least makes sense to look at what we are doing with highmem
> and ask if it makes sense.
>
> >> This part of kexec would be much more maintainable if it had a proper
> >> mm layer helper that tested to see if the page matched the passed in
> >> gfp flags. That way the mm layer could keep changing and doing weird
> >> gyrations and this code would not care.
> >
> > Good idea! I will do it.
> >
> >>
> >> What would be really helpful is if there was a straight forward way to
> >> allocate memory whose physical address fits in the native word size.
> >>
> >>
> >> All I know for certain about this patch is that it takes a piece of code
> >> that looked like it made sense, and transfroms it into something I can
> >> not easily verify, and can not maintain.
> >
> > Although I decide to make a helper as you described above, I don't
> > understand why you think that a new code isn't maintainable. It is just
> > the same thing with different name. Could you elaborate more why do
> > you think so?
>
> Because the current code is already wrong. It does not handle
> the general case of what it claims to handle. When the only distinction
> that needs to be drawn is highmem or not highmem that is likely fine.
> But now you are making it possible to draw more distinctions. At which
> point I have no idea which distinction needs to be drawn.
>
>
> The code and the logic is about 20 years old. When it was written I
> don't recally taking numa seriously and the kernel only had 3 zones
> as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH).
>
> The code attempts to work around limitations of those old zones amd play
> nice in a highmem world by allocating memory HIGH memory and not using
> it if the memory was above 4G ( on 32bit ).
>
> Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should
> probably be using that, when allocating memory.
>
From quick investigation, unfortunately, ZONE_DMA32 isn't available on
x86 32bit now so using GFP_DMA32 to allocate memory below 4G would not
work. Enabling ZONE_DMA32 on x86 32bit would be not simple, so, IMHO, it
would be better to leave the code as it is.
>
>
> Further in dealing with this memory management situation we only
> have two situations we call kimage_alloc_page.
>
> For an indirect page which must have a valid page_address(page).
> We could probably relax that if we cared to.
>
> For a general kexec page to store the next kernel in until we switch.
> The general pages can be in high memory.
>
> In a highmem world all of those pages should be below 32bit.
>
>
>
> Given that we fundamentally have two situations my sense is that we
> should just refactor the code so that we never have to deal with:
>
>
> /* The old page I have found cannot be a
> * destination page, so return it if it's
> * gfp_flags honor the ones passed in.
> */
> if (!(gfp_mask & __GFP_HIGHMEM) &&
> PageHighMem(old_page)) {
> kimage_free_pages(old_page);
> continue;
> }
>
> Either we teach kimage_add_entry how to work with high memory pages
> (still 32bit accessible) or we teach kimage_alloc_page to notice it is
> an indirect page allocation and to always skip trying to reuse the page
> it found in that case.
>
> That way the code does not need to know about forever changing mm internals.
Nice! I already have seen your patch and found that above two lines
related to HIGHMEM are removed. Thanks for your help.
Thanks.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH] kexec: Teach indirect pages how to live in high memory
2020-05-05 18:39 ` Eric W. Biederman
@ 2020-10-09 1:35 ` Joonsoo Kim
0 siblings, 0 replies; 6+ messages in thread
From: Joonsoo Kim @ 2020-10-09 1:35 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Michal Hocko, Johannes Weiner, Rik van Riel, Vlastimil Babka,
Aneesh Kumar K . V, Minchan Kim, Rafael J . Wysocki, LKML,
Roman Gushchin, Christoph Hellwig, kernel-team, Huang Rui,
Linux Memory Management List, Pavel Machek, Kexec Mailing List,
Andrew Morton, Laura Abbott, Mel Gorman, Christian Koenig,
Hari Bathini
On Tue, May 05, 2020 at 01:39:16PM -0500, Eric W. Biederman wrote:
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
> > On 05/05/20 3:29 am, Eric W. Biederman wrote:
> >>
> >> Recently a patch was proposed to kimage_alloc_page to slightly alter
> >> the logic of how pages allocated with incompatible flags were
> >> detected. The logic was being altered because the semantics of the
> >> page alloctor were changing yet again.
> >>
> >> Looking at that case I realized that there is no reason for it to even
> >> exist. Either the indirect page allocations and the source page
> >> allocations could be separated out, or I could do as I am doing now
> >> and simply teach the indirect pages to live in high memory.
> >>
> >> This patch replaced pointers of type kimage_entry_t * with a new type
> >> kimage_entry_pos_t. This new type holds the physical address of the
> >> indirect page and the offset within that page of the next indirect
> >> entry to write. A special constant KIMAGE_ENTRY_POS_INVALID is added
> >> that kimage_image_pos_t variables that don't currently have a valid
> >> may be set to.
> >>
> >> Two new functions kimage_read_entry and kimage_write_entry have been
> >> provided to write entries in way that works if they live in high
> >> memory.
> >>
> >> The now unnecessary checks to see if a destination entry is non-zero
> >> and to increment it if so have been removed. For safety new indrect
> >> pages are now cleared so we have a guarantee everything that has not
> >> been used yet is zero. Along with this writing an extra trailing 0
> >> entry has been removed, as it is known all trailing entries are now 0.
> >>
> >> With highmem support implemented for indirect pages
> >> kimage_image_alloc_page has been updated to always allocate
> >> GFP_HIGHUSER pages, and handling of pages with different
> >> gfp flags has been removed.
> >>
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >
> > Eric, the patch failed with data access exception on ppc64. Using the below patch on top
> > got me going...
>
> Doh! Somehow I thought I had put that logic or something equivalent
> into kimage_write_entry and it appears I did not. I will see if I can
> respin the patch.
>
> Thank you very much for testing.
Hello, Eric.
It seems that this patch isn't upstreamed.
Could you respin the patch?
I've tested this one on x86_32 (highmem enabled) and it works well.
Thanks.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-09 1:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1588130803-20527-1-git-send-email-iamjoonsoo.kim@lge.com>
[not found] ` <1588130803-20527-4-git-send-email-iamjoonsoo.kim@lge.com>
[not found] ` <87h7wzvjko.fsf@x220.int.ebiederm.org>
[not found] ` <CAAmzW4MrD75+Prw=fQ=d5uXKgGy3urBwmxnNtoNsw5M1m9xjYQ@mail.gmail.com>
2020-05-04 14:03 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Eric W. Biederman
2020-05-04 21:59 ` [RFC][PATCH] kexec: Teach indirect pages how to live in high memory Eric W. Biederman
2020-05-05 17:44 ` Hari Bathini
2020-05-05 18:39 ` Eric W. Biederman
2020-10-09 1:35 ` Joonsoo Kim
2020-05-06 5:23 ` [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Joonsoo Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox