* [PATCH v3 1/3] resource: Add __adjust_resource() for internal use
2013-04-10 17:16 [PATCH v3 0/3] Support memory hot-delete to boot memory Toshi Kani
@ 2013-04-10 17:16 ` Toshi Kani
2013-04-10 17:17 ` [PATCH v3 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
2013-04-10 17:17 ` [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
2 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 17:16 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, rientjes, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani
Added __adjust_resource(), which is called by adjust_resource()
internally after the resource_lock is held. There is no interface
change to adjust_resource(). This change allows other functions
to call __adjust_resource() internally while the resource_lock is
held.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
---
kernel/resource.c | 35 ++++++++++++++++++++++-------------
1 file changed, 22 insertions(+), 13 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 73f35d4..ae246f9 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -706,24 +706,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
write_unlock(&resource_lock);
}
-/**
- * adjust_resource - modify a resource's start and size
- * @res: resource to modify
- * @start: new start value
- * @size: new size
- *
- * Given an existing resource, change its start and size to match the
- * arguments. Returns 0 on success, -EBUSY if it can't fit.
- * Existing children of the resource are assumed to be immutable.
- */
-int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size)
+static int __adjust_resource(struct resource *res, resource_size_t start,
+ resource_size_t size)
{
struct resource *tmp, *parent = res->parent;
resource_size_t end = start + size - 1;
int result = -EBUSY;
- write_lock(&resource_lock);
-
if (!parent)
goto skip;
@@ -751,6 +740,26 @@ skip:
result = 0;
out:
+ return result;
+}
+
+/**
+ * adjust_resource - modify a resource's start and size
+ * @res: resource to modify
+ * @start: new start value
+ * @size: new size
+ *
+ * Given an existing resource, change its start and size to match the
+ * arguments. Returns 0 on success, -EBUSY if it can't fit.
+ * Existing children of the resource are assumed to be immutable.
+ */
+int adjust_resource(struct resource *res, resource_size_t start,
+ resource_size_t size)
+{
+ int result;
+
+ write_lock(&resource_lock);
+ result = __adjust_resource(res, start, size);
write_unlock(&resource_lock);
return result;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 17:16 [PATCH v3 0/3] Support memory hot-delete to boot memory Toshi Kani
2013-04-10 17:16 ` [PATCH v3 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
@ 2013-04-10 17:17 ` Toshi Kani
2013-04-10 21:44 ` Andrew Morton
2013-04-10 17:17 ` [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
2 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 17:17 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, rientjes, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani
Added release_mem_region_adjustable(), which releases a requested
region from a currently busy memory resource. This interface
adjusts the matched memory resource accordingly even if the
requested region does not match exactly but still fits into.
This new interface is intended for memory hot-delete. During
bootup, memory resources are inserted from the boot descriptor
table, such as EFI Memory Table and e820. Each memory resource
entry usually covers the whole contigous memory range. Memory
hot-delete request, on the other hand, may target to a particular
range of memory resource, and its size can be much smaller than
the whole contiguous memory. Since the existing release interfaces
like __release_region() require a requested region to be exactly
matched to a resource entry, they do not allow a partial resource
to be released.
This new interface is restrictive (i.e. release under certain
conditions), which is consistent with other release interfaces,
__release_region() and __release_resource(). Additional release
conditions, such as an overlapping region to a resource entry,
can be supported after they are confirmed as valid cases.
There is no change to the existing interfaces since their restriction
is valid for I/O resources.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
include/linux/ioport.h | 4 ++
kernel/resource.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 85ac9b9b..961d4dc 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,10 @@ extern struct resource * __request_region(struct resource *,
extern int __check_region(struct resource *, resource_size_t, resource_size_t);
extern void __release_region(struct resource *, resource_size_t,
resource_size_t);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int release_mem_region_adjustable(struct resource *, resource_size_t,
+ resource_size_t);
+#endif
static inline int __deprecated check_region(resource_size_t s,
resource_size_t n)
diff --git a/kernel/resource.c b/kernel/resource.c
index ae246f9..08791c8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1021,6 +1021,106 @@ void __release_region(struct resource *parent, resource_size_t start,
}
EXPORT_SYMBOL(__release_region);
+#ifdef CONFIG_MEMORY_HOTPLUG
+/**
+ * release_mem_region_adjustable - release a previously reserved memory region
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @size: resource region size
+ *
+ * This interface is intended for memory hot-delete. The requested region
+ * is released from a currently busy memory resource. The requested region
+ * must either match exactly or fit into a single busy resource entry. In
+ * the latter case, the remaining resource is adjusted accordingly.
+ * Existing children of the busy memory resource must be immutable in the
+ * request.
+ *
+ * Note:
+ * - Additional release conditions, such as overlapping region, can be
+ * supported after they are confirmed as valid cases.
+ * - When a busy memory resource gets split into two entries, the code
+ * assumes that all children remain in the lower address entry for
+ * simplicity. Enhance this logic when necessary.
+ */
+int release_mem_region_adjustable(struct resource *parent,
+ resource_size_t start, resource_size_t size)
+{
+ struct resource **p;
+ struct resource *res, *new;
+ resource_size_t end;
+ int ret = -EINVAL;
+
+ end = start + size - 1;
+ if ((start < parent->start) || (end > parent->end))
+ return ret;
+
+ p = &parent->child;
+ write_lock(&resource_lock);
+
+ while ((res = *p)) {
+ if (res->start >= end)
+ break;
+
+ /* look for the next resource if it does not fit into */
+ if (res->start > start || res->end < end) {
+ p = &res->sibling;
+ continue;
+ }
+
+ if (!(res->flags & IORESOURCE_MEM))
+ break;
+
+ if (!(res->flags & IORESOURCE_BUSY)) {
+ p = &res->child;
+ continue;
+ }
+
+ /* found the target resource; let's adjust accordingly */
+ if (res->start == start && res->end == end) {
+ /* free the whole entry */
+ *p = res->sibling;
+ kfree(res);
+ ret = 0;
+ } else if (res->start == start && res->end != end) {
+ /* adjust the start */
+ ret = __adjust_resource(res, end + 1,
+ res->end - end);
+ } else if (res->start != start && res->end == end) {
+ /* adjust the end */
+ ret = __adjust_resource(res, res->start,
+ start - res->start);
+ } else {
+ /* split into two entries */
+ new = kzalloc(sizeof(struct resource), GFP_KERNEL);
+ if (!new) {
+ ret = -ENOMEM;
+ break;
+ }
+ new->name = res->name;
+ new->start = end + 1;
+ new->end = res->end;
+ new->flags = res->flags;
+ new->parent = res->parent;
+ new->sibling = res->sibling;
+ new->child = NULL;
+
+ ret = __adjust_resource(res, res->start,
+ start - res->start);
+ if (ret) {
+ kfree(new);
+ break;
+ }
+ res->sibling = new;
+ }
+
+ break;
+ }
+
+ write_unlock(&resource_lock);
+ return ret;
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
/*
* Managed region resource
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 17:17 ` [PATCH v3 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-10 21:44 ` Andrew Morton
2013-04-10 21:49 ` Toshi Kani
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2013-04-10 21:44 UTC (permalink / raw)
To: Toshi Kani
Cc: linux-mm, linux-kernel, rientjes, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 10 Apr 2013 11:17:00 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource. This interface
> adjusts the matched memory resource accordingly even if the
> requested region does not match exactly but still fits into.
>
> This new interface is intended for memory hot-delete. During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820. Each memory resource
> entry usually covers the whole contigous memory range. Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory. Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
>
> This new interface is restrictive (i.e. release under certain
> conditions), which is consistent with other release interfaces,
> __release_region() and __release_resource(). Additional release
> conditions, such as an overlapping region to a resource entry,
> can be supported after they are confirmed as valid cases.
>
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
>
> ...
>
> +int release_mem_region_adjustable(struct resource *parent,
> + resource_size_t start, resource_size_t size)
> +{
> + struct resource **p;
> + struct resource *res, *new;
> + resource_size_t end;
> + int ret = -EINVAL;
> +
> + end = start + size - 1;
> + if ((start < parent->start) || (end > parent->end))
> + return ret;
> +
> + p = &parent->child;
> + write_lock(&resource_lock);
> +
> + while ((res = *p)) {
> + if (res->start >= end)
> + break;
> +
> + /* look for the next resource if it does not fit into */
> + if (res->start > start || res->end < end) {
> + p = &res->sibling;
> + continue;
> + }
> +
> + if (!(res->flags & IORESOURCE_MEM))
> + break;
> +
> + if (!(res->flags & IORESOURCE_BUSY)) {
> + p = &res->child;
> + continue;
> + }
> +
> + /* found the target resource; let's adjust accordingly */
> + if (res->start == start && res->end == end) {
> + /* free the whole entry */
> + *p = res->sibling;
> + kfree(res);
> + ret = 0;
> + } else if (res->start == start && res->end != end) {
> + /* adjust the start */
> + ret = __adjust_resource(res, end + 1,
> + res->end - end);
> + } else if (res->start != start && res->end == end) {
> + /* adjust the end */
> + ret = __adjust_resource(res, res->start,
> + start - res->start);
> + } else {
> + /* split into two entries */
> + new = kzalloc(sizeof(struct resource), GFP_KERNEL);
Nope, we can't perform a GFP_KERNEL allocation under write_lock().
Was this code path runtime tested? If no, please try
to find a way to test it. If yes, please see
Documentation/SubmitChecklist section 12 and use that in the future.
I'll switch it to GFP_ATOMIC. Which is horridly lame but the
allocation is small and alternatives are unobvious.
> + if (!new) {
> + ret = -ENOMEM;
> + break;
> + }
> + new->name = res->name;
> + new->start = end + 1;
> + new->end = res->end;
> + new->flags = res->flags;
> + new->parent = res->parent;
> + new->sibling = res->sibling;
> + new->child = NULL;
> +
> + ret = __adjust_resource(res, res->start,
> + start - res->start);
> + if (ret) {
> + kfree(new);
> + break;
> + }
> + res->sibling = new;
> + }
> +
> + break;
> + }
> +
> + write_unlock(&resource_lock);
> + return ret;
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 21:44 ` Andrew Morton
@ 2013-04-10 21:49 ` Toshi Kani
2013-04-10 22:08 ` David Rientjes
0 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 21:49 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, rientjes, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 2013-04-10 at 14:44 -0700, Andrew Morton wrote:
> On Wed, 10 Apr 2013 11:17:00 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
>
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource. This interface
> > adjusts the matched memory resource accordingly even if the
> > requested region does not match exactly but still fits into.
> >
> > This new interface is intended for memory hot-delete. During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820. Each memory resource
> > entry usually covers the whole contigous memory range. Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory. Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> >
> > This new interface is restrictive (i.e. release under certain
> > conditions), which is consistent with other release interfaces,
> > __release_region() and __release_resource(). Additional release
> > conditions, such as an overlapping region to a resource entry,
> > can be supported after they are confirmed as valid cases.
> >
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> >
> > ...
> >
> > +int release_mem_region_adjustable(struct resource *parent,
> > + resource_size_t start, resource_size_t size)
> > +{
> > + struct resource **p;
> > + struct resource *res, *new;
> > + resource_size_t end;
> > + int ret = -EINVAL;
> > +
> > + end = start + size - 1;
> > + if ((start < parent->start) || (end > parent->end))
> > + return ret;
> > +
> > + p = &parent->child;
> > + write_lock(&resource_lock);
> > +
> > + while ((res = *p)) {
> > + if (res->start >= end)
> > + break;
> > +
> > + /* look for the next resource if it does not fit into */
> > + if (res->start > start || res->end < end) {
> > + p = &res->sibling;
> > + continue;
> > + }
> > +
> > + if (!(res->flags & IORESOURCE_MEM))
> > + break;
> > +
> > + if (!(res->flags & IORESOURCE_BUSY)) {
> > + p = &res->child;
> > + continue;
> > + }
> > +
> > + /* found the target resource; let's adjust accordingly */
> > + if (res->start == start && res->end == end) {
> > + /* free the whole entry */
> > + *p = res->sibling;
> > + kfree(res);
> > + ret = 0;
> > + } else if (res->start == start && res->end != end) {
> > + /* adjust the start */
> > + ret = __adjust_resource(res, end + 1,
> > + res->end - end);
> > + } else if (res->start != start && res->end == end) {
> > + /* adjust the end */
> > + ret = __adjust_resource(res, res->start,
> > + start - res->start);
> > + } else {
> > + /* split into two entries */
> > + new = kzalloc(sizeof(struct resource), GFP_KERNEL);
>
> Nope, we can't perform a GFP_KERNEL allocation under write_lock().
>
> Was this code path runtime tested? If no, please try
> to find a way to test it. If yes, please see
> Documentation/SubmitChecklist section 12 and use that in the future.
Yes, I tested all cases. But I did not test with all the config options
described in the document. I will make sure to test with the options
next time. Thanks a lot for the pointer!
> I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> allocation is small and alternatives are unobvious.
Great! Again, thanks for the update!
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 21:49 ` Toshi Kani
@ 2013-04-10 22:08 ` David Rientjes
2013-04-10 22:24 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10 22:08 UTC (permalink / raw)
To: Toshi Kani
Cc: Andrew Morton, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 10 Apr 2013, Toshi Kani wrote:
> > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > allocation is small and alternatives are unobvious.
>
> Great! Again, thanks for the update!
release_mem_region_adjustable() allocates at most one struct resource, so
why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
resource_lock and then testing whether it's NULL or not when splitting?
It unnecessarily allocates memory when there's no split, but
__remove_pages() shouldn't be a hotpath.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 22:08 ` David Rientjes
@ 2013-04-10 22:24 ` Andrew Morton
2013-04-11 16:30 ` Toshi Kani
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2013-04-10 22:24 UTC (permalink / raw)
To: David Rientjes
Cc: Toshi Kani, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> On Wed, 10 Apr 2013, Toshi Kani wrote:
>
> > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > allocation is small and alternatives are unobvious.
> >
> > Great! Again, thanks for the update!
>
> release_mem_region_adjustable() allocates at most one struct resource, so
> why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> resource_lock and then testing whether it's NULL or not when splitting?
> It unnecessarily allocates memory when there's no split, but
> __remove_pages() shouldn't be a hotpath.
yup.
--- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
+++ a/kernel/resource.c
@@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
resource_size_t start, resource_size_t size)
{
struct resource **p;
- struct resource *res, *new;
+ struct resource *res;
+ struct resource *new_res;
resource_size_t end;
int ret = -EINVAL;
@@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
if ((start < parent->start) || (end > parent->end))
return ret;
+ /* The kzalloc() result gets checked later */
+ new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+
p = &parent->child;
write_lock(&resource_lock);
@@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
start - res->start);
} else {
/* split into two entries */
- new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
- if (!new) {
+ if (!new_res) {
ret = -ENOMEM;
break;
}
- new->name = res->name;
- new->start = end + 1;
- new->end = res->end;
- new->flags = res->flags;
- new->parent = res->parent;
- new->sibling = res->sibling;
- new->child = NULL;
+ new_res->name = res->name;
+ new_res->start = end + 1;
+ new_res->end = res->end;
+ new_res->flags = res->flags;
+ new_res->parent = res->parent;
+ new_res->sibling = res->sibling;
+ new_res->child = NULL;
ret = __adjust_resource(res, res->start,
start - res->start);
if (ret) {
- kfree(new);
+ kfree(new_res);
break;
}
- res->sibling = new;
+ res->sibling = new_res;
+ new_res = NULL;
}
break;
}
write_unlock(&resource_lock);
+ kfree(new_res);
return ret;
}
#endif /* CONFIG_MEMORY_HOTPLUG */
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-10 22:24 ` Andrew Morton
@ 2013-04-11 16:30 ` Toshi Kani
2013-04-24 8:42 ` Ram Pai
0 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-11 16:30 UTC (permalink / raw)
To: Andrew Morton
Cc: David Rientjes, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>
> > On Wed, 10 Apr 2013, Toshi Kani wrote:
> >
> > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > allocation is small and alternatives are unobvious.
> > >
> > > Great! Again, thanks for the update!
> >
> > release_mem_region_adjustable() allocates at most one struct resource, so
> > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > resource_lock and then testing whether it's NULL or not when splitting?
> > It unnecessarily allocates memory when there's no split, but
> > __remove_pages() shouldn't be a hotpath.
>
> yup.
>
> --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> +++ a/kernel/resource.c
> @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> resource_size_t start, resource_size_t size)
> {
> struct resource **p;
> - struct resource *res, *new;
> + struct resource *res;
> + struct resource *new_res;
> resource_size_t end;
> int ret = -EINVAL;
>
> @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> if ((start < parent->start) || (end > parent->end))
> return ret;
>
> + /* The kzalloc() result gets checked later */
> + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +
> p = &parent->child;
> write_lock(&resource_lock);
>
> @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> start - res->start);
> } else {
> /* split into two entries */
> - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> - if (!new) {
> + if (!new_res) {
> ret = -ENOMEM;
> break;
> }
> - new->name = res->name;
> - new->start = end + 1;
> - new->end = res->end;
> - new->flags = res->flags;
> - new->parent = res->parent;
> - new->sibling = res->sibling;
> - new->child = NULL;
> + new_res->name = res->name;
> + new_res->start = end + 1;
> + new_res->end = res->end;
> + new_res->flags = res->flags;
> + new_res->parent = res->parent;
> + new_res->sibling = res->sibling;
> + new_res->child = NULL;
>
> ret = __adjust_resource(res, res->start,
> start - res->start);
> if (ret) {
> - kfree(new);
> + kfree(new_res);
> break;
> }
The kfree() in the if-statement above is not necessary since kfree() is
called before the return at the end. That is, the if-statement needs to
be:
if (ret)
break;
With this change, I confirmed that all my test cases passed (with all
the config debug options this time :). With the change:
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Thanks!
-Toshi
> - res->sibling = new;
> + res->sibling = new_res;
> + new_res = NULL;
> }
>
> break;
> }
>
> write_unlock(&resource_lock);
> + kfree(new_res);
> return ret;
> }
> #endif /* CONFIG_MEMORY_HOTPLUG */
> _
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-11 16:30 ` Toshi Kani
@ 2013-04-24 8:42 ` Ram Pai
2013-04-24 14:43 ` Toshi Kani
0 siblings, 1 reply; 13+ messages in thread
From: Ram Pai @ 2013-04-24 8:42 UTC (permalink / raw)
To: Toshi Kani
Cc: Andrew Morton, David Rientjes, linux-mm, linux-kernel, guz.fnst,
tmac, isimatu.yasuaki, wency, tangchen, jiang.liu
On Thu, Apr 11, 2013 at 10:30:02AM -0600, Toshi Kani wrote:
> On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> > On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> >
> > > On Wed, 10 Apr 2013, Toshi Kani wrote:
> > >
> > > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > > allocation is small and alternatives are unobvious.
> > > >
> > > > Great! Again, thanks for the update!
> > >
> > > release_mem_region_adjustable() allocates at most one struct resource, so
> > > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > > resource_lock and then testing whether it's NULL or not when splitting?
> > > It unnecessarily allocates memory when there's no split, but
> > > __remove_pages() shouldn't be a hotpath.
> >
> > yup.
> >
> > --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> > +++ a/kernel/resource.c
> > @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> > resource_size_t start, resource_size_t size)
> > {
> > struct resource **p;
> > - struct resource *res, *new;
> > + struct resource *res;
> > + struct resource *new_res;
> > resource_size_t end;
> > int ret = -EINVAL;
> >
> > @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> > if ((start < parent->start) || (end > parent->end))
> > return ret;
> >
> > + /* The kzalloc() result gets checked later */
> > + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > +
> > p = &parent->child;
> > write_lock(&resource_lock);
> >
> > @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> > start - res->start);
> > } else {
> > /* split into two entries */
> > - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> > - if (!new) {
> > + if (!new_res) {
> > ret = -ENOMEM;
> > break;
> > }
> > - new->name = res->name;
> > - new->start = end + 1;
> > - new->end = res->end;
> > - new->flags = res->flags;
> > - new->parent = res->parent;
> > - new->sibling = res->sibling;
> > - new->child = NULL;
> > + new_res->name = res->name;
> > + new_res->start = end + 1;
> > + new_res->end = res->end;
> > + new_res->flags = res->flags;
> > + new_res->parent = res->parent;
> > + new_res->sibling = res->sibling;
> > + new_res->child = NULL;
> >
> > ret = __adjust_resource(res, res->start,
> > start - res->start);
> > if (ret) {
> > - kfree(new);
> > + kfree(new_res);
> > break;
> > }
>
> The kfree() in the if-statement above is not necessary since kfree() is
> called before the return at the end. That is, the if-statement needs to
> be:
> if (ret)
> break;
>
> With this change, I confirmed that all my test cases passed (with all
> the config debug options this time :). With the change:
>
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
I am not confortable witht the assumption, that when a split takes
place, the children are assumed to be in the lower entry. Probably a
warning to that effect, would help quickly
nail down the problem, if such a case does encounter ?
Otherwise this looks fine. Sorry for the delayed reply. Was out.
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] resource: Add release_mem_region_adjustable()
2013-04-24 8:42 ` Ram Pai
@ 2013-04-24 14:43 ` Toshi Kani
0 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-24 14:43 UTC (permalink / raw)
To: Ram Pai
Cc: Andrew Morton, David Rientjes, linux-mm, linux-kernel, guz.fnst,
tmac, isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 2013-04-24 at 16:42 +0800, Ram Pai wrote:
> On Thu, Apr 11, 2013 at 10:30:02AM -0600, Toshi Kani wrote:
> > On Wed, 2013-04-10 at 15:24 -0700, Andrew Morton wrote:
> > > On Wed, 10 Apr 2013 15:08:29 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> > >
> > > > On Wed, 10 Apr 2013, Toshi Kani wrote:
> > > >
> > > > > > I'll switch it to GFP_ATOMIC. Which is horridly lame but the
> > > > > > allocation is small and alternatives are unobvious.
> > > > >
> > > > > Great! Again, thanks for the update!
> > > >
> > > > release_mem_region_adjustable() allocates at most one struct resource, so
> > > > why not do kmalloc(sizeof(struct resource), GFP_KERNEL) before taking
> > > > resource_lock and then testing whether it's NULL or not when splitting?
> > > > It unnecessarily allocates memory when there's no split, but
> > > > __remove_pages() shouldn't be a hotpath.
> > >
> > > yup.
> > >
> > > --- a/kernel/resource.c~resource-add-release_mem_region_adjustable-fix-fix
> > > +++ a/kernel/resource.c
> > > @@ -1046,7 +1046,8 @@ int release_mem_region_adjustable(struct
> > > resource_size_t start, resource_size_t size)
> > > {
> > > struct resource **p;
> > > - struct resource *res, *new;
> > > + struct resource *res;
> > > + struct resource *new_res;
> > > resource_size_t end;
> > > int ret = -EINVAL;
> > >
> > > @@ -1054,6 +1055,9 @@ int release_mem_region_adjustable(struct
> > > if ((start < parent->start) || (end > parent->end))
> > > return ret;
> > >
> > > + /* The kzalloc() result gets checked later */
> > > + new_res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > > +
> > > p = &parent->child;
> > > write_lock(&resource_lock);
> > >
> > > @@ -1091,32 +1095,33 @@ int release_mem_region_adjustable(struct
> > > start - res->start);
> > > } else {
> > > /* split into two entries */
> > > - new = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> > > - if (!new) {
> > > + if (!new_res) {
> > > ret = -ENOMEM;
> > > break;
> > > }
> > > - new->name = res->name;
> > > - new->start = end + 1;
> > > - new->end = res->end;
> > > - new->flags = res->flags;
> > > - new->parent = res->parent;
> > > - new->sibling = res->sibling;
> > > - new->child = NULL;
> > > + new_res->name = res->name;
> > > + new_res->start = end + 1;
> > > + new_res->end = res->end;
> > > + new_res->flags = res->flags;
> > > + new_res->parent = res->parent;
> > > + new_res->sibling = res->sibling;
> > > + new_res->child = NULL;
> > >
> > > ret = __adjust_resource(res, res->start,
> > > start - res->start);
> > > if (ret) {
> > > - kfree(new);
> > > + kfree(new_res);
> > > break;
> > > }
> >
> > The kfree() in the if-statement above is not necessary since kfree() is
> > called before the return at the end. That is, the if-statement needs to
> > be:
> > if (ret)
> > break;
> >
> > With this change, I confirmed that all my test cases passed (with all
> > the config debug options this time :). With the change:
> >
> > Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>
> I am not confortable witht the assumption, that when a split takes
> place, the children are assumed to be in the lower entry. Probably a
> warning to that effect, would help quickly
> nail down the problem, if such a case does encounter ?
Yes, __adjust_resource() fails with -EBUSY when such condition happens.
Hence, release_mem_region_adjustable() returns with -EBUSY, and
__remove_pages() emits a warning message per patch 3/3. So, it can be
quickly nailed down as this restriction is documented in the comment as
well.
At this point, the children are only used for Kernel code/data/bss as
follows. Hot-removable memory ranges are located at higher ranges than
them. So, I decided to simplify the implementation for this initial
version. We can always enhance it when needed.
# cat /proc/iomem
:
00100000-defa57ff : System RAM
01000000-0162f8d1 : Kernel code
0162f8d2-01ce52bf : Kernel data
01df1000-01fa5fff : Kernel bss
:
100000000-31fffffff : System RAM
> Otherwise this looks fine. Sorry for the delayed reply. Was out.
>
> Reviewed-by: Ram Pai <linuxram@us.ibm.com>
No problem. Thanks for reviewing!
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
2013-04-10 17:16 [PATCH v3 0/3] Support memory hot-delete to boot memory Toshi Kani
2013-04-10 17:16 ` [PATCH v3 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
2013-04-10 17:17 ` [PATCH v3 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-10 17:17 ` Toshi Kani
2013-04-10 22:13 ` David Rientjes
2 siblings, 1 reply; 13+ messages in thread
From: Toshi Kani @ 2013-04-10 17:17 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, rientjes, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu, Toshi Kani
Changed __remove_pages() to call release_mem_region_adjustable().
This allows a requested memory range to be released from
the iomem_resource table even if it does not match exactly to
an resource entry but still fits into. The resource entries
initialized at bootup usually cover the whole contiguous
memory ranges and may not necessarily match with the size of
memory hot-delete requests.
If release_mem_region_adjustable() failed, __remove_pages() emits
a warning message and continues to proceed as it was the case
with release_mem_region(). release_mem_region(), which is defined
to __release_region(), emits a warning message and returns no error
since a void function.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
mm/memory_hotplug.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 57decb2..c916582 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -705,8 +705,10 @@ EXPORT_SYMBOL_GPL(__add_pages);
int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
unsigned long nr_pages)
{
- unsigned long i, ret = 0;
+ unsigned long i;
int sections_to_remove;
+ resource_size_t start, size;
+ int ret = 0;
/*
* We can only remove entire sections
@@ -714,7 +716,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);
- release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE);
+ start = phys_start_pfn << PAGE_SHIFT;
+ size = nr_pages * PAGE_SIZE;
+ ret = release_mem_region_adjustable(&iomem_resource, start, size);
+ if (ret)
+ pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n",
+ start, start + size - 1, ret);
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i < sections_to_remove; i++) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
2013-04-10 17:17 ` [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
@ 2013-04-10 22:13 ` David Rientjes
2013-04-11 20:30 ` Toshi Kani
0 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2013-04-10 22:13 UTC (permalink / raw)
To: Toshi Kani
Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 10 Apr 2013, Toshi Kani wrote:
> Changed __remove_pages() to call release_mem_region_adjustable().
> This allows a requested memory range to be released from
> the iomem_resource table even if it does not match exactly to
> an resource entry but still fits into. The resource entries
> initialized at bootup usually cover the whole contiguous
> memory ranges and may not necessarily match with the size of
> memory hot-delete requests.
>
> If release_mem_region_adjustable() failed, __remove_pages() emits
> a warning message and continues to proceed as it was the case
> with release_mem_region(). release_mem_region(), which is defined
> to __release_region(), emits a warning message and returns no error
> since a void function.
>
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
2013-04-10 22:13 ` David Rientjes
@ 2013-04-11 20:30 ` Toshi Kani
0 siblings, 0 replies; 13+ messages in thread
From: Toshi Kani @ 2013-04-11 20:30 UTC (permalink / raw)
To: David Rientjes
Cc: akpm, linux-mm, linux-kernel, linuxram, guz.fnst, tmac,
isimatu.yasuaki, wency, tangchen, jiang.liu
On Wed, 2013-04-10 at 15:13 -0700, David Rientjes wrote:
> On Wed, 10 Apr 2013, Toshi Kani wrote:
>
> > Changed __remove_pages() to call release_mem_region_adjustable().
> > This allows a requested memory range to be released from
> > the iomem_resource table even if it does not match exactly to
> > an resource entry but still fits into. The resource entries
> > initialized at bootup usually cover the whole contiguous
> > memory ranges and may not necessarily match with the size of
> > memory hot-delete requests.
> >
> > If release_mem_region_adjustable() failed, __remove_pages() emits
> > a warning message and continues to proceed as it was the case
> > with release_mem_region(). release_mem_region(), which is defined
> > to __release_region(), emits a warning message and returns no error
> > since a void function.
> >
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> Acked-by: David Rientjes <rientjes@google.com>
Thanks David!
-Toshi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread