* [PATCH 1/3] mm: correct return value of migrate_pages()
[not found] <Yes>
@ 2012-07-16 16:14 ` Joonsoo Kim
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
` (4 more replies)
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
` (6 subsequent siblings)
7 siblings, 5 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.
So this patch correct it by initialising nr_failed in outer loop.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..294d52a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
for(pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
@ 2012-07-16 16:14 ` Joonsoo Kim
2012-07-16 17:26 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 16:14 ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
` (3 subsequent siblings)
4 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin,
Christoph Lameter
do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EIO
Additionally, Correct comment above do_migrate_pages()
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..f7df271 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
* Move pages between the two nodesets so as to preserve the physical
* layout as much as possible.
*
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
*/
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
err = do_migrate_pages(mm, old, new,
capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+ if (err > 0)
+ err = -EIO;
mmput(mm);
out:
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-16 16:14 ` Joonsoo Kim
2012-07-16 17:29 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 17:14 ` [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
` (2 subsequent siblings)
4 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
Marek Szyprowski, Minchan Kim, Christoph Lameter
migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
tries = 0;
} else if (++tries == 5) {
- ret = ret < 0 ? ret : -EBUSY;
break;
}
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
putback_lru_pages(&cc.migratepages);
- return ret > 0 ? 0 : ret;
+ return ret <= 0 ? ret : -EBUSY;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-16 16:14 ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
@ 2012-07-16 17:14 ` Joonsoo Kim
2012-07-16 17:30 ` Christoph Lameter
2012-07-16 17:23 ` [PATCH 1/3] mm: correct return value of migrate_pages() Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
4 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-16 17:14 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
Christoph Lameter, Minchan Kim
move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index 294d52a..adabaf4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1171,7 +1171,7 @@ set_status:
}
up_read(&mm->mmap_sem);
- return err;
+ return err > 0 ? -EIO : err;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
` (2 preceding siblings ...)
2012-07-16 17:14 ` [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
@ 2012-07-16 17:23 ` Christoph Lameter
2012-07-16 17:32 ` JoonSoo Kim
2012-07-16 17:40 ` Michal Nazarewicz
4 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:23 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm
On Tue, 17 Jul 2012, Joonsoo Kim wrote:
> migrate_pages() should return number of pages not migrated or error code.
> When unmap_and_move return -EAGAIN, outer loop is re-execution without
> initialising nr_failed. This makes nr_failed over-counted.
The itention of the nr_failed was only to give an indication as to how
many attempts where made. The failed pages where on a separate queue that
seems to have vanished.
> So this patch correct it by initialising nr_failed in outer loop.
Well yea it makes sense since retry is initialized there as well.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-16 17:26 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
1 sibling, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:26 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin
On Tue, 17 Jul 2012, Joonsoo Kim wrote
> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EIO
Pages are not migrated because they are busy not because there is an
error. So lets return EBUSY.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-16 16:14 ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
@ 2012-07-16 17:29 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
1 sibling, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:29 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, linux-kernel, linux-mm, Michal Nazarewicz, Marek Szyprowski,
Minchan Kim
On Tue, 17 Jul 2012, Joonsoo Kim wrote:
> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
2012-07-16 17:14 ` [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
@ 2012-07-16 17:30 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:30 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim
On Tue, 17 Jul 2012, Joonsoo Kim wrote:
> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.
> This patch changes return value of do_move_page_to_node_array
> for not returning positive value. It can fix the problem.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Brice Goglin <brice@myri.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan@kernel.org>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 294d52a..adabaf4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1171,7 +1171,7 @@ set_status:
> }
>
> up_read(&mm->mmap_sem);
> - return err;
> + return err > 0 ? -EIO : err;
> }
Please use EBUSY.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 17:23 ` [PATCH 1/3] mm: correct return value of migrate_pages() Christoph Lameter
@ 2012-07-16 17:32 ` JoonSoo Kim
2012-07-16 17:37 ` Christoph Lameter
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:32 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm
2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> migrate_pages() should return number of pages not migrated or error code.
>> When unmap_and_move return -EAGAIN, outer loop is re-execution without
>> initialising nr_failed. This makes nr_failed over-counted.
>
> The itention of the nr_failed was only to give an indication as to how
> many attempts where made. The failed pages where on a separate queue that
> seems to have vanished.
>
>> So this patch correct it by initialising nr_failed in outer loop.
>
> Well yea it makes sense since retry is initialized there as well.
>
> Acked-by: Christoph Lameter <cl@linux.com>
Thanks for comment.
Additinally, I find that migrate_huge_pages() is needed identical fix
as migrate_pages().
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
for (pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
When I resend with this, could I include "Acked-by: Christoph Lameter
<cl@linux.com>"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 17:32 ` JoonSoo Kim
@ 2012-07-16 17:37 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:37 UTC (permalink / raw)
To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm
On Tue, 17 Jul 2012, JoonSoo Kim wrote:
>
> for (pass = 0; pass < 10 && retry; pass++) {
> retry = 0;
> + nr_failed = 0;
>
> list_for_each_entry_safe(page, page2, from, lru) {
> cond_resched();
>
> When I resend with this, could I include "Acked-by: Christoph Lameter
> <cl@linux.com>"?
Sure.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
` (3 preceding siblings ...)
2012-07-16 17:23 ` [PATCH 1/3] mm: correct return value of migrate_pages() Christoph Lameter
@ 2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 17:57 ` JoonSoo Kim
4 siblings, 1 reply; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Christoph Lameter
[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]
Joonsoo Kim <js1304@gmail.com> writes:
> migrate_pages() should return number of pages not migrated or error code.
> When unmap_and_move return -EAGAIN, outer loop is re-execution without
> initialising nr_failed. This makes nr_failed over-counted.
>
> So this patch correct it by initialising nr_failed in outer loop.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Actually, it makes me wonder if there is any code that uses this
information. If not, it would be best in my opinion to make it return
zero or negative error code, but that would have to be checked.
> diff --git a/mm/migrate.c b/mm/migrate.c
> index be26d5c..294d52a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
>
> for(pass = 0; pass < 10 && retry; pass++) {
> retry = 0;
> + nr_failed = 0;
>
> list_for_each_entry_safe(page, page2, from, lru) {
> cond_resched();
--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-16 17:26 ` Christoph Lameter
@ 2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 17:59 ` JoonSoo Kim
1 sibling, 1 reply; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter
[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]
Joonsoo Kim <js1304@gmail.com> writes:
> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EIO
>
> Additionally, Correct comment above do_migrate_pages()
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Sasha Levin <levinsasha928@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1d771e4..f7df271 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
> * Move pages between the two nodesets so as to preserve the physical
> * layout as much as possible.
> *
> - * Returns the number of page that could not be moved.
> + * Returns error or the number of pages not migrated.
> */
> int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
> const nodemask_t *to, int flags)
> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>
> err = do_migrate_pages(mm, old, new,
> capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
> + if (err > 0)
> + err = -EIO;
>
> mmput(mm);
> out:
--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-16 16:14 ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-16 17:29 ` Christoph Lameter
@ 2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 18:40 ` JoonSoo Kim
1 sibling, 1 reply; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
To: Joonsoo Kim
Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
Christoph Lameter
[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]
Joonsoo Kim <js1304@gmail.com> writes:
> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
Have you actually encountered this problem? If migrate_pages() fails
with a positive value, the code that you are removing kicks in and
-EBUSY is assigned to ret (now that I look at it, I think that in the
current code the "return ret > 0 ? 0 : ret;" statement could be reduced
to "return ret;"). Your code seems to be cleaner, but the commit
message does not look accurate to me.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..02d4519 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> }
> tries = 0;
> } else if (++tries == 5) {
> - ret = ret < 0 ? ret : -EBUSY;
> break;
> }
>
> @@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> }
>
> putback_lru_pages(&cc.migratepages);
> - return ret > 0 ? 0 : ret;
> + return ret <= 0 ? ret : -EBUSY;
> }
>
> /*
--
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 17:40 ` Michal Nazarewicz
@ 2012-07-16 17:57 ` JoonSoo Kim
2012-07-16 18:05 ` Christoph Lameter
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:57 UTC (permalink / raw)
To: Michal Nazarewicz; +Cc: akpm, linux-kernel, linux-mm, Christoph Lameter
2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
Thanks.
> Actually, it makes me wonder if there is any code that uses this
> information. If not, it would be best in my opinion to make it return
> zero or negative error code, but that would have to be checked.
I think that, too.
I looked at every callsites for migrate_pages() and there is no place
which really need fail count.
This function sometimes makes caller error-prone,
so I think changing return value is preferable.
How do you think, Christoph?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-16 17:40 ` Michal Nazarewicz
@ 2012-07-16 17:59 ` JoonSoo Kim
2012-07-17 13:02 ` Michal Nazarewicz
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:59 UTC (permalink / raw)
To: Michal Nazarewicz
Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter
2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Joonsoo Kim <js1304@gmail.com> writes:
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EIO
>>
>> Additionally, Correct comment above do_migrate_pages()
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Sasha Levin <levinsasha928@gmail.com>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
Thanks.
When I resend with changing -EIO to -EBUSY,
could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
2012-07-16 17:57 ` JoonSoo Kim
@ 2012-07-16 18:05 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-16 18:05 UTC (permalink / raw)
To: JoonSoo Kim; +Cc: Michal Nazarewicz, akpm, linux-kernel, linux-mm
On Tue, 17 Jul 2012, JoonSoo Kim wrote:
> > Actually, it makes me wonder if there is any code that uses this
> > information. If not, it would be best in my opinion to make it return
> > zero or negative error code, but that would have to be checked.
>
> I think that, too.
> I looked at every callsites for migrate_pages() and there is no place
> which really need fail count.
> This function sometimes makes caller error-prone,
> so I think changing return value is preferable.
>
> How do you think, Christoph?
We could do that. I am not aware of anything using that information
either. However, the condition in which some pages where migrated and
others are not is not like a classic error. In many situations the moving
of the pages is done for performance reasons. This just means that the
best performant memory locations could not be used for some pages. A
situation like that may be ok for an application.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-16 17:40 ` Michal Nazarewicz
@ 2012-07-16 18:40 ` JoonSoo Kim
2012-07-17 13:16 ` Michal Nazarewicz
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-16 18:40 UTC (permalink / raw)
To: Michal Nazarewicz
Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
Christoph Lameter
2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> Joonsoo Kim <js1304@gmail.com> writes:
>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Have you actually encountered this problem? If migrate_pages() fails
> with a positive value, the code that you are removing kicks in and
> -EBUSY is assigned to ret (now that I look at it, I think that in the
> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
> to "return ret;"). Your code seems to be cleaner, but the commit
> message does not look accurate to me.
>
I don't encounter this problem yet.
If migrate_pages() with offlining false meets KSM page, then migration failed.
In this case, failed page is removed from cc.migratepage list and
return failed count.
So it can be possible exiting loop without testing ++tries == 5 and
ret is over the zero.
Is there any point which I missing?
Is there any possible scenario "migrate_pages return > 0 and
cc.migratepages is empty"?
I'm not expert for MM, so please comment my humble opinion.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages()
[not found] <Yes>
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
@ 2012-07-17 12:33 ` Joonsoo Kim
2012-07-17 12:33 ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
` (2 more replies)
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
` (5 subsequent siblings)
7 siblings, 3 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.
So this patch correct it by initialising nr_failed in outer loop.
migrate_huge_pages() is identical case as migrate_pages()
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
for(pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
for (pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
@ 2012-07-17 12:33 ` Joonsoo Kim
2012-07-17 14:28 ` Christoph Lameter
2012-07-17 12:33 ` [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-17 12:33 ` [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin,
Christoph Lameter
do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY
Additionally, Correct comment above do_migrate_pages()
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
* Move pages between the two nodesets so as to preserve the physical
* layout as much as possible.
*
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
*/
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
err = do_migrate_pages(mm, old, new,
capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+ if (err > 0)
+ err = -EBUSY;
mmput(mm);
out:
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-17 12:33 ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-17 12:33 ` Joonsoo Kim
2012-07-17 13:25 ` Michal Nazarewicz
2012-07-17 12:33 ` [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
Marek Szyprowski, Minchan Kim, Christoph Lameter
migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
tries = 0;
} else if (++tries == 5) {
- ret = ret < 0 ? ret : -EBUSY;
break;
}
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
putback_lru_pages(&cc.migratepages);
- return ret > 0 ? 0 : ret;
+ return ret <= 0 ? ret : -EBUSY;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-17 12:33 ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-17 12:33 ` [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
@ 2012-07-17 12:33 ` Joonsoo Kim
2 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
Christoph Lameter, Minchan Kim
move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
}
up_read(&mm->mmap_sem);
- return err;
+ return err > 0 ? -EBUSY : err;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-16 17:59 ` JoonSoo Kim
@ 2012-07-17 13:02 ` Michal Nazarewicz
0 siblings, 0 replies; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:02 UTC (permalink / raw)
To: Michal Nazarewicz, JoonSoo Kim
Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter
On Mon, 16 Jul 2012 19:59:18 +0200, JoonSoo Kim <js1304@gmail.com> wrote:
> 2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>> do_migrate_pages() can return the number of pages not migrated.
>>> Because migrate_pages() syscall return this value directly,
>>> migrate_pages() syscall may return the number of pages not migrated.
>>> In fail case in migrate_pages() syscall, we should return error value.
>>> So change err to -EIO
>>>
>>> Additionally, Correct comment above do_migrate_pages()
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Sasha Levin <levinsasha928@gmail.com>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>
> Thanks.
>
> When I resend with changing -EIO to -EBUSY,
> could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?
Sure thing.
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-16 18:40 ` JoonSoo Kim
@ 2012-07-17 13:16 ` Michal Nazarewicz
0 siblings, 0 replies; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:16 UTC (permalink / raw)
To: JoonSoo Kim
Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
Christoph Lameter
On Mon, 16 Jul 2012 20:40:56 +0200, JoonSoo Kim <js1304@gmail.com> wrote:
> 2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>
>>> migrate_pages() would return positive value in some failure case,
>>> so 'ret > 0 ? 0 : ret' may be wrong.
>>> This fix it and remove one dead statement.
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Nazarewicz <mina86@mina86.com>
>>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Have you actually encountered this problem? If migrate_pages() fails
>> with a positive value, the code that you are removing kicks in and
>> -EBUSY is assigned to ret (now that I look at it, I think that in the
>> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
>> to "return ret;"). Your code seems to be cleaner, but the commit
>> message does not look accurate to me.
>>
>
> I don't encounter this problem yet.
>
> If migrate_pages() with offlining false meets KSM page, then migration failed.
> In this case, failed page is removed from cc.migratepage list and
> return failed count.
Good point.
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
2012-07-17 12:33 ` [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
@ 2012-07-17 13:25 ` Michal Nazarewicz
2012-07-17 15:45 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:25 UTC (permalink / raw)
To: akpm, Joonsoo Kim
Cc: linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
Christoph Lameter
On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.
How about the following message:
------------------- >8 ---------------------------------------------------
migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with. Such situation means that it went
through all the pages on the list some of which failed to be migrated.
If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked. This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.
This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.
------------------- >8 ---------------------------------------------------
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Acked-by: Christoph Lameter <cl@linux.com>
In fact, now that I look at it, I think that __alloc_contig_migrate_range()
should be changed even further. I'll take a closer look at it and send
a patch (possibly through Marek ;) ).
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..02d4519 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> }
> tries = 0;
> } else if (++tries == 5) {
> - ret = ret < 0 ? ret : -EBUSY;
> break;
> }
>@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> }
> putback_lru_pages(&cc.migratepages);
> - return ret > 0 ? 0 : ret;
> + return ret <= 0 ? ret : -EBUSY;
> }
> /*
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-17 12:33 ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-17 14:28 ` Christoph Lameter
2012-07-17 15:41 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-07-17 14:28 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin
On Tue, 17 Jul 2012, Joonsoo Kim wrote:
> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>
> err = do_migrate_pages(mm, old, new,
> capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
> + if (err > 0)
> + err = -EBUSY;
>
> mmput(mm);
> out:
Why not have do_migrate_pages() return EBUSY if we do not need the number
of failed/retried pages?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-17 14:28 ` Christoph Lameter
@ 2012-07-17 15:41 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:41 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin
2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>>
>> err = do_migrate_pages(mm, old, new,
>> capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
>> + if (err > 0)
>> + err = -EBUSY;
>>
>> mmput(mm);
>> out:
>
> Why not have do_migrate_pages() return EBUSY if we do not need the number
> of failed/retried pages?
There is no serious reason.
do_migrate_pages() have two callsites, although another one doesn't
use return value.
do_migrate_pages() is commented "Return the number of page ...".
And my focus is fixing possible error in migrate_pages() syscall.
So, I keep to return the number of failed/retired pages.
If we really think the number of failed/retired pages is useless, in that time,
instead that do_migrate_pages() return EBUSY, we can make migrate_pages()
return EBUSY. I think it is better to fix all the related codes at one go.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
2012-07-17 13:25 ` Michal Nazarewicz
@ 2012-07-17 15:45 ` JoonSoo Kim
2012-07-17 15:49 ` [PATCH 3/4 v3] " Joonsoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:45 UTC (permalink / raw)
To: Michal Nazarewicz
Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
Christoph Lameter
2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
>>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>
>
> How about the following message:
>
> ------------------- >8 ---------------------------------------------------
> migrate_pages() can return positive value while at the same time emptying
> the list of pages it was called with. Such situation means that it went
> through all the pages on the list some of which failed to be migrated.
>
> If that happens, __alloc_contig_migrate_range()'s loop may finish without
> "++tries == 5" never being checked. This in turn means that at the end
> of the function, ret may have a positive value, which should be treated
> as an error.
>
> This patch changes __alloc_contig_migrate_range() so that the return
> statement converts positive ret value into -EBUSY error.
> ------------------- >8 ---------------------------------------------------
It's good.
I will resend patch replacing my comment with yours.
Thanks for help.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-17 15:45 ` JoonSoo Kim
@ 2012-07-17 15:49 ` Joonsoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-17 15:49 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
Marek Szyprowski, Minchan Kim, Christoph Lameter
migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with. Such situation means that it went
through all the pages on the list some of which failed to be migrated.
If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked. This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.
This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
tries = 0;
} else if (++tries == 5) {
- ret = ret < 0 ? ret : -EBUSY;
break;
}
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
putback_lru_pages(&cc.migratepages);
- return ret > 0 ? 0 : ret;
+ return ret <= 0 ? ret : -EBUSY;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages()
[not found] <Yes>
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
@ 2012-07-27 17:55 ` Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
` (2 more replies)
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
` (4 subsequent siblings)
7 siblings, 3 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.
So this patch correct it by initialising nr_failed in outer loop.
migrate_huge_pages() is identical case as migrate_pages()
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
---
[Patch 2/4]: add "Acked-by: Michal Nazarewicz <mina86@mina86.com>"
[Patch 3/4]: commit log is changed according to Michal Nazarewicz's suggestion.
There is no other change from v2.
Just resend as ping for Andrew.
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
for(pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
for (pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ nr_failed = 0;
list_for_each_entry_safe(page, page2, from, lru) {
cond_resched();
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
@ 2012-07-27 17:55 ` Joonsoo Kim
2012-07-27 20:57 ` Christoph Lameter
2012-07-27 17:55 ` [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin,
Christoph Lameter
do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY
Additionally, Correct comment above do_migrate_pages()
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
* Move pages between the two nodesets so as to preserve the physical
* layout as much as possible.
*
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
*/
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
err = do_migrate_pages(mm, old, new,
capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+ if (err > 0)
+ err = -EBUSY;
mmput(mm);
out:
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-27 17:55 ` Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
Marek Szyprowski, Minchan Kim, Christoph Lameter
migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with. Such situation means that it went
through all the pages on the list some of which failed to be migrated.
If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked. This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.
This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
tries = 0;
} else if (++tries == 5) {
- ret = ret < 0 ? ret : -EBUSY;
break;
}
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
}
putback_lru_pages(&cc.migratepages);
- return ret > 0 ? 0 : ret;
+ return ret <= 0 ? ret : -EBUSY;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
@ 2012-07-27 17:55 ` Joonsoo Kim
2012-07-27 20:54 ` Christoph Lameter
2 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
To: akpm
Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
Christoph Lameter, Minchan Kim
move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
}
up_read(&mm->mmap_sem);
- return err;
+ return err > 0 ? -EBUSY : err;
}
/*
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-27 17:55 ` [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
@ 2012-07-27 20:54 ` Christoph Lameter
2012-07-28 6:09 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:54 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim
On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.
Nope. It only means that the migration for some pages has failed. This may
still be considered successful for the app if it moves 10000 pages and one
failed.
This patch would break the move_pages() syscall because an error code
return from do_move_pages_to_node_array() will cause the status byte for
each page move to not be updated anymore. Application will not be able to
tell anymore which pages were successfully moved and which are not.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-27 17:55 ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
@ 2012-07-27 20:57 ` Christoph Lameter
2012-07-28 6:16 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:57 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin
On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EBUSY
Lets leave this alone. This would change the migrate_pages semantics
because a successful move of N out of M pages would be marked as a
total failure although pages were in fact moved.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-27 20:54 ` Christoph Lameter
@ 2012-07-28 6:09 ` JoonSoo Kim
2012-07-30 19:29 ` Christoph Lameter
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-28 6:09 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim
2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> move_pages() syscall may return success in case that
>> do_move_page_to_node_array return positive value which means migration failed.
>
> Nope. It only means that the migration for some pages has failed. This may
> still be considered successful for the app if it moves 10000 pages and one
> failed.
>
> This patch would break the move_pages() syscall because an error code
> return from do_move_pages_to_node_array() will cause the status byte for
> each page move to not be updated anymore. Application will not be able to
> tell anymore which pages were successfully moved and which are not.
In case of returning non-zero, valid status is not required according
to man page.
So, this patch would not break the move_pages() syscall.
But, I agree that returning positive value only means that the
migration for some pages has failed.
This is my mistake, so please drop this patch.
Thanks for review.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-27 20:57 ` Christoph Lameter
@ 2012-07-28 6:16 ` JoonSoo Kim
2012-07-30 19:30 ` Christoph Lameter
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-28 6:16 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin
2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EBUSY
>
> Lets leave this alone. This would change the migrate_pages semantics
> because a successful move of N out of M pages would be marked as a
> total failure although pages were in fact moved.
>
Okay.
Then, do we need to fix man-page of migrate_pages() syscall?
According to man-page, only returning 0 or -1 is valid.
Without this patch, it can return positive value.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-28 6:09 ` JoonSoo Kim
@ 2012-07-30 19:29 ` Christoph Lameter
2012-07-31 3:34 ` JoonSoo Kim
2012-08-01 5:15 ` Michael Kerrisk
0 siblings, 2 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:29 UTC (permalink / raw)
To: JoonSoo Kim
Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim,
Michael Kerrisk
On Sat, 28 Jul 2012, JoonSoo Kim wrote:
> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> move_pages() syscall may return success in case that
> >> do_move_page_to_node_array return positive value which means migration failed.
> >
> > Nope. It only means that the migration for some pages has failed. This may
> > still be considered successful for the app if it moves 10000 pages and one
> > failed.
> >
> > This patch would break the move_pages() syscall because an error code
> > return from do_move_pages_to_node_array() will cause the status byte for
> > each page move to not be updated anymore. Application will not be able to
> > tell anymore which pages were successfully moved and which are not.
>
> In case of returning non-zero, valid status is not required according
> to man page.
Cannot find a statement like that in the man page. The return code
description is incorrect. It should that that is returns the number of
pages not moved otherwise an error code (Michael please fix the manpage).
> So, this patch would not break the move_pages() syscall.
It changes the way the system call is behaving right now.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
2012-07-28 6:16 ` JoonSoo Kim
@ 2012-07-30 19:30 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:30 UTC (permalink / raw)
To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Michael Kerrisk
On Sat, 28 Jul 2012, JoonSoo Kim wrote:
> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> do_migrate_pages() can return the number of pages not migrated.
> >> Because migrate_pages() syscall return this value directly,
> >> migrate_pages() syscall may return the number of pages not migrated.
> >> In fail case in migrate_pages() syscall, we should return error value.
> >> So change err to -EBUSY
> >
> > Lets leave this alone. This would change the migrate_pages semantics
> > because a successful move of N out of M pages would be marked as a
> > total failure although pages were in fact moved.
> >
>
> Okay.
> Then, do we need to fix man-page of migrate_pages() syscall?
> According to man-page, only returning 0 or -1 is valid.
> Without this patch, it can return positive value.
Yes the manpage needs updating to say that it can return the number of
pages not migrated.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-30 19:29 ` Christoph Lameter
@ 2012-07-31 3:34 ` JoonSoo Kim
2012-07-31 14:04 ` Christoph Lameter
2012-08-01 5:15 ` Michael Kerrisk
1 sibling, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-07-31 3:34 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim,
Michael Kerrisk
2012/7/31 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl@linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).
In man page, there is following statement.
"status is an array of integers that return the status of each page. The array
only contains valid values if move_pages() did not return an error."
And current implementation of move_pages() syscall doesn't return the number
of pages not moved, just return 0 when it encounter some failed pages.
So, if u want to fix the man page, u should fix do_pages_move() first.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-31 3:34 ` JoonSoo Kim
@ 2012-07-31 14:04 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:04 UTC (permalink / raw)
To: JoonSoo Kim
Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim,
Michael Kerrisk
On Tue, 31 Jul 2012, JoonSoo Kim wrote:
> In man page, there is following statement.
> "status is an array of integers that return the status of each page. The array
> only contains valid values if move_pages() did not return an error."
> And current implementation of move_pages() syscall doesn't return the number
> of pages not moved, just return 0 when it encounter some failed pages.
> So, if u want to fix the man page, u should fix do_pages_move() first.
Hmm... Yeah actually that is sufficient since the status is readily
obtainable from the status array. It would be better though if the
function would return the number of pages not moved in the same way as
migrate_pages().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-07-30 19:29 ` Christoph Lameter
2012-07-31 3:34 ` JoonSoo Kim
@ 2012-08-01 5:15 ` Michael Kerrisk
2012-08-01 18:00 ` Christoph Lameter
1 sibling, 1 reply; 95+ messages in thread
From: Michael Kerrisk @ 2012-08-01 5:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin,
Minchan Kim
[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]
On Mon, Jul 30, 2012 at 9:29 PM, Christoph Lameter <cl@linux.com> wrote:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl@linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).
Hi Christoph,
Is the patch below acceptable? (I've attached the complete page as well.)
See you in San Diego (?),
Michael
--- a/man2/migrate_pages.2
+++ b/man2/migrate_pages.2
@@ -29,7 +29,7 @@ migrate_pages \- move all pages in a process to
another set of nodes
Link with \fI\-lnuma\fP.
.SH DESCRIPTION
.BR migrate_pages ()
-moves all pages of the process
+attempts to move all pages of the process
.I pid
that are in memory nodes
.I old_nodes
@@ -87,7 +87,8 @@ privilege.
.SH "RETURN VALUE"
On success
.BR migrate_pages ()
-returns zero.
+returns the number of pages that cold not be moved
+(i.e., a return of zero means that all pages were successfully moved).
On error, it returns \-1, and sets
.I errno
to indicate the error.
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
[-- Attachment #2: migrate_pages.2 --]
[-- Type: application/octet-stream, Size: 4084 bytes --]
.\" Hey Emacs! This file is -*- nroff -*- source.
.\"
.\" Copyright 2009 Intel Corporation
.\" Author: Andi Kleen
.\" Based on the move_pages manpage which was
.\" This manpage is Copyright (C) 2006 Silicon Graphics, Inc.
.\" Christoph Lameter
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.TH MIGRATE_PAGES 2 2012-08-01 "Linux" "Linux Programmer's Manual"
.SH NAME
migrate_pages \- move all pages in a process to another set of nodes
.SH SYNOPSIS
.nf
.B #include <numaif.h>
.sp
.BI "long migrate_pages(int " pid ", unsigned long " maxnode,
.BI " const unsigned long *" old_nodes,
.BI " const unsigned long *" new_nodes);
.fi
.sp
Link with \fI\-lnuma\fP.
.SH DESCRIPTION
.BR migrate_pages ()
attempts to move all pages of the process
.I pid
that are in memory nodes
.I old_nodes
to the memory nodes in
.IR new_nodes .
Pages not located in any node in
.I old_nodes
will not be migrated.
As far as possible,
the kernel maintains the relative topology relationship inside
.I old_nodes
during the migration to
.IR new_nodes .
The
.I old_nodes
and
.I new_nodes
arguments are pointers to bit masks of node numbers, with up to
.I maxnode
bits in each mask.
These masks are maintained as arrays of unsigned
.I long
integers (in the last
.I long
integer, the bits beyond those specified by
.I maxnode
are ignored).
The
.I maxnode
argument is the maximum node number in the bit mask plus one (this is the same
as in
.BR mbind (2),
but different from
.BR select (2)).
The
.I pid
argument is the ID of the process whose pages are to be moved.
To move pages in another process,
the caller must be privileged
.RB ( CAP_SYS_NICE )
or the real or effective user ID of the calling process must match the
real or saved-set user ID of the target process.
If
.I pid
is 0, then
.BR migrate_pages ()
moves pages of the calling process.
Pages shared with another process will only be moved if the initiating
process has the
.B CAP_SYS_NICE
privilege.
.SH "RETURN VALUE"
On success
.BR migrate_pages ()
returns the number of pages that cold not be moved
(i.e., a return of zero means that all pages were successfully moved).
On error, it returns \-1, and sets
.I errno
to indicate the error.
.SH ERRORS
.TP
.B EPERM
Insufficient privilege
.RB ( CAP_SYS_NICE )
to move pages of the process specified by
.IR pid ,
or insufficient privilege
.RB ( CAP_SYS_NICE )
to access the specified target nodes.
.TP
.B ESRCH
No process matching
.I pid
could be found.
.\" FIXME There are other errors
.SH VERSIONS
The
.BR migrate_pages ()
system call first appeared on Linux in version 2.6.16.
.SH CONFORMING TO
This system call is Linux-specific.
.SH "NOTES"
For information on library support, see
.BR numa (7).
Use
.BR get_mempolicy (2)
with the
.B MPOL_F_MEMS_ALLOWED
flag to obtain the set of nodes that are allowed by
the calling process's cpuset.
Note that this information is subject to change at any
time by manual or automatic reconfiguration of the cpuset.
Use of
.BR migrate_pages ()
may result in pages whose location
(node) violates the memory policy established for the
specified addresses (see
.BR mbind (2))
and/or the specified process (see
.BR set_mempolicy (2)).
That is, memory policy does not constrain the destination
nodes used by
.BR migrate_pages ().
The
.I <numaif.h>
header is not included with glibc, but requires installing
.I libnuma-devel
or a similar package.
.SH "SEE ALSO"
.BR get_mempolicy (2),
.BR mbind (2),
.BR set_mempolicy (2),
.BR numa (3),
.BR numa_maps (5),
.BR cpuset (7),
.BR numa (7),
.BR migratepages (8),
.BR numa_stat (8);
.br
the kernel source file
.IR Documentation/vm/page_migration .
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-08-01 5:15 ` Michael Kerrisk
@ 2012-08-01 18:00 ` Christoph Lameter
2012-08-02 5:52 ` Michael Kerrisk
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-08-01 18:00 UTC (permalink / raw)
To: Michael Kerrisk
Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin,
Minchan Kim
On Wed, 1 Aug 2012, Michael Kerrisk wrote:
> Is the patch below acceptable? (I've attached the complete page as well.)
Yes looks good.
> See you in San Diego (?),
Yup. I will be there too.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
2012-08-01 18:00 ` Christoph Lameter
@ 2012-08-02 5:52 ` Michael Kerrisk
0 siblings, 0 replies; 95+ messages in thread
From: Michael Kerrisk @ 2012-08-02 5:52 UTC (permalink / raw)
To: Christoph Lameter
Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin,
Minchan Kim
On Wed, Aug 1, 2012 at 8:00 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 1 Aug 2012, Michael Kerrisk wrote:
>
>> Is the patch below acceptable? (I've attached the complete page as well.)
>
> Yes looks good.
Thanks for checking it!
>> See you in San Diego (?),
>
> Yup. I will be there too.
See you then!
Cheers,
Michael
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
[not found] <Yes>
` (2 preceding siblings ...)
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
@ 2012-08-24 16:05 ` Joonsoo Kim
2012-08-24 16:05 ` [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node Joonsoo Kim
2012-08-24 16:12 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Christoph Lameter
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
` (3 subsequent siblings)
7 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
cpu_partial of kmem_cache struct is a bit awkward.
It means the maximum number of objects kept in the per cpu slab
and cpu partial lists of a processor. However, current name
seems to represent objects kept in the cpu partial lists only.
So, this patch renames it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..9130e6b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -84,7 +84,7 @@ struct kmem_cache {
int size; /* The size of an object including meta data */
int object_size; /* The size of an object without meta data */
int offset; /* Free pointer offset. */
- int cpu_partial; /* Number of per cpu partial objects to keep around */
+ int max_cpu_object; /* Number of per cpu objects to keep around */
struct kmem_cache_order_objects oo;
/* Allocation and freeing of slabs */
diff --git a/mm/slub.c b/mm/slub.c
index c67bd0a..d597530 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1565,7 +1565,7 @@ static void *get_partial_node(struct kmem_cache *s,
available = put_cpu_partial(s, page, 0);
stat(s, CPU_PARTIAL_NODE);
}
- if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+ if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
break;
}
@@ -1953,7 +1953,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
if (oldpage) {
pobjects = oldpage->pobjects;
pages = oldpage->pages;
- if (drain && pobjects > s->cpu_partial) {
+ if (drain && pobjects > s->max_cpu_object) {
unsigned long flags;
/*
* partial array is full. Move the existing
@@ -3073,8 +3073,8 @@ static int kmem_cache_open(struct kmem_cache *s,
set_min_partial(s, ilog2(s->size) / 2);
/*
- * cpu_partial determined the maximum number of objects kept in the
- * per cpu partial lists of a processor.
+ * max_cpu_object determined the maximum number of objects kept in the
+ * per cpu slab and cpu partial lists of a processor.
*
* Per cpu partial lists mainly contain slabs that just have one
* object freed. If they are used for allocation then they can be
@@ -3085,20 +3085,20 @@ static int kmem_cache_open(struct kmem_cache *s,
*
* A) The number of objects from per cpu partial slabs dumped to the
* per node list when we reach the limit.
- * B) The number of objects in cpu partial slabs to extract from the
- * per node list when we run out of per cpu objects. We only fetch 50%
- * to keep some capacity around for frees.
+ * B) The number of objects in cpu slab and cpu partial lists to
+ * extract from the per node list when we run out of per cpu objects.
+ * We only fetch 50% to keep some capacity around for frees.
*/
if (kmem_cache_debug(s))
- s->cpu_partial = 0;
+ s->max_cpu_object = 0;
else if (s->size >= PAGE_SIZE)
- s->cpu_partial = 2;
+ s->max_cpu_object = 2;
else if (s->size >= 1024)
- s->cpu_partial = 6;
+ s->max_cpu_object = 6;
else if (s->size >= 256)
- s->cpu_partial = 13;
+ s->max_cpu_object = 13;
else
- s->cpu_partial = 30;
+ s->max_cpu_object = 30;
s->refcount = 1;
#ifdef CONFIG_NUMA
@@ -4677,12 +4677,12 @@ static ssize_t min_partial_store(struct kmem_cache *s, const char *buf,
}
SLAB_ATTR(min_partial);
-static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
+static ssize_t max_cpu_object_show(struct kmem_cache *s, char *buf)
{
- return sprintf(buf, "%u\n", s->cpu_partial);
+ return sprintf(buf, "%u\n", s->max_cpu_object);
}
-static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
+static ssize_t max_cpu_object_store(struct kmem_cache *s, const char *buf,
size_t length)
{
unsigned long objects;
@@ -4694,11 +4694,11 @@ static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
if (objects && kmem_cache_debug(s))
return -EINVAL;
- s->cpu_partial = objects;
+ s->max_cpu_object = objects;
flush_all(s);
return length;
}
-SLAB_ATTR(cpu_partial);
+SLAB_ATTR(max_cpu_object);
static ssize_t ctor_show(struct kmem_cache *s, char *buf)
{
@@ -5103,7 +5103,7 @@ static struct attribute *slab_attrs[] = {
&objs_per_slab_attr.attr,
&order_attr.attr,
&min_partial_attr.attr,
- &cpu_partial_attr.attr,
+ &max_cpu_object_attr.attr,
&objects_attr.attr,
&objects_partial_attr.attr,
&partial_attr.attr,
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
@ 2012-08-24 16:05 ` Joonsoo Kim
2012-08-24 16:15 ` Christoph Lameter
2012-08-24 16:12 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Christoph Lameter
1 sibling, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
In get_partial_node(), we want to refill cpu slab and cpu partial slabs
until the number of object kept in the per cpu slab and cpu partial lists
of a processor is reached to max_cpu_object.
However, in current implementation, it is not achieved.
See following code in get_partial_node().
if (!object) {
c->page = page;
stat(s, ALLOC_FROM_PARTIAL);
object = t;
available = page->objects - page->inuse;
} else {
available = put_cpu_partial(s, page, 0);
stat(s, CPU_PARTIAL_NODE);
}
if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
break;
In case of !object (available = page->objects - page->inuse),
"available" means the number of objects in cpu slab.
In this time, we don't have any cpu partial slab, so "available" imply
the number of objects available to the cpu without locking.
This is what we want.
But, look at another "available" (available = put_cpu_partial(s, page, 0)).
This "available" doesn't include the number of objects in cpu slab.
It only include the number of objects in cpu partial slabs.
So, it doesn't imply the number of objects available to the cpu without locking.
This isn't what we want.
Therefore fix it to imply same meaning in both case
and rename "available" to "cpu_slab_objects" for readability.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
diff --git a/mm/slub.c b/mm/slub.c
index d597530..c96e0e4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
{
struct page *page, *page2;
void *object = NULL;
+ int cpu_slab_objects = 0, pobjects = 0;
/*
* Racy check. If we mistakenly see no partial slabs then we
@@ -1551,7 +1552,6 @@ static void *get_partial_node(struct kmem_cache *s,
spin_lock(&n->list_lock);
list_for_each_entry_safe(page, page2, &n->partial, lru) {
void *t = acquire_slab(s, n, page, object == NULL);
- int available;
if (!t)
break;
@@ -1560,12 +1560,13 @@ static void *get_partial_node(struct kmem_cache *s,
c->page = page;
stat(s, ALLOC_FROM_PARTIAL);
object = t;
- available = page->objects - page->inuse;
+ cpu_slab_objects = page->objects - page->inuse;
} else {
- available = put_cpu_partial(s, page, 0);
+ pobjects = put_cpu_partial(s, page, 0);
stat(s, CPU_PARTIAL_NODE);
}
- if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
+ if (kmem_cache_debug(s)
+ || cpu_slab_objects + pobjects > s->max_cpu_object / 2)
break;
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
2012-08-24 16:05 ` [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node Joonsoo Kim
@ 2012-08-24 16:12 ` Christoph Lameter
1 sibling, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:12 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm
On Sat, 25 Aug 2012, Joonsoo Kim wrote:
> cpu_partial of kmem_cache struct is a bit awkward.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
2012-08-24 16:05 ` [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node Joonsoo Kim
@ 2012-08-24 16:15 ` Christoph Lameter
2012-08-24 16:28 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:15 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm
On Sat, 25 Aug 2012, Joonsoo Kim wrote:
> index d597530..c96e0e4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
> {
> struct page *page, *page2;
> void *object = NULL;
> + int cpu_slab_objects = 0, pobjects = 0;
We really need be clear here.
One counter is for the numbe of objects in the per cpu slab and the other
for the objects in tbhe per cpu partial lists.
So I think the first name is ok. Second should be similar
cpu_partial_objects?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
2012-08-24 16:15 ` Christoph Lameter
@ 2012-08-24 16:28 ` JoonSoo Kim
2012-08-24 16:31 ` Christoph Lameter
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:28 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm
2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, Joonsoo Kim wrote:
>
>> index d597530..c96e0e4 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
>> {
>> struct page *page, *page2;
>> void *object = NULL;
>> + int cpu_slab_objects = 0, pobjects = 0;
>
> We really need be clear here.
>
> One counter is for the numbe of objects in the per cpu slab and the other
> for the objects in tbhe per cpu partial lists.
>
> So I think the first name is ok. Second should be similar
>
> cpu_partial_objects?
>
Okay! It looks good.
But, when using "cpu_partial_objects", I have a coding style problem.
if (kmem_cache_debug(s)
|| cpu_slab_objects + cpu_partial_objects
> s->max_cpu_object / 2)
Do you have any good idea?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
2012-08-24 16:28 ` JoonSoo Kim
@ 2012-08-24 16:31 ` Christoph Lameter
2012-08-24 16:40 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:31 UTC (permalink / raw)
To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm
On Sat, 25 Aug 2012, JoonSoo Kim wrote:
> But, when using "cpu_partial_objects", I have a coding style problem.
>
> if (kmem_cache_debug(s)
> || cpu_slab_objects + cpu_partial_objects
> > s->max_cpu_object / 2)
>
> Do you have any good idea?
Not sure what the problem is? The line wrap?
Reduce the tabs for the third line?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
2012-08-24 16:31 ` Christoph Lameter
@ 2012-08-24 16:40 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:40 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm
2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, JoonSoo Kim wrote:
>
>> But, when using "cpu_partial_objects", I have a coding style problem.
>>
>> if (kmem_cache_debug(s)
>> || cpu_slab_objects + cpu_partial_objects
>> > s->max_cpu_object / 2)
>>
>> Do you have any good idea?
>
> Not sure what the problem is? The line wrap?
Yes! The line wrap.
if (kmem_cache_debug(s)
|| cpu_slab_objects + cpu_partial_objects >
s->max_cpu_object / 2)
break;
Above example use 82 columns... The line wrapping problem.
if (kmem_cache_debug(s) ||
cpu_slab_objects + cpu_partial_objects > s->max_cpu_object / 2)
break;
This one use 79 columns, but somehow ugly
because second line start at same column of above line.
Is it okay?
if (kmem_cache_debug(s)
|| cpu_slab_objects + cpu_partial_objects
> s->max_cpu_object / 2)
break;
Is it the best?
It use 72 columns.
Let me know what is the best method for this situation.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab
[not found] <Yes>
` (3 preceding siblings ...)
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
@ 2012-08-25 14:11 ` Joonsoo Kim
2012-08-25 14:11 ` [PATCH 2/2] slab: fix starting index for finding another object Joonsoo Kim
2012-09-03 10:08 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
` (2 subsequent siblings)
7 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
To: Pekka Enberg
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman,
Christoph Lameter
Now, we just do ClearSlabPfmemalloc() for first page of slab
when we clear SlabPfmemalloc flag. It is a problem because we sometimes
test flag of page which is not first page of slab in __ac_put_obj().
So add code to do ClearSlabPfmemalloc for all pages of slab.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
---
This patch based on Pekka's slab/next tree
diff --git a/mm/slab.c b/mm/slab.c
index 3b4587b..45cf59a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -992,8 +992,11 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
*/
l3 = cachep->nodelists[numa_mem_id()];
if (!list_empty(&l3->slabs_free) && force_refill) {
- struct slab *slabp = virt_to_slab(objp);
- ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+ int i, nr_pages = (1 << cachep->gfporder);
+ struct page *page = virt_to_head_page(objp);
+
+ for (i = 0; i < nr_pages; i++)
+ ClearPageSlabPfmemalloc(page + i);
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 2/2] slab: fix starting index for finding another object
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
@ 2012-08-25 14:11 ` Joonsoo Kim
2012-09-03 10:08 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
1 sibling, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
To: Pekka Enberg
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman,
Christoph Lameter
In array cache, there is a object at index 0.
So fix it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
diff --git a/mm/slab.c b/mm/slab.c
index 45cf59a..eb74bf5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -976,7 +976,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
}
/* The caller cannot use PFMEMALLOC objects, find another one */
- for (i = 1; i < ac->avail; i++) {
+ for (i = 0; i < ac->avail; i++) {
/* If a !PFMEMALLOC object is found, swap them */
if (!is_obj_pfmemalloc(ac->entry[i])) {
objp = ac->entry[i];
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
2012-08-25 14:11 ` [PATCH 2/2] slab: fix starting index for finding another object Joonsoo Kim
@ 2012-09-03 10:08 ` Mel Gorman
1 sibling, 0 replies; 95+ messages in thread
From: Mel Gorman @ 2012-09-03 10:08 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm, Christoph Lameter
It took me a while to getting around to reviewing this due to attending
kernel summit. Sorry about that.
On Sat, Aug 25, 2012 at 11:11:10PM +0900, Joonsoo Kim wrote:
> Now, we just do ClearSlabPfmemalloc() for first page of slab
> when we clear SlabPfmemalloc flag. It is a problem because we sometimes
> test flag of page which is not first page of slab in __ac_put_obj().
>
Well spotted.
The impact is marginal as far as pfmemalloc protection is concerned. I do not
believe that any of the slabs that use high-order allocations are used in for
the swap-over-network paths. It would be unfortunate if that ever changed.
> So add code to do ClearSlabPfmemalloc for all pages of slab.
>
I would prefer if the pfmemalloc information was kept on the head page.
Would the following patch also address your concerns?
diff --git a/mm/slab.c b/mm/slab.c
index 811af03..d34a903 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
l3 = cachep->nodelists[numa_mem_id()];
if (!list_empty(&l3->slabs_free) && force_refill) {
struct slab *slabp = virt_to_slab(objp);
- ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+ ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
{
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
- struct page *page = virt_to_page(objp);
+ struct page *page = virt_to_head_page(objp);
if (PageSlabPfmemalloc(page))
set_obj_pfmemalloc(&objp);
}
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
[not found] <Yes>
` (4 preceding siblings ...)
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
@ 2012-10-20 15:48 ` Joonsoo Kim
2012-10-20 15:48 ` [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA Joonsoo Kim
2012-10-24 8:05 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Pekka Enberg
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
7 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
To: Pekka Enberg, Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
kmalloc() and kmalloc_node() is always inlined into generic code.
However, there is a mistake in implemention of the SLUB.
In kmalloc() and kmalloc_node() of the SLUB,
we try to compare kmalloc_caches[index] with NULL.
As it cannot be known at compile time,
this comparison is inserted into generic code invoking kmalloc*.
This may decrease system performance, so we should fix it.
Below is the result of "size vmlinux"
text size is decreased roughly 20KB
Before:
text data bss dec hex filename
10044177 1443168 5722112 17209457 1069871 vmlinux
After:
text data bss dec hex filename
10022627 1443136 5722112 17187875 1064423 vmlinux
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
---
With Christoph's patchset(common kmalloc caches:
'[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
this issue will be fixed.
As it takes some time, I send this patch for v3.7
This patch is based on v3.7-rc1
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..4c75f2b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
return kmalloc_large(size, flags);
if (!(flags & SLUB_DMA)) {
- struct kmem_cache *s = kmalloc_slab(size);
+ int index = kmalloc_index(size);
+ struct kmem_cache *s = kmalloc_caches[index];
- if (!s)
+ if (!index)
return ZERO_SIZE_PTR;
return kmem_cache_alloc_trace(s, flags, size);
@@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size) &&
size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
- struct kmem_cache *s = kmalloc_slab(size);
+ int index = kmalloc_index(size);
+ struct kmem_cache *s = kmalloc_caches[index];
- if (!s)
+ if (!index)
return ZERO_SIZE_PTR;
return kmem_cache_alloc_node_trace(s, flags, node, size);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
@ 2012-10-20 15:48 ` Joonsoo Kim
2012-10-22 14:31 ` Christoph Lameter
2012-10-24 8:05 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Pekka Enberg
1 sibling, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
To: Pekka Enberg, Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter
kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
This patch optimize this case,
so when @flags = __GFP_DMA, it will be inlined into generic code.
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 4c75f2b..4adf50b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -147,6 +147,7 @@ struct kmem_cache {
* 2^x bytes of allocations.
*/
extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
+extern struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
/*
* Sorry that the following has to be that ugly but some versions of GCC
@@ -266,19 +267,24 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
static __always_inline void *kmalloc(size_t size, gfp_t flags)
{
+ struct kmem_cache *s;
+ int index;
+
if (__builtin_constant_p(size)) {
if (size > SLUB_MAX_SIZE)
return kmalloc_large(size, flags);
- if (!(flags & SLUB_DMA)) {
- int index = kmalloc_index(size);
- struct kmem_cache *s = kmalloc_caches[index];
-
- if (!index)
- return ZERO_SIZE_PTR;
+ index = kmalloc_index(size);
+ if (!index)
+ return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+ if (unlikely(flags & SLUB_DMA)) {
+ s = kmalloc_dma_caches[index];
+ } else
+#endif
+ s = kmalloc_caches[index];
- return kmem_cache_alloc_trace(s, flags, size);
- }
+ return kmem_cache_alloc_trace(s, flags, size);
}
return __kmalloc(size, flags);
}
@@ -303,13 +309,19 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
{
- if (__builtin_constant_p(size) &&
- size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
- int index = kmalloc_index(size);
- struct kmem_cache *s = kmalloc_caches[index];
+ struct kmem_cache *s;
+ int index;
+ if (__builtin_constant_p(size) && size <= SLUB_MAX_SIZE) {
+ index = kmalloc_index(size);
if (!index)
return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+ if (unlikely(flags & SLUB_DMA)) {
+ s = kmalloc_dma_caches[index];
+ } else
+#endif
+ s = kmalloc_caches[index];
return kmem_cache_alloc_node_trace(s, flags, node, size);
}
diff --git a/mm/slub.c b/mm/slub.c
index a0d6984..a94533c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3222,7 +3222,8 @@ struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
EXPORT_SYMBOL(kmalloc_caches);
#ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+EXPORT_SYMBOL(kmalloc_dma_caches);
#endif
static int __init setup_slub_min_order(char *str)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
2012-10-20 15:48 ` [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA Joonsoo Kim
@ 2012-10-22 14:31 ` Christoph Lameter
2012-10-23 2:29 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Christoph Lameter @ 2012-10-22 14:31 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm
On Sun, 21 Oct 2012, Joonsoo Kim wrote:
> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> This patch optimize this case,
> so when @flags = __GFP_DMA, it will be inlined into generic code.
__GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
not considered that it is worth to directly support it in the inlining
code.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
2012-10-22 14:31 ` Christoph Lameter
@ 2012-10-23 2:29 ` JoonSoo Kim
2012-10-23 6:16 ` Eric Dumazet
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-23 2:29 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm
2012/10/22 Christoph Lameter <cl@linux.com>:
> On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>
>> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> This patch optimize this case,
>> so when @flags = __GFP_DMA, it will be inlined into generic code.
>
> __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> not considered that it is worth to directly support it in the inlining
> code.
>
>
Hmm... but, the SLAB already did that optimization for __GFP_DMA.
Almost every kmalloc() is invoked with constant flags value,
so I think that overhead from this patch may be negligible.
With this patch, code size of vmlinux is reduced slightly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
2012-10-23 2:29 ` JoonSoo Kim
@ 2012-10-23 6:16 ` Eric Dumazet
2012-10-23 16:12 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Eric Dumazet @ 2012-10-23 6:16 UTC (permalink / raw)
To: JoonSoo Kim
Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel,
linux-mm
On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
> 2012/10/22 Christoph Lameter <cl@linux.com>:
> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
> >
> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> >> This patch optimize this case,
> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
> >
> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> > not considered that it is worth to directly support it in the inlining
> > code.
> >
> >
>
> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
> Almost every kmalloc() is invoked with constant flags value,
> so I think that overhead from this patch may be negligible.
> With this patch, code size of vmlinux is reduced slightly.
Only because you asked a allyesconfig
GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
hardware (from last century)
In fact if you want to reduce even more your vmlinux, you could test
if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
return kmem_cache_alloc_trace(s, flags, size);
to force the call to out of line code.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
2012-10-23 6:16 ` Eric Dumazet
@ 2012-10-23 16:12 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-23 16:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel,
linux-mm
Hi, Eric.
2012/10/23 Eric Dumazet <eric.dumazet@gmail.com>:
> On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
>> 2012/10/22 Christoph Lameter <cl@linux.com>:
>> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>> >
>> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> >> This patch optimize this case,
>> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
>> >
>> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
>> > not considered that it is worth to directly support it in the inlining
>> > code.
>> >
>> >
>>
>> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
>> Almost every kmalloc() is invoked with constant flags value,
>> so I think that overhead from this patch may be negligible.
>> With this patch, code size of vmlinux is reduced slightly.
>
> Only because you asked a allyesconfig
>
> GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
> hardware (from last century)
I'm not doing with allyesconfig,
but localmodconfig on my ubuntu desktop system.
On my system, 700 bytes of text of vmlinux is reduced
which mean there may be more than 100 callsite with GFP_DMA.
> In fact if you want to reduce even more your vmlinux, you could test
>
> if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
> return kmem_cache_alloc_trace(s, flags, size);
>
> to force the call to out of line code.
The reason why I mention about code size is that I want to say it may
be good for performance,
although it has a just small impact.
I'm not interest of reducing code size :)
Thanks for comment.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
2012-10-20 15:48 ` [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA Joonsoo Kim
@ 2012-10-24 8:05 ` Pekka Enberg
2012-10-24 13:36 ` Christoph Lameter
1 sibling, 1 reply; 95+ messages in thread
From: Pekka Enberg @ 2012-10-24 8:05 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Christoph Lameter
On Sat, Oct 20, 2012 at 6:48 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> kmalloc() and kmalloc_node() is always inlined into generic code.
> However, there is a mistake in implemention of the SLUB.
>
> In kmalloc() and kmalloc_node() of the SLUB,
> we try to compare kmalloc_caches[index] with NULL.
> As it cannot be known at compile time,
> this comparison is inserted into generic code invoking kmalloc*.
> This may decrease system performance, so we should fix it.
>
> Below is the result of "size vmlinux"
> text size is decreased roughly 20KB
>
> Before:
> text data bss dec hex filename
> 10044177 1443168 5722112 17209457 1069871 vmlinux
> After:
> text data bss dec hex filename
> 10022627 1443136 5722112 17187875 1064423 vmlinux
>
> Cc: Christoph Lameter <cl@linux.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> ---
> With Christoph's patchset(common kmalloc caches:
> '[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
> this issue will be fixed.
> As it takes some time, I send this patch for v3.7
>
> This patch is based on v3.7-rc1
Looks reasonable to me. Christoph, any objections?
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index df448ad..4c75f2b 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
> return kmalloc_large(size, flags);
>
> if (!(flags & SLUB_DMA)) {
> - struct kmem_cache *s = kmalloc_slab(size);
> + int index = kmalloc_index(size);
> + struct kmem_cache *s = kmalloc_caches[index];
>
> - if (!s)
> + if (!index)
> return ZERO_SIZE_PTR;
>
> return kmem_cache_alloc_trace(s, flags, size);
> @@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
> {
> if (__builtin_constant_p(size) &&
> size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
> - struct kmem_cache *s = kmalloc_slab(size);
> + int index = kmalloc_index(size);
> + struct kmem_cache *s = kmalloc_caches[index];
>
> - if (!s)
> + if (!index)
> return ZERO_SIZE_PTR;
>
> return kmem_cache_alloc_node_trace(s, flags, node, size);
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
2012-10-24 8:05 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Pekka Enberg
@ 2012-10-24 13:36 ` Christoph Lameter
0 siblings, 0 replies; 95+ messages in thread
From: Christoph Lameter @ 2012-10-24 13:36 UTC (permalink / raw)
To: Pekka Enberg; +Cc: Joonsoo Kim, Andrew Morton, linux-kernel, linux-mm
On Wed, 24 Oct 2012, Pekka Enberg wrote:
> Looks reasonable to me. Christoph, any objections?
I am fine with it. Its going to be short lived because my latest patchset
will do the same. Can we merge this for 3.7?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 0/5] minor clean-up and optimize highmem related code
[not found] <Yes>
` (5 preceding siblings ...)
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-28 19:12 ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
` (5 more replies)
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
7 siblings, 6 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
This patchset clean-up and optimize highmem related code.
[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().
Joonsoo Kim (5):
mm, highmem: use PKMAP_NR() to calculate an index of pkmap
mm, highmem: remove useless pool_lock
mm, highmem: remove page_address_pool list
mm, highmem: makes flush_all_zero_pkmaps() return index of last
flushed entry
mm, highmem: get virtual address of the page using PKMAP_ADDR()
include/linux/highmem.h | 1 +
mm/highmem.c | 102 ++++++++++++++++++++---------------------------
2 files changed, 45 insertions(+), 58 deletions(-)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-29 1:48 ` Minchan Kim
2012-10-28 19:12 ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
` (4 subsequent siblings)
5 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman
To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
unsigned long addr = (unsigned long)vaddr;
if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
- int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+ int i = PKMAP_NR(addr);
return pte_page(pkmap_page_table[i]);
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 2/5] mm, highmem: remove useless pool_lock
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
2012-10-28 19:12 ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-29 1:52 ` Minchan Kim
2012-10-30 21:31 ` Andrew Morton
2012-10-28 19:12 ` [PATCH 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
` (3 subsequent siblings)
5 siblings, 2 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
* page_address_map freelist, allocated from page_address_maps.
*/
static struct list_head page_address_pool; /* freelist */
-static spinlock_t pool_lock; /* protects page_address_pool */
/*
* Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
if (virtual) { /* Add */
BUG_ON(list_empty(&page_address_pool));
- spin_lock_irqsave(&pool_lock, flags);
pam = list_entry(page_address_pool.next,
struct page_address_map, list);
list_del(&pam->list);
- spin_unlock_irqrestore(&pool_lock, flags);
pam->page = page;
pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
if (pam->page == page) {
list_del(&pam->list);
spin_unlock_irqrestore(&pas->lock, flags);
- spin_lock_irqsave(&pool_lock, flags);
list_add_tail(&pam->list, &page_address_pool);
- spin_unlock_irqrestore(&pool_lock, flags);
goto done;
}
}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
INIT_LIST_HEAD(&page_address_htable[i].lh);
spin_lock_init(&page_address_htable[i].lock);
}
- spin_lock_init(&pool_lock);
}
#endif /* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 3/5] mm, highmem: remove page_address_pool list
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
2012-10-28 19:12 ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-28 19:12 ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-29 1:57 ` Minchan Kim
2012-10-28 19:12 ` [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry Joonsoo Kim
` (2 subsequent siblings)
5 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
We can find free page_address_map instance without the page_address_pool.
So remove it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..731cf9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -323,11 +323,7 @@ struct page_address_map {
void *virtual;
struct list_head list;
};
-
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool; /* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
/*
* Hash table bucket
@@ -392,12 +388,7 @@ void set_page_address(struct page *page, void *virtual)
pas = page_slot(page);
if (virtual) { /* Add */
- BUG_ON(list_empty(&page_address_pool));
-
- pam = list_entry(page_address_pool.next,
- struct page_address_map, list);
- list_del(&pam->list);
-
+ pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
pam->page = page;
pam->virtual = virtual;
@@ -410,7 +401,6 @@ void set_page_address(struct page *page, void *virtual)
if (pam->page == page) {
list_del(&pam->list);
spin_unlock_irqrestore(&pas->lock, flags);
- list_add_tail(&pam->list, &page_address_pool);
goto done;
}
}
@@ -420,15 +410,10 @@ done:
return;
}
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
void __init page_address_init(void)
{
int i;
- INIT_LIST_HEAD(&page_address_pool);
- for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
- list_add(&page_address_maps[i].list, &page_address_pool);
for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
INIT_LIST_HEAD(&page_address_htable[i].lh);
spin_lock_init(&page_address_htable[i].lock);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
` (2 preceding siblings ...)
2012-10-28 19:12 ` [PATCH 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-29 2:06 ` Minchan Kim
2012-10-28 19:12 ` [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
2012-10-29 2:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Minchan Kim
5 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of last flushed entry.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..0683869 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
#ifdef CONFIG_HIGHMEM
#include <asm/highmem.h>
+#define PKMAP_INDEX_INVAL (-1)
/* declarations for linux/mm/highmem.c */
unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index 731cf9a..65beb9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
return virt_to_page(addr);
}
-static void flush_all_zero_pkmaps(void)
+static int flush_all_zero_pkmaps(void)
{
int i;
- int need_flush = 0;
+ int index = PKMAP_INDEX_INVAL;
flush_cache_kmaps();
@@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
&pkmap_page_table[i]);
set_page_address(page, NULL);
- need_flush = 1;
+ index = i;
}
- if (need_flush)
+ if (index != PKMAP_INDEX_INVAL)
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+ return index;
}
/**
@@ -160,6 +162,7 @@ void kmap_flush_unused(void)
static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
+ int index = PKMAP_INDEX_INVAL;
int count;
start:
@@ -168,40 +171,45 @@ start:
for (;;) {
last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
if (!last_pkmap_nr) {
- flush_all_zero_pkmaps();
- count = LAST_PKMAP;
+ index = flush_all_zero_pkmaps();
+ if (index != PKMAP_INDEX_INVAL)
+ break; /* Found a usable entry */
}
- if (!pkmap_count[last_pkmap_nr])
+ if (!pkmap_count[last_pkmap_nr]) {
+ index = last_pkmap_nr;
break; /* Found a usable entry */
- if (--count)
- continue;
-
- /*
- * Sleep for somebody else to unmap their entries
- */
- {
- DECLARE_WAITQUEUE(wait, current);
-
- __set_current_state(TASK_UNINTERRUPTIBLE);
- add_wait_queue(&pkmap_map_wait, &wait);
- unlock_kmap();
- schedule();
- remove_wait_queue(&pkmap_map_wait, &wait);
- lock_kmap();
-
- /* Somebody else might have mapped it while we slept */
- if (page_address(page))
- return (unsigned long)page_address(page);
-
- /* Re-start */
- goto start;
}
+ if (--count == 0)
+ break;
}
- vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+ /*
+ * Sleep for somebody else to unmap their entries
+ */
+ if (index == PKMAP_INDEX_INVAL) {
+ DECLARE_WAITQUEUE(wait, current);
+
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ add_wait_queue(&pkmap_map_wait, &wait);
+ unlock_kmap();
+ schedule();
+ remove_wait_queue(&pkmap_map_wait, &wait);
+ lock_kmap();
+
+ /* Somebody else might have mapped it while we slept */
+ vaddr = (unsigned long)page_address(page);
+ if (vaddr)
+ return vaddr;
+
+ /* Re-start */
+ goto start;
+ }
+
+ vaddr = PKMAP_ADDR(index);
set_pte_at(&init_mm, vaddr,
- &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+ &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
- pkmap_count[last_pkmap_nr] = 1;
+ pkmap_count[index] = 1;
set_page_address(page, (void *)vaddr);
return vaddr;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
` (3 preceding siblings ...)
2012-10-28 19:12 ` [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry Joonsoo Kim
@ 2012-10-28 19:12 ` Joonsoo Kim
2012-10-29 2:09 ` Minchan Kim
2012-10-29 2:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Minchan Kim
5 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/mm/highmem.c b/mm/highmem.c
index 65beb9a..1417f4f 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static int flush_all_zero_pkmaps(void)
* So no dangers, even with speculative execution.
*/
page = pte_page(pkmap_page_table[i]);
- pte_clear(&init_mm, (unsigned long)page_address(page),
- &pkmap_page_table[i]);
+ pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
set_page_address(page, NULL);
index = i;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
2012-10-28 19:12 ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
@ 2012-10-29 1:48 ` Minchan Kim
0 siblings, 0 replies; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 1:48 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman
On Mon, Oct 29, 2012 at 04:12:52AM +0900, Joonsoo Kim wrote:
> To calculate an index of pkmap, using PKMAP_NR() is more understandable
> and maintainable, So change it.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
2012-10-28 19:12 ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
@ 2012-10-29 1:52 ` Minchan Kim
2012-10-30 21:31 ` Andrew Morton
1 sibling, 0 replies; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 1:52 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm
On Mon, Oct 29, 2012 at 04:12:53AM +0900, Joonsoo Kim wrote:
> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kin <minchan@kernel.org>
Looks good to me.
Just a nitpick.
Please write comment about locking rule like below.
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b3b3d68..017bad1 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -328,7 +328,6 @@ struct page_address_map {
> * page_address_map freelist, allocated from page_address_maps.
> */
/* page_address_pool is protected by kmap_lock */
> static struct list_head page_address_pool; /* freelist */
> -static spinlock_t pool_lock; /* protects page_address_pool */
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 3/5] mm, highmem: remove page_address_pool list
2012-10-28 19:12 ` [PATCH 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
@ 2012-10-29 1:57 ` Minchan Kim
0 siblings, 0 replies; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 1:57 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm
On Mon, Oct 29, 2012 at 04:12:54AM +0900, Joonsoo Kim wrote:
> We can find free page_address_map instance without the page_address_pool.
> So remove it.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
See below a nitpick. :)
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 017bad1..731cf9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -323,11 +323,7 @@ struct page_address_map {
> void *virtual;
> struct list_head list;
> };
> -
Let's leave a blank line.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
2012-10-28 19:12 ` [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry Joonsoo Kim
@ 2012-10-29 2:06 ` Minchan Kim
2012-10-29 13:12 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 2:06 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm
On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of last flushed entry.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..0683869 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>
> #ifdef CONFIG_HIGHMEM
> #include <asm/highmem.h>
> +#define PKMAP_INDEX_INVAL (-1)
How about this?
#define PKMAP_INVALID_INDEX (-1)
>
> /* declarations for linux/mm/highmem.c */
> unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 731cf9a..65beb9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> return virt_to_page(addr);
> }
>
> -static void flush_all_zero_pkmaps(void)
> +static int flush_all_zero_pkmaps(void)
> {
> int i;
> - int need_flush = 0;
> + int index = PKMAP_INDEX_INVAL;
>
> flush_cache_kmaps();
>
> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
> &pkmap_page_table[i]);
>
> set_page_address(page, NULL);
> - need_flush = 1;
> + index = i;
How about returning first free index instead of last one?
and update last_pkmap_nr to it.
> }
> - if (need_flush)
> + if (index != PKMAP_INDEX_INVAL)
> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> + return index;
> }
>
> /**
> @@ -160,6 +162,7 @@ void kmap_flush_unused(void)
> static inline unsigned long map_new_virtual(struct page *page)
> {
> unsigned long vaddr;
> + int index = PKMAP_INDEX_INVAL;
> int count;
>
> start:
> @@ -168,40 +171,45 @@ start:
> for (;;) {
> last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
> if (!last_pkmap_nr) {
> - flush_all_zero_pkmaps();
> - count = LAST_PKMAP;
> + index = flush_all_zero_pkmaps();
> + if (index != PKMAP_INDEX_INVAL)
> + break; /* Found a usable entry */
> }
> - if (!pkmap_count[last_pkmap_nr])
> + if (!pkmap_count[last_pkmap_nr]) {
> + index = last_pkmap_nr;
> break; /* Found a usable entry */
> - if (--count)
> - continue;
> -
> - /*
> - * Sleep for somebody else to unmap their entries
> - */
> - {
> - DECLARE_WAITQUEUE(wait, current);
> -
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> - add_wait_queue(&pkmap_map_wait, &wait);
> - unlock_kmap();
> - schedule();
> - remove_wait_queue(&pkmap_map_wait, &wait);
> - lock_kmap();
> -
> - /* Somebody else might have mapped it while we slept */
> - if (page_address(page))
> - return (unsigned long)page_address(page);
> -
> - /* Re-start */
> - goto start;
> }
> + if (--count == 0)
> + break;
> }
> - vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> + /*
> + * Sleep for somebody else to unmap their entries
> + */
> + if (index == PKMAP_INDEX_INVAL) {
> + DECLARE_WAITQUEUE(wait, current);
> +
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> + add_wait_queue(&pkmap_map_wait, &wait);
> + unlock_kmap();
> + schedule();
> + remove_wait_queue(&pkmap_map_wait, &wait);
> + lock_kmap();
> +
> + /* Somebody else might have mapped it while we slept */
> + vaddr = (unsigned long)page_address(page);
> + if (vaddr)
> + return vaddr;
> +
> + /* Re-start */
> + goto start;
> + }
> +
> + vaddr = PKMAP_ADDR(index);
> set_pte_at(&init_mm, vaddr,
> - &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> + &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>
> - pkmap_count[last_pkmap_nr] = 1;
> + pkmap_count[index] = 1;
> set_page_address(page, (void *)vaddr);
>
> return vaddr;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
2012-10-28 19:12 ` [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
@ 2012-10-29 2:09 ` Minchan Kim
0 siblings, 0 replies; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 2:09 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm
On Mon, Oct 29, 2012 at 04:12:56AM +0900, Joonsoo Kim wrote:
> In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
> Using this index, we can simply get virtual address of the page.
> So change it.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
` (4 preceding siblings ...)
2012-10-28 19:12 ` [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
@ 2012-10-29 2:12 ` Minchan Kim
2012-10-29 13:15 ` JoonSoo Kim
5 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-10-29 2:12 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra
Hi Joonsoo,
On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
> This patchset clean-up and optimize highmem related code.
>
> [1] is just clean-up and doesn't introduce any functional change.
> [2-3] are for clean-up and optimization.
> These eliminate an useless lock opearation and list management.
> [4-5] is for optimization related to flush_all_zero_pkmaps().
>
> Joonsoo Kim (5):
> mm, highmem: use PKMAP_NR() to calculate an index of pkmap
> mm, highmem: remove useless pool_lock
> mm, highmem: remove page_address_pool list
> mm, highmem: makes flush_all_zero_pkmaps() return index of last
> flushed entry
> mm, highmem: get virtual address of the page using PKMAP_ADDR()
This patchset looks awesome to me.
If you have a plan to respin, please CCed Peter.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
2012-10-29 2:06 ` Minchan Kim
@ 2012-10-29 13:12 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:12 UTC (permalink / raw)
To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm
2012/10/29 Minchan Kim <minchan@kernel.org>:
> On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of last flushed entry.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..0683869 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>> #ifdef CONFIG_HIGHMEM
>> #include <asm/highmem.h>
>> +#define PKMAP_INDEX_INVAL (-1)
>
> How about this?
>
> #define PKMAP_INVALID_INDEX (-1)
Okay.
>>
>> /* declarations for linux/mm/highmem.c */
>> unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index 731cf9a..65beb9a 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> return virt_to_page(addr);
>> }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static int flush_all_zero_pkmaps(void)
>> {
>> int i;
>> - int need_flush = 0;
>> + int index = PKMAP_INDEX_INVAL;
>>
>> flush_cache_kmaps();
>>
>> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
>> &pkmap_page_table[i]);
>>
>> set_page_address(page, NULL);
>> - need_flush = 1;
>> + index = i;
>
> How about returning first free index instead of last one?
> and update last_pkmap_nr to it.
Okay. It will be more good.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
2012-10-29 2:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Minchan Kim
@ 2012-10-29 13:15 ` JoonSoo Kim
2012-10-31 17:11 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:15 UTC (permalink / raw)
To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra
Hi, Minchan.
2012/10/29 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>> This patchset clean-up and optimize highmem related code.
>>
>> [1] is just clean-up and doesn't introduce any functional change.
>> [2-3] are for clean-up and optimization.
>> These eliminate an useless lock opearation and list management.
>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>
>> Joonsoo Kim (5):
>> mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>> mm, highmem: remove useless pool_lock
>> mm, highmem: remove page_address_pool list
>> mm, highmem: makes flush_all_zero_pkmaps() return index of last
>> flushed entry
>> mm, highmem: get virtual address of the page using PKMAP_ADDR()
>
> This patchset looks awesome to me.
> If you have a plan to respin, please CCed Peter.
Thanks for review.
I will wait more review and respin, the day after tomorrow.
Version 2 will include fix about your comment.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
2012-10-28 19:12 ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
2012-10-29 1:52 ` Minchan Kim
@ 2012-10-30 21:31 ` Andrew Morton
2012-10-31 5:14 ` Minchan Kim
2012-10-31 15:01 ` JoonSoo Kim
1 sibling, 2 replies; 95+ messages in thread
From: Andrew Morton @ 2012-10-30 21:31 UTC (permalink / raw)
To: Joonsoo Kim; +Cc: linux-kernel, linux-mm
On Mon, 29 Oct 2012 04:12:53 +0900
Joonsoo Kim <js1304@gmail.com> wrote:
> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.
Well, there's a set_page_address() call in mm/page_alloc.c which
doesn't have lock_kmap(). it doesn't *need* lock_kmap() because it's
init-time code and we're running single-threaded there. I hope!
But this exception should be double-checked and mentioned in the
changelog, please. And it's a reason why we can't add
assert_spin_locked(&kmap_lock) to set_page_address(), which is
unfortunate.
The irq-disabling in this code is odd. If ARCH_NEEDS_KMAP_HIGH_GET=n,
we didn't need irq-safe locking in set_page_address(). I guess we'll
need to retain it in page_address() - I expect some callers have IRQs
disabled.
ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing. It's ARM:
/*
* The reason for kmap_high_get() is to ensure that the currently kmap'd
* page usage count does not decrease to zero while we're using its
* existing virtual mapping in an atomic context. With a VIVT cache this
* is essential to do, but with a VIPT cache this is only an optimization
* so not to pay the price of establishing a second mapping if an existing
* one can be used. However, on platforms without hardware TLB maintenance
* broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
* the locking involved must also disable IRQs which is incompatible with
* the IPI mechanism used by global TLB operations.
*/
#define ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
#undef ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
#error "The sum of features in your kernel config cannot be supported together"
#endif
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
2012-10-30 21:31 ` Andrew Morton
@ 2012-10-31 5:14 ` Minchan Kim
2012-10-31 15:01 ` JoonSoo Kim
1 sibling, 0 replies; 95+ messages in thread
From: Minchan Kim @ 2012-10-31 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: Joonsoo Kim, linux-kernel, linux-mm
Hi Andrew,
On Tue, Oct 30, 2012 at 02:31:07PM -0700, Andrew Morton wrote:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
> > The pool_lock protects the page_address_pool from concurrent access.
> > But, access to the page_address_pool is already protected by kmap_lock.
> > So remove it.
>
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap(). it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there. I hope!
>
> But this exception should be double-checked and mentioned in the
> changelog, please. And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.
>
The exception is vaild only in m68k and sparc and they will use not
set_page_address of highmem.c but page->virtual. So I think we can add
such lock check in set_page_address in highmem.c.
But I'm not sure we really need it because set_page_address is used in
few places so isn't it enough adding a just wording to avoid unnecessary
overhead?
/* NOTE : Caller should hold kmap_lock by lock_kmap() */
>
> The irq-disabling in this code is odd. If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address(). I guess we'll
What lock you mean in set_page_address?
We have two locks in there, pool_lock and pas->lock.
By this patchset, we don't need pool_lock any more.
Remained thing is pas->lock.
If we make the lock irq-unsafe, it would be deadlock with page_addresss
if it is called in irq context. Currenntly, page_address is used
lots of places and not sure it's called only process context.
Was there any rule that we have to use page_addresss in only
process context?
> need to retain it in page_address() - I expect some callers have IRQs
> disabled.
>
>
> ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing. It's ARM:
>
> /*
> * The reason for kmap_high_get() is to ensure that the currently kmap'd
> * page usage count does not decrease to zero while we're using its
> * existing virtual mapping in an atomic context. With a VIVT cache this
> * is essential to do, but with a VIPT cache this is only an optimization
> * so not to pay the price of establishing a second mapping if an existing
> * one can be used. However, on platforms without hardware TLB maintenance
> * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
> * the locking involved must also disable IRQs which is incompatible with
> * the IPI mechanism used by global TLB operations.
> */
> #define ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
> #undef ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
> #error "The sum of features in your kernel config cannot be supported together"
> #endif
> #endif
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
2012-10-30 21:31 ` Andrew Morton
2012-10-31 5:14 ` Minchan Kim
@ 2012-10-31 15:01 ` JoonSoo Kim
1 sibling, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-31 15:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
Hello, Andrew.
2012/10/31 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> The pool_lock protects the page_address_pool from concurrent access.
>> But, access to the page_address_pool is already protected by kmap_lock.
>> So remove it.
>
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap(). it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there. I hope!
>
> But this exception should be double-checked and mentioned in the
> changelog, please. And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.
set_page_address() in mm/page_alloc.c is invoked only when
WANT_PAGE_VIRTUAL is defined.
And in this case, set_page_address()'s definition is not in highmem.c,
but in include/linux/mm.h.
So, we don't need to worry about set_page_address() call in mm/page_alloc.c
> The irq-disabling in this code is odd. If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address(). I guess we'll
> need to retain it in page_address() - I expect some callers have IRQs
> disabled.
As Minchan described, if we don't disable irq when we take a lock for pas->lock,
it would be deadlock with page_address().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH v2 0/5] minor clean-up and optimize highmem related code
[not found] <Yes>
` (6 preceding siblings ...)
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
` (4 more replies)
7 siblings, 5 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim
This patchset clean-up and optimize highmem related code.
Change from v1
Rebase on v3.7-rc3
[4] Instead of returning index of last flushed entry, return first index.
And update last_pkmap_nr to this index to optimize more.
Summary for v1
[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().
Joonsoo Kim (5):
mm, highmem: use PKMAP_NR() to calculate an index of pkmap
mm, highmem: remove useless pool_lock
mm, highmem: remove page_address_pool list
mm, highmem: makes flush_all_zero_pkmaps() return index of first
flushed entry
mm, highmem: get virtual address of the page using PKMAP_ADDR()
include/linux/highmem.h | 1 +
mm/highmem.c | 108 ++++++++++++++++++++++-------------------------
2 files changed, 51 insertions(+), 58 deletions(-)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
` (3 subsequent siblings)
4 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra
To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
unsigned long addr = (unsigned long)vaddr;
if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
- int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+ int i = PKMAP_NR(addr);
return pte_page(pkmap_page_table[i]);
}
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH v2 2/5] mm, highmem: remove useless pool_lock
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
` (2 subsequent siblings)
4 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra
The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
* page_address_map freelist, allocated from page_address_maps.
*/
static struct list_head page_address_pool; /* freelist */
-static spinlock_t pool_lock; /* protects page_address_pool */
/*
* Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
if (virtual) { /* Add */
BUG_ON(list_empty(&page_address_pool));
- spin_lock_irqsave(&pool_lock, flags);
pam = list_entry(page_address_pool.next,
struct page_address_map, list);
list_del(&pam->list);
- spin_unlock_irqrestore(&pool_lock, flags);
pam->page = page;
pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
if (pam->page == page) {
list_del(&pam->list);
spin_unlock_irqrestore(&pas->lock, flags);
- spin_lock_irqsave(&pool_lock, flags);
list_add_tail(&pam->list, &page_address_pool);
- spin_unlock_irqrestore(&pool_lock, flags);
goto done;
}
}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
INIT_LIST_HEAD(&page_address_htable[i].lh);
spin_lock_init(&page_address_htable[i].lock);
}
- spin_lock_init(&pool_lock);
}
#endif /* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH v2 3/5] mm, highmem: remove page_address_pool list
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
4 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra
We can find free page_address_map instance without the page_address_pool.
So remove it.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..d98b0a9 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -324,10 +324,7 @@ struct page_address_map {
struct list_head list;
};
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool; /* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
/*
* Hash table bucket
@@ -392,12 +389,7 @@ void set_page_address(struct page *page, void *virtual)
pas = page_slot(page);
if (virtual) { /* Add */
- BUG_ON(list_empty(&page_address_pool));
-
- pam = list_entry(page_address_pool.next,
- struct page_address_map, list);
- list_del(&pam->list);
-
+ pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
pam->page = page;
pam->virtual = virtual;
@@ -410,7 +402,6 @@ void set_page_address(struct page *page, void *virtual)
if (pam->page == page) {
list_del(&pam->list);
spin_unlock_irqrestore(&pas->lock, flags);
- list_add_tail(&pam->list, &page_address_pool);
goto done;
}
}
@@ -420,15 +411,10 @@ done:
return;
}
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
void __init page_address_init(void)
{
int i;
- INIT_LIST_HEAD(&page_address_pool);
- for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
- list_add(&page_address_maps[i].list, &page_address_pool);
for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
INIT_LIST_HEAD(&page_address_htable[i].lh);
spin_lock_init(&page_address_htable[i].lock);
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
` (2 preceding siblings ...)
2012-10-31 16:56 ` [PATCH v2 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
2012-11-01 5:03 ` Minchan Kim
2012-10-31 16:56 ` [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
4 siblings, 1 reply; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra,
Minchan Kim
In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of first flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of first flushed entry.
Additionally, update last_pkmap_nr to this index.
It is certain that entry which is below this index is occupied by other mapping,
therefore updating last_pkmap_nr to this index is reasonable optimization.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..97ad208 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
#ifdef CONFIG_HIGHMEM
#include <asm/highmem.h>
+#define PKMAP_INVALID_INDEX (LAST_PKMAP)
/* declarations for linux/mm/highmem.c */
unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index d98b0a9..b365f7b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
return virt_to_page(addr);
}
-static void flush_all_zero_pkmaps(void)
+static unsigned int flush_all_zero_pkmaps(void)
{
int i;
- int need_flush = 0;
+ unsigned int index = PKMAP_INVALID_INDEX;
flush_cache_kmaps();
@@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
&pkmap_page_table[i]);
set_page_address(page, NULL);
- need_flush = 1;
+ if (index == PKMAP_INVALID_INDEX)
+ index = i;
}
- if (need_flush)
+ if (index != PKMAP_INVALID_INDEX)
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+ return index;
}
/**
@@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
*/
void kmap_flush_unused(void)
{
+ unsigned int index;
+
lock_kmap();
- flush_all_zero_pkmaps();
+ index = flush_all_zero_pkmaps();
+ if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
+ last_pkmap_nr = index;
unlock_kmap();
}
static inline unsigned long map_new_virtual(struct page *page)
{
unsigned long vaddr;
+ unsigned int index = PKMAP_INVALID_INDEX;
int count;
start:
@@ -168,40 +176,45 @@ start:
for (;;) {
last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
if (!last_pkmap_nr) {
- flush_all_zero_pkmaps();
- count = LAST_PKMAP;
+ index = flush_all_zero_pkmaps();
+ break;
}
- if (!pkmap_count[last_pkmap_nr])
+ if (!pkmap_count[last_pkmap_nr]) {
+ index = last_pkmap_nr;
break; /* Found a usable entry */
- if (--count)
- continue;
-
- /*
- * Sleep for somebody else to unmap their entries
- */
- {
- DECLARE_WAITQUEUE(wait, current);
-
- __set_current_state(TASK_UNINTERRUPTIBLE);
- add_wait_queue(&pkmap_map_wait, &wait);
- unlock_kmap();
- schedule();
- remove_wait_queue(&pkmap_map_wait, &wait);
- lock_kmap();
-
- /* Somebody else might have mapped it while we slept */
- if (page_address(page))
- return (unsigned long)page_address(page);
-
- /* Re-start */
- goto start;
}
+ if (--count == 0)
+ break;
}
- vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+ /*
+ * Sleep for somebody else to unmap their entries
+ */
+ if (index == PKMAP_INVALID_INDEX) {
+ DECLARE_WAITQUEUE(wait, current);
+
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ add_wait_queue(&pkmap_map_wait, &wait);
+ unlock_kmap();
+ schedule();
+ remove_wait_queue(&pkmap_map_wait, &wait);
+ lock_kmap();
+
+ /* Somebody else might have mapped it while we slept */
+ vaddr = (unsigned long)page_address(page);
+ if (vaddr)
+ return vaddr;
+
+ /* Re-start */
+ goto start;
+ }
+
+ vaddr = PKMAP_ADDR(index);
set_pte_at(&init_mm, vaddr,
- &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+ &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
- pkmap_count[last_pkmap_nr] = 1;
+ pkmap_count[index] = 1;
+ last_pkmap_nr = index;
set_page_address(page, (void *)vaddr);
return vaddr;
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
` (3 preceding siblings ...)
2012-10-31 16:56 ` [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry Joonsoo Kim
@ 2012-10-31 16:56 ` Joonsoo Kim
4 siblings, 0 replies; 95+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra
In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
diff --git a/mm/highmem.c b/mm/highmem.c
index b365f7b..675ec97 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static unsigned int flush_all_zero_pkmaps(void)
* So no dangers, even with speculative execution.
*/
page = pte_page(pkmap_page_table[i]);
- pte_clear(&init_mm, (unsigned long)page_address(page),
- &pkmap_page_table[i]);
+ pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
set_page_address(page, NULL);
if (index == PKMAP_INVALID_INDEX)
--
1.7.9.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
2012-10-29 13:15 ` JoonSoo Kim
@ 2012-10-31 17:11 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-10-31 17:11 UTC (permalink / raw)
To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra
Hello, Andrew.
2012/10/29 JoonSoo Kim <js1304@gmail.com>:
> Hi, Minchan.
>
> 2012/10/29 Minchan Kim <minchan@kernel.org>:
>> Hi Joonsoo,
>>
>> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>>> This patchset clean-up and optimize highmem related code.
>>>
>>> [1] is just clean-up and doesn't introduce any functional change.
>>> [2-3] are for clean-up and optimization.
>>> These eliminate an useless lock opearation and list management.
>>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>>
>>> Joonsoo Kim (5):
>>> mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>>> mm, highmem: remove useless pool_lock
>>> mm, highmem: remove page_address_pool list
>>> mm, highmem: makes flush_all_zero_pkmaps() return index of last
>>> flushed entry
>>> mm, highmem: get virtual address of the page using PKMAP_ADDR()
>>
>> This patchset looks awesome to me.
>> If you have a plan to respin, please CCed Peter.
>
> Thanks for review.
> I will wait more review and respin, the day after tomorrow.
> Version 2 will include fix about your comment.
Could you pick up second version of this patchset?
[3] is changed to leave one blank line.
[4] is changed in order to further optimize according to Minchan's comment.
It return first index of flushed entry, instead of last index.
Others doesn't changed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-10-31 16:56 ` [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry Joonsoo Kim
@ 2012-11-01 5:03 ` Minchan Kim
2012-11-02 19:07 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-11-01 5:03 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of first flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of first flushed entry.
>
> Additionally, update last_pkmap_nr to this index.
> It is certain that entry which is below this index is occupied by other mapping,
> therefore updating last_pkmap_nr to this index is reasonable optimization.
>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..97ad208 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>
> #ifdef CONFIG_HIGHMEM
> #include <asm/highmem.h>
> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>
> /* declarations for linux/mm/highmem.c */
> unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index d98b0a9..b365f7b 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> return virt_to_page(addr);
> }
>
> -static void flush_all_zero_pkmaps(void)
> +static unsigned int flush_all_zero_pkmaps(void)
> {
> int i;
> - int need_flush = 0;
> + unsigned int index = PKMAP_INVALID_INDEX;
>
> flush_cache_kmaps();
>
> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> &pkmap_page_table[i]);
>
> set_page_address(page, NULL);
> - need_flush = 1;
> + if (index == PKMAP_INVALID_INDEX)
> + index = i;
> }
> - if (need_flush)
> + if (index != PKMAP_INVALID_INDEX)
> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> + return index;
> }
>
> /**
> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> */
> void kmap_flush_unused(void)
> {
> + unsigned int index;
> +
> lock_kmap();
> - flush_all_zero_pkmaps();
> + index = flush_all_zero_pkmaps();
> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> + last_pkmap_nr = index;
I don't know how kmap_flush_unused is really fast path so how my nitpick
is effective. Anyway,
What problem happens if we do following as?
lock()
index = flush_all_zero_pkmaps();
if (index != PKMAP_INVALID_INDEX)
last_pkmap_nr = index;
unlock();
Normally, last_pkmap_nr is increased with searching empty slot in
map_new_virtual. So I expect return value of flush_all_zero_pkmaps
in kmap_flush_unused normally become either less than last_pkmap_nr
or last_pkmap_nr + 1.
> unlock_kmap();
> }
>
> static inline unsigned long map_new_virtual(struct page *page)
> {
> unsigned long vaddr;
> + unsigned int index = PKMAP_INVALID_INDEX;
> int count;
>
> start:
> @@ -168,40 +176,45 @@ start:
> for (;;) {
> last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
> if (!last_pkmap_nr) {
> - flush_all_zero_pkmaps();
> - count = LAST_PKMAP;
> + index = flush_all_zero_pkmaps();
> + break;
> }
> - if (!pkmap_count[last_pkmap_nr])
> + if (!pkmap_count[last_pkmap_nr]) {
> + index = last_pkmap_nr;
> break; /* Found a usable entry */
> - if (--count)
> - continue;
> -
> - /*
> - * Sleep for somebody else to unmap their entries
> - */
> - {
> - DECLARE_WAITQUEUE(wait, current);
> -
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> - add_wait_queue(&pkmap_map_wait, &wait);
> - unlock_kmap();
> - schedule();
> - remove_wait_queue(&pkmap_map_wait, &wait);
> - lock_kmap();
> -
> - /* Somebody else might have mapped it while we slept */
> - if (page_address(page))
> - return (unsigned long)page_address(page);
> -
> - /* Re-start */
> - goto start;
> }
> + if (--count == 0)
> + break;
> }
> - vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> + /*
> + * Sleep for somebody else to unmap their entries
> + */
> + if (index == PKMAP_INVALID_INDEX) {
> + DECLARE_WAITQUEUE(wait, current);
> +
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> + add_wait_queue(&pkmap_map_wait, &wait);
> + unlock_kmap();
> + schedule();
> + remove_wait_queue(&pkmap_map_wait, &wait);
> + lock_kmap();
> +
> + /* Somebody else might have mapped it while we slept */
> + vaddr = (unsigned long)page_address(page);
> + if (vaddr)
> + return vaddr;
> +
> + /* Re-start */
> + goto start;
> + }
> +
> + vaddr = PKMAP_ADDR(index);
> set_pte_at(&init_mm, vaddr,
> - &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> + &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>
> - pkmap_count[last_pkmap_nr] = 1;
> + pkmap_count[index] = 1;
> + last_pkmap_nr = index;
> set_page_address(page, (void *)vaddr);
>
> return vaddr;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-01 5:03 ` Minchan Kim
@ 2012-11-02 19:07 ` JoonSoo Kim
2012-11-02 22:42 ` Minchan Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-11-02 19:07 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
Hello, Minchan.
2012/11/1 Minchan Kim <minchan@kernel.org>:
> On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of first flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of first flushed entry.
>>
>> Additionally, update last_pkmap_nr to this index.
>> It is certain that entry which is below this index is occupied by other mapping,
>> therefore updating last_pkmap_nr to this index is reasonable optimization.
>>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..97ad208 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>> #ifdef CONFIG_HIGHMEM
>> #include <asm/highmem.h>
>> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>>
>> /* declarations for linux/mm/highmem.c */
>> unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index d98b0a9..b365f7b 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> return virt_to_page(addr);
>> }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static unsigned int flush_all_zero_pkmaps(void)
>> {
>> int i;
>> - int need_flush = 0;
>> + unsigned int index = PKMAP_INVALID_INDEX;
>>
>> flush_cache_kmaps();
>>
>> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> &pkmap_page_table[i]);
>>
>> set_page_address(page, NULL);
>> - need_flush = 1;
>> + if (index == PKMAP_INVALID_INDEX)
>> + index = i;
>> }
>> - if (need_flush)
>> + if (index != PKMAP_INVALID_INDEX)
>> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +
>> + return index;
>> }
>>
>> /**
>> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> */
>> void kmap_flush_unused(void)
>> {
>> + unsigned int index;
>> +
>> lock_kmap();
>> - flush_all_zero_pkmaps();
>> + index = flush_all_zero_pkmaps();
>> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> + last_pkmap_nr = index;
>
> I don't know how kmap_flush_unused is really fast path so how my nitpick
> is effective. Anyway,
> What problem happens if we do following as?
>
> lock()
> index = flush_all_zero_pkmaps();
> if (index != PKMAP_INVALID_INDEX)
> last_pkmap_nr = index;
> unlock();
>
> Normally, last_pkmap_nr is increased with searching empty slot in
> map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> in kmap_flush_unused normally become either less than last_pkmap_nr
> or last_pkmap_nr + 1.
There is a case that return value of kmap_flush_unused() is larger
than last_pkmap_nr.
Look at the following example.
Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17
So, little dirty implementation is needed.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-02 19:07 ` JoonSoo Kim
@ 2012-11-02 22:42 ` Minchan Kim
2012-11-13 0:30 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-11-02 22:42 UTC (permalink / raw)
To: JoonSoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
Hi Joonsoo,
On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> Hello, Minchan.
>
> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> return index of first flushed entry. With this index,
> >> we can immediately map highmem page to virtual address represented by index.
> >> So change return type of flush_all_zero_pkmaps()
> >> and return index of first flushed entry.
> >>
> >> Additionally, update last_pkmap_nr to this index.
> >> It is certain that entry which is below this index is occupied by other mapping,
> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >>
> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >>
> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> index ef788b5..97ad208 100644
> >> --- a/include/linux/highmem.h
> >> +++ b/include/linux/highmem.h
> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >>
> >> #ifdef CONFIG_HIGHMEM
> >> #include <asm/highmem.h>
> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >>
> >> /* declarations for linux/mm/highmem.c */
> >> unsigned int nr_free_highpages(void);
> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> index d98b0a9..b365f7b 100644
> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> return virt_to_page(addr);
> >> }
> >>
> >> -static void flush_all_zero_pkmaps(void)
> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> {
> >> int i;
> >> - int need_flush = 0;
> >> + unsigned int index = PKMAP_INVALID_INDEX;
> >>
> >> flush_cache_kmaps();
> >>
> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> &pkmap_page_table[i]);
> >>
> >> set_page_address(page, NULL);
> >> - need_flush = 1;
> >> + if (index == PKMAP_INVALID_INDEX)
> >> + index = i;
> >> }
> >> - if (need_flush)
> >> + if (index != PKMAP_INVALID_INDEX)
> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> +
> >> + return index;
> >> }
> >>
> >> /**
> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> */
> >> void kmap_flush_unused(void)
> >> {
> >> + unsigned int index;
> >> +
> >> lock_kmap();
> >> - flush_all_zero_pkmaps();
> >> + index = flush_all_zero_pkmaps();
> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> + last_pkmap_nr = index;
> >
> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> > is effective. Anyway,
> > What problem happens if we do following as?
> >
> > lock()
> > index = flush_all_zero_pkmaps();
> > if (index != PKMAP_INVALID_INDEX)
> > last_pkmap_nr = index;
> > unlock();
> >
> > Normally, last_pkmap_nr is increased with searching empty slot in
> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> > in kmap_flush_unused normally become either less than last_pkmap_nr
> > or last_pkmap_nr + 1.
>
> There is a case that return value of kmap_flush_unused() is larger
> than last_pkmap_nr.
I see but why it's problem? kmap_flush_unused returns larger value than
last_pkmap_nr means that there is no free slot at below the value.
So unconditional last_pkmap_nr update is vaild.
> Look at the following example.
>
> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
>
> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17
>
> So, little dirty implementation is needed.
>
> Thanks.
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-02 22:42 ` Minchan Kim
@ 2012-11-13 0:30 ` JoonSoo Kim
2012-11-13 12:49 ` Minchan Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-11-13 0:30 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
2012/11/3 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> Hello, Minchan.
>>
>> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> return index of first flushed entry. With this index,
>> >> we can immediately map highmem page to virtual address represented by index.
>> >> So change return type of flush_all_zero_pkmaps()
>> >> and return index of first flushed entry.
>> >>
>> >> Additionally, update last_pkmap_nr to this index.
>> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >>
>> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >>
>> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> index ef788b5..97ad208 100644
>> >> --- a/include/linux/highmem.h
>> >> +++ b/include/linux/highmem.h
>> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >>
>> >> #ifdef CONFIG_HIGHMEM
>> >> #include <asm/highmem.h>
>> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >>
>> >> /* declarations for linux/mm/highmem.c */
>> >> unsigned int nr_free_highpages(void);
>> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> index d98b0a9..b365f7b 100644
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> return virt_to_page(addr);
>> >> }
>> >>
>> >> -static void flush_all_zero_pkmaps(void)
>> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> {
>> >> int i;
>> >> - int need_flush = 0;
>> >> + unsigned int index = PKMAP_INVALID_INDEX;
>> >>
>> >> flush_cache_kmaps();
>> >>
>> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> &pkmap_page_table[i]);
>> >>
>> >> set_page_address(page, NULL);
>> >> - need_flush = 1;
>> >> + if (index == PKMAP_INVALID_INDEX)
>> >> + index = i;
>> >> }
>> >> - if (need_flush)
>> >> + if (index != PKMAP_INVALID_INDEX)
>> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> +
>> >> + return index;
>> >> }
>> >>
>> >> /**
>> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> */
>> >> void kmap_flush_unused(void)
>> >> {
>> >> + unsigned int index;
>> >> +
>> >> lock_kmap();
>> >> - flush_all_zero_pkmaps();
>> >> + index = flush_all_zero_pkmaps();
>> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> + last_pkmap_nr = index;
>> >
>> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> > is effective. Anyway,
>> > What problem happens if we do following as?
>> >
>> > lock()
>> > index = flush_all_zero_pkmaps();
>> > if (index != PKMAP_INVALID_INDEX)
>> > last_pkmap_nr = index;
>> > unlock();
>> >
>> > Normally, last_pkmap_nr is increased with searching empty slot in
>> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> > or last_pkmap_nr + 1.
>>
>> There is a case that return value of kmap_flush_unused() is larger
>> than last_pkmap_nr.
>
> I see but why it's problem? kmap_flush_unused returns larger value than
> last_pkmap_nr means that there is no free slot at below the value.
> So unconditional last_pkmap_nr update is vaild.
I think that this is not true.
Look at the slightly different example.
Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
In this case, unconditional last_pkmap_nr update skip one kunmapped index.
So, conditional update is needed.
>> Look at the following example.
>>
>> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17
>>
>> So, little dirty implementation is needed.
>>
>> Thanks.
>
> --
> Kind Regards,
> Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-13 0:30 ` JoonSoo Kim
@ 2012-11-13 12:49 ` Minchan Kim
2012-11-13 14:12 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-11-13 12:49 UTC (permalink / raw)
To: JoonSoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> > Hi Joonsoo,
> >
> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> Hello, Minchan.
> >>
> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> return index of first flushed entry. With this index,
> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> and return index of first flushed entry.
> >> >>
> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >>
> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >>
> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> index ef788b5..97ad208 100644
> >> >> --- a/include/linux/highmem.h
> >> >> +++ b/include/linux/highmem.h
> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >>
> >> >> #ifdef CONFIG_HIGHMEM
> >> >> #include <asm/highmem.h>
> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >>
> >> >> /* declarations for linux/mm/highmem.c */
> >> >> unsigned int nr_free_highpages(void);
> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> index d98b0a9..b365f7b 100644
> >> >> --- a/mm/highmem.c
> >> >> +++ b/mm/highmem.c
> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> return virt_to_page(addr);
> >> >> }
> >> >>
> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> {
> >> >> int i;
> >> >> - int need_flush = 0;
> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
> >> >>
> >> >> flush_cache_kmaps();
> >> >>
> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> &pkmap_page_table[i]);
> >> >>
> >> >> set_page_address(page, NULL);
> >> >> - need_flush = 1;
> >> >> + if (index == PKMAP_INVALID_INDEX)
> >> >> + index = i;
> >> >> }
> >> >> - if (need_flush)
> >> >> + if (index != PKMAP_INVALID_INDEX)
> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> +
> >> >> + return index;
> >> >> }
> >> >>
> >> >> /**
> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> */
> >> >> void kmap_flush_unused(void)
> >> >> {
> >> >> + unsigned int index;
> >> >> +
> >> >> lock_kmap();
> >> >> - flush_all_zero_pkmaps();
> >> >> + index = flush_all_zero_pkmaps();
> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> + last_pkmap_nr = index;
> >> >
> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> > is effective. Anyway,
> >> > What problem happens if we do following as?
> >> >
> >> > lock()
> >> > index = flush_all_zero_pkmaps();
> >> > if (index != PKMAP_INVALID_INDEX)
> >> > last_pkmap_nr = index;
> >> > unlock();
> >> >
> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> > or last_pkmap_nr + 1.
> >>
> >> There is a case that return value of kmap_flush_unused() is larger
> >> than last_pkmap_nr.
> >
> > I see but why it's problem? kmap_flush_unused returns larger value than
> > last_pkmap_nr means that there is no free slot at below the value.
> > So unconditional last_pkmap_nr update is vaild.
>
> I think that this is not true.
> Look at the slightly different example.
>
> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>
> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>
> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> So, conditional update is needed.
Thanks for pouinting out, Joonsoo.
You're right. I misunderstood your flush_all_zero_pkmaps change.
As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
What's the benefit returning flushed flushed free slot index rather than free slot index?
I think flush_all_zero_pkmaps should return first free slot because customer of
flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
What he want is just free or not. In such case, we can remove above check and it makes
flusha_all_zero_pkmaps more intuitive.
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-13 12:49 ` Minchan Kim
@ 2012-11-13 14:12 ` JoonSoo Kim
2012-11-13 15:01 ` Minchan Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-11-13 14:12 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
2012/11/13 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> > Hi Joonsoo,
>> >
>> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> Hello, Minchan.
>> >>
>> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> return index of first flushed entry. With this index,
>> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> and return index of first flushed entry.
>> >> >>
>> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >>
>> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >>
>> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> index ef788b5..97ad208 100644
>> >> >> --- a/include/linux/highmem.h
>> >> >> +++ b/include/linux/highmem.h
>> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >>
>> >> >> #ifdef CONFIG_HIGHMEM
>> >> >> #include <asm/highmem.h>
>> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >>
>> >> >> /* declarations for linux/mm/highmem.c */
>> >> >> unsigned int nr_free_highpages(void);
>> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> index d98b0a9..b365f7b 100644
>> >> >> --- a/mm/highmem.c
>> >> >> +++ b/mm/highmem.c
>> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> return virt_to_page(addr);
>> >> >> }
>> >> >>
>> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> {
>> >> >> int i;
>> >> >> - int need_flush = 0;
>> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
>> >> >>
>> >> >> flush_cache_kmaps();
>> >> >>
>> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> &pkmap_page_table[i]);
>> >> >>
>> >> >> set_page_address(page, NULL);
>> >> >> - need_flush = 1;
>> >> >> + if (index == PKMAP_INVALID_INDEX)
>> >> >> + index = i;
>> >> >> }
>> >> >> - if (need_flush)
>> >> >> + if (index != PKMAP_INVALID_INDEX)
>> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> +
>> >> >> + return index;
>> >> >> }
>> >> >>
>> >> >> /**
>> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> */
>> >> >> void kmap_flush_unused(void)
>> >> >> {
>> >> >> + unsigned int index;
>> >> >> +
>> >> >> lock_kmap();
>> >> >> - flush_all_zero_pkmaps();
>> >> >> + index = flush_all_zero_pkmaps();
>> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> + last_pkmap_nr = index;
>> >> >
>> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> > is effective. Anyway,
>> >> > What problem happens if we do following as?
>> >> >
>> >> > lock()
>> >> > index = flush_all_zero_pkmaps();
>> >> > if (index != PKMAP_INVALID_INDEX)
>> >> > last_pkmap_nr = index;
>> >> > unlock();
>> >> >
>> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> > or last_pkmap_nr + 1.
>> >>
>> >> There is a case that return value of kmap_flush_unused() is larger
>> >> than last_pkmap_nr.
>> >
>> > I see but why it's problem? kmap_flush_unused returns larger value than
>> > last_pkmap_nr means that there is no free slot at below the value.
>> > So unconditional last_pkmap_nr update is vaild.
>>
>> I think that this is not true.
>> Look at the slightly different example.
>>
>> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>>
>> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> So, conditional update is needed.
>
> Thanks for pouinting out, Joonsoo.
> You're right. I misunderstood your flush_all_zero_pkmaps change.
> As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> What's the benefit returning flushed flushed free slot index rather than free slot index?
If flush_all_zero_pkmaps() return free slot index rather than first
flushed free slot,
we need another comparison like as 'if pkmap_count[i] == 0' and
need another local variable for determining whether flush is occurred or not.
I want to minimize these overhead and churning of the code, although
they are negligible.
> I think flush_all_zero_pkmaps should return first free slot because customer of
> flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> What he want is just free or not. In such case, we can remove above check and it makes
> flusha_all_zero_pkmaps more intuitive.
Yes, it is more intuitive, but as I mentioned above, it need another comparison,
so with that, a benefit which prevent to re-iterate when there is no
free slot, may be disappeared.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-13 14:12 ` JoonSoo Kim
@ 2012-11-13 15:01 ` Minchan Kim
2012-11-14 17:09 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-11-13 15:01 UTC (permalink / raw)
To: JoonSoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> > Hi Joonsoo,
> >> >
> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> Hello, Minchan.
> >> >>
> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> return index of first flushed entry. With this index,
> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> and return index of first flushed entry.
> >> >> >>
> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >>
> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >>
> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> index ef788b5..97ad208 100644
> >> >> >> --- a/include/linux/highmem.h
> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >>
> >> >> >> #ifdef CONFIG_HIGHMEM
> >> >> >> #include <asm/highmem.h>
> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >>
> >> >> >> /* declarations for linux/mm/highmem.c */
> >> >> >> unsigned int nr_free_highpages(void);
> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> --- a/mm/highmem.c
> >> >> >> +++ b/mm/highmem.c
> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >> return virt_to_page(addr);
> >> >> >> }
> >> >> >>
> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >> {
> >> >> >> int i;
> >> >> >> - int need_flush = 0;
> >> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >>
> >> >> >> flush_cache_kmaps();
> >> >> >>
> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> &pkmap_page_table[i]);
> >> >> >>
> >> >> >> set_page_address(page, NULL);
> >> >> >> - need_flush = 1;
> >> >> >> + if (index == PKMAP_INVALID_INDEX)
> >> >> >> + index = i;
> >> >> >> }
> >> >> >> - if (need_flush)
> >> >> >> + if (index != PKMAP_INVALID_INDEX)
> >> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> +
> >> >> >> + return index;
> >> >> >> }
> >> >> >>
> >> >> >> /**
> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> */
> >> >> >> void kmap_flush_unused(void)
> >> >> >> {
> >> >> >> + unsigned int index;
> >> >> >> +
> >> >> >> lock_kmap();
> >> >> >> - flush_all_zero_pkmaps();
> >> >> >> + index = flush_all_zero_pkmaps();
> >> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> + last_pkmap_nr = index;
> >> >> >
> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> > is effective. Anyway,
> >> >> > What problem happens if we do following as?
> >> >> >
> >> >> > lock()
> >> >> > index = flush_all_zero_pkmaps();
> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> > last_pkmap_nr = index;
> >> >> > unlock();
> >> >> >
> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> > or last_pkmap_nr + 1.
> >> >>
> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> than last_pkmap_nr.
> >> >
> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> > So unconditional last_pkmap_nr update is vaild.
> >>
> >> I think that this is not true.
> >> Look at the slightly different example.
> >>
> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >>
> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> do kunmap() with index 17
> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >>
> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> So, conditional update is needed.
> >
> > Thanks for pouinting out, Joonsoo.
> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>
> If flush_all_zero_pkmaps() return free slot index rather than first
> flushed free slot,
> we need another comparison like as 'if pkmap_count[i] == 0' and
> need another local variable for determining whether flush is occurred or not.
> I want to minimize these overhead and churning of the code, although
> they are negligible.
>
> > I think flush_all_zero_pkmaps should return first free slot because customer of
> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> > What he want is just free or not. In such case, we can remove above check and it makes
> > flusha_all_zero_pkmaps more intuitive.
>
> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> so with that, a benefit which prevent to re-iterate when there is no
> free slot, may be disappeared.
If you're very keen on the performance, why do you have such code?
You can remove below branch if you were keen on the performance.
diff --git a/mm/highmem.c b/mm/highmem.c
index c8be376..44a88dd 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
flush_cache_kmaps();
- for (i = 0; i < LAST_PKMAP; i++) {
+ for (i = LAST_PKMAP - 1; i >= 0; i--) {
struct page *page;
/*
@@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
set_page_address(page, NULL);
- if (index == PKMAP_INVALID_INDEX)
- index = i;
+ index = i;
}
if (index != PKMAP_INVALID_INDEX)
flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
Anyway, if you have the concern of performance, Okay let's give up making code clear
although I didn't see any report about kmap perfomance. Instead, please consider above
optimization because you have already broken what you mentioned.
If we can't make function clear, another method for it is to add function comment. Please.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind Regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-13 15:01 ` Minchan Kim
@ 2012-11-14 17:09 ` JoonSoo Kim
2012-11-19 23:46 ` Minchan Kim
0 siblings, 1 reply; 95+ messages in thread
From: JoonSoo Kim @ 2012-11-14 17:09 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
Hi, Minchan.
2012/11/14 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> > Hi Joonsoo,
>> >> >
>> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> Hello, Minchan.
>> >> >>
>> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> and return index of first flushed entry.
>> >> >> >>
>> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >>
>> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >>
>> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >>
>> >> >> >> #ifdef CONFIG_HIGHMEM
>> >> >> >> #include <asm/highmem.h>
>> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >>
>> >> >> >> /* declarations for linux/mm/highmem.c */
>> >> >> >> unsigned int nr_free_highpages(void);
>> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> --- a/mm/highmem.c
>> >> >> >> +++ b/mm/highmem.c
>> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >> return virt_to_page(addr);
>> >> >> >> }
>> >> >> >>
>> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >> {
>> >> >> >> int i;
>> >> >> >> - int need_flush = 0;
>> >> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >>
>> >> >> >> flush_cache_kmaps();
>> >> >> >>
>> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> &pkmap_page_table[i]);
>> >> >> >>
>> >> >> >> set_page_address(page, NULL);
>> >> >> >> - need_flush = 1;
>> >> >> >> + if (index == PKMAP_INVALID_INDEX)
>> >> >> >> + index = i;
>> >> >> >> }
>> >> >> >> - if (need_flush)
>> >> >> >> + if (index != PKMAP_INVALID_INDEX)
>> >> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> +
>> >> >> >> + return index;
>> >> >> >> }
>> >> >> >>
>> >> >> >> /**
>> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> */
>> >> >> >> void kmap_flush_unused(void)
>> >> >> >> {
>> >> >> >> + unsigned int index;
>> >> >> >> +
>> >> >> >> lock_kmap();
>> >> >> >> - flush_all_zero_pkmaps();
>> >> >> >> + index = flush_all_zero_pkmaps();
>> >> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> + last_pkmap_nr = index;
>> >> >> >
>> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> > is effective. Anyway,
>> >> >> > What problem happens if we do following as?
>> >> >> >
>> >> >> > lock()
>> >> >> > index = flush_all_zero_pkmaps();
>> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> > last_pkmap_nr = index;
>> >> >> > unlock();
>> >> >> >
>> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> > or last_pkmap_nr + 1.
>> >> >>
>> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> than last_pkmap_nr.
>> >> >
>> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> > So unconditional last_pkmap_nr update is vaild.
>> >>
>> >> I think that this is not true.
>> >> Look at the slightly different example.
>> >>
>> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >>
>> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> do kunmap() with index 17
>> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >>
>> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> So, conditional update is needed.
>> >
>> > Thanks for pouinting out, Joonsoo.
>> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>>
>> If flush_all_zero_pkmaps() return free slot index rather than first
>> flushed free slot,
>> we need another comparison like as 'if pkmap_count[i] == 0' and
>> need another local variable for determining whether flush is occurred or not.
>> I want to minimize these overhead and churning of the code, although
>> they are negligible.
>>
>> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> > What he want is just free or not. In such case, we can remove above check and it makes
>> > flusha_all_zero_pkmaps more intuitive.
>>
>> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> so with that, a benefit which prevent to re-iterate when there is no
>> free slot, may be disappeared.
>
> If you're very keen on the performance, why do you have such code?
> You can remove below branch if you were keen on the performance.
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index c8be376..44a88dd 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>
> flush_cache_kmaps();
>
> - for (i = 0; i < LAST_PKMAP; i++) {
> + for (i = LAST_PKMAP - 1; i >= 0; i--) {
> struct page *page;
>
> /*
> @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>
> set_page_address(page, NULL);
> - if (index == PKMAP_INVALID_INDEX)
> - index = i;
> + index = i;
> }
> if (index != PKMAP_INVALID_INDEX)
> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>
>
> Anyway, if you have the concern of performance, Okay let's give up making code clear
> although I didn't see any report about kmap perfomance. Instead, please consider above
> optimization because you have already broken what you mentioned.
> If we can't make function clear, another method for it is to add function comment. Please.
Yes, I also didn't see any report about kmap performance.
By your reviewing comment, I eventually reach that this patch will not
give any benefit.
So how about to drop it?
Thanks for review.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-14 17:09 ` JoonSoo Kim
@ 2012-11-19 23:46 ` Minchan Kim
2012-11-27 15:01 ` JoonSoo Kim
0 siblings, 1 reply; 95+ messages in thread
From: Minchan Kim @ 2012-11-19 23:46 UTC (permalink / raw)
To: JoonSoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra
Hi Joonsoo,
Sorry for the delay.
On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
> Hi, Minchan.
>
> 2012/11/14 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> >> > Hi Joonsoo,
> >> >> >
> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> >> Hello, Minchan.
> >> >> >>
> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> >> return index of first flushed entry. With this index,
> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> >> and return index of first flushed entry.
> >> >> >> >>
> >> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >> >>
> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >> >>
> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> >> index ef788b5..97ad208 100644
> >> >> >> >> --- a/include/linux/highmem.h
> >> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >> >>
> >> >> >> >> #ifdef CONFIG_HIGHMEM
> >> >> >> >> #include <asm/highmem.h>
> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >> >>
> >> >> >> >> /* declarations for linux/mm/highmem.c */
> >> >> >> >> unsigned int nr_free_highpages(void);
> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> >> --- a/mm/highmem.c
> >> >> >> >> +++ b/mm/highmem.c
> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >> >> return virt_to_page(addr);
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >> >> {
> >> >> >> >> int i;
> >> >> >> >> - int need_flush = 0;
> >> >> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >> >>
> >> >> >> >> flush_cache_kmaps();
> >> >> >> >>
> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >> &pkmap_page_table[i]);
> >> >> >> >>
> >> >> >> >> set_page_address(page, NULL);
> >> >> >> >> - need_flush = 1;
> >> >> >> >> + if (index == PKMAP_INVALID_INDEX)
> >> >> >> >> + index = i;
> >> >> >> >> }
> >> >> >> >> - if (need_flush)
> >> >> >> >> + if (index != PKMAP_INVALID_INDEX)
> >> >> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> >> +
> >> >> >> >> + return index;
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> /**
> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >> */
> >> >> >> >> void kmap_flush_unused(void)
> >> >> >> >> {
> >> >> >> >> + unsigned int index;
> >> >> >> >> +
> >> >> >> >> lock_kmap();
> >> >> >> >> - flush_all_zero_pkmaps();
> >> >> >> >> + index = flush_all_zero_pkmaps();
> >> >> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> >> + last_pkmap_nr = index;
> >> >> >> >
> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> >> > is effective. Anyway,
> >> >> >> > What problem happens if we do following as?
> >> >> >> >
> >> >> >> > lock()
> >> >> >> > index = flush_all_zero_pkmaps();
> >> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> >> > last_pkmap_nr = index;
> >> >> >> > unlock();
> >> >> >> >
> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> >> > or last_pkmap_nr + 1.
> >> >> >>
> >> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> >> than last_pkmap_nr.
> >> >> >
> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> >> > So unconditional last_pkmap_nr update is vaild.
> >> >>
> >> >> I think that this is not true.
> >> >> Look at the slightly different example.
> >> >>
> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >> >>
> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> >> do kunmap() with index 17
> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >> >>
> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> >> So, conditional update is needed.
> >> >
> >> > Thanks for pouinting out, Joonsoo.
> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
> >>
> >> If flush_all_zero_pkmaps() return free slot index rather than first
> >> flushed free slot,
> >> we need another comparison like as 'if pkmap_count[i] == 0' and
> >> need another local variable for determining whether flush is occurred or not.
> >> I want to minimize these overhead and churning of the code, although
> >> they are negligible.
> >>
> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> >> > What he want is just free or not. In such case, we can remove above check and it makes
> >> > flusha_all_zero_pkmaps more intuitive.
> >>
> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> >> so with that, a benefit which prevent to re-iterate when there is no
> >> free slot, may be disappeared.
> >
> > If you're very keen on the performance, why do you have such code?
> > You can remove below branch if you were keen on the performance.
> >
> > diff --git a/mm/highmem.c b/mm/highmem.c
> > index c8be376..44a88dd 100644
> > --- a/mm/highmem.c
> > +++ b/mm/highmem.c
> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> >
> > flush_cache_kmaps();
> >
> > - for (i = 0; i < LAST_PKMAP; i++) {
> > + for (i = LAST_PKMAP - 1; i >= 0; i--) {
> > struct page *page;
> >
> > /*
> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> > pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
> >
> > set_page_address(page, NULL);
> > - if (index == PKMAP_INVALID_INDEX)
> > - index = i;
> > + index = i;
> > }
> > if (index != PKMAP_INVALID_INDEX)
> > flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >
> >
> > Anyway, if you have the concern of performance, Okay let's give up making code clear
> > although I didn't see any report about kmap perfomance. Instead, please consider above
> > optimization because you have already broken what you mentioned.
> > If we can't make function clear, another method for it is to add function comment. Please.
>
> Yes, I also didn't see any report about kmap performance.
> By your reviewing comment, I eventually reach that this patch will not
> give any benefit.
> So how about to drop it?
Personally, I prefer to proceed but if you don't have a confidence about gain,
No problem to drop it.
Thanks.
>
> Thanks for review.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
2012-11-19 23:46 ` Minchan Kim
@ 2012-11-27 15:01 ` JoonSoo Kim
0 siblings, 0 replies; 95+ messages in thread
From: JoonSoo Kim @ 2012-11-27 15:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, Mel Gorman, Minchan Kim
Hello, Andrew.
2012/11/20 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
> Sorry for the delay.
>
> On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
>> Hi, Minchan.
>>
>> 2012/11/14 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> >> > Hi Joonsoo,
>> >> >> >
>> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> >> Hello, Minchan.
>> >> >> >>
>> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> >> and return index of first flushed entry.
>> >> >> >> >>
>> >> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >> >>
>> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >> >>
>> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >> >>
>> >> >> >> >> #ifdef CONFIG_HIGHMEM
>> >> >> >> >> #include <asm/highmem.h>
>> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >> >>
>> >> >> >> >> /* declarations for linux/mm/highmem.c */
>> >> >> >> >> unsigned int nr_free_highpages(void);
>> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> >> --- a/mm/highmem.c
>> >> >> >> >> +++ b/mm/highmem.c
>> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >> >> return virt_to_page(addr);
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >> >> {
>> >> >> >> >> int i;
>> >> >> >> >> - int need_flush = 0;
>> >> >> >> >> + unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >> >>
>> >> >> >> >> flush_cache_kmaps();
>> >> >> >> >>
>> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >> &pkmap_page_table[i]);
>> >> >> >> >>
>> >> >> >> >> set_page_address(page, NULL);
>> >> >> >> >> - need_flush = 1;
>> >> >> >> >> + if (index == PKMAP_INVALID_INDEX)
>> >> >> >> >> + index = i;
>> >> >> >> >> }
>> >> >> >> >> - if (need_flush)
>> >> >> >> >> + if (index != PKMAP_INVALID_INDEX)
>> >> >> >> >> flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> >> +
>> >> >> >> >> + return index;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >> /**
>> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >> */
>> >> >> >> >> void kmap_flush_unused(void)
>> >> >> >> >> {
>> >> >> >> >> + unsigned int index;
>> >> >> >> >> +
>> >> >> >> >> lock_kmap();
>> >> >> >> >> - flush_all_zero_pkmaps();
>> >> >> >> >> + index = flush_all_zero_pkmaps();
>> >> >> >> >> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> >> + last_pkmap_nr = index;
>> >> >> >> >
>> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> >> > is effective. Anyway,
>> >> >> >> > What problem happens if we do following as?
>> >> >> >> >
>> >> >> >> > lock()
>> >> >> >> > index = flush_all_zero_pkmaps();
>> >> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> >> > last_pkmap_nr = index;
>> >> >> >> > unlock();
>> >> >> >> >
>> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> >> > or last_pkmap_nr + 1.
>> >> >> >>
>> >> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> >> than last_pkmap_nr.
>> >> >> >
>> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> >> > So unconditional last_pkmap_nr update is vaild.
>> >> >>
>> >> >> I think that this is not true.
>> >> >> Look at the slightly different example.
>> >> >>
>> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >> >>
>> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> >> do kunmap() with index 17
>> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >> >>
>> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> >> So, conditional update is needed.
>> >> >
>> >> > Thanks for pouinting out, Joonsoo.
>> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>> >>
>> >> If flush_all_zero_pkmaps() return free slot index rather than first
>> >> flushed free slot,
>> >> we need another comparison like as 'if pkmap_count[i] == 0' and
>> >> need another local variable for determining whether flush is occurred or not.
>> >> I want to minimize these overhead and churning of the code, although
>> >> they are negligible.
>> >>
>> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> >> > What he want is just free or not. In such case, we can remove above check and it makes
>> >> > flusha_all_zero_pkmaps more intuitive.
>> >>
>> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> >> so with that, a benefit which prevent to re-iterate when there is no
>> >> free slot, may be disappeared.
>> >
>> > If you're very keen on the performance, why do you have such code?
>> > You can remove below branch if you were keen on the performance.
>> >
>> > diff --git a/mm/highmem.c b/mm/highmem.c
>> > index c8be376..44a88dd 100644
>> > --- a/mm/highmem.c
>> > +++ b/mm/highmem.c
>> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> >
>> > flush_cache_kmaps();
>> >
>> > - for (i = 0; i < LAST_PKMAP; i++) {
>> > + for (i = LAST_PKMAP - 1; i >= 0; i--) {
>> > struct page *page;
>> >
>> > /*
>> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> > pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>> >
>> > set_page_address(page, NULL);
>> > - if (index == PKMAP_INVALID_INDEX)
>> > - index = i;
>> > + index = i;
>> > }
>> > if (index != PKMAP_INVALID_INDEX)
>> > flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >
>> >
>> > Anyway, if you have the concern of performance, Okay let's give up making code clear
>> > although I didn't see any report about kmap perfomance. Instead, please consider above
>> > optimization because you have already broken what you mentioned.
>> > If we can't make function clear, another method for it is to add function comment. Please.
>>
>> Yes, I also didn't see any report about kmap performance.
>> By your reviewing comment, I eventually reach that this patch will not
>> give any benefit.
>> So how about to drop it?
>
> Personally, I prefer to proceed but if you don't have a confidence about gain,
> No problem to drop it.
> Thanks.
>
>>
>> Thanks for review.
During the review, I concluded that this patch have no gain.
And this patch churn the code too much.
So I want to drop this patch for your tree.
Sorry for late notification.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 95+ messages in thread
end of thread, other threads:[~2012-11-27 15:01 UTC | newest]
Thread overview: 95+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Yes>
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
2012-07-16 16:14 ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-16 17:26 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 17:59 ` JoonSoo Kim
2012-07-17 13:02 ` Michal Nazarewicz
2012-07-16 16:14 ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-16 17:29 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 18:40 ` JoonSoo Kim
2012-07-17 13:16 ` Michal Nazarewicz
2012-07-16 17:14 ` [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-16 17:30 ` Christoph Lameter
2012-07-16 17:23 ` [PATCH 1/3] mm: correct return value of migrate_pages() Christoph Lameter
2012-07-16 17:32 ` JoonSoo Kim
2012-07-16 17:37 ` Christoph Lameter
2012-07-16 17:40 ` Michal Nazarewicz
2012-07-16 17:57 ` JoonSoo Kim
2012-07-16 18:05 ` Christoph Lameter
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-17 12:33 ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-17 14:28 ` Christoph Lameter
2012-07-17 15:41 ` JoonSoo Kim
2012-07-17 12:33 ` [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-17 13:25 ` Michal Nazarewicz
2012-07-17 15:45 ` JoonSoo Kim
2012-07-17 15:49 ` [PATCH 3/4 v3] " Joonsoo Kim
2012-07-17 12:33 ` [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-27 20:57 ` Christoph Lameter
2012-07-28 6:16 ` JoonSoo Kim
2012-07-30 19:30 ` Christoph Lameter
2012-07-27 17:55 ` [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-27 20:54 ` Christoph Lameter
2012-07-28 6:09 ` JoonSoo Kim
2012-07-30 19:29 ` Christoph Lameter
2012-07-31 3:34 ` JoonSoo Kim
2012-07-31 14:04 ` Christoph Lameter
2012-08-01 5:15 ` Michael Kerrisk
2012-08-01 18:00 ` Christoph Lameter
2012-08-02 5:52 ` Michael Kerrisk
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
2012-08-24 16:05 ` [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node Joonsoo Kim
2012-08-24 16:15 ` Christoph Lameter
2012-08-24 16:28 ` JoonSoo Kim
2012-08-24 16:31 ` Christoph Lameter
2012-08-24 16:40 ` JoonSoo Kim
2012-08-24 16:12 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Christoph Lameter
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
2012-08-25 14:11 ` [PATCH 2/2] slab: fix starting index for finding another object Joonsoo Kim
2012-09-03 10:08 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
2012-10-20 15:48 ` [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA Joonsoo Kim
2012-10-22 14:31 ` Christoph Lameter
2012-10-23 2:29 ` JoonSoo Kim
2012-10-23 6:16 ` Eric Dumazet
2012-10-23 16:12 ` JoonSoo Kim
2012-10-24 8:05 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Pekka Enberg
2012-10-24 13:36 ` Christoph Lameter
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
2012-10-28 19:12 ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-29 1:48 ` Minchan Kim
2012-10-28 19:12 ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
2012-10-29 1:52 ` Minchan Kim
2012-10-30 21:31 ` Andrew Morton
2012-10-31 5:14 ` Minchan Kim
2012-10-31 15:01 ` JoonSoo Kim
2012-10-28 19:12 ` [PATCH 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
2012-10-29 1:57 ` Minchan Kim
2012-10-28 19:12 ` [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry Joonsoo Kim
2012-10-29 2:06 ` Minchan Kim
2012-10-29 13:12 ` JoonSoo Kim
2012-10-28 19:12 ` [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
2012-10-29 2:09 ` Minchan Kim
2012-10-29 2:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Minchan Kim
2012-10-29 13:15 ` JoonSoo Kim
2012-10-31 17:11 ` JoonSoo Kim
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
2012-10-31 16:56 ` [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry Joonsoo Kim
2012-11-01 5:03 ` Minchan Kim
2012-11-02 19:07 ` JoonSoo Kim
2012-11-02 22:42 ` Minchan Kim
2012-11-13 0:30 ` JoonSoo Kim
2012-11-13 12:49 ` Minchan Kim
2012-11-13 14:12 ` JoonSoo Kim
2012-11-13 15:01 ` Minchan Kim
2012-11-14 17:09 ` JoonSoo Kim
2012-11-19 23:46 ` Minchan Kim
2012-11-27 15:01 ` JoonSoo Kim
2012-10-31 16:56 ` [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).