* Re: Need to take mmap_sem lock in move_pages. [not found] <28631E6913C8074E95A698E8AC93D091B21561@caexch1.virident.info> @ 2009-02-04 9:36 ` KAMEZAWA Hiroyuki 2009-02-04 9:40 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-04 9:36 UTC (permalink / raw) To: Swamy Gowda; +Cc: linux-kernel, cl, Brice.Goglin On Wed, 4 Feb 2009 01:02:35 -0800 "Swamy Gowda" <swamy@virident.com> wrote: > Hi, > > > > I believe that migrate_pages related race conditions were fixed as part > of rcu_read_lock in unmap_and_move. But it seems we are still taking the > mmap_sem lock in do_move_pages function, is this really required? If it > is so why it is not needed in hot remove path? > Hmm , CC: to Christoph and Brice. My understanding is following. 1. do_move_page_to_node_array() at el. needs mmap_sem (read-side) because it scans page table and vma. While we have to scan vmas to find pages for migration, it needs mmap_sem. So, this part, == do_move_page_to_node_array() 903 err = 0; 904 if (!list_empty(&pagelist)) 905 err = migrate_pages(&pagelist, new_page_node, 906 (unsigned long)pm); 907 908 up_read(&mm->mmap_sem); == can be == 903 err = 0; 904 up_read(&mm->mmap_sem); 905 if (!list_empty(&pagelist)) 906 err = migrate_pages(&pagelist, new_page_node, 907 (unsigned long)pm); 908 == ? But, by above move of semaphore, reliability of sys_migrate_page() goes down. Assume 2 threads. Thread1 calls sys_move_pages(), thread2 does page fault, touches pte under migraton. == Thread 1 Thread2 up_read(&mm->mmap_sem) page fault => map new page. end of migration. == What a user expects is move all pages within [start, end) moves to nodes specified. And, if the page doesn't exist, "err = -ENOENT" is set to status buffer. But, in above case, the page exists but not in place the user expected. So, there are trade offs. Pros. up_read(&mmap_sem) before migration will reduce period of lock. Cons. sys_migrate_pages() at el. are not atomic anymore and return code is not reliable. 2. memory hotplug's migrate_page() finds the page by physical memory's memmap. NO scans to any mm_struct or vmas. So, not necessary to take mmap_sem. (The logic is same as to vmscan.c's logic) It also moves pages which is not mapped. Thanks, -Kame ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Need to take mmap_sem lock in move_pages. 2009-02-04 9:36 ` Need to take mmap_sem lock in move_pages KAMEZAWA Hiroyuki @ 2009-02-04 9:40 ` KAMEZAWA Hiroyuki 2009-02-04 9:55 ` KAMEZAWA Hiroyuki 2009-02-04 15:34 ` Christoph Lameter 0 siblings, 2 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-04 9:40 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Swamy Gowda, linux-kernel, cl, Brice.Goglin On Wed, 4 Feb 2009 18:36:00 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 4 Feb 2009 01:02:35 -0800 > "Swamy Gowda" <swamy@virident.com> wrote: > > > Hi, > > > > > > > > I believe that migrate_pages related race conditions were fixed as part > > of rcu_read_lock in unmap_and_move. But it seems we are still taking the > > mmap_sem lock in do_move_pages function, is this really required? If it > > is so why it is not needed in hot remove path? > > > > Hmm , CC: to Christoph and Brice. > > My understanding is following. > > 1. do_move_page_to_node_array() at el. needs mmap_sem (read-side) because it > scans page table and vma. While we have to scan vmas to find pages for > migration, it needs mmap_sem. > > So, this part, > == do_move_page_to_node_array() > 903 err = 0; > 904 if (!list_empty(&pagelist)) > 905 err = migrate_pages(&pagelist, new_page_node, > 906 (unsigned long)pm); > 907 > 908 up_read(&mm->mmap_sem); > == > can be > == > 903 err = 0; > 904 up_read(&mm->mmap_sem); > 905 if (!list_empty(&pagelist)) > 906 err = migrate_pages(&pagelist, new_page_node, > 907 (unsigned long)pm); > 908 > == > ? > > But, by above move of semaphore, reliability of sys_migrate_page() goes down. > Assume 2 threads. Thread1 calls sys_move_pages(), thread2 does page fault, > touches pte under migraton. > == > Thread 1 Thread2 > > up_read(&mm->mmap_sem) > page fault => map new page. > end of migration. > == > please ignore following ;( Sigh.. Maybe up_read() can be moved before do_migrate_pages(), I think. Sorry. -kame ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. 2009-02-04 9:40 ` KAMEZAWA Hiroyuki @ 2009-02-04 9:55 ` KAMEZAWA Hiroyuki 2009-02-04 15:34 ` Christoph Lameter 1 sibling, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-04 9:55 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Swamy Gowda, linux-kernel, cl, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009 18:40:28 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 4 Feb 2009 18:36:00 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > Maybe up_read() can be moved before do_migrate_pages(), I think. > How about this ? == mmap_sem can be released after page table walk ends. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: mmotm-2.6.29-Feb03/mm/migrate.c =================================================================== --- mmotm-2.6.29-Feb03.orig/mm/migrate.c +++ mmotm-2.6.29-Feb03/mm/migrate.c @@ -875,13 +875,13 @@ put_and_set: set_status: pp->status = err; } + up_read(&mm->mmap_sem); err = 0; if (!list_empty(&pagelist)) err = migrate_pages(&pagelist, new_page_node, (unsigned long)pm); - up_read(&mm->mmap_sem); return err; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. @ 2009-02-04 9:55 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-04 9:55 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Swamy Gowda, linux-kernel, cl, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009 18:40:28 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 4 Feb 2009 18:36:00 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > Maybe up_read() can be moved before do_migrate_pages(), I think. > How about this ? == mmap_sem can be released after page table walk ends. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: mmotm-2.6.29-Feb03/mm/migrate.c =================================================================== --- mmotm-2.6.29-Feb03.orig/mm/migrate.c +++ mmotm-2.6.29-Feb03/mm/migrate.c @@ -875,13 +875,13 @@ put_and_set: set_status: pp->status = err; } + up_read(&mm->mmap_sem); err = 0; if (!list_empty(&pagelist)) err = migrate_pages(&pagelist, new_page_node, (unsigned long)pm); - up_read(&mm->mmap_sem); return err; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. 2009-02-04 9:55 ` KAMEZAWA Hiroyuki @ 2009-02-04 15:39 ` Christoph Lameter -1 siblings, 0 replies; 13+ messages in thread From: Christoph Lameter @ 2009-02-04 15:39 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Swamy Gowda, linux-kernel, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: > mmap_sem can be released after page table walk ends. No. read lock on mmap_sem must be held since the migrate functions manipulate page table entries. Concurrent large scale changes to the page tables (splitting vmas, remapping etc) must not be possible. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. @ 2009-02-04 15:39 ` Christoph Lameter 0 siblings, 0 replies; 13+ messages in thread From: Christoph Lameter @ 2009-02-04 15:39 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Swamy Gowda, linux-kernel, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: > mmap_sem can be released after page table walk ends. No. read lock on mmap_sem must be held since the migrate functions manipulate page table entries. Concurrent large scale changes to the page tables (splitting vmas, remapping etc) must not be possible. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. 2009-02-04 15:39 ` Christoph Lameter @ 2009-02-05 1:15 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-05 1:15 UTC (permalink / raw) To: Christoph Lameter Cc: Swamy Gowda, linux-kernel, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009 10:39:19 -0500 (EST) Christoph Lameter <cl@linux-foundation.org> wrote: > On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: > > > mmap_sem can be released after page table walk ends. > > No. read lock on mmap_sem must be held since the migrate functions > manipulate page table entries. Concurrent large scale changes to the page > tables (splitting vmas, remapping etc) must not be possible. > Just for clarification: 1. changes in page table is not problem from the viewpoint of kernel. (means no panic, no leak,...) 2. But this loses "atomic" aspect of migration and will allow unexpected behaviors. (means the page-mapping status after sys_move may not be what user expects.) Thanks, -Kame ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. @ 2009-02-05 1:15 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-05 1:15 UTC (permalink / raw) To: Christoph Lameter Cc: Swamy Gowda, linux-kernel, Brice.Goglin, linux-mm@kvack.org On Wed, 4 Feb 2009 10:39:19 -0500 (EST) Christoph Lameter <cl@linux-foundation.org> wrote: > On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: > > > mmap_sem can be released after page table walk ends. > > No. read lock on mmap_sem must be held since the migrate functions > manipulate page table entries. Concurrent large scale changes to the page > tables (splitting vmas, remapping etc) must not be possible. > Just for clarification: 1. changes in page table is not problem from the viewpoint of kernel. (means no panic, no leak,...) 2. But this loses "atomic" aspect of migration and will allow unexpected behaviors. (means the page-mapping status after sys_move may not be what user expects.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. 2009-02-05 1:15 ` KAMEZAWA Hiroyuki @ 2009-02-05 12:23 ` Swamy Gowda -1 siblings, 0 replies; 13+ messages in thread From: Swamy Gowda @ 2009-02-05 12:23 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Christoph Lameter, linux-kernel, Brice.Goglin, linux-mm@kvack.org KAMEZAWA Hiroyuki wrote: > On Wed, 4 Feb 2009 10:39:19 -0500 (EST) > Christoph Lameter <cl@linux-foundation.org> wrote: > >> On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: >> >> > mmap_sem can be released after page table walk ends. >> >> No. read lock on mmap_sem must be held since the migrate functions >> manipulate page table entries. Concurrent large scale changes to the >> page >> tables (splitting vmas, remapping etc) must not be possible. >> > Just for clarification: > > 1. changes in page table is not problem from the viewpoint of kernel. > (means no panic, no leak,...) > 2. But this loses "atomic" aspect of migration and will allow unexpected > behaviors. > (means the page-mapping status after sys_move may not be what user > expects.) > > > Thanks, > -Kame > > But I can't understand how user can see different page->mapping , since new page->mapping still holds the anon_vma pointer which should still contain the changes in the vma list( due to split vma etc). But, considering it as a problem how is it avoided in case of hotremove? --Swamy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. @ 2009-02-05 12:23 ` Swamy Gowda 0 siblings, 0 replies; 13+ messages in thread From: Swamy Gowda @ 2009-02-05 12:23 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Christoph Lameter, linux-kernel, Brice.Goglin, linux-mm@kvack.org KAMEZAWA Hiroyuki wrote: > On Wed, 4 Feb 2009 10:39:19 -0500 (EST) > Christoph Lameter <cl@linux-foundation.org> wrote: > >> On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: >> >> > mmap_sem can be released after page table walk ends. >> >> No. read lock on mmap_sem must be held since the migrate functions >> manipulate page table entries. Concurrent large scale changes to the >> page >> tables (splitting vmas, remapping etc) must not be possible. >> > Just for clarification: > > 1. changes in page table is not problem from the viewpoint of kernel. > (means no panic, no leak,...) > 2. But this loses "atomic" aspect of migration and will allow unexpected > behaviors. > (means the page-mapping status after sys_move may not be what user > expects.) > > > Thanks, > -Kame > > But I can't understand how user can see different page->mapping , since new page->mapping still holds the anon_vma pointer which should still contain the changes in the vma list( due to split vma etc). But, considering it as a problem how is it avoided in case of hotremove? --Swamy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. 2009-02-05 12:23 ` Swamy Gowda @ 2009-02-05 13:23 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-05 13:23 UTC (permalink / raw) To: Swamy Gowda Cc: KAMEZAWA Hiroyuki, Christoph Lameter, linux-kernel, brice.goglin, linux-mm@kvack.org Swamy Gowda wrote: > KAMEZAWA Hiroyuki wrote: >> On Wed, 4 Feb 2009 10:39:19 -0500 (EST) >> Christoph Lameter <cl@linux-foundation.org> wrote: >> >>> On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: >>> >>> > mmap_sem can be released after page table walk ends. >>> >>> No. read lock on mmap_sem must be held since the migrate functions >>> manipulate page table entries. Concurrent large scale changes to the >>> page >>> tables (splitting vmas, remapping etc) must not be possible. >>> >> Just for clarification: >> >> 1. changes in page table is not problem from the viewpoint of kernel. >> (means no panic, no leak,...) >> 2. But this loses "atomic" aspect of migration and will allow unexpected >> behaviors. >> (means the page-mapping status after sys_move may not be what user >> expects.) >> >> >> Thanks, >> -Kame >> >> > But I can't understand how user can see different page->mapping , since > new page->mapping still holds the anon_vma pointer which should still > contain the changes in the vma list( due to split vma etc). But, > considering it as a problem how is it avoided in case of hotremove? > I'm sorry page-mapping in my text is not page->mapping. Just means process's memory map. In my point of view, no problems (I wrote no problem in the kernel.) One big difference between sys_move_pages and hot remove is hot-remove retries many times but sys_move_pages() doesn't. So, race/contention in migrate_page() will dramatically decrease success-rate of page migration by system call. In user side, sys_move_pages(), we may have to think more. I wonder that there may be much more contentions of pte_lock and page_lock() etc... if we remove mmap_sem. The good point of mmap_sem is the waiter can sleep without any troubles and nest of locks. Thanks, -Kame ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC][PATCH] release mmap_sem before starting migration (Was Re: Need to take mmap_sem lock in move_pages. @ 2009-02-05 13:23 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-05 13:23 UTC (permalink / raw) To: Swamy Gowda Cc: KAMEZAWA Hiroyuki, Christoph Lameter, linux-kernel, brice.goglin, linux-mm@kvack.org Swamy Gowda wrote: > KAMEZAWA Hiroyuki wrote: >> On Wed, 4 Feb 2009 10:39:19 -0500 (EST) >> Christoph Lameter <cl@linux-foundation.org> wrote: >> >>> On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: >>> >>> > mmap_sem can be released after page table walk ends. >>> >>> No. read lock on mmap_sem must be held since the migrate functions >>> manipulate page table entries. Concurrent large scale changes to the >>> page >>> tables (splitting vmas, remapping etc) must not be possible. >>> >> Just for clarification: >> >> 1. changes in page table is not problem from the viewpoint of kernel. >> (means no panic, no leak,...) >> 2. But this loses "atomic" aspect of migration and will allow unexpected >> behaviors. >> (means the page-mapping status after sys_move may not be what user >> expects.) >> >> >> Thanks, >> -Kame >> >> > But I can't understand how user can see different page->mapping , since > new page->mapping still holds the anon_vma pointer which should still > contain the changes in the vma list( due to split vma etc). But, > considering it as a problem how is it avoided in case of hotremove? > I'm sorry page-mapping in my text is not page->mapping. Just means process's memory map. In my point of view, no problems (I wrote no problem in the kernel.) One big difference between sys_move_pages and hot remove is hot-remove retries many times but sys_move_pages() doesn't. So, race/contention in migrate_page() will dramatically decrease success-rate of page migration by system call. In user side, sys_move_pages(), we may have to think more. I wonder that there may be much more contentions of pte_lock and page_lock() etc... if we remove mmap_sem. The good point of mmap_sem is the waiter can sleep without any troubles and nest of locks. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Need to take mmap_sem lock in move_pages. 2009-02-04 9:40 ` KAMEZAWA Hiroyuki 2009-02-04 9:55 ` KAMEZAWA Hiroyuki @ 2009-02-04 15:34 ` Christoph Lameter 1 sibling, 0 replies; 13+ messages in thread From: Christoph Lameter @ 2009-02-04 15:34 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Swamy Gowda, linux-kernel, Brice.Goglin On Wed, 4 Feb 2009, KAMEZAWA Hiroyuki wrote: > > 1. do_move_page_to_node_array() at el. needs mmap_sem (read-side) because it > > scans page table and vma. While we have to scan vmas to find pages for > > migration, it needs mmap_sem. Right. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-02-05 13:24 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <28631E6913C8074E95A698E8AC93D091B21561@caexch1.virident.info>
2009-02-04 9:36 ` Need to take mmap_sem lock in move_pages KAMEZAWA Hiroyuki
2009-02-04 9:40 ` KAMEZAWA Hiroyuki
2009-02-04 9:55 ` [RFC][PATCH] release mmap_sem before starting migration (Was " KAMEZAWA Hiroyuki
2009-02-04 9:55 ` KAMEZAWA Hiroyuki
2009-02-04 15:39 ` Christoph Lameter
2009-02-04 15:39 ` Christoph Lameter
2009-02-05 1:15 ` KAMEZAWA Hiroyuki
2009-02-05 1:15 ` KAMEZAWA Hiroyuki
2009-02-05 12:23 ` Swamy Gowda
2009-02-05 12:23 ` Swamy Gowda
2009-02-05 13:23 ` KAMEZAWA Hiroyuki
2009-02-05 13:23 ` KAMEZAWA Hiroyuki
2009-02-04 15:34 ` Christoph Lameter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.