* migrate_misplaced_transhuge_page: no page_count check? @ 2012-12-20 4:52 Hugh Dickins 2012-12-20 16:49 ` Mel Gorman 0 siblings, 1 reply; 4+ messages in thread From: Hugh Dickins @ 2012-12-20 4:52 UTC (permalink / raw) To: Mel Gorman, Ingo Molnar; +Cc: linux-kernel, linux-mm Mel, Ingo, I want to raise again a question I raised (in offline mail with Mel) a couple of weeks ago. I see only a page_mapcount check in migrate_misplaced_transhuge_page, and don't understand how migration can be safe against the possibility of an earlier call to get_user_pages or get_user_pages_fast (intended to pin a part of the THP) without a page_count check. (I'm also still somewhat worried about unidentified attempts to pin the page concurrently; but since I don't have an example to give, and concurrent get_user_pages or get_user_pages_fast wouldn't get past the pmd_numa, let's not worry too much about my unidentified anxiety ;) migrate_page_move_mapping and migrate_huge_page_move_mapping check page_count, but migrate_misplaced_transhuge_page doesn't use those. __collapse_huge_page_isolate and khugepaged_scan_pmd (over in huge_memory.c) take commented care to check page_count lest GUP. I can see that page_count might often be raised by concurrent faults on the same pmd_numa, waiting on the lock_page in do_huge_pmd_numa_page. That's unfortunate, and maybe you can find a clever way to discount those. But safety must come first: don't we need to check page_count? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: migrate_misplaced_transhuge_page: no page_count check? 2012-12-20 4:52 migrate_misplaced_transhuge_page: no page_count check? Hugh Dickins @ 2012-12-20 16:49 ` Mel Gorman 2012-12-21 18:38 ` Hugh Dickins 0 siblings, 1 reply; 4+ messages in thread From: Mel Gorman @ 2012-12-20 16:49 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, linux-kernel, linux-mm, Andrea Arcangeli On Wed, Dec 19, 2012 at 08:52:37PM -0800, Hugh Dickins wrote: > Mel, Ingo, > > I want to raise again a question I raised (in offline mail with Mel) > a couple of weeks ago. > It's a good question and thanks for kicking me on this again because I had not followed up properly. > I see only a page_mapcount check in migrate_misplaced_transhuge_page, > and don't understand how migration can be safe against the possibility > of an earlier call to get_user_pages or get_user_pages_fast (intended > to pin a part of the THP) without a page_count check. > It would be hard to trigger bugs in relation to it but it's not fundamentally safe either and just happens to work due to limitations of THP as much as anything else. Adding Andrea to cc in case I'm lacking imagination. IMO, the GUP would need to be relatively long-lived to trigger a bug so the realistic candidate is direct IO. it. It would look something like; 1. pin for direct IO 2. PTE scanner run, mark pte_numa 3. Incur a fault on a remote node, migrate the page 4. direct IO completes on old page and is lost Steps 1 and 2 can happen in either order. I am reasonably sure specjbb, autonumabench and friends are not exercising such paths so I would not have encountered it. The end result would look like a read or write corruption to the application but I also expect that applications that are accessing pages under direct IO are already buggy and it'd be hard to tell the difference. It's a completely different situation than what khugepaged has to deal with. The migration concerns for base pages are also much more involvedi. > (I'm also still somewhat worried about unidentified attempts to > pin the page concurrently; but since I don't have an example to give, > and concurrent get_user_pages or get_user_pages_fast wouldn't get past > the pmd_numa, let's not worry too much about my unidentified anxiety ;) > You are right to be worried. > migrate_page_move_mapping and migrate_huge_page_move_mapping check > page_count, but migrate_misplaced_transhuge_page doesn't use those. > __collapse_huge_page_isolate and khugepaged_scan_pmd (over in > huge_memory.c) take commented care to check page_count lest GUP. > > I can see that page_count might often be raised by concurrent faults > on the same pmd_numa, waiting on the lock_page in do_huge_pmd_numa_page. > That's unfortunate, and maybe you can find a clever way to discount > those. But safety must come first: don't we need to check page_count? > We do and I'm not super-worried about the concurrent faults as a brief check indicated that only 2% of migration attempts failed due to parallel faults elevating the count when running autonumabench. The impact is that the migration is delayed until the next PTE scan which is bad, but not bad enough to delay the obvious safety fix first. How about this? ---8<--- mm: migrate: Check page_count of THP before migrating Hugh Dickins poined out that migrate_misplaced_transhuge_page() does not check page_count before migrating like base page migration and khugepage. He could not see why this was safe and he is right. It happens to work for the most part. The page_mapcount() check ensures that only a single address space is using this page and as THPs are typically private it should not be possible for another address space to fault it in parallel. If the address space has one associated task then it's difficult to have both a GUP pin and be referencing the page at the same time. If there are multiple tasks then a buggy scenario requires that another thread be accessing the page while the direct IO is in flight. This is dodgy behaviour as there is a possibility of corruption with or without THP migration. It would be difficult to identify the corruption as being a migration bug. While we happen to be ok for THP migration versus GUP it is shoddy to depend on such "safety" so this patch checks the page count similar to anonymous pages. Note that this does not mean that the page_mapcount() check can go away. If we were to remove the page_mapcount() check then the THP would have to be unmapped from all referencing PTEs, replaced with migration PTEs and restored properly afterwards. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/migrate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 3b676b0..7636b90 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1679,7 +1679,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, page_xchg_last_nid(new_page, page_last_nid(page)); isolated = numamigrate_isolate_page(pgdat, page); - if (!isolated) { + + /* + * Failing to isolate or a GUP pin prevents migration. The expected + * page count is 2. 1 for anonymous pages without a mapping and 1 + * for the callers pin + */ + if (!isolated || page_count(page) != 2) { count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); put_page(new_page); goto out_keep_locked; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: migrate_misplaced_transhuge_page: no page_count check? 2012-12-20 16:49 ` Mel Gorman @ 2012-12-21 18:38 ` Hugh Dickins 2012-12-21 20:35 ` Mel Gorman 0 siblings, 1 reply; 4+ messages in thread From: Hugh Dickins @ 2012-12-21 18:38 UTC (permalink / raw) To: Mel Gorman; +Cc: Ingo Molnar, linux-kernel, linux-mm, Andrea Arcangeli On Thu, 20 Dec 2012, Mel Gorman wrote: > On Wed, Dec 19, 2012 at 08:52:37PM -0800, Hugh Dickins wrote: > > Mel, Ingo, > > > > I want to raise again a question I raised (in offline mail with Mel) > > a couple of weeks ago. > > > > It's a good question and thanks for kicking me on this again because I > had not followed up properly. > > > I see only a page_mapcount check in migrate_misplaced_transhuge_page, > > and don't understand how migration can be safe against the possibility > > of an earlier call to get_user_pages or get_user_pages_fast (intended > > to pin a part of the THP) without a page_count check. > > > > It would be hard to trigger bugs in relation to it but it's not fundamentally > safe either and just happens to work due to limitations of THP as much as > anything else. Adding Andrea to cc in case I'm lacking imagination. IMO, > the GUP would need to be relatively long-lived to trigger a bug so the > realistic candidate is direct IO. it. It would look something like; > > 1. pin for direct IO > 2. PTE scanner run, mark pte_numa > 3. Incur a fault on a remote node, migrate the page > 4. direct IO completes on old page and is lost > > Steps 1 and 2 can happen in either order. I am reasonably sure specjbb, > autonumabench and friends are not exercising such paths so I would not > have encountered it. > > The end result would look like a read or write corruption to the application > but I also expect that applications that are accessing pages under direct > IO are already buggy and it'd be hard to tell the difference. It's a > completely different situation than what khugepaged has to deal with. > The migration concerns for base pages are also much more involvedi. > > > (I'm also still somewhat worried about unidentified attempts to > > pin the page concurrently; but since I don't have an example to give, > > and concurrent get_user_pages or get_user_pages_fast wouldn't get past > > the pmd_numa, let's not worry too much about my unidentified anxiety ;) > > > > You are right to be worried. > > > migrate_page_move_mapping and migrate_huge_page_move_mapping check > > page_count, but migrate_misplaced_transhuge_page doesn't use those. > > __collapse_huge_page_isolate and khugepaged_scan_pmd (over in > > huge_memory.c) take commented care to check page_count lest GUP. > > > > I can see that page_count might often be raised by concurrent faults > > on the same pmd_numa, waiting on the lock_page in do_huge_pmd_numa_page. > > That's unfortunate, and maybe you can find a clever way to discount > > those. But safety must come first: don't we need to check page_count? > > > > We do and I'm not super-worried about the concurrent faults as a brief > check indicated that only 2% of migration attempts failed due to parallel > faults elevating the count when running autonumabench. The impact is that > the migration is delayed until the next PTE scan which is bad, but not > bad enough to delay the obvious safety fix first. How about this? > > ---8<--- > mm: migrate: Check page_count of THP before migrating > > Hugh Dickins poined out that migrate_misplaced_transhuge_page() does not > check page_count before migrating like base page migration and khugepage. He > could not see why this was safe and he is right. > > It happens to work for the most part. The page_mapcount() check ensures that > only a single address space is using this page and as THPs are typically > private it should not be possible for another address space to fault it in > parallel. If the address space has one associated task then it's difficult to > have both a GUP pin and be referencing the page at the same time. If there > are multiple tasks then a buggy scenario requires that another thread be > accessing the page while the direct IO is in flight. This is dodgy behaviour > as there is a possibility of corruption with or without THP migration. It > would be difficult to identify the corruption as being a migration bug. > > While we happen to be ok for THP migration versus GUP it is shoddy to > depend on such "safety" so this patch checks the page count similar to > anonymous pages. Note that this does not mean that the page_mapcount() > check can go away. If we were to remove the page_mapcount() check then > the THP would have to be unmapped from all referencing PTEs, replaced with > migration PTEs and restored properly afterwards. > > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Thanks very much for responding so quickly on this, Mel. It pains me that I cannot yet say acked-by, because I need to spend more time checking it, and cannot do so today. I like where you've placed the check, that's just right. But I'm worried that perhaps there's a putback_lru_page missing, and wonder if it's missing even without this additional patch. It would not be immediately obvious if it's missing, the pages wouldn't leak, but more and more would become unreclaimable until freed. Hugh > --- > mm/migrate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 3b676b0..7636b90 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1679,7 +1679,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, > page_xchg_last_nid(new_page, page_last_nid(page)); > > isolated = numamigrate_isolate_page(pgdat, page); > - if (!isolated) { > + > + /* > + * Failing to isolate or a GUP pin prevents migration. The expected > + * page count is 2. 1 for anonymous pages without a mapping and 1 > + * for the callers pin > + */ > + if (!isolated || page_count(page) != 2) { > count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); > put_page(new_page); > goto out_keep_locked; > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: migrate_misplaced_transhuge_page: no page_count check? 2012-12-21 18:38 ` Hugh Dickins @ 2012-12-21 20:35 ` Mel Gorman 0 siblings, 0 replies; 4+ messages in thread From: Mel Gorman @ 2012-12-21 20:35 UTC (permalink / raw) To: Hugh Dickins; +Cc: Ingo Molnar, linux-kernel, linux-mm, Andrea Arcangeli On Fri, Dec 21, 2012 at 10:38:45AM -0800, Hugh Dickins wrote: > > While we happen to be ok for THP migration versus GUP it is shoddy to > > depend on such "safety" so this patch checks the page count similar to > > anonymous pages. Note that this does not mean that the page_mapcount() > > check can go away. If we were to remove the page_mapcount() check then > > the THP would have to be unmapped from all referencing PTEs, replaced with > > migration PTEs and restored properly afterwards. > > > > Reported-by: Hugh Dickins <hughd@google.com> > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > Thanks very much for responding so quickly on this, Mel. It pains > me that I cannot yet say acked-by, because I need to spend more time > checking it, and cannot do so today. > > I like where you've placed the check, that's just right. But I'm > worried that perhaps there's a putback_lru_page missing, and wonder > if it's missing even without this additional patch. *hangs head* Failing to migrate due to GUP needs to put the page back on the LRU. After this patch; 1. fail to isolate from LRU, old page is still on LRU 2. fail due to GUP or parallel fault, old page is put back on LRU 3. fail because PMD changed while PTL was released, old page put back on LRU 4. Successful migration, new page added to LRU by page_add_new_anon_rmap()->lru_cache_add_lru() ---8<--- mm: migrate: Check page_count of THP before migrating Hugh Dickins poined out that migrate_misplaced_transhuge_page() does not check page_count before migrating like base page migration and khugepage. He could not see why this was safe and he is right. It happens to work for the most part. The page_mapcount() check ensures that only a single address space is using this page and as THPs are typically private it should not be possible for another address space to fault it in parallel. If the address space has one associated task then it's difficult to have both a GUP pin and be referencing the page at the same time. If there are multiple tasks then a buggy scenario requires that another thread be accessing the page while the direct IO is in flight. This is dodgy behaviour as there is a possibility of corruption with or without THP migration. It would be difficult to identify the corruption as being a migration bug. While we happen to be ok for THP migration versus GUP it is shoddy to depend on such "safety" so this patch checks the page count similar to anonymous pages. Note that this does not mean that the page_mapcount() check can go away. If we were to remove the page_mapcount() check the the THP would have to be unmapped from all referencing PTEs, replaced with migration PTEs and restored properly afterwards. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/migrate.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 3b676b0..f466827 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1679,9 +1679,18 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, page_xchg_last_nid(new_page, page_last_nid(page)); isolated = numamigrate_isolate_page(pgdat, page); - if (!isolated) { + + /* + * Failing to isolate or a GUP pin prevents migration. The expected + * page count is 2. 1 for anonymous pages without a mapping and 1 + * for the callers pin. If the page was isolated, the page will + * need to be put back on the LRU. + */ + if (!isolated || page_count(page) != 2) { count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); put_page(new_page); + if (isolated) + putback_lru_page(page); goto out_keep_locked; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-21 20:36 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-20 4:52 migrate_misplaced_transhuge_page: no page_count check? Hugh Dickins 2012-12-20 16:49 ` Mel Gorman 2012-12-21 18:38 ` Hugh Dickins 2012-12-21 20:35 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).