* Is this a racing bug in page_make_sharable()? [not found] <4F14345B.4040807@gmail.com> @ 2012-01-16 14:43 ` Nai Xia 2012-01-17 10:53 ` Tim Deegan 0 siblings, 1 reply; 8+ messages in thread From: Nai Xia @ 2012-01-16 14:43 UTC (permalink / raw) To: Grzegorz Milos; +Cc: xen-devel Hi Grzegorz, As I understand, the purpose of the code in page_make_sharable() checking the ref count is to ensure that nobody unexpected is working on the page, and so we can migrate it to dom_cow, right? ==== /* Check if the ref count is 2. The first from PGT_allocated, and * the second from get_page_and_type at the top of this function */ if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) { /* Return type count back to zero */ put_page_and_type(page); spin_unlock(&d->page_alloc_lock); return -E2BIG; } ==== However, it seems to me that this ref check and the following page migration is not atomic( although the operations for type_info ref check seems good) i.e. it's possible that it passed this ref check but just before it goes to dom_cow, someone else gets this page? As far as I know, in Linux kernel's similar scenarios, they do ref-freezing tricks. And if we don't need to worry about this racing case, then what is this check trying to ensure? Thanks, Nai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2012-01-16 14:43 ` Is this a racing bug in page_make_sharable()? Nai Xia @ 2012-01-17 10:53 ` Tim Deegan 2012-12-27 15:35 ` Nai Xia 0 siblings, 1 reply; 8+ messages in thread From: Tim Deegan @ 2012-01-17 10:53 UTC (permalink / raw) To: Nai Xia; +Cc: xen-devel, Grzegorz Milos At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote: > Hi Grzegorz, > > As I understand, the purpose of the code in page_make_sharable() > checking the ref count is to ensure that nobody unexpected is working > on the page, and so we can migrate it to dom_cow, right? > > ==== > /* Check if the ref count is 2. The first from PGT_allocated, and > * the second from get_page_and_type at the top of this function */ > if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) > { > /* Return type count back to zero */ > put_page_and_type(page); > spin_unlock(&d->page_alloc_lock); > return -E2BIG; > } > ==== > > However, it seems to me that this ref check and the following page > migration is not atomic( although the operations for type_info ref > check seems good) i.e. it's possible that it passed this ref > check but just before it goes to dom_cow, someone else gets this page? Yes, there are a number of races around the p2m code; Anders Lagar-Cavilla has been working on fixing the locking around p2m lookups to take care of this. Tim. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2012-01-17 10:53 ` Tim Deegan @ 2012-12-27 15:35 ` Nai Xia 2013-01-10 13:00 ` Tim Deegan 0 siblings, 1 reply; 8+ messages in thread From: Nai Xia @ 2012-12-27 15:35 UTC (permalink / raw) To: Tim Deegan Cc: Lixiuchang, Xiaowei Yang, Luohao (brian), xen-devel, Grzegorz Milos Hi all, Last time I reported this bug. And I now see some changes in your Xen git master branch. However, I think the problem still remains for ref-checking in page migration to dom_cow. I think I can construct a bug by interleaving the two code paths: in guest_remove_page() | in page_make_sharable() ------------------------------------------------------------------------------------------------------------------------------ if ( p2m_is_shared(p2mt) ) ..... ... ..... page = mfn_to_page(mfn); ..... ..... if ( !get_page_and_type(page, d, PGT_shared_page) ) // success ......... if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) // also pass if ( unlikely(!get_page(page, d)) ) /* go on to remove page */ /* go on to add page to cow domain */ ------------------------------------------------------------------------------------------------------------------------------------- is there anything that can already prevent such racing or is this really can happen? Thanks, Nai Xia On 2012年01月17日 18:53, Tim Deegan wrote: > At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote: >> Hi Grzegorz, >> >> As I understand, the purpose of the code in page_make_sharable() >> checking the ref count is to ensure that nobody unexpected is working >> on the page, and so we can migrate it to dom_cow, right? >> >> ==== >> /* Check if the ref count is 2. The first from PGT_allocated, and >> * the second from get_page_and_type at the top of this function */ >> if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) >> { >> /* Return type count back to zero */ >> put_page_and_type(page); >> spin_unlock(&d->page_alloc_lock); >> return -E2BIG; >> } >> ==== >> >> However, it seems to me that this ref check and the following page >> migration is not atomic( although the operations for type_info ref >> check seems good) i.e. it's possible that it passed this ref >> check but just before it goes to dom_cow, someone else gets this page? > > Yes, there are a number of races around the p2m code; Anders > Lagar-Cavilla has been working on fixing the locking around p2m lookups > to take care of this. > > Tim. > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2012-12-27 15:35 ` Nai Xia @ 2013-01-10 13:00 ` Tim Deegan 2013-01-10 16:36 ` Andres Lagar-Cavilla 0 siblings, 1 reply; 8+ messages in thread From: Tim Deegan @ 2013-01-10 13:00 UTC (permalink / raw) To: Nai Xia Cc: Xiaowei Yang, Andres Lagar-Cavilla, Luohao (brian), Lixiuchang, xen-devel Hi, At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: > I think I can construct a bug by interleaving the two code paths: > > in guest_remove_page() | in page_make_sharable() > ------------------------------------------------------------------------------------------------------------------------------ > if ( p2m_is_shared(p2mt) ) ..... > ... ..... > page = mfn_to_page(mfn); ..... > ..... > > if ( > !get_page_and_type(page, > d, PGT_shared_page) ) > // success > > ......... > if ( page->count_info != > (PGC_allocated | (2 + > expected_refcnt)) ) // > also pass > > > if ( unlikely(!get_page(page, d)) ) > > /* go on to remove page */ /* go on to add page to > cow domain */ > ------------------------------------------------------------------------------------------------------------------------------------- > > > is there anything that can already prevent such racing or is this really > can happen? I think this race can happen. I'm not sure exactly what the effect is, though. I guess the page ends up belonging to dom_cow, but without the PGC_allocated bit set. So when it becomes unshared again, it's immediately freed. :( Andres, what do you think? Tim. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2013-01-10 13:00 ` Tim Deegan @ 2013-01-10 16:36 ` Andres Lagar-Cavilla 2013-01-10 17:25 ` Tim Deegan 0 siblings, 1 reply; 8+ messages in thread From: Andres Lagar-Cavilla @ 2013-01-10 16:36 UTC (permalink / raw) To: Tim Deegan Cc: Luohao (brian), Lixiuchang, Andres Lagar-Cavilla, Xiaowei Yang, Nai Xia, xen-devel Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks. Having said that... On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote: > Hi, > > At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: >> I think I can construct a bug by interleaving the two code paths: >> >> in guest_remove_page() | in page_make_sharable() >> ------------------------------------------------------------------------------------------------------------------------------ >> if ( p2m_is_shared(p2mt) ) ..... >> ... ..... >> page = mfn_to_page(mfn); ..... >> ..... >> >> if ( >> !get_page_and_type(page, >> d, PGT_shared_page) ) >> // success >> >> ......... >> if ( page->count_info != >> (PGC_allocated | (2 + >> expected_refcnt)) ) // >> also pass >> >> >> if ( unlikely(!get_page(page, d)) ) >> >> /* go on to remove page */ /* go on to add page to >> cow domain */ >> ------------------------------------------------------------------------------------------------------------------------------------- >> >> >> is there anything that can already prevent such racing or is this really >> can happen? > > I think this race can happen. I beg to disagree. Both page_make_sharable and page_make_private are protected by the per-mfn page lock (see __grab_shared_page). This lock will serialize strictly page_make_sharable against the page_make_private in quest_remove_page -> mem_sharing_unshare_page -> page_make_private. So page_make_sharable will execute atomically, add to cow, and then unshare will come in, make the page private again, remove from cow and hand over to the domain for whom the page is being removed. Or, the page has other shared references in which case page_make_sharable is a no-op, and page_make_private will generate a new different page (mfn). Or, and this is the tricky case you refer to, page_make_private will complete before page_make_sharable kicks in. The question then is: how did page_make_sharable even got to be called? The mfn is being nominated for sharing. How? Through a p2m entry in a domain. Is this the same domain as the one for which quest_remove_page is executing? Then all is serialized through the p2m lock, no race. Is this a different domain? Then the page is already shared, no race as per above. Finally, is this a different domain and the page is not shared? This Should Not Be! As in, never happens with the current APIs, and IMHO is borderline uncheckable for. Hope that helps, Andres > I'm not sure exactly what the effect > is, though. I guess the page ends up belonging to dom_cow, but > without the PGC_allocated bit set. So when it becomes unshared again, > it's immediately freed. :( > > Andres, what do you think? > > Tim. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2013-01-10 16:36 ` Andres Lagar-Cavilla @ 2013-01-10 17:25 ` Tim Deegan 2013-01-11 2:46 ` Nai Xia 0 siblings, 1 reply; 8+ messages in thread From: Tim Deegan @ 2013-01-10 17:25 UTC (permalink / raw) To: Andres Lagar-Cavilla Cc: Lixiuchang, Xiaowei Yang, Nai Xia, Luohao (brian), xen-devel At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote: > Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks. > > Having said that... > On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote: > > > Hi, > > > > At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: > >> I think I can construct a bug by interleaving the two code paths: > >> > >> in guest_remove_page() | in page_make_sharable() > >> ------------------------------------------------------------------------------------------------------------------------------ > >> if ( p2m_is_shared(p2mt) ) ..... > >> ... ..... > >> page = mfn_to_page(mfn); ..... > >> ..... > >> > >> if ( > >> !get_page_and_type(page, > >> d, PGT_shared_page) ) > >> // success > >> > >> ......... > >> if ( page->count_info != > >> (PGC_allocated | (2 + > >> expected_refcnt)) ) // > >> also pass > >> > >> > >> if ( unlikely(!get_page(page, d)) ) > >> > >> /* go on to remove page */ /* go on to add page to > >> cow domain */ > >> ------------------------------------------------------------------------------------------------------------------------------------- > >> > >> > >> is there anything that can already prevent such racing or is this really > >> can happen? > > > > I think this race can happen. > > Through a p2m entry in a domain. Is this the same domain as the one > for which quest_remove_page is executing? Then all is serialized > through the p2m lock, no race. Right, that's what I was missing. Because guest_remove_page has called get_gfn_query on the gfn, and not yet called put_gfn(), page_make_sharable() can't be running. All is well. :) Thanks, Tim. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Is this a racing bug in page_make_sharable()? 2013-01-10 17:25 ` Tim Deegan @ 2013-01-11 2:46 ` Nai Xia 0 siblings, 0 replies; 8+ messages in thread From: Nai Xia @ 2013-01-11 2:46 UTC (permalink / raw) To: Tim Deegan Cc: Lixiuchang, Andres Lagar-Cavilla, Xiaowei Yang, Luohao (brian), xen-devel On 2013年01月11日 01:25, Tim Deegan wrote: > At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote: >> Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks. >> >> Having said that... >> On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote: >> >>> Hi, >>> >>> At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote: >>>> I think I can construct a bug by interleaving the two code paths: >>>> >>>> in guest_remove_page() | in page_make_sharable() >>>> ------------------------------------------------------------------------------------------------------------------------------ >>>> if ( p2m_is_shared(p2mt) ) ..... >>>> ... ..... >>>> page = mfn_to_page(mfn); ..... >>>> ..... >>>> >>>> if ( >>>> !get_page_and_type(page, >>>> d, PGT_shared_page) ) >>>> // success >>>> >>>> ......... >>>> if ( page->count_info != >>>> (PGC_allocated | (2 + >>>> expected_refcnt)) ) // >>>> also pass >>>> >>>> >>>> if ( unlikely(!get_page(page, d)) ) >>>> >>>> /* go on to remove page */ /* go on to add page to >>>> cow domain */ >>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> is there anything that can already prevent such racing or is this really >>>> can happen? >>> >>> I think this race can happen. >> >> Through a p2m entry in a domain. Is this the same domain as the one >> for which quest_remove_page is executing? Then all is serialized >> through the p2m lock, no race. > > Right, that's what I was missing. Because guest_remove_page has called > get_gfn_query on the gfn, and not yet called put_gfn(), > page_make_sharable() can't be running. All is well. :) I see. Good to see everything working well. :) Thanks, Nai > > Thanks, > > Tim. > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Is this a racing bug in page_make_sharable()? @ 2012-01-16 14:51 Nai Xia 0 siblings, 0 replies; 8+ messages in thread From: Nai Xia @ 2012-01-16 14:51 UTC (permalink / raw) To: xen-devel Hi, As I understand, the purpose of the code in page_make_sharable() checking the ref count is to ensure that nobody unexpected is working on the page, and so we can migrate it to dom_cow, right? ==== /* Check if the ref count is 2. The first from PGT_allocated, and * the second from get_page_and_type at the top of this function */ if(page->count_info != (PGC_allocated | (2 + expected_refcnt))) { /* Return type count back to zero */ put_page_and_type(page); spin_unlock(&d->page_alloc_lock); return -E2BIG; } ==== However, it seems to me that this ref check and the following page migration is not atomic( although the operations for type_info ref check seems good) i.e. it's possible that it passed this ref check but just before it goes to dom_cow, someone else gets this page? As far as I know, in Linux kernel's similar scenarios, they do ref-freezing tricks. And if we don't need to worry about this racing case, then what is this check trying to ensure? Thanks, Nai ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-01-11 2:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <4F14345B.4040807@gmail.com> 2012-01-16 14:43 ` Is this a racing bug in page_make_sharable()? Nai Xia 2012-01-17 10:53 ` Tim Deegan 2012-12-27 15:35 ` Nai Xia 2013-01-10 13:00 ` Tim Deegan 2013-01-10 16:36 ` Andres Lagar-Cavilla 2013-01-10 17:25 ` Tim Deegan 2013-01-11 2:46 ` Nai Xia 2012-01-16 14:51 Nai Xia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).