From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 34A638D0040 for ; Fri, 25 Mar 2011 01:00:22 -0400 (EDT) Received: from wpaz17.hot.corp.google.com (wpaz17.hot.corp.google.com [172.24.198.81]) by smtp-out.google.com with ESMTP id p2P50Kk5012040 for ; Thu, 24 Mar 2011 22:00:20 -0700 Received: from gyf3 (gyf3.prod.google.com [10.243.50.67]) by wpaz17.hot.corp.google.com with ESMTP id p2P4xtLi008990 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 24 Mar 2011 22:00:19 -0700 Received: by gyf3 with SMTP id 3so420384gyf.17 for ; Thu, 24 Mar 2011 22:00:17 -0700 (PDT) MIME-Version: 1.0 Date: Thu, 24 Mar 2011 22:00:16 -0700 Message-ID: Subject: get_page() vs __split_huge_page_refcount() From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli , linux-mm Hi, I am getting up to speed with mainline THP code and was wondering what's going on with reference counts within __split_huge_page_refcount(): for (i = 1; i < HPAGE_PMD_NR; i++) { struct page *page_tail = page + i; /* tail_page->_count cannot change */ atomic_sub(atomic_read(&page_tail->_count), &page->_count); BUG_ON(page_count(page) <= 0); ... A look at get_page() gave a partial answer. First, the page refcount is incremented, then, if this was a tail page, the head page is looked up and its refcount is incremented too. __split_huge_page_refcount() preserves the refcount of tail pages but substracts it from the head page, as it'll be an independent page after the split. However this comment lead to more head scratching: /* * This is safe only because * __split_huge_page_refcount can't run under * get_page(). */ As I can see, follow_page() with a FOLL_GET flag is careful when it encounters huge pages. It tests the _PAGE_SPLITTING bit in the pmd (under protection of page_table_lock) to avoid racing with __split_huge_page_refcount(). Then, it can safely call get_page() and not worry about both refcounts updates being visible at once. My question is this: After someone obtains a page reference using get_user_pages(), what prevents them from getting additional references with get_page() ? I always thought it was legal to duplicate references that way, but now I don't see how it'd be safe doing so on anon pages with THP enabled. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id 8435C8D0040 for ; Fri, 25 Mar 2011 12:48:53 -0400 (EDT) Date: Fri, 25 Mar 2011 17:48:47 +0100 From: Andrea Arcangeli Subject: Re: get_page() vs __split_huge_page_refcount() Message-ID: <20110325164847.GD431@random.random> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: linux-mm Hi Michel, On Thu, Mar 24, 2011 at 10:00:16PM -0700, Michel Lespinasse wrote: > My question is this: After someone obtains a page reference using > get_user_pages(), what prevents them from getting additional > references with get_page() ? I always thought it was legal to > duplicate references that way, but now I don't see how it'd be safe > doing so on anon pages with THP enabled. It's not legal anymore as you noticed, but I'm not aware of anything doing that. I don't see an useful case where a driver could need to take one extra refcount after GUP returned. The normal API is GUP/put_page. We could make it legal again by taking the compound_lock after a PageCompound check though. I hope it's not needed though. It's unavoidable in put_page because put_page will run out of order with regard to __split_huge_page_refcount. But serializing get_page in GUP against __split_huge_page_refcount is automatic through the pmd_trans_splitting bit and needed for all page table walkers anyway. Maybe it's good idea to add a comment to transhuge.txt about that? I don't think I added it. Grepping for get_page in drivers doesn't show too many, they mostly run through the vm_ops->fault handler. Most important I can't see how possibly it could be useful to run a get_page after get_user_pages(FOLL_GET) returns. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org