All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Naoya Horiguchi <nao.horiguchi@gmail.com>
Subject: Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
Date: Thu, 3 Jul 2014 14:41:00 +0300	[thread overview]
Message-ID: <20140703114100.GA27140@node.dhcp.inet.fi> (raw)
In-Reply-To: <20140702043057.GA19813@nhori.redhat.com>

On Wed, Jul 02, 2014 at 12:30:57AM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > > Why do we need this special case for hugetlb page ->index? Why not use
> > > > PAGE_SIZE units there too? Or I miss something?
> > > 
> > > hugetlb pages are never split, so we use larger page cache size for
> > > hugetlbfs file (to avoid large sparse page cache tree.)
> > 
> > For transparent huge page cache I would like to have native support in
> > page cache radix-tree: since huge pages are always naturally aligned we
> > can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> > HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> > range the huge page represents. This approach should fit hugetlb too. And
> > -1 special case for hugetlb.
> > But I'm not sure when I'll get time to play with this...
> 
> So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
> when transparent huge page is merged. I may try to write patches
> on top of your tree after I've done a few series in my work queue.
> 
> In order to fix the current problem, I suggest a page_to_pgoff() as a
> short-term workaround. I found a few other call sites which can call
> on hugepage, so this function help us track such callers.
> The similar function seems to be introduced in your transparent huge
> page cache tree (page_cache_index()). So this function will be finally
> overwritten with it.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Tue, 1 Jul 2014 21:38:22 -0400
> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Naoya Horiguchi <nao.horiguchi@gmail.com>
Subject: Re: [PATCH] rmap: fix pgoff calculation to handle hugepage correctly
Date: Thu, 3 Jul 2014 14:41:00 +0300	[thread overview]
Message-ID: <20140703114100.GA27140@node.dhcp.inet.fi> (raw)
In-Reply-To: <20140702043057.GA19813@nhori.redhat.com>

On Wed, Jul 02, 2014 at 12:30:57AM -0400, Naoya Horiguchi wrote:
> On Tue, Jul 01, 2014 at 11:15:40PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 01, 2014 at 02:50:21PM -0400, Naoya Horiguchi wrote:
> > > On Tue, Jul 01, 2014 at 09:07:39PM +0300, Kirill A. Shutemov wrote:
> > > > Why do we need this special case for hugetlb page ->index? Why not use
> > > > PAGE_SIZE units there too? Or I miss something?
> > > 
> > > hugetlb pages are never split, so we use larger page cache size for
> > > hugetlbfs file (to avoid large sparse page cache tree.)
> > 
> > For transparent huge page cache I would like to have native support in
> > page cache radix-tree: since huge pages are always naturally aligned we
> > can create a leaf node for it several (RADIX_TREE_MAP_SHIFT -
> > HPAGE_PMD_ORDER) levels up by tree, which would cover all indexes in the
> > range the huge page represents. This approach should fit hugetlb too. And
> > -1 special case for hugetlb.
> > But I'm not sure when I'll get time to play with this...
> 
> So I'm OK that hugetlb page should have ->index in PAGE_CACHE_SIZE
> when transparent huge page is merged. I may try to write patches
> on top of your tree after I've done a few series in my work queue.
> 
> In order to fix the current problem, I suggest a page_to_pgoff() as a
> short-term workaround. I found a few other call sites which can call
> on hugepage, so this function help us track such callers.
> The similar function seems to be introduced in your transparent huge
> page cache tree (page_cache_index()). So this function will be finally
> overwritten with it.
> 
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Tue, 1 Jul 2014 21:38:22 -0400
> Subject: [PATCH v2] rmap: fix pgoff calculation to handle hugepage correctly
> 
> I triggered VM_BUG_ON() in vma_address() when I try to migrate an anonymous
> hugepage with mbind() in the kernel v3.16-rc3. This is because pgoff's
> calculation in rmap_walk_anon() fails to consider compound_order() only to
> have an incorrect value.
> 
> This patch introduces page_to_pgoff(), which gets the page's offset in
> PAGE_CACHE_SIZE. Kirill pointed out that page cache tree should natively
> handle hugepages, and in order to make hugetlbfs fit it, page->index of
> hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
> but page_to_pgoff() contains the point to be fixed in a single function.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

  reply	other threads:[~2014-07-03 11:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-01 14:46 [PATCH] rmap: fix pgoff calculation to handle hugepage correctly Naoya Horiguchi
2014-07-01 14:46 ` Naoya Horiguchi
2014-07-01 17:42 ` Rik van Riel
2014-07-01 17:42   ` Rik van Riel
2014-07-01 18:07 ` Kirill A. Shutemov
2014-07-01 18:07   ` Kirill A. Shutemov
2014-07-01 18:50   ` Naoya Horiguchi
2014-07-01 18:50     ` Naoya Horiguchi
2014-07-01 20:15     ` Kirill A. Shutemov
2014-07-01 20:15       ` Kirill A. Shutemov
2014-07-02  4:30       ` Naoya Horiguchi
2014-07-02  4:30         ` Naoya Horiguchi
2014-07-03 11:41         ` Kirill A. Shutemov [this message]
2014-07-03 11:41           ` Kirill A. Shutemov
2014-07-07 19:39         ` Andrew Morton
2014-07-07 19:39           ` Andrew Morton
2014-07-15 16:41           ` [PATCH -mm] mm: refactor page index/offset getters Naoya Horiguchi
2014-07-15 16:41             ` Naoya Horiguchi
2014-07-23 21:39             ` Andrew Morton
2014-07-23 21:39               ` Andrew Morton
2014-07-23 21:45               ` Naoya Horiguchi
2014-07-23 21:45                 ` Naoya Horiguchi
2014-07-28 20:29             ` Johannes Weiner
2014-07-28 20:29               ` Johannes Weiner
2014-07-29  0:42               ` Naoya Horiguchi
2014-07-29  0:42                 ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140703114100.GA27140@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.