From: Matthew Wilcox <willy@linux.intel.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Matthew Wilcox <willy@linux.intel.com>
Subject: Re: [PATCH v1 5/7] dax: Add huge page fault support
Date: Thu, 9 Oct 2014 16:47:16 -0400 [thread overview]
Message-ID: <20141009204716.GQ5098@wil.cx> (raw)
In-Reply-To: <20141008201100.GB9232@node.dhcp.inet.fi>
On Wed, Oct 08, 2014 at 11:11:00PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 08, 2014 at 09:25:27AM -0400, Matthew Wilcox wrote:
> > + pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + if (pgoff >= size)
> > + return VM_FAULT_SIGBUS;
> > + /* If the PMD would cover blocks out of the file */
> > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > + return VM_FAULT_FALLBACK;
>
> IIUC, zero pading would work too.
The blocks after this file might be allocated to another file already.
I suppose we could ask the filesystem if it wants to allocate them to
this file.
Dave, Jan, is it acceptable to call get_block() for blocks that extend
beyond the current i_size?
> > +
> > + memset(&bh, 0, sizeof(bh));
> > + block = ((sector_t)pgoff & ~PG_PMD_COLOUR) << (PAGE_SHIFT - blkbits);
> > +
> > + /* Start by seeing if we already have an allocated block */
> > + bh.b_size = PMD_SIZE;
> > + length = get_block(inode, block, &bh, 0);
>
> This makes me confused. get_block() return zero on success, right?
> Why the var called 'lenght'?
Historical reasons. I can go back and change the name of the variable.
> > + sector = bh.b_blocknr << (blkbits - 9);
> > + length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn, bh.b_size);
> > + if (length < 0)
> > + goto sigbus;
> > + if (length < PMD_SIZE)
> > + goto fallback;
> > + if (pfn & PG_PMD_COLOUR)
> > + goto fallback; /* not aligned */
>
> So, are you rely on pure luck to make get_block() allocate 2M aligned pfn?
> Not really productive. You would need assistance from fs and
> arch_get_unmapped_area() sides.
Certainly ext4 and XFS will align their allocations; if you ask it for a
2MB block, it will try to allocate a 2MB block aligned on a 2MB boundary.
I started looking into the get_unampped_area (and have the code sitting
around to align specially marked files on special boundaries), but when
I mentioned it to the author of the NVM Library, he said "Oh, I'll just
pick a 1GB aligned area to request it be mapped at", so I haven't taken
it any further.
The upshot is that (confirmed with debugging code), when the tests run,
they pretty much always get a correctly aligned block.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@linux.intel.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Matthew Wilcox <willy@linux.intel.com>
Subject: Re: [PATCH v1 5/7] dax: Add huge page fault support
Date: Thu, 9 Oct 2014 16:47:16 -0400 [thread overview]
Message-ID: <20141009204716.GQ5098@wil.cx> (raw)
In-Reply-To: <20141008201100.GB9232@node.dhcp.inet.fi>
On Wed, Oct 08, 2014 at 11:11:00PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 08, 2014 at 09:25:27AM -0400, Matthew Wilcox wrote:
> > + pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> > + size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + if (pgoff >= size)
> > + return VM_FAULT_SIGBUS;
> > + /* If the PMD would cover blocks out of the file */
> > + if ((pgoff | PG_PMD_COLOUR) >= size)
> > + return VM_FAULT_FALLBACK;
>
> IIUC, zero pading would work too.
The blocks after this file might be allocated to another file already.
I suppose we could ask the filesystem if it wants to allocate them to
this file.
Dave, Jan, is it acceptable to call get_block() for blocks that extend
beyond the current i_size?
> > +
> > + memset(&bh, 0, sizeof(bh));
> > + block = ((sector_t)pgoff & ~PG_PMD_COLOUR) << (PAGE_SHIFT - blkbits);
> > +
> > + /* Start by seeing if we already have an allocated block */
> > + bh.b_size = PMD_SIZE;
> > + length = get_block(inode, block, &bh, 0);
>
> This makes me confused. get_block() return zero on success, right?
> Why the var called 'lenght'?
Historical reasons. I can go back and change the name of the variable.
> > + sector = bh.b_blocknr << (blkbits - 9);
> > + length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn, bh.b_size);
> > + if (length < 0)
> > + goto sigbus;
> > + if (length < PMD_SIZE)
> > + goto fallback;
> > + if (pfn & PG_PMD_COLOUR)
> > + goto fallback; /* not aligned */
>
> So, are you rely on pure luck to make get_block() allocate 2M aligned pfn?
> Not really productive. You would need assistance from fs and
> arch_get_unmapped_area() sides.
Certainly ext4 and XFS will align their allocations; if you ask it for a
2MB block, it will try to allocate a 2MB block aligned on a 2MB boundary.
I started looking into the get_unampped_area (and have the code sitting
around to align specially marked files on special boundaries), but when
I mentioned it to the author of the NVM Library, he said "Oh, I'll just
pick a 1GB aligned area to request it be mapped at", so I haven't taken
it any further.
The upshot is that (confirmed with debugging code), when the tests run,
they pretty much always get a correctly aligned block.
next prev parent reply other threads:[~2014-10-09 20:47 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-08 13:25 [PATCH v1 0/7] Huge page support for DAX Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 13:25 ` [PATCH v1 1/7] thp: vma_adjust_trans_huge(): adjust file-backed VMA too Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 13:25 ` [PATCH v1 2/7] mm: Prepare for DAX huge pages Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 15:21 ` Kirill A. Shutemov
2014-10-08 15:57 ` Matthew Wilcox
2014-10-08 19:43 ` Kirill A. Shutemov
2014-10-08 19:43 ` Kirill A. Shutemov
2014-10-09 20:40 ` Matthew Wilcox
2014-10-09 20:40 ` Matthew Wilcox
2014-10-13 20:36 ` Kirill A. Shutemov
2014-10-13 20:36 ` Kirill A. Shutemov
2014-10-08 13:25 ` [PATCH v1 3/7] mm: Add vm_insert_pfn_pmd() Matthew Wilcox
2014-10-08 13:25 ` [PATCH v1 4/7] mm: Add a pmd_fault handler Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 13:25 ` [PATCH v1 5/7] dax: Add huge page fault support Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 20:11 ` Kirill A. Shutemov
2014-10-08 20:11 ` Kirill A. Shutemov
2014-10-09 20:47 ` Matthew Wilcox [this message]
2014-10-09 20:47 ` Matthew Wilcox
2014-10-13 1:13 ` Dave Chinner
2014-10-13 1:13 ` Dave Chinner
2014-10-08 13:25 ` [PATCH v1 6/7] ext2: Huge " Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
2014-10-08 13:25 ` [PATCH v1 7/7] ext4: " Matthew Wilcox
2014-10-08 13:25 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141009204716.GQ5098@wil.cx \
--to=willy@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.