From: Matthew Wilcox <willy@linux.intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 04/21] mm: Allow page fault handlers to perform the COW
Date: Thu, 16 Oct 2014 15:48:15 -0400 [thread overview]
Message-ID: <20141016194815.GD11522@wil.cx> (raw)
In-Reply-To: <20141016091136.GC19075@thinkos.etherlink>
On Thu, Oct 16, 2014 at 11:12:22AM +0200, Mathieu Desnoyers wrote:
> On 25-Sep-2014 04:33:21 PM, Matthew Wilcox wrote:
> > Currently COW of an XIP file is done by first bringing in a read-only
> > mapping, then retrying the fault and copying the page. It is much more
> > efficient to tell the fault handler that a COW is being attempted (by
> > passing in the pre-allocated page in the vm_fault structure), and allow
> > the handler to perform the COW operation itself.
> >
> > The handler cannot insert the page itself if there is already a read-only
> > mapping at that address, so allow the handler to return VM_FAULT_LOCKED
> > and set the fault_page to be NULL. This indicates to the MM code that
> > the i_mmap_mutex is held instead of the page lock.
>
> Why test the value of fault_page pointer rather than just test return
> flags to detect in which state the callee left i_mmap_mutex ?
Maybe my changelog isn't clear enough to a non-mm expert. Which would
include me. Usually page fault handlers return with the page lock
held and VM_FAULT_LOCKED set. This patch adds the ability to return
with VM_FAULT_LOCKED set and a NULL page. This indicates to the VM the
new possibility that the i_mmap_mutex is held instead of the page lock
(since there is no page, we cannot possibly be holding the page lock).
But we have to hold some kind of lock here, or we run the risk of a
truncate operation coming in and removing the page from the file that we
just found. The i_mmap_mutex is not ideal (since it may become heavily
contended), but it does fix the race, and some people have interesting
ideas on how to fix the scalability problem.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 8981cc8..0a47817 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -208,6 +208,7 @@ struct vm_fault {
> > pgoff_t pgoff; /* Logical page offset based on vma */
> > void __user *virtual_address; /* Faulting virtual address */
> >
> > + struct page *cow_page; /* Handler may choose to COW */
>
> The page fault handler being very much performance sensitive, I'm
> wondering if it would not be better to move cow_page near the end of
> struct vm_fault, so that the "page" field can stay on the first
> cache line.
I think your mental arithmetic has an "off by double" there:
struct vm_fault {
unsigned int flags; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int pgoff; /* 8 8 */
void * virtual_address; /* 16 8 */
struct page * cow_page; /* 24 8 */
struct page * page; /* 32 8 */
long unsigned int max_pgoff; /* 40 8 */
pte_t * pte; /* 48 8 */
/* size: 56, cachelines: 1, members: 7 */
/* sum members: 52, holes: 1, sum holes: 4 */
/* last cacheline: 56 bytes */
};
> > @@ -2000,6 +2000,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> > vmf.pgoff = page->index;
> > vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
> > vmf.page = page;
> > + vmf.cow_page = NULL;
>
> Could we add a FAULT_FLAG_COW_PAGE to vmf.flags, so we don't have to set
> cow_page to NULL in the common case (when it is not used) ?
I don't think we're short on bits, so I'm not opposed. Any MM people
want to weigh in before I make this change?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-10-16 19:49 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-25 20:33 [PATCH v11 00/21] Add support for NV-DIMMs to ext4 Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 01/21] axonram: Fix bug in direct_access Matthew Wilcox
2014-10-16 7:52 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 02/21] block: Change direct_access calling convention Matthew Wilcox
2014-10-16 8:45 ` Mathieu Desnoyers
2014-10-16 19:39 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 03/21] mm: Fix XIP fault vs truncate race Matthew Wilcox
2014-10-16 8:56 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 04/21] mm: Allow page fault handlers to perform the COW Matthew Wilcox
2014-10-16 9:12 ` Mathieu Desnoyers
2014-10-16 19:48 ` Matthew Wilcox [this message]
2014-10-17 15:35 ` Mathieu Desnoyers
2014-10-18 17:22 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 05/21] vfs,ext2: Introduce IS_DAX(inode) Matthew Wilcox
2014-10-16 9:35 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 06/21] vfs: Add copy_to_iter(), copy_from_iter() and iov_iter_zero() Matthew Wilcox
2014-10-16 13:33 ` Mathieu Desnoyers
2014-10-16 13:59 ` Matthew Wilcox
2014-10-16 14:12 ` Mathieu Desnoyers
2014-10-16 22:21 ` Matthew Wilcox
2014-10-17 15:39 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 07/21] dax,ext2: Replace XIP read and write with DAX I/O Matthew Wilcox
2014-10-16 9:50 ` Mathieu Desnoyers
2014-10-16 19:51 ` Matthew Wilcox
2014-10-16 22:33 ` Matthew Wilcox
2014-10-17 15:52 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 08/21] dax,ext2: Replace ext2_clear_xip_target with dax_clear_blocks Matthew Wilcox
2014-10-16 10:05 ` Mathieu Desnoyers
2014-10-16 21:22 ` Matthew Wilcox
2014-10-17 15:45 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 09/21] dax,ext2: Replace the XIP page fault handler with the DAX page fault handler Matthew Wilcox
2014-10-16 10:20 ` Mathieu Desnoyers
2014-10-16 21:29 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 10/21] dax,ext2: Replace xip_truncate_page with dax_truncate_page Matthew Wilcox
2014-10-16 10:28 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 11/21] dax: Replace XIP documentation with DAX documentation Matthew Wilcox
2014-10-16 12:08 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 12/21] vfs: Remove get_xip_mem Matthew Wilcox
2014-10-16 12:14 ` Mathieu Desnoyers
2014-10-16 21:44 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 13/21] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-10-16 12:18 ` Mathieu Desnoyers
2014-10-16 21:45 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 14/21] ext2: Remove ext2_use_xip Matthew Wilcox
2014-10-16 12:20 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 15/21] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-10-16 12:21 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 16/21] vfs,ext2: Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX Matthew Wilcox
2014-10-16 12:26 ` Mathieu Desnoyers
2014-10-16 21:52 ` Matthew Wilcox
2014-09-25 20:33 ` [PATCH v11 17/21] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-10-16 12:29 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 18/21] ext2: Get rid of most mentions of XIP in ext2 Matthew Wilcox
2014-10-16 12:32 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 19/21] dax: Add dax_zero_page_range Matthew Wilcox
2014-10-16 12:38 ` Mathieu Desnoyers
2014-10-16 22:01 ` Matthew Wilcox
2014-10-17 15:49 ` Mathieu Desnoyers
2014-10-18 17:41 ` Matthew Wilcox
2014-10-18 21:16 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 20/21] ext4: Add DAX functionality Matthew Wilcox
2014-10-16 12:56 ` Mathieu Desnoyers
2014-10-16 22:16 ` Matthew Wilcox
2014-10-17 15:42 ` Mathieu Desnoyers
2014-09-25 20:33 ` [PATCH v11 21/21] brd: Rename XIP to DAX Matthew Wilcox
2014-10-16 13:00 ` Mathieu Desnoyers
2015-03-24 18:50 ` Matt Mullins
2015-03-25 3:25 ` Dave Chinner
2015-03-26 17:09 ` Should implementations of ->direct_access be allowed to sleep? Matthew Wilcox
2015-03-26 19:32 ` Dave Chinner
2015-03-29 8:02 ` Boaz Harrosh
2015-03-29 9:13 ` Boaz Harrosh
2014-09-25 20:47 ` [PATCH v11 00/21] Add support for NV-DIMMs to ext4 Matthew Wilcox
2014-09-30 9:45 ` Valdis.Kletnieks
2014-09-30 14:48 ` Matthew Wilcox
2014-09-30 14:53 ` Valdis.Kletnieks
2014-09-30 16:08 ` Matthew Wilcox
2014-09-30 17:10 ` Zuckerman, Boris
2014-09-30 19:24 ` Matthew Wilcox
2014-09-30 19:31 ` Zuckerman, Boris
2014-09-30 20:37 ` Valdis.Kletnieks
2014-09-30 21:25 ` Andreas Dilger
2014-09-30 21:52 ` Valdis.Kletnieks
2014-10-01 15:45 ` Jeff Moyer
2014-10-01 17:10 ` Valdis.Kletnieks
2014-10-01 17:17 ` Valdis.Kletnieks
2014-10-16 7:39 ` Mathieu Desnoyers
2014-10-16 14:11 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141016194815.GD11522@wil.cx \
--to=willy@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.r.wilcox@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).