linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Matthew Wilcox <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH v5 10/22] Remove get_xip_mem
Date: Wed, 15 Jan 2014 17:46:28 -0800	[thread overview]
Message-ID: <52D739F4.8060108@infradead.org> (raw)
In-Reply-To: <557203b474f633a59f32fee1f624a5239effcab7.1389779961.git.matthew.r.wilcox@intel.com>

On 01/15/2014 05:24 PM, Matthew Wilcox wrote:
> All callers of get_xip_mem() are now gone.  Remove checks for it,
> initialisers of it, documentation of it and the only implementation of it.
> 
> Add documentation for the new way of writing an XIP filesystem.
> 
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> ---
>  Documentation/filesystems/Locking |   3 -
>  Documentation/filesystems/xip.txt | 116 +++++++++++++++++++++-----------------
>  fs/exofs/inode.c                  |   1 -
>  fs/ext2/inode.c                   |   1 -
>  fs/ext2/xip.c                     |  37 ------------
>  fs/ext2/xip.h                     |   3 -
>  fs/open.c                         |   5 +-
>  include/linux/fs.h                |   2 -
>  mm/fadvise.c                      |   6 +-
>  mm/madvise.c                      |   2 +-
>  10 files changed, 70 insertions(+), 106 deletions(-)
> 
> diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt
> index b62eabf..520e73a 100644
> --- a/Documentation/filesystems/xip.txt
> +++ b/Documentation/filesystems/xip.txt
> @@ -3,69 +3,81 @@ Execute-in-place for file mappings
>  
>  Motivation
>  ----------
> -File mappings are performed by mapping page cache pages to userspace. In
> -addition, read&write type file operations also transfer data from/to the page
> -cache.
> -
> -For memory backed storage devices that use the block device interface, the page
> -cache pages are in fact copies of the original storage. Various approaches
> -exist to work around the need for an extra copy. The ramdisk driver for example
> -does read the data into the page cache, keeps a reference, and discards the
> -original data behind later on.
> -
> -Execute-in-place solves this issue the other way around: instead of keeping
> -data in the page cache, the need to have a page cache copy is eliminated
> -completely. With execute-in-place, read&write type operations are performed
> -directly from/to the memory backed storage device. For file mappings, the
> -storage device itself is mapped directly into userspace.
> -
> -This implementation was initially written for shared memory segments between
> -different virtual machines on s390 hardware to allow multiple machines to
> -share the same binaries and libraries.
> -
> -Implementation
> ---------------
> -Execute-in-place is implemented in three steps: block device operation,
> -address space operation, and file operations.
> -
> -A block device operation named direct_access is used to translate the
> -block device sector number to a page frame number (pfn) that identifies
> -the physical page for the memory.  It also returns a kernel virtual
> -address that can be used to access the memory.
> +
> +File mappings are usually performed by mapping page cache pages to
> +userspace.  In addition, read & write file operations also transfer data
> +between the page cache and storage.
> +
> +For memory backed storage devices that use the block device interface,
> +the page cache pages are just copies of the original storage.  The
> +execute-in-place code removes the extra copy by performing reads and
> +writes directly on the memory backed storage device.  For file mappings,
> +the storage device itself is mapped directly into userspace.
> +
> +
> +Implementation Tips for Block Driver Writers
> +--------------------------------------------
> +
> +To support XIP in your block driver, implement the 'direct_access'
> +block device operation.  It is used to translate the sector number
> +(expressed in units of 512-byte sectors) to a page frame number (pfn)
> +that identifies the physical page for the memory.  It also returns a
> +kernel virtual address that can be used to access the memory.
>  
>  The direct_access method takes a 'size' parameter that indicates the
>  number of bytes being requested.  The function should return the number
>  of bytes that it can provide, although it must not exceed the number of
>  bytes requested.  It may also return a negative errno if an error occurs.
>  
> -The block device operation is optional, these block devices support it as of
> -today:
> +In order to support this method, the storage must be byte-accessable by

                                                        byte-accessible

> +the CPU at all times.  If your device uses paging techniques to expose
> +a large amount of memory through a smaller window, then you cannot
> +implement direct_access.  Equally, if your device can occasionally
> +stall the CPU for an extended period, you should also not attempt to
> +implement direct_access.
> +
> +These block devices may be used for inspiration:
> +- axonram: Axon DDR2 device driver
> +- brd: RAM backed block device driver
>  - dcssblk: s390 dcss block device driver
>  
> -An address space operation named get_xip_mem is used to retrieve references
> -to a page frame number and a kernel address. To obtain these values a reference
> -to an address_space is provided. This function assigns values to the kmem and
> -pfn parameters. The third argument indicates whether the function should allocate
> -blocks if needed.
>  
> -This address space operation is mutually exclusive with readpage&writepage that
> -do page cache read/write operations.
> -The following filesystems support it as of today:
> -- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
> +Implementation Tips for Filesystem Writers
> +------------------------------------------
> +
> +Filesystem support consists of
> +- adding support to mark inodes as being XIP by setting the S_XIP flag in
> +  i_flags
> +- implementing the direct_IO address space operation, and calling
> +  xip_do_io() instead of blockdev_direct_IO() if S_XIP is set
> +- implementing an mmap file operation for XIP files which sets the
> +  VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers
> +  for fault and page_mkwrite (which should probably call xip_fault() and
> +  xip_mkwrite(), passing the appropriate get_block() callback)
> +- calling xip_truncate_page() instead of block_truncate_page() for XIP files
> +- ensuring that there is sufficient locking between reads, writes,
> +  truncates and page faults

     truncates, and
but that's up to you and your editor/proofreader etc.  :)

> +
> +The get_block() callback passed to xip_do_io(), xip_fault(), xip_mkwrite()
> +and xip_truncate_page() must not return uninitialised extents.  It must zero
> +any blocks that it returns, and it must ensure that simultaneous calls to
> +get_block() (for example by a page-fault racing with a read() or a write())
> +work correctly.
>  
> -A set of file operations that do utilize get_xip_page can be found in
> -mm/filemap_xip.c . The following file operation implementations are provided:
> -- aio_read/aio_write
> -- readv/writev
> -- sendfile
> +These filesystems may be used for inspiration:
> +- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
>  
> -The generic file operations do_sync_read/do_sync_write can be used to implement
> -classic synchronous IO calls.
>  
>  Shortcomings
>  ------------
> -This implementation is limited to storage devices that are cpu addressable at
> -all times (no highmem or such). It works well on rom/ram, but enhancements are
> -needed to make it work with flash in read+write mode.
> -Putting the Linux kernel and/or its modules on a xip filesystem does not mean
> -they are not copied.
> +
> +Even if the kernel or its modules are stored on an filesystem that supports

                                                   a

> +XIP on a block device that supports XIP, they will still be copied into RAM.
> +
> +Calling get_user_pages() on a range of user memory that has been mmaped
> +from an XIP file will fail as there are no 'struct page' to describe
> +those pages.  This problem is being worked on.  That means that O_DIRECT
> +reads/writes to those memory ranges from a non-XIP file will fail (note
> +that O_DIRECT reads/writes _of an XIP file_ do work, it is the memory
> +that is being accessed that is key here).  Other things that will not
> +work include RDMA, sendfile() and splice().

                      sendfile(),
same comment as above.



-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-01-16  1:46 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-16  1:24 [PATCH v5 00/22] Rewrite XIP code and add XIP support to ext4 Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 01/22] Fix XIP fault vs truncate race Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 02/22] Allow page fault handlers to perform the COW Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 03/22] axonram: Fix bug in direct_access Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 04/22] Change direct_access calling convention Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 05/22] Introduce IS_XIP(inode) Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 06/22] Treat XIP like O_DIRECT Matthew Wilcox
2014-01-31 16:59   ` Jan Kara
2014-01-16  1:24 ` [PATCH v5 07/22] Rewrite XIP page fault handling Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 08/22] Change xip_truncate_page to take a get_block parameter Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 09/22] Remove mm/filemap_xip.c Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 10/22] Remove get_xip_mem Matthew Wilcox
2014-01-16  1:46   ` Randy Dunlap [this message]
2014-01-27 13:26     ` Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 11/22] Replace ext2_clear_xip_target with xip_clear_blocks Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 12/22] ext2: Remove ext2_xip_verify_sb() Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 13/22] ext2: Remove ext2_use_xip Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 14/22] ext2: Remove xip.c and xip.h Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 15/22] Remove CONFIG_EXT2_FS_XIP Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 16/22] ext2: Remove ext2_aops_xip Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 17/22] xip: Add xip_zero_page_range Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 18/22] ext4: Make ext4_block_zero_page_range static Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 19/22] ext4: Add XIP functionality Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 20/22] ext4: Fix typos Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 21/22] xip: Add reporting of major faults Matthew Wilcox
2014-01-16  1:24 ` [PATCH v5 22/22] XIP: Add support for unwritten extents Matthew Wilcox
     [not found] ` <CEFDA737.22F87%matthew.r.wilcox@intel.com>
2014-01-17  0:00   ` [PATCH v5 19/22] ext4: Add XIP functionality Ross Zwisler
     [not found] ` <CEFD7DAD.22F65%matthew.r.wilcox@intel.com>
2014-01-22 22:51   ` [PATCH v5 22/22] XIP: Add support for unwritten extents Ross Zwisler
2014-01-23 12:08     ` Matthew Wilcox
2014-01-23 19:13       ` Ross Zwisler
     [not found]     ` <CF0C370C.235F1%willy@linux.intel.com>
2014-01-27 23:32       ` Ross Zwisler
2014-01-28  3:49         ` Matthew Wilcox
2014-01-23  7:48 ` [PATCH v5 00/22] Rewrite XIP code and add XIP support to ext4 Dave Chinner
2014-01-23  7:53   ` Dave Chinner
2014-01-23  9:01 ` Dave Chinner
2014-01-23 12:12   ` Wilcox, Matthew R
2014-01-28  6:06     ` Dave Chinner
2014-01-30  6:42 ` Dave Chinner
2014-01-30  9:25   ` Dave Chinner
2014-01-31  3:06     ` Dave Chinner
2014-01-31  5:45       ` Ross Zwisler
2014-01-31 13:04         ` Dave Chinner
     [not found] ` <CF1FF3EB.24114%matthew.r.wilcox@intel.com>
2014-02-11 23:12   ` [PATCH v5 19/22] ext4: Add XIP functionality Ross Zwisler
2014-02-13  0:00     ` Ross Zwisler
     [not found] ` <CF215477.24281%matthew.r.wilcox@intel.com>
2014-02-12 23:53   ` [PATCH v5 06/22] Treat XIP like O_DIRECT Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D739F4.8060108@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).