Re: [PATCH] ext4: Add XIP functionality

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: andreas.dilger@intel.com, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] ext4: Add XIP functionality
Date: Mon, 2 Dec 2013 09:34:59 +0100	[thread overview]
Message-ID: <20131202083459.GA2305@quack.suse.cz> (raw)
In-Reply-To: <1384811492-37040-1-git-send-email-ross.zwisler@linux.intel.com>

  Hello,

On Mon 18-11-13 14:51:32, Ross Zwisler wrote:
> This is a port of the XIP functionality found in the current version of
> ext2.  This patch set is intended to achieve feature parity with XIP in
> ext2 rather than non-XIP in ext4.  In particular, it lacks support for
> splice and AIO.  We'll be submitting patches in the future to add that
> functionality, but we think this is a good start.
> 
> There are also a couple of bugs that also appear in ext2 around handling
> of the xip mount option; we're currently investigating and will submit
> patches to fix both in ext2 and ext4, but didn't want to delay getting
> this patch out for comment.
> 
> The motivation behind this work is that we believe that the XIP feature
> will begin to find new uses as various persistent memory devices and
> technologies come on to the market.  Having direct, byte-addressable
> access to persistent memory without having an additional copy in the
> page cache can be a win in terms of I/O latency and overall memory
> usage.
  Yes, I believe implementing XIP in ext4 is desirable. It is the only
ext2 feature I'm aware of that is missing from ext4.

> This patch applies cleanly to v3.12, and was tested using brd as our
> block driver.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> ---
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e274e9c..dea66bb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
...
> @@ -4645,11 +4673,19 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
>  			} else
>  				ext4_wait_for_tail_page_commit(inode);
>  		}
> -		/*
> -		 * Truncate pagecache after we've waited for commit
> -		 * in data=journal mode to make pages freeable.
> -		 */
> +
> +		if (mapping_is_xip(inode->i_mapping)) {
> +			error = xip_truncate_page(inode->i_mapping,
> +						  inode->i_size);
> +			if (error)
> +				goto err_out;
> +		} else {
> +			/*
> +			 * Truncate pagecache after we've waited for commit
> +			 * in data=journal mode to make pages freeable.
> +			 */
>  			truncate_pagecache(inode, inode->i_size);
> +		}
>  	}
>  	/*
>  	 * We want to call ext4_truncate() even if attr->ia_size ==
  Umm, much more logical place for this would be in ext4_truncate() at the
place where we do ext4_block_truncate_page(). Because xip_truncate_page()
does what ext4_block_truncate_page() does.

Also thinking about it for a while you must call truncate_pagecache() in
XIP mode as well to unmap PTEs removed by truncate. In ext2 this is hidden
in truncate_setsize() call...

Also you seem to be missing any hole punching support at all. For that
you'd need to modify xip_truncate_page() to accept not only offset but also
length of the truncate area (a separate patch please). And then you will
need to use that function from ext4_punch_hole() at the place where
ext4_zero_partial_blocks() is used.

Finally, as Matthew Wilcox pointed out
(http://www.spinics.net/lists/linux-fsdevel/msg70582.html) there's a race
between truncate and mmap in xip support because xip is missing
serialization on page locks. So I believe we should solve that when we are
growing XIP support in another filesystem... Probably using mmap_sem for
that might be viable but I have to try.

> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2c2e6cb..18e70d2 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
...
> @@ -3525,11 +3532,19 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		}
>  		if (test_opt(sb, DELALLOC))
>  			clear_opt(sb, DELALLOC);
> +		if (test_opt(sb, XIP)) {
> +			ext4_msg(sb, KERN_ERR, "can't mount with "
> +				 "both data=journal and xip");
> +			goto failed_mount;
> +		}
>  	}
>  
>  	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
>  		(test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
>  
> +	ext4_xip_verify_sb(sb); /* see if bdev supports xip, unset
> +				    EXT4_MOUNT_XIP if not */
> +
  I don't like clearing the flag inside this function. Just opencode the
function here please (I don't think the other call site at ext4_remount()
makes sense at all).

>  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
>  	    (EXT4_HAS_COMPAT_FEATURE(sb, ~0U) ||
>  	     EXT4_HAS_RO_COMPAT_FEATURE(sb, ~0U) ||
> @@ -3576,6 +3591,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  		goto failed_mount;
>  	}
>  
> +	if (ext4_use_xip(sb) && blocksize != PAGE_SIZE) {
> +		if (!silent)
> +			ext4_msg(sb, KERN_ERR,
> +				"error: unsupported blocksize for xip");
> +		goto failed_mount;
> +	}
> +
>  	if (sb->s_blocksize != blocksize) {
>  		/* Validate the filesystem blocksize */
>  		if (!sb_set_blocksize(sb, blocksize)) {
> @@ -4707,6 +4729,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
>  	struct ext4_super_block *es;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	unsigned long old_sb_flags;
> +	unsigned long old_mount_opt = sbi->s_mount_opt;
>  	struct ext4_mount_options old_opts;
>  	int enable_quota = 0;
>  	ext4_group_t g;
> @@ -4773,7 +4796,23 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
>  	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
>  		(test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
>  
> +	ext4_xip_verify_sb(sb); /* see if bdev supports xip, unset
> +				    EXT4_MOUNT_XIP if not */
> +
> +	if (ext4_use_xip(sb) && sb->s_blocksize != PAGE_SIZE) {
> +		ext4_msg(sb, KERN_WARNING,
> +			"warning: unsupported blocksize for xip");
> +		err = -EINVAL;
> +		goto restore_opts;
> +	}
> +
>  	es = sbi->s_es;
> +	if ((sbi->s_mount_opt ^ old_mount_opt) & EXT4_MOUNT_XIP) {
> +		ext4_msg(sb, KERN_WARNING, "warning: refusing change of "
> +			 "xip flag with busy inodes while remounting");
> +		sbi->s_mount_opt &= ~EXT4_MOUNT_XIP;
> +		sbi->s_mount_opt |= old_mount_opt & EXT4_MOUNT_XIP;
> +	}
  So why do you bother with ext4_xip_verify_sb() and other stuff when you
disallow remount to change xip flag anyway (which I think makes sense)?

>  	if (sbi->s_journal) {
>  		ext4_init_journal_params(sb, sbi->s_journal);
> diff --git a/fs/ext4/xip.c b/fs/ext4/xip.c
> new file mode 100644
> index 0000000..e0a430a
> --- /dev/null
> +++ b/fs/ext4/xip.c
> @@ -0,0 +1,91 @@
> +/*
> + *  linux/fs/ext4/xip.c
> + *
> + * Copyright (C) 2005 IBM Corporation
> + * Author: Carsten Otte (cotte@de.ibm.com)
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/fs.h>
> +#include <linux/genhd.h>
> +#include <linux/buffer_head.h>
> +#include <linux/blkdev.h>
> +#include "ext4.h"
> +#include "xip.h"
> +
> +static inline int
> +__inode_direct_access(struct inode *inode, sector_t block,
> +		      void **kaddr, unsigned long *pfn)
> +{
> +	struct block_device *bdev = inode->i_sb->s_bdev;
> +	const struct block_device_operations *ops = bdev->bd_disk->fops;
> +	sector_t sector;
> +
> +	sector = block * (PAGE_SIZE / 512); /* ext4 block to bdev sector */
> +
> +	BUG_ON(!ops->direct_access);
> +	return ops->direct_access(bdev, sector, kaddr, pfn);
> +}
> +
> +static inline int
> +__ext4_get_block(struct inode *inode, pgoff_t pgoff, int create,
> +		   sector_t *result)
> +{
> +	struct buffer_head tmp;
> +	int rc;
> +
> +	memset(&tmp, 0, sizeof(struct buffer_head));
> +	tmp.b_size = inode->i_sb->s_blocksize;
> +	rc = ext4_get_block(inode, pgoff, &tmp, create);
> +	*result = tmp.b_blocknr;
  Please use ext4_map_blocks() directly. There's no need to go via
ext4_get_block() with its suboptimal buffer_head interface...

> +	/* did we get a sparse block (hole in the file)? */
> +	if (!tmp.b_blocknr && !rc) {
> +		BUG_ON(create);
> +		rc = -ENODATA;
> +	}
> +
> +	return rc;
> +}
> +

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2013-12-02  8:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 21:51 [PATCH] ext4: Add XIP functionality Ross Zwisler
2013-12-02  8:34 ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131202083459.GA2305@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=andreas.dilger@intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).