linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: ross.zwisler@linux.intel.com
Cc: akpm@linux-foundation.org, andreas.dilger@intel.com,
	axboe@kernel.dk, boaz@plexistor.com, david@fromorbit.com,
	hch@lst.de, jack@suse.cz, kirill.shutemov@linux.intel.com,
	mathieu.desnoyers@efficios.com, matthew.r.wilcox@intel.com,
	rdunlap@infradead.org, tytso@mit.edu, mm-commits@vger.kernel.org,
	linux-ext4@vger.kernel.org
Subject: Re: + ext4-add-dax-functionality.patch added to -mm tree
Date: Thu, 15 Jan 2015 13:41:06 +0100	[thread overview]
Message-ID: <20150115124106.GF12739@quack.suse.cz> (raw)
In-Reply-To: <54b45495.+RptMlNQorYE9TTf%akpm@linux-foundation.org>

On Mon 12-01-15 15:11:17, Andrew Morton wrote:
> From: Ross Zwisler <ross.zwisler@linux.intel.com>
> Subject: ext4: add DAX functionality
> 
> This is a port of the DAX functionality found in the current version of
> ext2.
> 
> [matthew.r.wilcox@intel.com: heavily tweaked]
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Cc: Boaz Harrosh <boaz@plexistor.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  Documentation/filesystems/dax.txt  |    1 
>  Documentation/filesystems/ext4.txt |    4 +
>  fs/ext4/ext4.h                     |    6 +
>  fs/ext4/file.c                     |   50 ++++++++++++++-
>  fs/ext4/indirect.c                 |   18 +++--
>  fs/ext4/inode.c                    |   89 ++++++++++++++++++---------
>  fs/ext4/namei.c                    |   10 ++-
>  fs/ext4/super.c                    |   39 +++++++++++
>  8 files changed, 180 insertions(+), 37 deletions(-)
> 
> diff -puN Documentation/filesystems/dax.txt~ext4-add-dax-functionality Documentation/filesystems/dax.txt
> --- a/Documentation/filesystems/dax.txt~ext4-add-dax-functionality
> +++ a/Documentation/filesystems/dax.txt
> @@ -73,6 +73,7 @@ or a write()) work correctly.
>  
>  These filesystems may be used for inspiration:
>  - ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt
> +- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
>  
>  
>  Shortcomings
> diff -puN Documentation/filesystems/ext4.txt~ext4-add-dax-functionality Documentation/filesystems/ext4.txt
> --- a/Documentation/filesystems/ext4.txt~ext4-add-dax-functionality
> +++ a/Documentation/filesystems/ext4.txt
> @@ -386,6 +386,10 @@ max_dir_size_kb=n	This limits the size o
>  i_version		Enable 64-bit inode version support. This option is
>  			off by default.
>  
> +dax			Use direct access (no page cache).  See
> +			Documentation/filesystems/dax.txt.  Note that
> +			this option is incompatible with data=journal.
> +
>  Data Mode
>  =========
>  There are 3 different data modes:
> diff -puN fs/ext4/ext4.h~ext4-add-dax-functionality fs/ext4/ext4.h
> --- a/fs/ext4/ext4.h~ext4-add-dax-functionality
> +++ a/fs/ext4/ext4.h
> @@ -965,6 +965,11 @@ struct ext4_inode_info {
>  #define EXT4_MOUNT_ERRORS_MASK		0x00070
>  #define EXT4_MOUNT_MINIX_DF		0x00080	/* Mimics the Minix statfs */
>  #define EXT4_MOUNT_NOLOAD		0x00100	/* Don't use existing journal*/
> +#ifdef CONFIG_FS_DAX
> +#define EXT4_MOUNT_DAX			0x00200	/* Direct Access */
> +#else
> +#define EXT4_MOUNT_DAX			0
> +#endif
  Again, why do you make definition of EXT4_MOUNT_DAX dependent on
CONFIG_FS_DAX?

> diff -puN fs/ext4/file.c~ext4-add-dax-functionality fs/ext4/file.c
> --- a/fs/ext4/file.c~ext4-add-dax-functionality
> +++ a/fs/ext4/file.c
> @@ -95,7 +95,7 @@ ext4_file_write_iter(struct kiocb *iocb,
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	struct mutex *aio_mutex = NULL;
>  	struct blk_plug plug;
> -	int o_direct = file->f_flags & O_DIRECT;
> +	int o_direct = io_is_direct(file);
>  	int overwrite = 0;
>  	size_t length = iov_iter_count(from);
>  	ssize_t ret;
> @@ -191,6 +191,27 @@ errout:
>  	return ret;
>  }
>  
> +#ifdef CONFIG_FS_DAX
> +static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	return dax_fault(vma, vmf, ext4_get_block);
> +					/* Is this the right get_block? */
  You can remove the comment. It is the right get_block function.

...
> diff -puN fs/ext4/inode.c~ext4-add-dax-functionality fs/ext4/inode.c
> --- a/fs/ext4/inode.c~ext4-add-dax-functionality
> +++ a/fs/ext4/inode.c
> @@ -657,6 +657,18 @@ has_zeroout:
>  	return retval;
>  }
>  
> +static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
> +{
> +	struct inode *inode = bh->b_assoc_map->host;
> +	/* XXX: breaks on 32-bit > 16GB. Is that even supported? */
  That should be 16 TB if I'm doing the math right - 32-bit block number *
block size (4k) = 16 TB. And that's the max limit of ext4 (as logical file
offset in blocks has to fit in 32-bits for ext4). So I think you can just
remove the comment. But also see comment below.

> +	loff_t offset = (loff_t)(uintptr_t)bh->b_private << inode->i_blkbits;
> +	int err;
> +	if (!uptodate)
> +		return;
> +	WARN_ON(!buffer_unwritten(bh));
> +	err = ext4_convert_unwritten_extents(NULL, inode, offset, bh->b_size);
> +}
> +
>  /* Maximum number of blocks we map for direct IO at once. */
>  #define DIO_MAX_BLOCKS 4096
>  
> @@ -694,6 +706,11 @@ static int _ext4_get_block(struct inode
>  
>  		map_bh(bh, inode->i_sb, map.m_pblk);
>  		bh->b_state = (bh->b_state & ~EXT4_MAP_FLAGS) | map.m_flags;
> +		if (IS_DAX(inode) && buffer_unwritten(bh) && !io_end) {
> +			bh->b_assoc_map = inode->i_mapping;
> +			bh->b_private = (void *)(unsigned long)iblock;
> +			bh->b_end_io = ext4_end_io_unwritten;
> +		}
  So why is this needed? It would deserve a comment. It confuses me in
particular because:
1) This is a often a phony bh used just as a container for passed data and
   b_end_io is just ignored.
2) Even if it was real bh attached to a page, for DAX we don't do any
   writeback and thus ->b_end_io will never get called?
3) And if it does get called, you certainly cannot call
   ext4_convert_unwritten_extents() from softirq context where ->b_end_io
   gets called.

>  		if (io_end && io_end->flag & EXT4_IO_END_UNWRITTEN)
>  			set_buffer_defer_completion(bh);
>  		bh->b_size = inode->i_sb->s_blocksize * map.m_len;

								Honza

       reply	other threads:[~2015-01-15 12:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <54b45495.+RptMlNQorYE9TTf%akpm@linux-foundation.org>
2015-01-15 12:41 ` Jan Kara [this message]
2015-01-16 21:16   ` + ext4-add-dax-functionality.patch added to -mm tree Wilcox, Matthew R
2015-01-19 14:18     ` Jan Kara
2015-02-17  8:52       ` Jan Kara
2015-02-17 13:37         ` Matthew Wilcox
2015-02-18 10:40           ` Jan Kara
2015-02-18 21:55             ` Dave Chinner
2015-02-18 21:59               ` hch
2015-02-19 15:42               ` Jan Kara
2015-02-19 21:12                 ` Dave Chinner
2015-02-19 23:08                   ` Dave Chinner
2015-02-20 12:05                   ` Jan Kara
2015-02-20 22:15             ` Matthew Wilcox
2015-02-23 12:52               ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150115124106.GF12739@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.dilger@intel.com \
    --cc=axboe@kernel.dk \
    --cc=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).