* [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO
@ 2016-02-19 15:59 Jan Kara
2016-02-19 15:59 ` [PATCH 1/6] ext4: Pack ioend structure better Jan Kara
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Hello,
Recently I had a look at how io_ends are handled for AIO DIO. When doing AIO
DIO we allocate io_end and attach it to inode. ext4_ext_map_blocks() then
fetches io_end from inode and operates on it. This scheme is somewhat fragile
because it relies on the fact that we hold i_mutex whenever we need io_end
which is non-obvious dependency (if it was not for io_end handling, i_data_sem
would be enough). dioread_nolock mode broke this assumption and was corrupting
io_end pointers until recently which somewhat demostrates my point.
This patch set converts ext4 to allocate io_end only once we spot unwritten
extent during direct IO write and uses bh->b_private and dio->private
to store it in the same fashion as XFS. That way passing of io_end pointer
happens via IO-local context and is thus independent of any locks.
The core of the change is in patch 5/6, the rest are related cleanups and
preparations for the change.
Patches pass xfstests in normal and dioread_nolock mode. Patch 5/6 conflicts
with changes of DIO completion on error however resolving these conflicts
should be straightforward.
Honza
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/6] ext4: Pack ioend structure better
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 3:39 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO Jan Kara
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
Order entries better to avoid this and thus make the structure occupy
64 instead of 72 bytes for 64-bit architectures.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0662b285dc8a..1046621ef64d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -182,9 +182,9 @@ typedef struct ext4_io_end {
struct bio *bio; /* Linked list of completed
* bios covering the extent */
unsigned int flag; /* unwritten or not */
+ atomic_t count; /* reference counter */
loff_t offset; /* offset in the file */
ssize_t size; /* size of the extent */
- atomic_t count; /* reference counter */
} ext4_io_end_t;
struct ext4_io_submit {
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
2016-02-19 15:59 ` [PATCH 1/6] ext4: Pack ioend structure better Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 3:53 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 3/6] ext4: Rename and split get blocks functions Jan Kara
` (3 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
However the code cleanups that happened after 2011 when the lock was
introduced made aio_mutex acquired at almost the same places where we
already have exclusion using i_mutex. So just use i_mutex for the
exclusion of unaligned AIO DIO.
The change moves waiting for pending unwritten extent conversion under
i_mutex. That makes special handling of O_APPEND writes unnecessary and
also avoids possible livelocking of unaligned AIO DIO with aligned one
(nothing was preventing contiguous stream of aligned AIO DIOs to let
unaligned AIO DIO wait forever).
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 3 ---
fs/ext4/file.c | 32 +++++++++++++-------------------
fs/ext4/super.c | 5 +----
3 files changed, 14 insertions(+), 26 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1046621ef64d..b02b2e58805a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3284,10 +3284,7 @@ static inline void ext4_inode_resume_unlocked_dio(struct inode *inode)
#define EXT4_WQ_HASH_SZ 37
#define ext4_ioend_wq(v) (&ext4__ioend_wq[((unsigned long)(v)) %\
EXT4_WQ_HASH_SZ])
-#define ext4_aio_mutex(v) (&ext4__aio_mutex[((unsigned long)(v)) %\
- EXT4_WQ_HASH_SZ])
extern wait_queue_head_t ext4__ioend_wq[EXT4_WQ_HASH_SZ];
-extern struct mutex ext4__aio_mutex[EXT4_WQ_HASH_SZ];
#define EXT4_RESIZING 0
extern int ext4_resize_begin(struct super_block *sb);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 1126436dada1..7d5fb122fd26 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -93,31 +93,29 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(iocb->ki_filp);
- struct mutex *aio_mutex = NULL;
struct blk_plug plug;
int o_direct = iocb->ki_flags & IOCB_DIRECT;
+ int unaligned_aio = 0;
int overwrite = 0;
ssize_t ret;
+ inode_lock(inode);
+ ret = generic_write_checks(iocb, from);
+ if (ret <= 0)
+ goto out;
+
/*
- * Unaligned direct AIO must be serialized; see comment above
- * In the case of O_APPEND, assume that we must always serialize
+ * Unaligned direct AIO must be serialized among each other as zeroing
+ * of partial blocks of two competing unaligned AIOs can result in data
+ * corruption.
*/
- if (o_direct &&
- ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
+ if (o_direct && ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
!is_sync_kiocb(iocb) &&
- (iocb->ki_flags & IOCB_APPEND ||
- ext4_unaligned_aio(inode, from, iocb->ki_pos))) {
- aio_mutex = ext4_aio_mutex(inode);
- mutex_lock(aio_mutex);
+ ext4_unaligned_aio(inode, from, iocb->ki_pos)) {
+ unaligned_aio = 1;
ext4_unwritten_wait(inode);
}
- inode_lock(inode);
- ret = generic_write_checks(iocb, from);
- if (ret <= 0)
- goto out;
-
/*
* If we have encountered a bitmap-format file, the size limit
* is smaller than s_maxbytes, which is for extent-mapped files.
@@ -139,7 +137,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
blk_start_plug(&plug);
/* check whether we do a DIO overwrite or not */
- if (ext4_should_dioread_nolock(inode) && !aio_mutex &&
+ if (ext4_should_dioread_nolock(inode) && !unaligned_aio &&
!file->f_mapping->nrpages && pos + length <= i_size_read(inode)) {
struct ext4_map_blocks map;
unsigned int blkbits = inode->i_blkbits;
@@ -181,14 +179,10 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (o_direct)
blk_finish_plug(&plug);
- if (aio_mutex)
- mutex_unlock(aio_mutex);
return ret;
out:
inode_unlock(inode);
- if (aio_mutex)
- mutex_unlock(aio_mutex);
return ret;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3ed01ec011d7..6d8a01b4f535 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5321,7 +5321,6 @@ MODULE_ALIAS_FS("ext4");
/* Shared across all ext4 file systems */
wait_queue_head_t ext4__ioend_wq[EXT4_WQ_HASH_SZ];
-struct mutex ext4__aio_mutex[EXT4_WQ_HASH_SZ];
static int __init ext4_init_fs(void)
{
@@ -5334,10 +5333,8 @@ static int __init ext4_init_fs(void)
/* Build-time check for flags consistency */
ext4_check_flag_values();
- for (i = 0; i < EXT4_WQ_HASH_SZ; i++) {
- mutex_init(&ext4__aio_mutex[i]);
+ for (i = 0; i < EXT4_WQ_HASH_SZ; i++)
init_waitqueue_head(&ext4__ioend_wq[i]);
- }
err = ext4_init_es();
if (err)
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/6] ext4: Rename and split get blocks functions
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
2016-02-19 15:59 ` [PATCH 1/6] ext4: Pack ioend structure better Jan Kara
2016-02-19 15:59 ` [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 4:09 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block Jan Kara
` (2 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
describe what it does. Also split out get blocks functions for direct
IO. Later we move functionality from _ext4_get_blocks() there. There's no
functional change in this patch.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 8 +++--
fs/ext4/indirect.c | 10 +++---
fs/ext4/inline.c | 7 ++--
fs/ext4/inode.c | 96 ++++++++++++++++++++++++++++++++++--------------------
4 files changed, 74 insertions(+), 47 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b02b2e58805a..80cc3914b6ad 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2505,12 +2505,14 @@ extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);
int ext4_inode_is_fast_symlink(struct inode *inode);
struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int);
struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int);
-int ext4_get_block_write(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create);
+int ext4_get_block_unwritten(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
int ext4_dax_mmap_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
int ext4_get_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create);
+ struct buffer_head *bh_result, int create);
+int ext4_dio_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create);
int ext4_walk_page_buffers(handle_t *handle,
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 355ef9c36c87..1655ff195e0e 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -693,21 +693,21 @@ retry:
}
if (IS_DAX(inode))
ret = dax_do_io(iocb, inode, iter, offset,
- ext4_get_block, NULL, 0);
+ ext4_dio_get_block, NULL, 0);
else
ret = __blockdev_direct_IO(iocb, inode,
inode->i_sb->s_bdev, iter,
- offset, ext4_get_block, NULL,
- NULL, 0);
+ offset, ext4_dio_get_block,
+ NULL, NULL, 0);
inode_dio_end(inode);
} else {
locked:
if (IS_DAX(inode))
ret = dax_do_io(iocb, inode, iter, offset,
- ext4_get_block, NULL, DIO_LOCKING);
+ ext4_dio_get_block, NULL, DIO_LOCKING);
else
ret = blockdev_direct_IO(iocb, inode, iter, offset,
- ext4_get_block);
+ ext4_dio_get_block);
if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
loff_t isize = i_size_read(inode);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index dfe3b9bafc0d..36d8cc9327f5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -581,9 +581,10 @@ retry:
if (ret)
goto out;
- if (ext4_should_dioread_nolock(inode))
- ret = __block_write_begin(page, from, to, ext4_get_block_write);
- else
+ if (ext4_should_dioread_nolock(inode)) {
+ ret = __block_write_begin(page, from, to,
+ ext4_get_block_unwritten);
+ } else
ret = __block_write_begin(page, from, to, ext4_get_block);
if (!ret && ext4_should_journal_data(inode)) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ee8ca1ff4023..47b11fbd1417 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -741,6 +741,60 @@ int ext4_get_block(struct inode *inode, sector_t iblock,
}
/*
+ * Get block function used when preparing for buffered write if we require
+ * creating an unwritten extent if blocks haven't been allocated. The extent
+ * will be converted to written after the IO is complete.
+ */
+int ext4_get_block_unwritten(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create)
+{
+ ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n",
+ inode->i_ino, create);
+ return _ext4_get_block(inode, iblock, bh_result,
+ EXT4_GET_BLOCKS_IO_CREATE_EXT);
+}
+
+/* Get block function for DIO reads and writes to inodes without extents */
+int ext4_dio_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh, int create)
+{
+ return _ext4_get_block(inode, iblock, bh,
+ create ? EXT4_GET_BLOCKS_CREATE : 0);
+}
+
+/*
+ * Get block function for DIO writes when we create unwritten extent if
+ * blocks are not allocated yet. The extent will be converted to written
+ * after IO is complete.
+ */
+static int ext4_dio_get_block_unwritten(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create)
+{
+ ext4_debug("ext4_dio_get_block_unwritten: inode %lu, create flag %d\n",
+ inode->i_ino, create);
+ return _ext4_get_block(inode, iblock, bh_result,
+ EXT4_GET_BLOCKS_IO_CREATE_EXT);
+}
+
+static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create)
+{
+ int ret;
+
+ ext4_debug("ext4_dio_get_block_overwrite: inode %lu, create flag %d\n",
+ inode->i_ino, create);
+ ret = _ext4_get_block(inode, iblock, bh_result, 0);
+ /*
+ * Blocks should have been preallocated! ext4_file_write_iter() checks
+ * that.
+ */
+ WARN_ON_ONCE(!buffer_mapped(bh_result));
+
+ return ret;
+}
+
+
+/*
* `handle' can be NULL if create is zero
*/
struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
@@ -1051,13 +1105,14 @@ retry_journal:
#ifdef CONFIG_EXT4_FS_ENCRYPTION
if (ext4_should_dioread_nolock(inode))
ret = ext4_block_write_begin(page, pos, len,
- ext4_get_block_write);
+ ext4_get_block_unwritten);
else
ret = ext4_block_write_begin(page, pos, len,
ext4_get_block);
#else
if (ext4_should_dioread_nolock(inode))
- ret = __block_write_begin(page, pos, len, ext4_get_block_write);
+ ret = __block_write_begin(page, pos, len,
+ ext4_get_block_unwritten);
else
ret = __block_write_begin(page, pos, len, ext4_get_block);
#endif
@@ -3056,37 +3111,6 @@ static int ext4_releasepage(struct page *page, gfp_t wait)
return try_to_free_buffers(page);
}
-/*
- * ext4_get_block used when preparing for a DIO write or buffer write.
- * We allocate an uinitialized extent if blocks haven't been allocated.
- * The extent will be converted to initialized after the IO is complete.
- */
-int ext4_get_block_write(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- ext4_debug("ext4_get_block_write: inode %lu, create flag %d\n",
- inode->i_ino, create);
- return _ext4_get_block(inode, iblock, bh_result,
- EXT4_GET_BLOCKS_IO_CREATE_EXT);
-}
-
-static int ext4_get_block_overwrite(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- int ret;
-
- ext4_debug("ext4_get_block_overwrite: inode %lu, create flag %d\n",
- inode->i_ino, create);
- ret = _ext4_get_block(inode, iblock, bh_result, 0);
- /*
- * Blocks should have been preallocated! ext4_file_write_iter() checks
- * that.
- */
- WARN_ON_ONCE(!buffer_mapped(bh_result));
-
- return ret;
-}
-
#ifdef CONFIG_FS_DAX
int ext4_dax_mmap_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
@@ -3254,7 +3278,7 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
*/
iocb->private = NULL;
if (overwrite) {
- get_block_func = ext4_get_block_overwrite;
+ get_block_func = ext4_dio_get_block_overwrite;
} else {
ext4_inode_aio_set(inode, NULL);
if (!is_sync_kiocb(iocb)) {
@@ -3276,7 +3300,7 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
*/
ext4_inode_aio_set(inode, io_end);
}
- get_block_func = ext4_get_block_write;
+ get_block_func = ext4_dio_get_block_unwritten;
dio_flags = DIO_LOCKING;
}
#ifdef CONFIG_EXT4_FS_ENCRYPTION
@@ -5470,7 +5494,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
unlock_page(page);
/* OK, we need to fill the hole... */
if (ext4_should_dioread_nolock(inode))
- get_block = ext4_get_block_write;
+ get_block = ext4_get_block_unwritten;
else
get_block = ext4_get_block;
retry_alloc:
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
` (2 preceding siblings ...)
2016-02-19 15:59 ` [PATCH 3/6] ext4: Rename and split get blocks functions Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 4:27 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO Jan Kara
2016-02-19 15:59 ` [PATCH 6/6] ext4: Remove i_ioend_count Jan Kara
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
There is no need to handle starting of a transaction and deferal of DIO
completion in _ext4_get_block() function. We can move this out to get
block functions for direct IO that need it. That way we can add stricter
checks verifying things work as we expect.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 92 ++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 59 insertions(+), 33 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 47b11fbd1417..506180729faf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -686,50 +686,25 @@ out_sem:
return retval;
}
-/* Maximum number of blocks we map for direct IO at once. */
-#define DIO_MAX_BLOCKS 4096
-
static int _ext4_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int flags)
{
- handle_t *handle = ext4_journal_current_handle();
struct ext4_map_blocks map;
- int ret = 0, started = 0;
- int dio_credits;
+ int ret = 0;
if (ext4_has_inline_data(inode))
return -ERANGE;
map.m_lblk = iblock;
map.m_len = bh->b_size >> inode->i_blkbits;
-
- if (flags && !handle) {
- /* Direct IO write... */
- if (map.m_len > DIO_MAX_BLOCKS)
- map.m_len = DIO_MAX_BLOCKS;
- dio_credits = ext4_chunk_trans_blocks(inode, map.m_len);
- handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
- dio_credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- return ret;
- }
- started = 1;
- }
-
- ret = ext4_map_blocks(handle, inode, &map, flags);
+ ret = ext4_map_blocks(ext4_journal_current_handle(), inode, &map,
+ flags);
if (ret > 0) {
- ext4_io_end_t *io_end = ext4_inode_aio(inode);
-
map_bh(bh, inode->i_sb, map.m_pblk);
bh->b_state = (bh->b_state & ~EXT4_MAP_FLAGS) | map.m_flags;
- if (io_end && io_end->flag & EXT4_IO_END_UNWRITTEN)
- set_buffer_defer_completion(bh);
bh->b_size = inode->i_sb->s_blocksize * map.m_len;
ret = 0;
}
- if (started)
- ext4_journal_stop(handle);
return ret;
}
@@ -754,12 +729,42 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock,
EXT4_GET_BLOCKS_IO_CREATE_EXT);
}
+/* Maximum number of blocks we map for direct IO at once. */
+#define DIO_MAX_BLOCKS 4096
+
+static handle_t *start_dio_trans(struct inode *inode,
+ struct buffer_head *bh_result)
+{
+ int dio_credits;
+
+ /* Trim mapping request to maximum we can map at once for DIO */
+ if (bh_result->b_size >> inode->i_blkbits > DIO_MAX_BLOCKS)
+ bh_result->b_size = DIO_MAX_BLOCKS << inode->i_blkbits;
+ dio_credits = ext4_chunk_trans_blocks(inode,
+ bh_result->b_size >> inode->i_blkbits);
+ return ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS, dio_credits);
+}
+
/* Get block function for DIO reads and writes to inodes without extents */
int ext4_dio_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create)
{
- return _ext4_get_block(inode, iblock, bh,
- create ? EXT4_GET_BLOCKS_CREATE : 0);
+ handle_t *handle;
+ int ret;
+
+ /* We don't expect handle for direct IO */
+ WARN_ON_ONCE(ext4_journal_current_handle());
+
+ if (create) {
+ handle = start_dio_trans(inode, bh);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ }
+ ret = _ext4_get_block(inode, iblock, bh,
+ create ? EXT4_GET_BLOCKS_CREATE : 0);
+ if (create)
+ ext4_journal_stop(handle);
+ return ret;
}
/*
@@ -770,10 +775,28 @@ int ext4_dio_get_block(struct inode *inode, sector_t iblock,
static int ext4_dio_get_block_unwritten(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
+ handle_t *handle;
+ int ret;
+
ext4_debug("ext4_dio_get_block_unwritten: inode %lu, create flag %d\n",
inode->i_ino, create);
- return _ext4_get_block(inode, iblock, bh_result,
- EXT4_GET_BLOCKS_IO_CREATE_EXT);
+ /* We don't expect handle for direct IO */
+ WARN_ON_ONCE(ext4_journal_current_handle());
+
+ handle = start_dio_trans(inode, bh_result);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ ret = _ext4_get_block(inode, iblock, bh_result,
+ EXT4_GET_BLOCKS_IO_CREATE_EXT);
+ ext4_journal_stop(handle);
+ if (!ret && buffer_unwritten(bh_result)) {
+ ext4_io_end_t *io_end = ext4_inode_aio(inode);
+
+ set_buffer_defer_completion(bh_result);
+ WARN_ON_ONCE(io_end && !(io_end->flag & EXT4_IO_END_UNWRITTEN));
+ }
+
+ return ret;
}
static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock,
@@ -783,12 +806,15 @@ static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock,
ext4_debug("ext4_dio_get_block_overwrite: inode %lu, create flag %d\n",
inode->i_ino, create);
+ /* We don't expect handle for direct IO */
+ WARN_ON_ONCE(ext4_journal_current_handle());
+
ret = _ext4_get_block(inode, iblock, bh_result, 0);
/*
* Blocks should have been preallocated! ext4_file_write_iter() checks
* that.
*/
- WARN_ON_ONCE(!buffer_mapped(bh_result));
+ WARN_ON_ONCE(!buffer_mapped(bh_result) || buffer_unwritten(bh_result));
return ret;
}
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
` (3 preceding siblings ...)
2016-02-19 15:59 ` [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 4:38 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 6/6] ext4: Remove i_ioend_count Jan Kara
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
When mapping blocks for direct IO, we allocate io_end structure before
mapping blocks and store pointer to it in the inode. This creates a
requirement that any AIO DIO using io_end must be protected by i_mutex.
This created problems in the past with dioread_nolock mode which was
corrupting io_end pointers. Also io_end is allocated unnecessarily in
case where we don't need to convert any extents (which is a common case
for example when overwriting file).
We fix the problem by allocating io_end only once we return unwritten
extent from block mapping function for AIO DIO (so we can save some
pointless io_end allocations) and we pass pointer to it in bh->b_private
which generic DIO code later passes to our end IO callback. That way we
remove any need for global pointer to io_end structure and thus fix the
races.
The downside of this change is that the checking for unwritten IO in
flight in ext4_extents_can_be_merged() is more racy since we now
increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
i_data_sem. However the check has been racy already before because
ext4_writepages() already increment i_unwritten after dropping
i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
case.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 10 ----
fs/ext4/extents.c | 35 +++-----------
fs/ext4/inode.c | 133 ++++++++++++++++++++++++++++--------------------------
3 files changed, 74 insertions(+), 104 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 80cc3914b6ad..29d51505a9d2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1513,16 +1513,6 @@ static inline void ext4_set_io_unwritten_flag(struct inode *inode,
}
}
-static inline ext4_io_end_t *ext4_inode_aio(struct inode *inode)
-{
- return inode->i_private;
-}
-
-static inline void ext4_inode_aio_set(struct inode *inode, ext4_io_end_t *io)
-{
- inode->i_private = io;
-}
-
/*
* Inode dynamic state flags
*/
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0ffabaf90aa5..1742e60e229c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1736,6 +1736,12 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
*/
if (ext1_ee_len + ext2_ee_len > EXT_INIT_MAX_LEN)
return 0;
+ /*
+ * The check for IO to unwritten extent is somewhat racy as we
+ * increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after
+ * dropping i_data_sem. But reserved blocks should save us in that
+ * case.
+ */
if (ext4_ext_is_unwritten(ex1) &&
(ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN) ||
atomic_read(&EXT4_I(inode)->i_unwritten) ||
@@ -4007,7 +4013,6 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path = *ppath;
int ret = 0;
int err = 0;
- ext4_io_end_t *io = ext4_inode_aio(inode);
ext_debug("ext4_ext_handle_unwritten_extents: inode %lu, logical "
"block %llu, max_blocks %u, flags %x, allocated %u\n",
@@ -4030,15 +4035,6 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
flags | EXT4_GET_BLOCKS_CONVERT);
if (ret <= 0)
goto out;
- /*
- * Flag the inode(non aio case) or end_io struct (aio case)
- * that this IO needs to conversion to written when IO is
- * completed
- */
- if (io)
- ext4_set_io_unwritten_flag(inode, io);
- else
- ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
map->m_flags |= EXT4_MAP_UNWRITTEN;
goto out;
}
@@ -4283,9 +4279,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
unsigned int allocated = 0, offset = 0;
unsigned int allocated_clusters = 0;
struct ext4_allocation_request ar;
- ext4_io_end_t *io = ext4_inode_aio(inode);
ext4_lblk_t cluster_offset;
- int set_unwritten = 0;
bool map_from_cluster = false;
ext_debug("blocks %u/%u requested for inode %lu\n",
@@ -4482,15 +4476,6 @@ got_allocated_blocks:
if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT){
ext4_ext_mark_unwritten(&newex);
map->m_flags |= EXT4_MAP_UNWRITTEN;
- /*
- * io_end structure was created for every IO write to an
- * unwritten extent. To avoid unnecessary conversion,
- * here we flag the IO that really needs the conversion.
- * For non asycn direct IO case, flag the inode state
- * that we need to perform conversion when IO is done.
- */
- if (flags & EXT4_GET_BLOCKS_PRE_IO)
- set_unwritten = 1;
}
err = 0;
@@ -4501,14 +4486,6 @@ got_allocated_blocks:
err = ext4_ext_insert_extent(handle, inode, &path,
&newex, flags);
- if (!err && set_unwritten) {
- if (io)
- ext4_set_io_unwritten_flag(inode, io);
- else
- ext4_set_inode_state(inode,
- EXT4_STATE_DIO_UNWRITTEN);
- }
-
if (err && free_on_err) {
int fb_flags = flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ?
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE : 0;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 506180729faf..ef411c4f11f8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -768,18 +768,16 @@ int ext4_dio_get_block(struct inode *inode, sector_t iblock,
}
/*
- * Get block function for DIO writes when we create unwritten extent if
+ * Get block function for AIO DIO writes when we create unwritten extent if
* blocks are not allocated yet. The extent will be converted to written
* after IO is complete.
*/
-static int ext4_dio_get_block_unwritten(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
+static int ext4_dio_get_block_unwritten_async(struct inode *inode,
+ sector_t iblock, struct buffer_head *bh_result, int create)
{
handle_t *handle;
int ret;
- ext4_debug("ext4_dio_get_block_unwritten: inode %lu, create flag %d\n",
- inode->i_ino, create);
/* We don't expect handle for direct IO */
WARN_ON_ONCE(ext4_journal_current_handle());
@@ -789,16 +787,62 @@ static int ext4_dio_get_block_unwritten(struct inode *inode, sector_t iblock,
ret = _ext4_get_block(inode, iblock, bh_result,
EXT4_GET_BLOCKS_IO_CREATE_EXT);
ext4_journal_stop(handle);
- if (!ret && buffer_unwritten(bh_result)) {
- ext4_io_end_t *io_end = ext4_inode_aio(inode);
+ /*
+ * When doing DIO using unwritten extents, we need io_end to convert
+ * unwritten extents to written on IO completion. We allocate io_end
+ * once we spot unwritten extent and store it in b_private. Generic
+ * DIO code keeps b_private set and furthermore passes the value to
+ * our completion callback in 'private' argument.
+ */
+ if (!ret && buffer_unwritten(bh_result)) {
+ if (!bh_result->b_private) {
+ ext4_io_end_t *io_end;
+
+ io_end = ext4_init_io_end(inode, GFP_KERNEL);
+ if (!io_end)
+ return -ENOMEM;
+ bh_result->b_private = io_end;
+ ext4_set_io_unwritten_flag(inode, io_end);
+ }
set_buffer_defer_completion(bh_result);
- WARN_ON_ONCE(io_end && !(io_end->flag & EXT4_IO_END_UNWRITTEN));
}
return ret;
}
+/*
+ * Get block function for non-AIO DIO writes when we create unwritten extent if
+ * blocks are not allocated yet. The extent will be converted to written
+ * after IO is complete from ext4_ext_direct_IO() function.
+ */
+static int ext4_dio_get_block_unwritten_sync(struct inode *inode,
+ sector_t iblock, struct buffer_head *bh_result, int create)
+{
+ handle_t *handle;
+ int ret;
+
+ /* We don't expect handle for direct IO */
+ WARN_ON_ONCE(ext4_journal_current_handle());
+
+ handle = start_dio_trans(inode, bh_result);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ ret = _ext4_get_block(inode, iblock, bh_result,
+ EXT4_GET_BLOCKS_IO_CREATE_EXT);
+ ext4_journal_stop(handle);
+
+ /*
+ * Mark inode as having pending DIO writes to unwritten extents.
+ * ext4_ext_direct_IO() checks this flag and converts extents to
+ * written.
+ */
+ if (!ret && buffer_unwritten(bh_result))
+ ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
+
+ return ret;
+}
+
static int ext4_dio_get_block_overwrite(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -3214,7 +3258,7 @@ out:
static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
ssize_t size, void *private)
{
- ext4_io_end_t *io_end = iocb->private;
+ ext4_io_end_t *io_end = private;
/* if not async direct IO just return */
if (!io_end)
@@ -3222,10 +3266,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
ext_debug("ext4_end_io_dio(): io_end 0x%p "
"for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
- iocb->private, io_end->inode->i_ino, iocb, offset,
- size);
+ io_end, io_end->inode->i_ino, iocb, offset, size);
- iocb->private = NULL;
io_end->offset = offset;
io_end->size = size;
ext4_put_io_end(io_end);
@@ -3261,7 +3303,6 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
get_block_t *get_block_func = NULL;
int dio_flags = 0;
loff_t final_size = offset + count;
- ext4_io_end_t *io_end = NULL;
/* Use the old path for reads and writes beyond i_size. */
if (iov_iter_rw(iter) != WRITE || final_size > inode->i_size)
@@ -3286,16 +3327,17 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
/*
* We could direct write to holes and fallocate.
*
- * Allocated blocks to fill the hole are marked as
- * unwritten to prevent parallel buffered read to expose
- * the stale data before DIO complete the data IO.
+ * Allocated blocks to fill the hole are marked as unwritten to prevent
+ * parallel buffered read to expose the stale data before DIO complete
+ * the data IO.
*
- * As to previously fallocated extents, ext4 get_block will
- * just simply mark the buffer mapped but still keep the
- * extents unwritten.
+ * As to previously fallocated extents, ext4 get_block will just simply
+ * mark the buffer mapped but still keep the extents unwritten.
*
- * For non AIO case, we will convert those unwritten extents
- * to written after return back from blockdev_direct_IO.
+ * For non AIO case, we will convert those unwritten extents to written
+ * after return back from blockdev_direct_IO. That way we save us from
+ * allocating io_end structure and also the overhead of offloading
+ * the extent convertion to a workqueue.
*
* For async DIO, the conversion needs to be deferred when the
* IO is completed. The ext4 end_io callback function will be
@@ -3303,30 +3345,13 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
* case, we allocate an io_end structure to hook to the iocb.
*/
iocb->private = NULL;
- if (overwrite) {
+ if (overwrite)
get_block_func = ext4_dio_get_block_overwrite;
+ else if (is_sync_kiocb(iocb)) {
+ get_block_func = ext4_dio_get_block_unwritten_sync;
+ dio_flags = DIO_LOCKING;
} else {
- ext4_inode_aio_set(inode, NULL);
- if (!is_sync_kiocb(iocb)) {
- io_end = ext4_init_io_end(inode, GFP_NOFS);
- if (!io_end) {
- ret = -ENOMEM;
- goto retake_lock;
- }
- /*
- * Grab reference for DIO. Will be dropped in
- * ext4_end_io_dio()
- */
- iocb->private = ext4_get_io_end(io_end);
- /*
- * we save the io structure for current async direct
- * IO, so that later ext4_map_blocks() could flag the
- * io structure whether there is a unwritten extents
- * needs to be converted when IO is completed.
- */
- ext4_inode_aio_set(inode, io_end);
- }
- get_block_func = ext4_dio_get_block_unwritten;
+ get_block_func = ext4_dio_get_block_unwritten_async;
dio_flags = DIO_LOCKING;
}
#ifdef CONFIG_EXT4_FS_ENCRYPTION
@@ -3341,27 +3366,6 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
get_block_func,
ext4_end_io_dio, NULL, dio_flags);
- /*
- * Put our reference to io_end. This can free the io_end structure e.g.
- * in sync IO case or in case of error. It can even perform extent
- * conversion if all bios we submitted finished before we got here.
- * Note that in that case iocb->private can be already set to NULL
- * here.
- */
- if (io_end) {
- ext4_inode_aio_set(inode, NULL);
- ext4_put_io_end(io_end);
- /*
- * When no IO was submitted ext4_end_io_dio() was not
- * called so we have to put iocb's reference.
- */
- if (ret <= 0 && ret != -EIOCBQUEUED && iocb->private) {
- WARN_ON(iocb->private != io_end);
- WARN_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
- ext4_put_io_end(io_end);
- iocb->private = NULL;
- }
- }
if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
EXT4_STATE_DIO_UNWRITTEN)) {
int err;
@@ -3376,7 +3380,6 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
}
-retake_lock:
if (iov_iter_rw(iter) == WRITE)
inode_dio_end(inode);
/* take i_mutex locking again if we do a ovewrite dio */
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/6] ext4: Remove i_ioend_count
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
` (4 preceding siblings ...)
2016-02-19 15:59 ` [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO Jan Kara
@ 2016-02-19 15:59 ` Jan Kara
2016-03-09 4:53 ` Theodore Ts'o
5 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2016-02-19 15:59 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Remove counter of pending io ends as it is unused.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 7 +------
fs/ext4/inode.c | 3 ---
fs/ext4/page-io.c | 4 ----
fs/ext4/super.c | 1 -
4 files changed, 1 insertion(+), 14 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 29d51505a9d2..00fed23dedf5 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1024,13 +1024,8 @@ struct ext4_inode_info {
* transaction reserved
*/
struct list_head i_rsv_conversion_list;
- /*
- * Completed IOs that need unwritten extents handling and don't have
- * transaction reserved
- */
- atomic_t i_ioend_count; /* Number of outstanding io_end structs */
- atomic_t i_unwritten; /* Nr. of inflight conversions pending */
struct work_struct i_rsv_conversion_work;
+ atomic_t i_unwritten; /* Nr. of inflight conversions pending */
spinlock_t i_block_reservation_lock;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ef411c4f11f8..c74c8e7c377b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -216,7 +216,6 @@ void ext4_evict_inode(struct inode *inode)
}
truncate_inode_pages_final(&inode->i_data);
- WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
goto no_delete;
}
@@ -228,8 +227,6 @@ void ext4_evict_inode(struct inode *inode)
ext4_begin_ordered_truncate(inode, 0);
truncate_inode_pages_final(&inode->i_data);
- WARN_ON(atomic_read(&EXT4_I(inode)->i_ioend_count));
-
/*
* Protect us against freezing - iput() caller didn't have to have any
* protection against it
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 090b3498638e..349d7aa04fe7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -128,9 +128,6 @@ static void ext4_release_io_end(ext4_io_end_t *io_end)
BUG_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
WARN_ON(io_end->handle);
- if (atomic_dec_and_test(&EXT4_I(io_end->inode)->i_ioend_count))
- wake_up_all(ext4_ioend_wq(io_end->inode));
-
for (bio = io_end->bio; bio; bio = next_bio) {
next_bio = bio->bi_private;
ext4_finish_bio(bio);
@@ -265,7 +262,6 @@ ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
{
ext4_io_end_t *io = kmem_cache_zalloc(io_end_cachep, flags);
if (io) {
- atomic_inc(&EXT4_I(inode)->i_ioend_count);
io->inode = inode;
INIT_LIST_HEAD(&io->list);
atomic_set(&io->count, 1);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6d8a01b4f535..e3872174875a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -944,7 +944,6 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
spin_lock_init(&ei->i_completed_io_lock);
ei->i_sync_tid = 0;
ei->i_datasync_tid = 0;
- atomic_set(&ei->i_ioend_count, 0);
atomic_set(&ei->i_unwritten, 0);
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
#ifdef CONFIG_EXT4_FS_ENCRYPTION
--
2.6.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6] ext4: Pack ioend structure better
2016-02-19 15:59 ` [PATCH 1/6] ext4: Pack ioend structure better Jan Kara
@ 2016-03-09 3:39 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 3:39 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:37PM +0100, Jan Kara wrote:
> On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
> Order entries better to avoid this and thus make the structure occupy
> 64 instead of 72 bytes for 64-bit architectures.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Applied, thanks.
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO
2016-02-19 15:59 ` [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO Jan Kara
@ 2016-03-09 3:53 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 3:53 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:38PM +0100, Jan Kara wrote:
> Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
> However the code cleanups that happened after 2011 when the lock was
> introduced made aio_mutex acquired at almost the same places where we
> already have exclusion using i_mutex. So just use i_mutex for the
> exclusion of unaligned AIO DIO.
>
> The change moves waiting for pending unwritten extent conversion under
> i_mutex. That makes special handling of O_APPEND writes unnecessary and
> also avoids possible livelocking of unaligned AIO DIO with aligned one
> (nothing was preventing contiguous stream of aligned AIO DIOs to let
> unaligned AIO DIO wait forever).
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/6] ext4: Rename and split get blocks functions
2016-02-19 15:59 ` [PATCH 3/6] ext4: Rename and split get blocks functions Jan Kara
@ 2016-03-09 4:09 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 4:09 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:39PM +0100, Jan Kara wrote:
> Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
> describe what it does. Also split out get blocks functions for direct
> IO. Later we move functionality from _ext4_get_blocks() there. There's no
> functional change in this patch.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block
2016-02-19 15:59 ` [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block Jan Kara
@ 2016-03-09 4:27 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 4:27 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:40PM +0100, Jan Kara wrote:
> There is no need to handle starting of a transaction and deferal of DIO
> completion in _ext4_get_block() function. We can move this out to get
> block functions for direct IO that need it. That way we can add stricter
> checks verifying things work as we expect.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO
2016-02-19 15:59 ` [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO Jan Kara
@ 2016-03-09 4:38 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 4:38 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:41PM +0100, Jan Kara wrote:
> When mapping blocks for direct IO, we allocate io_end structure before
> mapping blocks and store pointer to it in the inode. This creates a
> requirement that any AIO DIO using io_end must be protected by i_mutex.
> This created problems in the past with dioread_nolock mode which was
> corrupting io_end pointers. Also io_end is allocated unnecessarily in
> case where we don't need to convert any extents (which is a common case
> for example when overwriting file).
>
> We fix the problem by allocating io_end only once we return unwritten
> extent from block mapping function for AIO DIO (so we can save some
> pointless io_end allocations) and we pass pointer to it in bh->b_private
> which generic DIO code later passes to our end IO callback. That way we
> remove any need for global pointer to io_end structure and thus fix the
> races.
>
> The downside of this change is that the checking for unwritten IO in
> flight in ext4_extents_can_be_merged() is more racy since we now
> increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
> i_data_sem. However the check has been racy already before because
> ext4_writepages() already increment i_unwritten after dropping
> i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
> case.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6/6] ext4: Remove i_ioend_count
2016-02-19 15:59 ` [PATCH 6/6] ext4: Remove i_ioend_count Jan Kara
@ 2016-03-09 4:53 ` Theodore Ts'o
0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2016-03-09 4:53 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Fri, Feb 19, 2016 at 04:59:42PM +0100, Jan Kara wrote:
> Remove counter of pending io ends as it is unused.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, applied.
(I'm kicking off a full gce-xfstests run just to make sure there are
no regression across all of the various file system configs that I
normally test.)
- Ted
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-03-09 4:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-19 15:59 [PATCH 0/6] ext4: Clean up io_end handling for AIO DIO Jan Kara
2016-02-19 15:59 ` [PATCH 1/6] ext4: Pack ioend structure better Jan Kara
2016-03-09 3:39 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 2/6] ext4: Use i_mutex to serialize unaligned AIO DIO Jan Kara
2016-03-09 3:53 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 3/6] ext4: Rename and split get blocks functions Jan Kara
2016-03-09 4:09 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 4/6] ext4: Move trans handling and completion deferal out of _ext4_get_block Jan Kara
2016-03-09 4:27 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 5/6] ext4: Simplify io_end handling for AIO DIO Jan Kara
2016-03-09 4:38 ` Theodore Ts'o
2016-02-19 15:59 ` [PATCH 6/6] ext4: Remove i_ioend_count Jan Kara
2016-03-09 4:53 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).