public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: linux-ext4@vger.kernel.org
Cc: jack@suse.cz, tytso@mit.edu
Subject: Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
Date: Mon, 23 Nov 2015 19:37:56 +0300	[thread overview]
Message-ID: <87a8q4k6xn.fsf@openvz.org> (raw)
In-Reply-To: <1448294568-20892-1-git-send-email-dmonakhov@openvz.org>

[-- Attachment #1: Type: text/plain, Size: 3907 bytes --]

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
>     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
>
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
Fairly to say I'm not very happy with the fix because it continues bad
practice of ad-hock fixes for generic journal vs freeze synchronization

Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to
ext4_journal_start()/ext4_journal_stop() but this is not possible due to
limitations introduced by nojournal mode (described here:8e8ad8a5)
So let's fix nojournal instead. In order to do that we somehow have
store ref_count and pointer to sb inside nojournal_handle.
There are two possible ways to do that.
1) Embed second journal related field to task_struct and guard it with
   compile macros definition.
void *journal_info;
+ #ifdef CONFIG_EXTRA_JOURNAL_INFO
+   void *journal_info2;
+ #endif

2) Encode ref and sb in to single long. This can be done by aligning
   ext4_sb_info pointer to 4096. So we can embed ref count to lower bits
   like follows.
#define EXT4_NOJOURNAL_SHIFT 12
#define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1)
#define EXT4_NOJOURNAL_MASK  (1 << EXT4_NOJOURNAL_SHIFT) -1
#define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK)
#define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1)
static int ext4_handle_valid(handle_t *handle)
{
        return !(handle & 0x1);
}
static handle_t *get_nojournal_handle(struct super_block *sb)
{
        handle_t *handle = current->journal_info;
        struct super_block *old_sb  = NOJOURNAL_SB(handle);
        unsigned long ref_cnt = NOJOURNAL_REF(handle);
        BUG_ON(old_sb && old_sb != sb);
        ref++;
        current->journal_info = NOJOURNAL_SB(handle);
}

What do you think about this? Are where any better way to fix this?

>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3a6197a..4cba944 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
>  		      map.m_lblk);
>  	/*
> +	 * Protect us against freezing - AIO-DIO case. Caller didn't have to
> +	 * have any protection against it
> +	 */
> +	sb_start_intwrite(inode->i_sb);
> +
> +	/*
>  	 * This is somewhat ugly but the idea is clear: When transaction is
>  	 * reserved, everything goes into it. Otherwise we rather start several
>  	 * smaller transactions for conversion of each extent separately.
> @@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	}
>  	if (!credits)
>  		ret2 = ext4_journal_stop(handle);
> +	sb_end_intwrite(inode->i_sb);
>  	return ret > 0 ? ret2 : ret;
>  }
>  
> -- 
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

  reply	other threads:[~2015-11-23 16:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 16:02 [PATCH] ext4: fix race aio-dio vs freeze_fs Dmitry Monakhov
2015-11-23 16:37 ` Dmitry Monakhov [this message]
2015-11-24 13:31   ` Jan Kara
2015-11-24 13:24 ` Jan Kara
2015-11-24 16:07   ` Christoph Hellwig
2015-11-25 10:25     ` Jan Kara
2015-11-24 16:55   ` Dmitry Monakhov
2015-11-25  9:19     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8q4k6xn.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox