From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@fb.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>, Theodore Ts'o <tytso@mit.edu>,
"Elliott, Robert (Server Storage)" <elliott@hp.com>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
Date: Thu, 16 Apr 2015 08:36:20 +1000 [thread overview]
Message-ID: <20150415223620.GU13731@dastard> (raw)
In-Reply-To: <1429135298-17153-2-git-send-email-axboe@fb.com>
On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
> do_blockdev_direct_IO() increments and decrements the inode
> ->i_dio_count for each IO operation. It does this to protect against
> truncate of a file. Block devices don't need this sort of protection.
>
> For a capable multiqueue setup, this atomic int is the only shared
> state between applications accessing the device for O_DIRECT, and it
> presents a scaling wall for that. In my testing, as much as 30% of
> system time is spent incrementing and decrementing this value. A mixed
> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
> better latencies too. Before:
.....
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..c4901c40ad65 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
> EXPORT_SYMBOL(inode_dio_wait);
>
> /*
> - * inode_dio_done - signal finish of a direct I/O requests
> + * inode_dio_begin - signal start of a direct I/O requests
> * @inode: inode the direct I/O happens on
> *
> * This is called once we've finished processing a direct I/O request,
> * and is used to wake up callers waiting for direct I/O to be quiesced.
> */
> -void inode_dio_done(struct inode *inode)
> +void inode_dio_inc(struct inode *inode)
function name does not match docbook comment....
> +{
> + atomic_inc(&inode->i_dio_count);
> +}
> +EXPORT_SYMBOL(inode_dio_inc);
> +
> +/*
> + * inode_dio_dec - signal finish of a direct I/O requests
> + * @inode: inode the direct I/O happens on
> + *
> + * This is called once we've finished processing a direct I/O request,
> + * and is used to wake up callers waiting for direct I/O to be quiesced.
> + */
> +void inode_dio_dec(struct inode *inode)
> {
> if (atomic_dec_and_test(&inode->i_dio_count))
> wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
> }
> -EXPORT_SYMBOL(inode_dio_done);
> +EXPORT_SYMBOL(inode_dio_dec);
Bikeshedding: I think this would be better suited to inode_dio_begin()
and inode_dio_end() because now we are trying to say "this is where
the DIO starts, and this is where it ends". It's not really
"reference counting" interface, we're trying to annotate the
boundaries of where DIO iis protected against truncate....
And, realistically, if we are pushing this up into the filesystems
again, we should push it up into *all* filesystems and get rid of it
completely from the DIO layer. That way no new twisty passages in
the direct IO code are needed.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-04-15 22:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
2015-04-15 22:36 ` Dave Chinner [this message]
2015-04-15 22:56 ` Al Viro
2015-04-15 23:05 ` Jens Axboe
2015-04-15 23:30 ` Al Viro
2015-04-15 23:50 ` Jens Axboe
2015-04-15 22:57 ` Jens Axboe
2015-04-15 22:01 ` [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO() Jens Axboe
2015-04-15 22:01 ` [PATCH 3/3] ext4: " Jens Axboe
2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
2015-04-15 22:06 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150415223620.GU13731@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@fb.com \
--cc=elliott@hp.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).