From: Jens Axboe <axboe@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>, "Theodore Ts'o" <tytso@mit.edu>,
"Elliott, Robert (Server Storage)" <elliott@hp.com>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems
Date: Wed, 15 Apr 2015 16:57:41 -0600 [thread overview]
Message-ID: <552EECE5.30605@fb.com> (raw)
In-Reply-To: <20150415223620.GU13731@dastard>
On 04/15/2015 04:36 PM, Dave Chinner wrote:
> On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
>> do_blockdev_direct_IO() increments and decrements the inode
>> ->i_dio_count for each IO operation. It does this to protect against
>> truncate of a file. Block devices don't need this sort of protection.
>>
>> For a capable multiqueue setup, this atomic int is the only shared
>> state between applications accessing the device for O_DIRECT, and it
>> presents a scaling wall for that. In my testing, as much as 30% of
>> system time is spent incrementing and decrementing this value. A mixed
>> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
>> better latencies too. Before:
> .....
>> diff --git a/fs/inode.c b/fs/inode.c
>> index f00b16f45507..c4901c40ad65 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
>> EXPORT_SYMBOL(inode_dio_wait);
>>
>> /*
>> - * inode_dio_done - signal finish of a direct I/O requests
>> + * inode_dio_begin - signal start of a direct I/O requests
>> * @inode: inode the direct I/O happens on
>> *
>> * This is called once we've finished processing a direct I/O request,
>> * and is used to wake up callers waiting for direct I/O to be quiesced.
>> */
>> -void inode_dio_done(struct inode *inode)
>> +void inode_dio_inc(struct inode *inode)
>
> function name does not match docbook comment....
Oops, will fix that up.
>> +{
>> + atomic_inc(&inode->i_dio_count);
>> +}
>> +EXPORT_SYMBOL(inode_dio_inc);
>> +
>> +/*
>> + * inode_dio_dec - signal finish of a direct I/O requests
>> + * @inode: inode the direct I/O happens on
>> + *
>> + * This is called once we've finished processing a direct I/O request,
>> + * and is used to wake up callers waiting for direct I/O to be quiesced.
>> + */
>> +void inode_dio_dec(struct inode *inode)
>> {
>> if (atomic_dec_and_test(&inode->i_dio_count))
>> wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
>> }
>> -EXPORT_SYMBOL(inode_dio_done);
>> +EXPORT_SYMBOL(inode_dio_dec);
>
> Bikeshedding: I think this would be better suited to inode_dio_begin()
> and inode_dio_end() because now we are trying to say "this is where
> the DIO starts, and this is where it ends". It's not really
> "reference counting" interface, we're trying to annotate the
> boundaries of where DIO iis protected against truncate....
I don't really care, if people like begin/end more than inc/dec, I'm
happy with that.
> And, realistically, if we are pushing this up into the filesystems
> again, we should push it up into *all* filesystems and get rid of it
> completely from the DIO layer. That way no new twisty passages in
> the direct IO code are needed.
Lets please keep that for a potential round 2. It's not like I'm piling
lots of hacks on, it's two one-liner changes. It's not adding a lot to
the entropy of direct-io.c. I've been carrying this patch for years now,
I really don't want to sign up for futzing around in direct-io.c, nor is
that a reasonable requirement imho.
--
Jens Axboe
next prev parent reply other threads:[~2015-04-15 22:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-15 22:01 [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
2015-04-15 22:01 ` [PATCH 1/3] " Jens Axboe
2015-04-15 22:36 ` Dave Chinner
2015-04-15 22:56 ` Al Viro
2015-04-15 23:05 ` Jens Axboe
2015-04-15 23:30 ` Al Viro
2015-04-15 23:50 ` Jens Axboe
2015-04-15 22:57 ` Jens Axboe [this message]
2015-04-15 22:01 ` [PATCH 2/3] btrfs: pass in DIO_SKIP_DIO_COUNT to do_blockdev_direct_IO() Jens Axboe
2015-04-15 22:01 ` [PATCH 3/3] ext4: " Jens Axboe
2015-04-15 22:05 ` [PATCH v2] direct-io: only inc/dec inode->i_dio_count for file systems Al Viro
2015-04-15 22:06 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=552EECE5.30605@fb.com \
--to=axboe@fb.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=elliott@hp.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.