linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-fsdevel@vger.kernel.org>, <hch@infradead.org>,
	<viro@zeniv.linux.org.uk>, <linux-kernel@vger.kernel.org>,
	<clm@fb.com>
Subject: Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems
Date: Wed, 15 Apr 2015 14:08:39 -0600	[thread overview]
Message-ID: <552EC547.8070606@fb.com> (raw)
In-Reply-To: <20150415124658.7ef9bde65965cc3031239522@linux-foundation.org>

On 04/15/2015 01:46 PM, Andrew Morton wrote:
> On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@fb.com> wrote:
>
>>> Is there similar impact to direct-io-to-file?  It would be nice to fix
>>> that up also.  Many filesystems do something along the lines of
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	atomic_dev(i_dio_count);
>>> 	__blockdev_direct_IO(...);
>>>
>>> and with your patch I think we could change them to
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	__blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
>>> 	atomic_dev(i_dio_count);
>>>
>>> which would halve the atomic op load.
>>
>> I haven't checked pure file, but without extending, I suspect that we
>> should see similar benefits there. In any case, it'd make sense doing,
>> having twice the atomic inc/dec is just a bad idea in general, if we can
>> get rid of it.
>>
>> A quick grep doesn't show this use case, or I'm just blind. Where do you
>> see that?
>
> btrfs_direct_IO() holds i_dio_count across its call to
> __blockdev_direct_IO() for writes.  That makes the dio_bio_count
> manipulation in do_blockdev_direct_IO() unneeded?  ext4 is similar.

You are right, I even modified those.. Yes, that looks like it's 
unnecessary. Chris?

> Reducing from 4 ops to 2 probably won't make as much difference as
> reducing from 2 to 0 - most of the cost comes from initially grabbing that
> cacheline from a different CPU.

It wont be a 50% reduction, but it all really depends a lot on timing. 
If you have enough banging on it, it could be close to a 50% reduction.


>>> But that's piling hack on top of hack.  Can we change the
>>
>> I'd more view it as reducing the hack, the real hack is the way that we
>> manually do atomic_inc() on i_dio_count, and then call a magic
>> inode_dio_done() that decrements it again. It's not very pretty, I'm
>> just reducing the scope of the hack :-)
>
> A magic flag which says "you don't need to do this in that case because
> I know this inode is special".  direct-io already has too much of this :(

Well, outside of rewriting the dio code, that's what we get. The flags 
already exist, and I don't disagree with you that the situation is 
generally a mess.

>>> do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
>>> increment i_dio_count"?  ie: exclusion against truncate is wholly the
>>> caller's responsibility.  That way, this awkward sharing of
>>> responsibility between caller and callee gets cleaned up and
>>> DIO_IGNORE_TRUNCATE goes away.
>>
>> That would clean it up, at the expense of touching more churn. I'd be
>> fine with doing it that way.
>
> OK, could be done later I suppose.

It could easily be layered on top, doesn't have to be part of the 
initial patch. Once the flag is there, callers can do what they need and 
add the flag.

>>> inode_dio_begin() would be a good place to assert that i_mutex is held,
>>> btw.
>>
>> How would you check it? If the i_dio_count is bumped, then you'd not
>> need to hold i_mutex.
>
> 	if (atomic_add_return() == 1)
> 		assert()

That's not a 100% catch, but probably good enough.

> I guess.  It was just a thought.  Having wandered around the code, I'm
> not 100% confident that everyone is holding i_mutex - it's not all
> obviously correct.

Personally I'd just prefer not having to dive deeper into this for the 
benefit of this patch.

> otoh, the caller doesn't *have* to choose i_mutex for the external
> exclusion, and perhaps some callers have used something else.
>


-- 
Jens Axboe


  reply	other threads:[~2015-04-15 20:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-15 18:22 [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
2015-04-15 18:56 ` Andrew Morton
2015-04-15 19:26   ` Jens Axboe
2015-04-15 19:46     ` Andrew Morton
2015-04-15 20:08       ` Jens Axboe [this message]
2015-04-15 22:25   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552EC547.8070606@fb.com \
    --to=axboe@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).