All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-fsdevel@vger.kernel.org>, <hch@infradead.org>,
	<viro@zeniv.linux.org.uk>, <linux-kernel@vger.kernel.org>,
	<clm@fb.com>
Subject: Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems
Date: Wed, 15 Apr 2015 14:08:39 -0600	[thread overview]
Message-ID: <552EC547.8070606@fb.com> (raw)
In-Reply-To: <20150415124658.7ef9bde65965cc3031239522@linux-foundation.org>

On 04/15/2015 01:46 PM, Andrew Morton wrote:
> On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@fb.com> wrote:
>
>>> Is there similar impact to direct-io-to-file?  It would be nice to fix
>>> that up also.  Many filesystems do something along the lines of
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	atomic_dev(i_dio_count);
>>> 	__blockdev_direct_IO(...);
>>>
>>> and with your patch I think we could change them to
>>>
>>> 	atomic_inc(i_dio_count);
>>> 	wibble()
>>> 	__blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
>>> 	atomic_dev(i_dio_count);
>>>
>>> which would halve the atomic op load.
>>
>> I haven't checked pure file, but without extending, I suspect that we
>> should see similar benefits there. In any case, it'd make sense doing,
>> having twice the atomic inc/dec is just a bad idea in general, if we can
>> get rid of it.
>>
>> A quick grep doesn't show this use case, or I'm just blind. Where do you
>> see that?
>
> btrfs_direct_IO() holds i_dio_count across its call to
> __blockdev_direct_IO() for writes.  That makes the dio_bio_count
> manipulation in do_blockdev_direct_IO() unneeded?  ext4 is similar.

You are right, I even modified those.. Yes, that looks like it's 
unnecessary. Chris?

> Reducing from 4 ops to 2 probably won't make as much difference as
> reducing from 2 to 0 - most of the cost comes from initially grabbing that
> cacheline from a different CPU.

It wont be a 50% reduction, but it all really depends a lot on timing. 
If you have enough banging on it, it could be close to a 50% reduction.


>>> But that's piling hack on top of hack.  Can we change the
>>
>> I'd more view it as reducing the hack, the real hack is the way that we
>> manually do atomic_inc() on i_dio_count, and then call a magic
>> inode_dio_done() that decrements it again. It's not very pretty, I'm
>> just reducing the scope of the hack :-)
>
> A magic flag which says "you don't need to do this in that case because
> I know this inode is special".  direct-io already has too much of this :(

Well, outside of rewriting the dio code, that's what we get. The flags 
already exist, and I don't disagree with you that the situation is 
generally a mess.

>>> do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
>>> increment i_dio_count"?  ie: exclusion against truncate is wholly the
>>> caller's responsibility.  That way, this awkward sharing of
>>> responsibility between caller and callee gets cleaned up and
>>> DIO_IGNORE_TRUNCATE goes away.
>>
>> That would clean it up, at the expense of touching more churn. I'd be
>> fine with doing it that way.
>
> OK, could be done later I suppose.

It could easily be layered on top, doesn't have to be part of the 
initial patch. Once the flag is there, callers can do what they need and 
add the flag.

>>> inode_dio_begin() would be a good place to assert that i_mutex is held,
>>> btw.
>>
>> How would you check it? If the i_dio_count is bumped, then you'd not
>> need to hold i_mutex.
>
> 	if (atomic_add_return() == 1)
> 		assert()

That's not a 100% catch, but probably good enough.

> I guess.  It was just a thought.  Having wandered around the code, I'm
> not 100% confident that everyone is holding i_mutex - it's not all
> obviously correct.

Personally I'd just prefer not having to dive deeper into this for the 
benefit of this patch.

> otoh, the caller doesn't *have* to choose i_mutex for the external
> exclusion, and perhaps some callers have used something else.
>


-- 
Jens Axboe


  reply	other threads:[~2015-04-15 20:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-15 18:22 [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems Jens Axboe
2015-04-15 18:56 ` Andrew Morton
2015-04-15 19:26   ` Jens Axboe
2015-04-15 19:46     ` Andrew Morton
2015-04-15 20:08       ` Jens Axboe [this message]
2015-04-15 22:25   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552EC547.8070606@fb.com \
    --to=axboe@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.