From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems Date: Wed, 15 Apr 2015 14:08:39 -0600 Message-ID: <552EC547.8070606@fb.com> References: <20150415182256.GA30339@kernel.dk> <20150415115653.f69d970e4b155deea98a0829@linux-foundation.org> <552EBB7B.4030107@fb.com> <20150415124658.7ef9bde65965cc3031239522@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , To: Andrew Morton Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:42263 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752019AbbDOUJF (ORCPT ); Wed, 15 Apr 2015 16:09:05 -0400 In-Reply-To: <20150415124658.7ef9bde65965cc3031239522@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 04/15/2015 01:46 PM, Andrew Morton wrote: > On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe wrote: > >>> Is there similar impact to direct-io-to-file? It would be nice to fix >>> that up also. Many filesystems do something along the lines of >>> >>> atomic_inc(i_dio_count); >>> wibble() >>> atomic_dev(i_dio_count); >>> __blockdev_direct_IO(...); >>> >>> and with your patch I think we could change them to >>> >>> atomic_inc(i_dio_count); >>> wibble() >>> __blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE); >>> atomic_dev(i_dio_count); >>> >>> which would halve the atomic op load. >> >> I haven't checked pure file, but without extending, I suspect that we >> should see similar benefits there. In any case, it'd make sense doing, >> having twice the atomic inc/dec is just a bad idea in general, if we can >> get rid of it. >> >> A quick grep doesn't show this use case, or I'm just blind. Where do you >> see that? > > btrfs_direct_IO() holds i_dio_count across its call to > __blockdev_direct_IO() for writes. That makes the dio_bio_count > manipulation in do_blockdev_direct_IO() unneeded? ext4 is similar. You are right, I even modified those.. Yes, that looks like it's unnecessary. Chris? > Reducing from 4 ops to 2 probably won't make as much difference as > reducing from 2 to 0 - most of the cost comes from initially grabbing that > cacheline from a different CPU. It wont be a 50% reduction, but it all really depends a lot on timing. If you have enough banging on it, it could be close to a 50% reduction. >>> But that's piling hack on top of hack. Can we change the >> >> I'd more view it as reducing the hack, the real hack is the way that we >> manually do atomic_inc() on i_dio_count, and then call a magic >> inode_dio_done() that decrements it again. It's not very pretty, I'm >> just reducing the scope of the hack :-) > > A magic flag which says "you don't need to do this in that case because > I know this inode is special". direct-io already has too much of this :( Well, outside of rewriting the dio code, that's what we get. The flags already exist, and I don't disagree with you that the situation is generally a mess. >>> do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or >>> increment i_dio_count"? ie: exclusion against truncate is wholly the >>> caller's responsibility. That way, this awkward sharing of >>> responsibility between caller and callee gets cleaned up and >>> DIO_IGNORE_TRUNCATE goes away. >> >> That would clean it up, at the expense of touching more churn. I'd be >> fine with doing it that way. > > OK, could be done later I suppose. It could easily be layered on top, doesn't have to be part of the initial patch. Once the flag is there, callers can do what they need and add the flag. >>> inode_dio_begin() would be a good place to assert that i_mutex is held, >>> btw. >> >> How would you check it? If the i_dio_count is bumped, then you'd not >> need to hold i_mutex. > > if (atomic_add_return() == 1) > assert() That's not a 100% catch, but probably good enough. > I guess. It was just a thought. Having wandered around the code, I'm > not 100% confident that everyone is holding i_mutex - it's not all > obviously correct. Personally I'd just prefer not having to dive deeper into this for the benefit of this patch. > otoh, the caller doesn't *have* to choose i_mutex for the external > exclusion, and perhaps some callers have used something else. > -- Jens Axboe