From: Jens Axboe <axboe@kernel.dk>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
Christoph Hellwig <hch@lst.de>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
Mike Snitzer <snitzer@redhat.com>
Cc: "linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
"boaz@plexistor.com" <boaz@plexistor.com>,
"Kani, Toshimitsu" <toshi.kani@hp.com>,
"Knippers, Linda" <linda.knippers@hp.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: pmem and i_dio_count overhead
Date: Wed, 15 Apr 2015 12:27:33 -0600 [thread overview]
Message-ID: <552EAD95.2080707@kernel.dk> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B40295A858600@G9W0745.americas.hpqcorp.net>
On 04/03/2015 03:35 PM, Elliott, Robert (Server Storage) wrote:
> Jens, one of your patches from October 2013 never made it
> to the kernel, but would be beneficial for pmem. It helps
> IOPS about 15%.
>
> Original patch: https://lkml.org/lkml/2013/10/24/130
>
>> From Jens Axboe
>> Subject [PATCH 05/11] direct-io: only inc/dec inode->i_dio_count for file systems
>> Date Thu, 24 Oct 2013 10:25:58 +0100
>>
>> We don't need truncate protection for block devices, so add a flag
>> bypassing this cache line dirtying twice for every IO. This easily
>> contributes to 5-10% of the CPU time on high IOPS O_DIRECT testing.
>
> Here are perf top results while running fio to pmem devices
> using memcpy with non-temporal load and store instructions:
>
> 20.54% [pmem] [k] pmem_do_bvec.isra.6 <the memcpy function>
> 10.13% [kernel] [k] do_blockdev_direct_IO
> 5.93% [kernel] [k] inode_dio_done
> 4.46% [kernel] [k] bio_endio
> 3.07% fio [.] get_io_u
> 2.08% fio [.] do_io
>
> Inside do_blockdev_direct_io (10%), 60% of the time is spent
> atomically incrementing i_dio_count:
>
> │ static inline void atomic_inc(atomic_t *v)
> │ {
> │ asm volatile(LOCK_PREFIX "incl %0"
> 0.06 │ 225: lock incl 0x134(%r14)
> │ atomic_inc(&inode->i_dio_count);
> │
> │ retval = 0;
> │ sdio.blkbits = blkbits;
> │ sdio.blkfactor = i_blkbits - blkbits;
> │ sdio.block_in_file = offset >> blkbits;
> 60.31 │ mov -0x1d0(%rbp),%rdx
> 0.16 │ mov %r12d,%ecx
> │ */
> │ atomic_inc(&inode->i_dio_count);
> │
> │ retval = 0;
> │ sdio.blkbits = blkbits;
> │ sdio.blkfactor = i_blkbits - blkbits;
> 0.00 │ sub %r12d,%ebx
> │ * Will be decremented at I/O completion time.
> │ */
> │ atomic_inc(&inode->i_dio_count);
>
> inode_dio_done is taking all of its 5.8% time doing the
> corresponding atomic_dec.
>
> So, they're combining for 11.8% of the overall CPU time.
> The problem is more atomic contention than cache line dirtying.
>
> Applying your patch (changing the bitmask from 0x04 to
> 0x08, since 0x04 is taken now) eliminates those
> instructions from perf top and improves the high IOPS
> results by 5 to 15%.
>
> Attr Copy Read IOPS Write IOPS
> ==== ==== ========= ==========
> UC NT rd,wr 513 K 326 K
> with the patch: 510 K 325 K
>
> WB NT rd,wr 3.3 M 3.5 M
> with the patch: 3.8 M 3.9 M
>
> WC NT rd,wr 3.0 M 3.9 M
> with the patch: 3.1 M 4.1 M
>
> WT NT rd,wr 3.3 M 2.1 M
> with the patch: 3.7 M 3.7 M
>
> (there is some other test environment inconsistency
> with WT writes - I don't think this change really
> helped by 76%)
Just re-posted a cleaned up variant, forgot to CC you... You've got it
in private email as well.
Yes, lets finally get this in! Andrew, we ended up bike shedding on this
patch a lot this time, which is ultimately why it got dropped on the
floor. I CC'ed you on the new submission as well.
--
Jens Axboe
prev parent reply other threads:[~2015-04-15 18:27 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-03 21:35 pmem and i_dio_count overhead Elliott, Robert (Server Storage)
2015-04-15 18:27 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=552EAD95.2080707@kernel.dk \
--to=axboe@kernel.dk \
--cc=Elliott@hp.com \
--cc=akpm@linux-foundation.org \
--cc=boaz@plexistor.com \
--cc=hch@lst.de \
--cc=linda.knippers@hp.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=ross.zwisler@linux.intel.com \
--cc=snitzer@redhat.com \
--cc=toshi.kani@hp.com \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox