All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	Christoph Hellwig <hch@lst.de>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	Mike Snitzer <snitzer@redhat.com>
Cc: "linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"boaz@plexistor.com" <boaz@plexistor.com>,
	"Kani, Toshimitsu" <toshi.kani@hp.com>,
	"Knippers, Linda" <linda.knippers@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: pmem and i_dio_count overhead
Date: Wed, 15 Apr 2015 12:27:33 -0600	[thread overview]
Message-ID: <552EAD95.2080707@kernel.dk> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B40295A858600@G9W0745.americas.hpqcorp.net>

On 04/03/2015 03:35 PM, Elliott, Robert (Server Storage) wrote:
> Jens, one of your patches from October 2013 never made it
> to the kernel, but would be beneficial for pmem.  It helps
> IOPS about 15%.
>
> Original patch: https://lkml.org/lkml/2013/10/24/130
>
>>  From Jens Axboe
>> Subject [PATCH 05/11] direct-io: only inc/dec inode->i_dio_count for file systems
>> Date Thu, 24 Oct 2013 10:25:58 +0100
>>
>> We don't need truncate protection for block devices, so add a flag
>> bypassing this cache line dirtying twice for every IO. This easily
>> contributes to 5-10% of the CPU time on high IOPS O_DIRECT testing.
>
> Here are perf top results while running fio to pmem devices
> using memcpy with non-temporal load and store instructions:
>
>   20.54%  [pmem]                   [k] pmem_do_bvec.isra.6   <the memcpy function>
>   10.13%  [kernel]                 [k] do_blockdev_direct_IO
>    5.93%  [kernel]                 [k] inode_dio_done
>    4.46%  [kernel]                 [k] bio_endio
>    3.07%  fio                      [.] get_io_u
>    2.08%  fio                      [.] do_io
>
> Inside do_blockdev_direct_io (10%), 60% of the time is spent
> atomically incrementing i_dio_count:
>
>         │      static inline void atomic_inc(atomic_t *v)
>         │      {
>         │              asm volatile(LOCK_PREFIX "incl %0"
>    0.06 │ 225:   lock   incl   0x134(%r14)
>         │              atomic_inc(&inode->i_dio_count);
>         │
>         │              retval = 0;
>         │              sdio.blkbits = blkbits;
>         │              sdio.blkfactor = i_blkbits - blkbits;
>         │              sdio.block_in_file = offset >> blkbits;
>   60.31 │        mov    -0x1d0(%rbp),%rdx
>    0.16 │        mov    %r12d,%ecx
>         │               */
>         │              atomic_inc(&inode->i_dio_count);
>         │
>         │              retval = 0;
>         │              sdio.blkbits = blkbits;
>         │              sdio.blkfactor = i_blkbits - blkbits;
>    0.00 │        sub    %r12d,%ebx
>         │               * Will be decremented at I/O completion time.
>         │               */
>         │              atomic_inc(&inode->i_dio_count);
>
> inode_dio_done is taking all of its 5.8% time doing the
> corresponding atomic_dec.
>
> So, they're combining for 11.8% of the overall CPU time.
> The problem is more atomic contention than cache line dirtying.
>
> Applying your patch (changing the bitmask from 0x04 to
> 0x08, since 0x04 is taken now) eliminates those
> instructions from perf top and improves the high IOPS
> results by 5 to 15%.
>
> Attr	Copy		Read IOPS		Write IOPS
> ====	====		=========		==========
> UC	NT rd,wr	513 K			326 K
> with the patch:	510 K			325 K
>
> WB	NT rd,wr	3.3 M			3.5 M
> with the patch:	3.8 M			3.9 M
>
> WC	NT rd,wr	3.0 M			3.9 M
> with the patch:	3.1 M			4.1 M
>
> WT	NT rd,wr	3.3 M			2.1 M
> with the patch:	3.7 M			3.7 M
>
> (there is some other test environment inconsistency
> with WT writes - I don't think this change really
> helped by 76%)

Just re-posted a cleaned up variant, forgot to CC you... You've got it 
in private email as well.

Yes, lets finally get this in! Andrew, we ended up bike shedding on this 
patch a lot this time, which is ultimately why it got dropped on the 
floor. I CC'ed you on the new submission as well.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
	Christoph Hellwig <hch@lst.de>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	Mike Snitzer <snitzer@redhat.com>
Cc: "linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"boaz@plexistor.com" <boaz@plexistor.com>,
	"Kani, Toshimitsu" <toshi.kani@hp.com>,
	"Knippers, Linda" <linda.knippers@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: pmem and i_dio_count overhead
Date: Wed, 15 Apr 2015 12:27:33 -0600	[thread overview]
Message-ID: <552EAD95.2080707@kernel.dk> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B40295A858600@G9W0745.americas.hpqcorp.net>

On 04/03/2015 03:35 PM, Elliott, Robert (Server Storage) wrote:
> Jens, one of your patches from October 2013 never made it
> to the kernel, but would be beneficial for pmem.  It helps
> IOPS about 15%.
>
> Original patch: https://lkml.org/lkml/2013/10/24/130
>
>>  From Jens Axboe
>> Subject [PATCH 05/11] direct-io: only inc/dec inode->i_dio_count for file systems
>> Date Thu, 24 Oct 2013 10:25:58 +0100
>>
>> We don't need truncate protection for block devices, so add a flag
>> bypassing this cache line dirtying twice for every IO. This easily
>> contributes to 5-10% of the CPU time on high IOPS O_DIRECT testing.
>
> Here are perf top results while running fio to pmem devices
> using memcpy with non-temporal load and store instructions:
>
>   20.54%  [pmem]                   [k] pmem_do_bvec.isra.6   <the memcpy function>
>   10.13%  [kernel]                 [k] do_blockdev_direct_IO
>    5.93%  [kernel]                 [k] inode_dio_done
>    4.46%  [kernel]                 [k] bio_endio
>    3.07%  fio                      [.] get_io_u
>    2.08%  fio                      [.] do_io
>
> Inside do_blockdev_direct_io (10%), 60% of the time is spent
> atomically incrementing i_dio_count:
>
>         │      static inline void atomic_inc(atomic_t *v)
>         │      {
>         │              asm volatile(LOCK_PREFIX "incl %0"
>    0.06 │ 225:   lock   incl   0x134(%r14)
>         │              atomic_inc(&inode->i_dio_count);
>         │
>         │              retval = 0;
>         │              sdio.blkbits = blkbits;
>         │              sdio.blkfactor = i_blkbits - blkbits;
>         │              sdio.block_in_file = offset >> blkbits;
>   60.31 │        mov    -0x1d0(%rbp),%rdx
>    0.16 │        mov    %r12d,%ecx
>         │               */
>         │              atomic_inc(&inode->i_dio_count);
>         │
>         │              retval = 0;
>         │              sdio.blkbits = blkbits;
>         │              sdio.blkfactor = i_blkbits - blkbits;
>    0.00 │        sub    %r12d,%ebx
>         │               * Will be decremented at I/O completion time.
>         │               */
>         │              atomic_inc(&inode->i_dio_count);
>
> inode_dio_done is taking all of its 5.8% time doing the
> corresponding atomic_dec.
>
> So, they're combining for 11.8% of the overall CPU time.
> The problem is more atomic contention than cache line dirtying.
>
> Applying your patch (changing the bitmask from 0x04 to
> 0x08, since 0x04 is taken now) eliminates those
> instructions from perf top and improves the high IOPS
> results by 5 to 15%.
>
> Attr	Copy		Read IOPS		Write IOPS
> ====	====		=========		==========
> UC	NT rd,wr	513 K			326 K
> with the patch:	510 K			325 K
>
> WB	NT rd,wr	3.3 M			3.5 M
> with the patch:	3.8 M			3.9 M
>
> WC	NT rd,wr	3.0 M			3.9 M
> with the patch:	3.1 M			4.1 M
>
> WT	NT rd,wr	3.3 M			2.1 M
> with the patch:	3.7 M			3.7 M
>
> (there is some other test environment inconsistency
> with WT writes - I don't think this change really
> helped by 76%)

Just re-posted a cleaned up variant, forgot to CC you... You've got it 
in private email as well.

Yes, lets finally get this in! Andrew, we ended up bike shedding on this 
patch a lot this time, which is ultimately why it got dropped on the 
floor. I CC'ed you on the new submission as well.

-- 
Jens Axboe


  reply	other threads:[~2015-04-15 18:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-03 21:35 pmem and i_dio_count overhead Elliott, Robert (Server Storage)
2015-04-03 21:35 ` Elliott, Robert (Server Storage)
2015-04-15 18:27 ` Jens Axboe [this message]
2015-04-15 18:27   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552EAD95.2080707@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Elliott@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=boaz@plexistor.com \
    --cc=hch@lst.de \
    --cc=linda.knippers@hp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=snitzer@redhat.com \
    --cc=toshi.kani@hp.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.