linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: willy@linux.intel.com (Matthew Wilcox)
Subject: [PATCH] NVMe: Avoid caculate cq head doorbel in nvme_process_cq()
Date: Wed, 4 Sep 2013 12:51:42 -0400	[thread overview]
Message-ID: <20130904165142.GA20931@linux.intel.com> (raw)
In-Reply-To: <5226FA47.10205@huawei.com>

On Wed, Sep 04, 2013@05:15:51PM +0800, Haiyan Hu wrote:
> On 2013/9/3 23:53, Matthew Wilcox wrote:
> > Have you been able to measure a difference?  If so, can you share
> > relative numbers?
> 
> Sorry for my inaccurate description.
> Our test data shows that this patch does not change cmd process latency.
> But I think it can improve code efficiency, maybe a little.

Right, so it's a trade-off.  You want to add 8 bytes to the queue data
structure in order to save a few instructions from being executed at
queue processing time.  We need to quantify the savings (since we know
the costs).  I approximated the code by replacing:

-       writel(head, nvmeq->q_db + (1 << nvmeq->dev->db_stride));
+       writel(head, nvmeq->q_db);

Here's the code before:

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       48 8b 93 88 00 00 00    mov    0x88(%rbx),%rdx
     2d5:       8b 48 40                mov    0x40(%rax),%ecx
     2d8:       b8 01 00 00 00          mov    $0x1,%eax
     2dd:       d3 e0                   shl    %cl,%eax
     2df:       48 98                   cltq   
     2e1:       48 8d 14 82             lea    (%rdx,%rax,4),%rdx
     2e5:       41 0f b7 c4             movzwl %r12w,%eax
     2e9:       89 02                   mov    %eax,(%rdx)

Here's the code after:

     2ca:       48 8b 93 88 00 00 00    mov    0x88(%rbx),%rdx
     2d1:       41 0f b7 c4             movzwl %r12w,%eax
     2d5:       89 02                   mov    %eax,(%rdx)

That's 6 instructions sved, and to be fair they have some pretty tight
dependencies.  Still, on a 3GHz processor, that's maybe 2 nanoseconds.
If we're doing a million IOPS, we could save 2ms/s, or 0.2%.  Pretty tough
to measure, let alone justify.

Here's an alternative that doesn't require adding 8 bytes to the
nvme_queue.  Instead of storing the shift in db_stride, we can store the
actual stride.  Then there is less calculation to be done at
completion time, and the code looks like this:

-       writel(head, nvmeq->q_db + (1 << nvmeq->dev->db_stride));
+       writel(head, nvmeq->q_db + nvmeq->dev->db_stride);

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       48 63 50 40             movslq 0x40(%rax),%rdx
     2d2:       48 8b 83 88 00 00 00    mov    0x88(%rbx),%rax
     2d9:       48 8d 14 90             lea    (%rax,%rdx,4),%rdx
     2dd:       41 0f b7 c4             movzwl %r12w,%eax
     2e1:       89 02                   mov    %eax,(%rdx)

That saves half the instructions, for no increased data structure usage.

One other microoptimisation we can do is change the type of db_stride from
int to unsigned.  Then the compiler produces:

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       8b 50 40                mov    0x40(%rax),%edx
     2d1:       48 8b 83 88 00 00 00    mov    0x88(%rbx),%rax
     2d8:       48 8d 14 90             lea    (%rax,%rdx,4),%rdx
     2dc:       41 0f b7 c4             movzwl %r12w,%eax
     2e0:       89 02                   mov    %eax,(%rdx)

which uses one fewer byte (at address 2ce, it uses mov instead of movslq).


Would you like to send a patch which changes the type of db_stride to
unsigned and changes the value stored there to be 1 << the current value?

Or you can try to persuade me again that the tradeoff is worth adding
the extra 8 bytes to the data structure :-)

  reply	other threads:[~2013-09-04 16:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.3257.1377730793.1059.linux-nvme@lists.infradead.org>
2013-09-02  6:49 ` [PATCH] NVMe: Avoid caculate cq head doorbel in nvme_process_cq() Huhaiyan (haiyan)
2013-09-03 15:53   ` Matthew Wilcox
2013-09-04  9:15     ` Haiyan Hu
2013-09-04 16:51       ` Matthew Wilcox [this message]
2013-09-05  8:55         ` Haiyan Hu
2013-09-05 19:28           ` Matthew Wilcox
     [not found] <A103C806EC8D6E46A0D950DAA4D118D1400E0067@szxeml558-mbx.china.huawei.com>
2013-09-02  7:00 ` Qiang Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130904165142.GA20931@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).