[PATCH] NVMe: Avoid caculate cq head doorbel in nvme_process_cq()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: willy@linux.intel.com (Matthew Wilcox)
Subject: [PATCH] NVMe: Avoid caculate cq head doorbel in nvme_process_cq()
Date: Wed, 4 Sep 2013 12:51:42 -0400	[thread overview]
Message-ID: <20130904165142.GA20931@linux.intel.com> (raw)
In-Reply-To: <5226FA47.10205@huawei.com>

On Wed, Sep 04, 2013@05:15:51PM +0800, Haiyan Hu wrote:
> On 2013/9/3 23:53, Matthew Wilcox wrote:
> > Have you been able to measure a difference?  If so, can you share
> > relative numbers?
> 
> Sorry for my inaccurate description.
> Our test data shows that this patch does not change cmd process latency.
> But I think it can improve code efficiency, maybe a little.

Right, so it's a trade-off.  You want to add 8 bytes to the queue data
structure in order to save a few instructions from being executed at
queue processing time.  We need to quantify the savings (since we know
the costs).  I approximated the code by replacing:

-       writel(head, nvmeq->q_db + (1 << nvmeq->dev->db_stride));
+       writel(head, nvmeq->q_db);

Here's the code before:

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       48 8b 93 88 00 00 00    mov    0x88(%rbx),%rdx
     2d5:       8b 48 40                mov    0x40(%rax),%ecx
     2d8:       b8 01 00 00 00          mov    $0x1,%eax
     2dd:       d3 e0                   shl    %cl,%eax
     2df:       48 98                   cltq   
     2e1:       48 8d 14 82             lea    (%rdx,%rax,4),%rdx
     2e5:       41 0f b7 c4             movzwl %r12w,%eax
     2e9:       89 02                   mov    %eax,(%rdx)

Here's the code after:

     2ca:       48 8b 93 88 00 00 00    mov    0x88(%rbx),%rdx
     2d1:       41 0f b7 c4             movzwl %r12w,%eax
     2d5:       89 02                   mov    %eax,(%rdx)

That's 6 instructions sved, and to be fair they have some pretty tight
dependencies.  Still, on a 3GHz processor, that's maybe 2 nanoseconds.
If we're doing a million IOPS, we could save 2ms/s, or 0.2%.  Pretty tough
to measure, let alone justify.

Here's an alternative that doesn't require adding 8 bytes to the
nvme_queue.  Instead of storing the shift in db_stride, we can store the
actual stride.  Then there is less calculation to be done at
completion time, and the code looks like this:

-       writel(head, nvmeq->q_db + (1 << nvmeq->dev->db_stride));
+       writel(head, nvmeq->q_db + nvmeq->dev->db_stride);

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       48 63 50 40             movslq 0x40(%rax),%rdx
     2d2:       48 8b 83 88 00 00 00    mov    0x88(%rbx),%rax
     2d9:       48 8d 14 90             lea    (%rax,%rdx,4),%rdx
     2dd:       41 0f b7 c4             movzwl %r12w,%eax
     2e1:       89 02                   mov    %eax,(%rdx)

That saves half the instructions, for no increased data structure usage.

One other microoptimisation we can do is change the type of db_stride from
int to unsigned.  Then the compiler produces:

     2ca:       48 8b 43 08             mov    0x8(%rbx),%rax
     2ce:       8b 50 40                mov    0x40(%rax),%edx
     2d1:       48 8b 83 88 00 00 00    mov    0x88(%rbx),%rax
     2d8:       48 8d 14 90             lea    (%rax,%rdx,4),%rdx
     2dc:       41 0f b7 c4             movzwl %r12w,%eax
     2e0:       89 02                   mov    %eax,(%rdx)

which uses one fewer byte (at address 2ce, it uses mov instead of movslq).

Would you like to send a patch which changes the type of db_stride to
unsigned and changes the value stored there to be 1 << the current value?

Or you can try to persuade me again that the tradeoff is worth adding
the extra 8 bytes to the data structure :-)

next prev parent reply	other threads:[~2013-09-04 16:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.3257.1377730793.1059.linux-nvme@lists.infradead.org>
2013-09-02  6:49 ` [PATCH] NVMe: Avoid caculate cq head doorbel in nvme_process_cq() Huhaiyan (haiyan)
2013-09-03 15:53   ` Matthew Wilcox
2013-09-04  9:15     ` Haiyan Hu
2013-09-04 16:51       ` Matthew Wilcox [this message]
2013-09-05  8:55         ` Haiyan Hu
2013-09-05 19:28           ` Matthew Wilcox
     [not found] <A103C806EC8D6E46A0D950DAA4D118D1400E0067@szxeml558-mbx.china.huawei.com>
2013-09-02  7:00 ` Qiang Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130904165142.GA20931@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.