All of lore.kernel.org
 help / color / mirror / Atom feed
From: willy@linux.intel.com (Matthew Wilcox)
Subject: [PATCH] NVMe: Remove superfluous cqe_seen
Date: Thu, 19 Jun 2014 12:59:57 -0400	[thread overview]
Message-ID: <20140619165957.GK12025@linux.intel.com> (raw)
In-Reply-To: <80B89753B40C5141A3E2D53FE7A2A8A9943A7ADC@NTXBOIMBX02.micron.com>

On Thu, May 22, 2014@12:10:19AM +0000, Sam Bradshaw (sbradshaw) wrote:
> Performance problem, though not very easily measured.  At very high iops
> rates, most if not all cqe's are processed via nvme_process_cq() in 
> make_request(), leaving nvme_irq() with no work to do.  Nevertheless, it
> always writes cqe_seen, which invalidates a very hot cacheline.  This
> is somewhat exacerbated when IO submissions originate on a remote node
> relative to the cpu handling the irq.

I was thinking "Hey, we should move cqe_seen to a different cacheline".
So I looked at the cacheline assignments for the different variables,
and cqe_seen is on the same cacheline as cq_head and cq_phase, so that
cacheline is already being dirtied.  Indeed, it's in the same Dword as
cq_phase, so I'd be amazed if the CPU didn't coalesce the two writes.
That might be a more fruitful patch ... rearrange nvme_queue to put
cq_head, cq_phase and cqe_seen in the same Dword, and expect the CPU to
optimise the three assignments into a single Dword store.

I'll let you try it out since you have the setup to benchmark it.  Right now,
this is the layout I see:

        /* --- cacheline 3 boundary (192 bytes) --- */
        u32 *                      q_db;                 /*   192     8 */
        u16                        q_depth;              /*   200     2 */
        u16                        cq_vector;            /*   202     2 */
        u16                        sq_head;              /*   204     2 */
        u16                        sq_tail;              /*   206     2 */
        u16                        cq_head;              /*   208     2 */
        u16                        qid;                  /*   210     2 */
        u8                         cq_phase;             /*   212     1 */
        u8                         cqe_seen;             /*   213     1 */
        u8                         q_suspended;          /*   214     1 */

I notice a 4-byte hole after q_lock, so moving cq_head, cq_phase and
cqe_seen into that space would probably be a good idea (since that
cacheline is definitely dirty).  I really haven't tried to optimise the
frequently-updated parts of the data structure into the same cacheline,
and it should really help your bizarre setup :-).

      parent reply	other threads:[~2014-06-19 16:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-21 18:14 [PATCH] NVMe: Remove superfluous cqe_seen Sam Bradshaw
2014-05-21 23:17 ` Matthew Wilcox
2014-05-22  0:10   ` Sam Bradshaw (sbradshaw)
2014-05-23 14:55     ` Matthew Wilcox
2014-06-19 16:59     ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140619165957.GK12025@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.