Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Thumshirn <jthumshirn@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	linux-block@vger.kernel.org, Linux-scsi@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Christoph Hellwig <hch@infradead.org>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers
Date: Fri, 20 Jan 2017 13:22:54 +0100	[thread overview]
Message-ID: <20170120122254.GA5947@linux-x5ow.site> (raw)
In-Reply-To: <f10b8508-22a2-2959-c369-2dd29f271d2c@grimberg.me>

On Tue, Jan 17, 2017 at 05:45:53PM +0200, Sagi Grimberg wrote:
> 
> >--
> >[1]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7310 |****************************************|
> >         2 -> 3          : 11       |      |
> >         4 -> 7          : 10       |      |
> >         8 -> 15         : 20       |      |
> >        16 -> 31         : 0        |      |
> >        32 -> 63         : 0        |      |
> >        64 -> 127        : 1        |      |
> >
> >[2]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7309 |****************************************|
> >         2 -> 3          : 14       |      |
> >         4 -> 7          : 7        |      |
> >         8 -> 15         : 17       |      |
> >
> 
> Rrr, email made the histograms look funky (tabs vs. spaces...)
> The count is what's important anyways...
> 
> Just adding that I used an Intel P3500 nvme device.
> 
> >We can see that most of the time our latency is pretty good (<1ns) but with
> >huge tail latencies (some 8-15 ns and even one in 32-63 ns).
> 
> Obviously is micro-seconds and not nano-seconds (I wish...)

So to share yesterday's (and today's) findings:

On AHCI I see only one completion polled as well.

This probably is because in contrast to networking (with NAPI) in the block
layer we do have a link between submission and completion whereas in networking
RX and TX are decoupled. So if we're sending out one request we get the
completion for it.

What we'd need is a link to know "we've sent 10 requests out, now poll for the
10 completions after the 1st IRQ". So basically what NVMe already did with
calling __nvme_process_cq() after submission. Maybe we should even disable
IRQs when submitting and re-enable after submitting so the
submission patch doesn't get preempted by a completion.

Does this make sense?

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: Felix Imendï¿½rffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nï¿½rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

WARNING: multiple messages have this Message-ID (diff)

From: jthumshirn@suse.de (Johannes Thumshirn)
Subject: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers
Date: Fri, 20 Jan 2017 13:22:54 +0100	[thread overview]
Message-ID: <20170120122254.GA5947@linux-x5ow.site> (raw)
In-Reply-To: <f10b8508-22a2-2959-c369-2dd29f271d2c@grimberg.me>

On Tue, Jan 17, 2017@05:45:53PM +0200, Sagi Grimberg wrote:
> 
> >--
> >[1]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7310 |****************************************|
> >         2 -> 3          : 11       |      |
> >         4 -> 7          : 10       |      |
> >         8 -> 15         : 20       |      |
> >        16 -> 31         : 0        |      |
> >        32 -> 63         : 0        |      |
> >        64 -> 127        : 1        |      |
> >
> >[2]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7309 |****************************************|
> >         2 -> 3          : 14       |      |
> >         4 -> 7          : 7        |      |
> >         8 -> 15         : 17       |      |
> >
> 
> Rrr, email made the histograms look funky (tabs vs. spaces...)
> The count is what's important anyways...
> 
> Just adding that I used an Intel P3500 nvme device.
> 
> >We can see that most of the time our latency is pretty good (<1ns) but with
> >huge tail latencies (some 8-15 ns and even one in 32-63 ns).
> 
> Obviously is micro-seconds and not nano-seconds (I wish...)

So to share yesterday's (and today's) findings:

On AHCI I see only one completion polled as well.

This probably is because in contrast to networking (with NAPI) in the block
layer we do have a link between submission and completion whereas in networking
RX and TX are decoupled. So if we're sending out one request we get the
completion for it.

What we'd need is a link to know "we've sent 10 requests out, now poll for the
10 completions after the 1st IRQ". So basically what NVMe already did with
calling __nvme_process_cq() after submission. Maybe we should even disable
IRQs when submitting and re-enable after submitting so the
submission patch doesn't get preempted by a completion.

Does this make sense?

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

WARNING: multiple messages have this Message-ID (diff)

From: Johannes Thumshirn <jthumshirn@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	linux-block@vger.kernel.org, Linux-scsi@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Christoph Hellwig <hch@infradead.org>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers
Date: Fri, 20 Jan 2017 13:22:54 +0100	[thread overview]
Message-ID: <20170120122254.GA5947@linux-x5ow.site> (raw)
In-Reply-To: <f10b8508-22a2-2959-c369-2dd29f271d2c@grimberg.me>

On Tue, Jan 17, 2017 at 05:45:53PM +0200, Sagi Grimberg wrote:
> 
> >--
> >[1]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7310 |****************************************|
> >         2 -> 3          : 11       |      |
> >         4 -> 7          : 10       |      |
> >         8 -> 15         : 20       |      |
> >        16 -> 31         : 0        |      |
> >        32 -> 63         : 0        |      |
> >        64 -> 127        : 1        |      |
> >
> >[2]
> >queue = b'nvme0q1'
> >     usecs               : count     distribution
> >         0 -> 1          : 7309 |****************************************|
> >         2 -> 3          : 14       |      |
> >         4 -> 7          : 7        |      |
> >         8 -> 15         : 17       |      |
> >
> 
> Rrr, email made the histograms look funky (tabs vs. spaces...)
> The count is what's important anyways...
> 
> Just adding that I used an Intel P3500 nvme device.
> 
> >We can see that most of the time our latency is pretty good (<1ns) but with
> >huge tail latencies (some 8-15 ns and even one in 32-63 ns).
> 
> Obviously is micro-seconds and not nano-seconds (I wish...)

So to share yesterday's (and today's) findings:

On AHCI I see only one completion polled as well.

This probably is because in contrast to networking (with NAPI) in the block
layer we do have a link between submission and completion whereas in networking
RX and TX are decoupled. So if we're sending out one request we get the
completion for it.

What we'd need is a link to know "we've sent 10 requests out, now poll for the
10 completions after the 1st IRQ". So basically what NVMe already did with
calling __nvme_process_cq() after submission. Maybe we should even disable
IRQs when submitting and re-enable after submitting so the
submission patch doesn't get preempted by a completion.

Does this make sense?

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

next prev parent reply	other threads:[~2017-01-20 12:22 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-11 13:43 [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers Johannes Thumshirn
2017-01-11 13:43 ` Johannes Thumshirn
2017-01-11 13:43 ` Johannes Thumshirn
2017-01-11 13:46 ` Hannes Reinecke
2017-01-11 13:46   ` Hannes Reinecke
2017-01-11 13:46   ` Hannes Reinecke
2017-01-11 15:07 ` Jens Axboe
2017-01-11 15:07   ` Jens Axboe
2017-01-11 15:13   ` Jens Axboe
2017-01-11 15:13     ` Jens Axboe
2017-01-12  8:23     ` Sagi Grimberg
2017-01-12  8:23       ` Sagi Grimberg
2017-01-12 10:02       ` Johannes Thumshirn
2017-01-12 10:02         ` Johannes Thumshirn
2017-01-12 10:02         ` Johannes Thumshirn
2017-01-12 11:44         ` Sagi Grimberg
2017-01-12 11:44           ` Sagi Grimberg
2017-01-12 12:53           ` Johannes Thumshirn
2017-01-12 12:53             ` Johannes Thumshirn
2017-01-12 12:53             ` Johannes Thumshirn
2017-01-12 14:41             ` [Lsf-pc] " Sagi Grimberg
2017-01-12 14:41               ` Sagi Grimberg
2017-01-12 18:59               ` Johannes Thumshirn
2017-01-12 18:59                 ` Johannes Thumshirn
2017-01-12 18:59                 ` Johannes Thumshirn
2017-01-17 15:38       ` Sagi Grimberg
2017-01-17 15:38         ` Sagi Grimberg
2017-01-17 15:45         ` Sagi Grimberg
2017-01-17 15:45           ` Sagi Grimberg
2017-01-20 12:22           ` Johannes Thumshirn [this message]
2017-01-20 12:22             ` Johannes Thumshirn
2017-01-20 12:22             ` Johannes Thumshirn
2017-01-17 16:15         ` Sagi Grimberg
2017-01-17 16:15           ` Sagi Grimberg
2017-01-17 16:27           ` Johannes Thumshirn
2017-01-17 16:27             ` Johannes Thumshirn
2017-01-17 16:27             ` Johannes Thumshirn
2017-01-17 16:38             ` Sagi Grimberg
2017-01-17 16:38               ` Sagi Grimberg
2017-01-18 13:51               ` Johannes Thumshirn
2017-01-18 13:51                 ` Johannes Thumshirn
2017-01-18 13:51                 ` Johannes Thumshirn
2017-01-18 14:27                 ` Sagi Grimberg
2017-01-18 14:27                   ` Sagi Grimberg
2017-01-18 14:36                   ` Andrey Kuzmin
2017-01-18 14:36                     ` Andrey Kuzmin
2017-01-18 14:40                     ` Sagi Grimberg
2017-01-18 14:40                       ` Sagi Grimberg
2017-01-18 15:35                       ` Andrey Kuzmin
2017-01-18 15:35                         ` Andrey Kuzmin
2017-01-18 14:58                   ` Johannes Thumshirn
2017-01-18 14:58                     ` Johannes Thumshirn
2017-01-18 14:58                     ` Johannes Thumshirn
2017-01-18 15:14                     ` Sagi Grimberg
2017-01-18 15:14                       ` Sagi Grimberg
2017-01-18 15:16                       ` Johannes Thumshirn
2017-01-18 15:16                         ` Johannes Thumshirn
2017-01-18 15:16                         ` Johannes Thumshirn
2017-01-18 15:39                         ` Hannes Reinecke
2017-01-18 15:39                           ` Hannes Reinecke
2017-01-18 15:39                           ` Hannes Reinecke
2017-01-19  8:12                           ` Sagi Grimberg
2017-01-19  8:12                             ` Sagi Grimberg
2017-01-19  8:23                             ` Sagi Grimberg
2017-01-19  8:23                               ` Sagi Grimberg
2017-01-19  9:18                               ` Johannes Thumshirn
2017-01-19  9:18                                 ` Johannes Thumshirn
2017-01-19  9:18                                 ` Johannes Thumshirn
2017-01-19  9:13                             ` Johannes Thumshirn
2017-01-19  9:13                               ` Johannes Thumshirn
2017-01-19  9:13                               ` Johannes Thumshirn
2017-01-17 16:44         ` Andrey Kuzmin
2017-01-17 16:50           ` Sagi Grimberg
2017-01-17 16:50             ` Sagi Grimberg
2017-01-18 14:02             ` Hannes Reinecke
2017-01-18 14:02               ` Hannes Reinecke
2017-01-20  0:13               ` Jens Axboe
2017-01-20  0:13                 ` Jens Axboe
2017-01-13 15:56     ` Johannes Thumshirn
2017-01-13 15:56       ` Johannes Thumshirn
2017-01-13 15:56       ` Johannes Thumshirn
2017-01-11 15:16   ` Hannes Reinecke
2017-01-11 15:16     ` Hannes Reinecke
2017-01-11 15:16     ` Hannes Reinecke
2017-01-12  4:36   ` Stephen Bates
2017-01-12  4:44     ` Jens Axboe
2017-01-12  4:44       ` Jens Axboe
2017-01-12  4:56       ` Stephen Bates
2017-01-12  4:56         ` Stephen Bates
2017-01-19 10:57   ` Ming Lei
2017-01-19 10:57     ` Ming Lei
2017-01-19 11:03     ` Hannes Reinecke
2017-01-19 11:03       ` Hannes Reinecke
2017-01-11 16:08 ` Bart Van Assche
2017-01-11 16:08   ` Bart Van Assche
2017-01-11 16:08   ` Bart Van Assche
2017-01-11 16:12   ` hch
2017-01-11 16:12     ` hch
2017-01-11 16:15     ` Jens Axboe
2017-01-11 16:15       ` Jens Axboe
2017-01-11 16:22     ` Hannes Reinecke
2017-01-11 16:22       ` Hannes Reinecke
2017-01-11 16:22       ` Hannes Reinecke
2017-01-11 16:26       ` Bart Van Assche
2017-01-11 16:26         ` Bart Van Assche
2017-01-11 16:26         ` Bart Van Assche
2017-01-11 16:45         ` Hannes Reinecke
2017-01-11 16:45           ` Hannes Reinecke
2017-01-11 16:45           ` Hannes Reinecke
2017-01-12  8:52         ` sagi grimberg
2017-01-12  8:52           ` sagi grimberg
2017-01-11 16:14   ` Johannes Thumshirn
2017-01-11 16:14     ` Johannes Thumshirn
2017-01-11 16:14     ` Johannes Thumshirn
2017-01-12  8:41   ` Sagi Grimberg
2017-01-12  8:41     ` Sagi Grimberg
2017-01-12  8:41     ` Sagi Grimberg
2017-01-12 19:13     ` Bart Van Assche
2017-01-12 19:13       ` Bart Van Assche
2017-01-12 19:13       ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170120122254.GA5947@linux-x5ow.site \
    --to=jthumshirn@suse.de \
    --cc=Linux-scsi@vger.kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.