SGL support of driver

All of lore.kernel.org
 help / color / mirror / Atom feed

* SGL support of driver
@ 2015-07-27 16:22 sheng qiu
  2015-07-27 16:31 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: sheng qiu @ 2015-07-27 16:22 UTC (permalink / raw)


Hi,

may i ask if exiting kernel driver support SGL DMA setups?

Thanks,
Sheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-27 16:22 SGL support of driver sheng qiu
@ 2015-07-27 16:31 ` Christoph Hellwig
  2015-07-27 16:50   ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2015-07-27 16:31 UTC (permalink / raw)


On Mon, Jul 27, 2015@09:22:51AM -0700, sheng qiu wrote:
> Hi,
> 
> may i ask if exiting kernel driver support SGL DMA setups?

It doesn't.  There have been patches but they weren't in shape to be
included yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-27 16:31 ` Christoph Hellwig
@ 2015-07-27 16:50   ` Keith Busch
  2015-07-27 16:57     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2015-07-27 16:50 UTC (permalink / raw)

On Mon, 27 Jul 2015, Christoph Hellwig wrote:
> On Mon, Jul 27, 2015@09:22:51AM -0700, sheng qiu wrote:
>> Hi,
>>
>> may i ask if exiting kernel driver support SGL DMA setups?
>
> It doesn't.  There have been patches but they weren't in shape to be
> included yet.

We also need data that shows SGL is an improvement over PRP. H/w should
be able to process a PRP faster than an SGL in most cases.

SGL may be faster with a physically contiguous range across multiple
pages, or if a request payload can't be PRP mapped, requiring a bounce
buffer.

The Linux driver never recieves an IO vector that requires double
buffering though, and usually not physically contiguous. I'm not sure
there is a case to support SGL in the linux-nvme driver. Maybe if it's
tied to block integrity extensions with interleaved metadata formats,
but it'd be odd to find a device that implements SGL but not separate
metadata.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-27 16:50   ` Keith Busch
@ 2015-07-27 16:57     ` Christoph Hellwig
  2015-07-27 17:13       ` Keith Busch
  2015-07-28 14:35       ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2015-07-27 16:57 UTC (permalink / raw)

On Mon, Jul 27, 2015@04:50:25PM +0000, Keith Busch wrote:
> The Linux driver never recieves an IO vector that requires double
> buffering though, and usually not physically contiguous. I'm not sure
> there is a case to support SGL in the linux-nvme driver. Maybe if it's
> tied to block integrity extensions with interleaved metadata formats,
> but it'd be odd to find a device that implements SGL but not separate
> metadata.

That's because it asks never to get one..

vectored direct I/O is a case where NVMe currently has to split
while most other block devices can handle it in a single I/O.  For
some database workloads this does make a significant difference, and
it's even more interesting for providing atomicy to userspace using
a future O_ATOMIC.

It also requires pagecache I/O to be split into multiple commands
where other block devices can handle it a lot more efficiently.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-27 16:57     ` Christoph Hellwig
@ 2015-07-27 17:13       ` Keith Busch
  2015-07-28 14:35       ` Matthew Wilcox
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2015-07-27 17:13 UTC (permalink / raw)


On Mon, 27 Jul 2015, Christoph Hellwig wrote:
> On Mon, Jul 27, 2015@04:50:25PM +0000, Keith Busch wrote:
>> The Linux driver never recieves an IO vector that requires double
>> buffering though, and usually not physically contiguous. I'm not sure
>> there is a case to support SGL in the linux-nvme driver. Maybe if it's
>> tied to block integrity extensions with interleaved metadata formats,
>> but it'd be odd to find a device that implements SGL but not separate
>> metadata.
>
> That's because it asks never to get one..
>
> vectored direct I/O is a case where NVMe currently has to split
> while most other block devices can handle it in a single I/O.  For
> some database workloads this does make a significant difference, and
> it's even more interesting for providing atomicy to userspace using
> a future O_ATOMIC.

Even vectored IO requires each be a block size multiple. Splitting isn't
so bad as double buffering.

Still, if a PRP split is less efficient than an unsplit SGL, that ought
to be compelling enough to apply.

> It also requires pagecache I/O to be split into multiple commands
> where other block devices can handle it a lot more efficiently.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-27 16:57     ` Christoph Hellwig
  2015-07-27 17:13       ` Keith Busch
@ 2015-07-28 14:35       ` Matthew Wilcox
  2015-07-29  7:55         ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2015-07-28 14:35 UTC (permalink / raw)


On Mon, Jul 27, 2015@09:57:56AM -0700, Christoph Hellwig wrote:
> It also requires pagecache I/O to be split into multiple commands
> where other block devices can handle it a lot more efficiently.

Wait, what?  pagecache I/O is page aligned and page sized.  That can
always be represented by a PRP list.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-28 14:35       ` Matthew Wilcox
@ 2015-07-29  7:55         ` Christoph Hellwig
  2015-07-29 14:28           ` Keith Busch
  2015-07-30 18:38           ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2015-07-29  7:55 UTC (permalink / raw)

On Tue, Jul 28, 2015@10:35:00AM -0400, Matthew Wilcox wrote:
> On Mon, Jul 27, 2015@09:57:56AM -0700, Christoph Hellwig wrote:
> > It also requires pagecache I/O to be split into multiple commands
> > where other block devices can handle it a lot more efficiently.
> 
> Wait, what?  pagecache I/O is page aligned and page sized.  That can
> always be represented by a PRP list.

Pagecache I/O is not nessecarily page aligned.

Think of this case thay I remember clearly because it exposed a bug
a few years ago:

64k page size, 4k file system block size, raid 0 with a stripe size of
8k and two legs.

The typical SGL feds to the hardware driver for streaming I/O will be:

page A, offset 0, len 8k
page A, offset 16k, len 8k
page A, offset 32k, len 8k
page A, offset 48k, len 8k
page B, offset 0, len 8k

This is clearly something PRP list will not handle well.  Note that I'm
not nessecarily saying this is soemthing to optimize for, the more
interesting case for NVMe really is the vectored direct I/O case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-29  7:55         ` Christoph Hellwig
@ 2015-07-29 14:28           ` Keith Busch
  2015-07-30 18:38           ` Matthew Wilcox
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2015-07-29 14:28 UTC (permalink / raw)

On Wed, 29 Jul 2015, Christoph Hellwig wrote:
> 64k page size, 4k file system block size, raid 0 with a stripe size of
> 8k and two legs.
>
> The typical SGL feds to the hardware driver for streaming I/O will be:
>
> page A, offset 0, len 8k
> page A, offset 16k, len 8k
> page A, offset 32k, len 8k
> page A, offset 48k, len 8k
> page B, offset 0, len 8k

I think we can easily optimize for this. It would work w/ PRP if the
device's page size is set to <= 8k, which I believe all nvme controllers
support. We just need to make a small change to bvec_gap_to_prev()
to use the device's page size instead of PAGE_SIZE.

> This is clearly something PRP list will not handle well.  Note that I'm
> not nessecarily saying this is soemthing to optimize for, the more
> interesting case for NVMe really is the vectored direct I/O case.

Vectored direct never split when the block size >= page size.

512b sectors is still common in NVMe though, so yeah, those may need
separate req's.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-29  7:55         ` Christoph Hellwig
  2015-07-29 14:28           ` Keith Busch
@ 2015-07-30 18:38           ` Matthew Wilcox
  2015-07-31  7:19             ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2015-07-30 18:38 UTC (permalink / raw)


On Wed, Jul 29, 2015@12:55:57AM -0700, Christoph Hellwig wrote:
> 64k page size, 4k file system block size, raid 0 with a stripe size of
> 8k and two legs.
> 
> The typical SGL feds to the hardware driver for streaming I/O will be:
> 
> page A, offset 0, len 8k
> page A, offset 16k, len 8k
> page A, offset 32k, len 8k
> page A, offset 48k, len 8k
> page B, offset 0, len 8k
> 
> This is clearly something PRP list will not handle well.  Note that I'm
> not nessecarily saying this is soemthing to optimize for, the more
> interesting case for NVMe really is the vectored direct I/O case.

I'm not really interested in optimising for badly configured RAID setups :-)

But point taken, RAID (and other things) can mess with I/Os between the
page cache and the driver.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* SGL support of driver
  2015-07-30 18:38           ` Matthew Wilcox
@ 2015-07-31  7:19             ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2015-07-31  7:19 UTC (permalink / raw)


On Thu, Jul 30, 2015@02:38:28PM -0400, Matthew Wilcox wrote:
> I'm not really interested in optimising for badly configured RAID setups :-)
> 
> But point taken, RAID (and other things) can mess with I/Os between the
> page cache and the driver.

With blocksize < pagesize you can also get other non-aligned I/O,
although rately in a way where you could fit multiple of these into a
SGL.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-31  7:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-27 16:22 SGL support of driver sheng qiu
2015-07-27 16:31 ` Christoph Hellwig
2015-07-27 16:50   ` Keith Busch
2015-07-27 16:57     ` Christoph Hellwig
2015-07-27 17:13       ` Keith Busch
2015-07-28 14:35       ` Matthew Wilcox
2015-07-29  7:55         ` Christoph Hellwig
2015-07-29 14:28           ` Keith Busch
2015-07-30 18:38           ` Matthew Wilcox
2015-07-31  7:19             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.