* SGL support of driver @ 2015-07-27 16:22 sheng qiu 2015-07-27 16:31 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: sheng qiu @ 2015-07-27 16:22 UTC (permalink / raw) Hi, may i ask if exiting kernel driver support SGL DMA setups? Thanks, Sheng ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-27 16:22 SGL support of driver sheng qiu @ 2015-07-27 16:31 ` Christoph Hellwig 2015-07-27 16:50 ` Keith Busch 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2015-07-27 16:31 UTC (permalink / raw) On Mon, Jul 27, 2015@09:22:51AM -0700, sheng qiu wrote: > Hi, > > may i ask if exiting kernel driver support SGL DMA setups? It doesn't. There have been patches but they weren't in shape to be included yet. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-27 16:31 ` Christoph Hellwig @ 2015-07-27 16:50 ` Keith Busch 2015-07-27 16:57 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Keith Busch @ 2015-07-27 16:50 UTC (permalink / raw) On Mon, 27 Jul 2015, Christoph Hellwig wrote: > On Mon, Jul 27, 2015@09:22:51AM -0700, sheng qiu wrote: >> Hi, >> >> may i ask if exiting kernel driver support SGL DMA setups? > > It doesn't. There have been patches but they weren't in shape to be > included yet. We also need data that shows SGL is an improvement over PRP. H/w should be able to process a PRP faster than an SGL in most cases. SGL may be faster with a physically contiguous range across multiple pages, or if a request payload can't be PRP mapped, requiring a bounce buffer. The Linux driver never recieves an IO vector that requires double buffering though, and usually not physically contiguous. I'm not sure there is a case to support SGL in the linux-nvme driver. Maybe if it's tied to block integrity extensions with interleaved metadata formats, but it'd be odd to find a device that implements SGL but not separate metadata. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-27 16:50 ` Keith Busch @ 2015-07-27 16:57 ` Christoph Hellwig 2015-07-27 17:13 ` Keith Busch 2015-07-28 14:35 ` Matthew Wilcox 0 siblings, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2015-07-27 16:57 UTC (permalink / raw) On Mon, Jul 27, 2015@04:50:25PM +0000, Keith Busch wrote: > The Linux driver never recieves an IO vector that requires double > buffering though, and usually not physically contiguous. I'm not sure > there is a case to support SGL in the linux-nvme driver. Maybe if it's > tied to block integrity extensions with interleaved metadata formats, > but it'd be odd to find a device that implements SGL but not separate > metadata. That's because it asks never to get one.. vectored direct I/O is a case where NVMe currently has to split while most other block devices can handle it in a single I/O. For some database workloads this does make a significant difference, and it's even more interesting for providing atomicy to userspace using a future O_ATOMIC. It also requires pagecache I/O to be split into multiple commands where other block devices can handle it a lot more efficiently. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-27 16:57 ` Christoph Hellwig @ 2015-07-27 17:13 ` Keith Busch 2015-07-28 14:35 ` Matthew Wilcox 1 sibling, 0 replies; 10+ messages in thread From: Keith Busch @ 2015-07-27 17:13 UTC (permalink / raw) On Mon, 27 Jul 2015, Christoph Hellwig wrote: > On Mon, Jul 27, 2015@04:50:25PM +0000, Keith Busch wrote: >> The Linux driver never recieves an IO vector that requires double >> buffering though, and usually not physically contiguous. I'm not sure >> there is a case to support SGL in the linux-nvme driver. Maybe if it's >> tied to block integrity extensions with interleaved metadata formats, >> but it'd be odd to find a device that implements SGL but not separate >> metadata. > > That's because it asks never to get one.. > > vectored direct I/O is a case where NVMe currently has to split > while most other block devices can handle it in a single I/O. For > some database workloads this does make a significant difference, and > it's even more interesting for providing atomicy to userspace using > a future O_ATOMIC. Even vectored IO requires each be a block size multiple. Splitting isn't so bad as double buffering. Still, if a PRP split is less efficient than an unsplit SGL, that ought to be compelling enough to apply. > It also requires pagecache I/O to be split into multiple commands > where other block devices can handle it a lot more efficiently. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-27 16:57 ` Christoph Hellwig 2015-07-27 17:13 ` Keith Busch @ 2015-07-28 14:35 ` Matthew Wilcox 2015-07-29 7:55 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2015-07-28 14:35 UTC (permalink / raw) On Mon, Jul 27, 2015@09:57:56AM -0700, Christoph Hellwig wrote: > It also requires pagecache I/O to be split into multiple commands > where other block devices can handle it a lot more efficiently. Wait, what? pagecache I/O is page aligned and page sized. That can always be represented by a PRP list. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-28 14:35 ` Matthew Wilcox @ 2015-07-29 7:55 ` Christoph Hellwig 2015-07-29 14:28 ` Keith Busch 2015-07-30 18:38 ` Matthew Wilcox 0 siblings, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2015-07-29 7:55 UTC (permalink / raw) On Tue, Jul 28, 2015@10:35:00AM -0400, Matthew Wilcox wrote: > On Mon, Jul 27, 2015@09:57:56AM -0700, Christoph Hellwig wrote: > > It also requires pagecache I/O to be split into multiple commands > > where other block devices can handle it a lot more efficiently. > > Wait, what? pagecache I/O is page aligned and page sized. That can > always be represented by a PRP list. Pagecache I/O is not nessecarily page aligned. Think of this case thay I remember clearly because it exposed a bug a few years ago: 64k page size, 4k file system block size, raid 0 with a stripe size of 8k and two legs. The typical SGL feds to the hardware driver for streaming I/O will be: page A, offset 0, len 8k page A, offset 16k, len 8k page A, offset 32k, len 8k page A, offset 48k, len 8k page B, offset 0, len 8k This is clearly something PRP list will not handle well. Note that I'm not nessecarily saying this is soemthing to optimize for, the more interesting case for NVMe really is the vectored direct I/O case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-29 7:55 ` Christoph Hellwig @ 2015-07-29 14:28 ` Keith Busch 2015-07-30 18:38 ` Matthew Wilcox 1 sibling, 0 replies; 10+ messages in thread From: Keith Busch @ 2015-07-29 14:28 UTC (permalink / raw) On Wed, 29 Jul 2015, Christoph Hellwig wrote: > 64k page size, 4k file system block size, raid 0 with a stripe size of > 8k and two legs. > > The typical SGL feds to the hardware driver for streaming I/O will be: > > page A, offset 0, len 8k > page A, offset 16k, len 8k > page A, offset 32k, len 8k > page A, offset 48k, len 8k > page B, offset 0, len 8k I think we can easily optimize for this. It would work w/ PRP if the device's page size is set to <= 8k, which I believe all nvme controllers support. We just need to make a small change to bvec_gap_to_prev() to use the device's page size instead of PAGE_SIZE. > This is clearly something PRP list will not handle well. Note that I'm > not nessecarily saying this is soemthing to optimize for, the more > interesting case for NVMe really is the vectored direct I/O case. Vectored direct never split when the block size >= page size. 512b sectors is still common in NVMe though, so yeah, those may need separate req's. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-29 7:55 ` Christoph Hellwig 2015-07-29 14:28 ` Keith Busch @ 2015-07-30 18:38 ` Matthew Wilcox 2015-07-31 7:19 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2015-07-30 18:38 UTC (permalink / raw) On Wed, Jul 29, 2015@12:55:57AM -0700, Christoph Hellwig wrote: > 64k page size, 4k file system block size, raid 0 with a stripe size of > 8k and two legs. > > The typical SGL feds to the hardware driver for streaming I/O will be: > > page A, offset 0, len 8k > page A, offset 16k, len 8k > page A, offset 32k, len 8k > page A, offset 48k, len 8k > page B, offset 0, len 8k > > This is clearly something PRP list will not handle well. Note that I'm > not nessecarily saying this is soemthing to optimize for, the more > interesting case for NVMe really is the vectored direct I/O case. I'm not really interested in optimising for badly configured RAID setups :-) But point taken, RAID (and other things) can mess with I/Os between the page cache and the driver. ^ permalink raw reply [flat|nested] 10+ messages in thread
* SGL support of driver 2015-07-30 18:38 ` Matthew Wilcox @ 2015-07-31 7:19 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2015-07-31 7:19 UTC (permalink / raw) On Thu, Jul 30, 2015@02:38:28PM -0400, Matthew Wilcox wrote: > I'm not really interested in optimising for badly configured RAID setups :-) > > But point taken, RAID (and other things) can mess with I/Os between the > page cache and the driver. With blocksize < pagesize you can also get other non-aligned I/O, although rately in a way where you could fit multiple of these into a SGL. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-31 7:19 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-27 16:22 SGL support of driver sheng qiu 2015-07-27 16:31 ` Christoph Hellwig 2015-07-27 16:50 ` Keith Busch 2015-07-27 16:57 ` Christoph Hellwig 2015-07-27 17:13 ` Keith Busch 2015-07-28 14:35 ` Matthew Wilcox 2015-07-29 7:55 ` Christoph Hellwig 2015-07-29 14:28 ` Keith Busch 2015-07-30 18:38 ` Matthew Wilcox 2015-07-31 7:19 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.