From mboxrd@z Thu Jan 1 00:00:00 1970 From: axboe@kernel.dk (Jens Axboe) Date: Thu, 13 Nov 2014 18:29:37 -0700 Subject: [PATCH] NVMe: Add rw_page support In-Reply-To: <1415923538-18760-1-git-send-email-keith.busch@intel.com> References: <1415923538-18760-1-git-send-email-keith.busch@intel.com> Message-ID: <54655B01.9090206@kernel.dk> On 2014-11-13 17:05, Keith Busch wrote: > This adds the rw_page entry point to the nvme driver so a page can be > written/read without going through the block layer and not requiring > additional allocations. > > Just because we implement this doesn't mean we want to use it. I only see > a performance win on some types of work, like swap where I see about 15% > reduction in system time (compared to 20% prior to blk-mq when we didn't > allocate a request to get a command id). Even then, system time accounts > for very little of the real time anyway, and it's only an over-all win > if the device has very low-latency. But the driver doesn't know this > nor if the expected workload will even benefit from using page io, > so I added a queue flag that a user can toggle on/off. Of the reduced system time, where was that spent? > The other benefit besides reduced system time is that we can swap > pages in/out without having to allocate anything since everything is > preallocated in this path. I have an iod prealloc patch that gets rid of the last kmalloc/kfree in the IO path for nvme, for smaller IO (defaults to 8KB, or 2 segments), making the generic IO path contain no allocations/frees for smaller IO. Probably that would get us pretty close to rw_page. For direct/sync IO, we'll go direct to ->queue_rq mostly, too. The downside I see is that this is an OOB IO path. Once we start adding IO scheduling for those that need that, then this will completely bypass that. -- Jens Axboe