From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:51420 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751299AbcB2Pqd (ORCPT ); Mon, 29 Feb 2016 10:46:33 -0500 Message-ID: <1456760790.2390.8.camel@HansenPartnership.com> Subject: Re: [Lsf-pc] [LSF/MM ATTEND] block: multipage bvecs From: James Bottomley To: Boaz Harrosh , Christoph Hellwig Cc: linux-block@vger.kernel.org, Ming Lei , Linux FS Devel , lsf-pc@lists.linuxfoundation.org Date: Mon, 29 Feb 2016 07:46:30 -0800 In-Reply-To: <56D41A8E.2070405@plexistor.com> References: <56D2D757.2000204@plexistor.com> <20160228160801.GB12881@infradead.org> <56D41A8E.2070405@plexistor.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, 2016-02-29 at 12:16 +0200, Boaz Harrosh wrote: > On 02/28/2016 06:08 PM, Christoph Hellwig wrote: > > On Sun, Feb 28, 2016 at 01:17:43PM +0200, Boaz Harrosh wrote: > > > I don't know if you ever tried it but I did. If I take a regular > > > SSD disk or a PCIE flash card that I have in my machine and > > > I stick a pointer to a page and bv_len = PAGE_SIZE * 8 and call > > > submit_bio, I get 8 pages worth of IO with a single bvec and it > > > all just works. > > > > No, it will break in all kinds of places. Also you really should > > never just setup bvecs yourself, please always use bio_add_page! > > > > Guys when did you ever stop playing and became so serious? Of course > I never do that in submitted code. But I do like to experiment from > time to time and play around, I like it when my VM crashes ;-) > > That said when did you last look at bio_add_page() it will just work > as well. (Specially lately since the limits are checked later ever > since bios can split) > > So if you have a real hard stair you'll see that we consider bv_len > everywhere and the PAGE_SIZE assumption is more when allocating array > sizes and things like that. Again it will break on SW drivers like > brd and scsi_debug. But will currently work on anything going through > sg-lists and DMA mapping. No, it won't. The segment mappers won't break up anything with bv_len > max segement size, so it gets incorrectly mapped as a sglist entry which is too long. The max segment size can come from two places, it can be a driver limitation, like IDE (the segment length register only has 16 bits in older cards, or it can come from the platform IOMMU descriptors. Running on virtual hardware with no iommu isn't testing any of this. James