From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XYrq1-0002p3-Mg for qemu-devel@nongnu.org; Tue, 30 Sep 2014 03:27:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XYrpm-0000TH-TD for qemu-devel@nongnu.org; Tue, 30 Sep 2014 03:27:09 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:36806 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XYrpm-0000Sx-KJ for qemu-devel@nongnu.org; Tue, 30 Sep 2014 03:26:54 -0400 Message-ID: <542A5B33.20707@kamp.de> Date: Tue, 30 Sep 2014 09:26:43 +0200 From: Peter Lieven MIME-Version: 1.0 References: <541AE887.9050607@redhat.com> <541C3095.4040405@redhat.com> <541FEF3A.30909@kamp.de> <54207349.2000107@redhat.com> <54210FEB.50900@kamp.de> <20140923085917.GB3871@noname.str.redhat.com> <54213E3B.5040302@kamp.de> <20140923094729.GD3871@noname.str.redhat.com> <542142D7.3040006@kamp.de> <20140923100517.GE3871@noname.str.redhat.com> In-Reply-To: <20140923100517.GE3871@noname.str.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Paolo Bonzini , ronniesahlberg@gmail.com, qemu-devel@nongnu.org, stefanha@redhat.com, mreitz@redhat.com On 23.09.2014 12:05, Kevin Wolf wrote: > Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben: >> On 23.09.2014 11:47, Kevin Wolf wrote: >>> Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben: >>>> On 23.09.2014 10:59, Kevin Wolf wrote: >>>>> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben: >>>>>> On 22.09.2014 21:06, Paolo Bonzini wrote: >>>>>>> Il 22/09/2014 11:43, Peter Lieven ha scritto: >>>>>>>> This series aims not at touching default behaviour. The default for max_transfer_length >>>>>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request >>>>>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack. >>>>>>> Understood. But the right fix is to avoid that backend limits transpire >>>>>>> into guest ABI, not to catch the limits earlier. So the right fix would >>>>>>> be to implement request splitting. >>>>>> Since you proposed to add traces for this would you leave those in? >>>>>> And since iSCSI is the only user of this at the moment would you >>>>>> go for implementing this check in the iSCSI block layer? >>>>>> >>>>>> As for the split logic would you think it is enough to build new qiov's >>>>>> out of the too big iov without copying the contents? This would work >>>>>> as long as a single iov inside the qiov is not bigger the max_transfer_length. >>>>>> Otherwise I would need to allocate temporary buffers and copy around. >>>>> You can split single iovs, too. There are functions that make this very >>>>> easy, they copy a sub-qiov with a byte granularity offset and length >>>>> (qemu_iovec_concat and friends). qcow2 uses the same to split requests >>>>> at (fragmented) cluster boundaries. >>>> Might it be as easy as this? >>> This is completely untested, right? :-) >> Yes :-) >> I was just unsure if the general approach is right. > Looks alright to me. > >>> But ignoring bugs, the principle looks right. >>> >>>> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, >>>> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, >>>> BdrvRequestFlags flags) >>>> { >>>> if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { >>>> return -EINVAL; >>>> } >>>> >>>> if (bs->bl.max_transfer_length && >>>> nb_sectors > bs->bl.max_transfer_length) { >>>> int ret = 0; >>>> QEMUIOVector *qiov2 = NULL; >>> Make it "QEMUIOVector qiov2;" on the stack. >>> >>>> size_t soffset = 0; >>>> >>>> trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors, >>>> bs->bl.max_transfer_length); >>>> >>>> qemu_iovec_init(qiov2, qiov->niov); >>> And &qiov2 here, then this doesn't crash with a NULL dereference. >>> >>>> while (nb_sectors > bs->bl.max_transfer_length && !ret) { >>>> qemu_iovec_reset(qiov2); >>>> qemu_iovec_concat(qiov2, qiov, soffset, >>>> bs->bl.max_transfer_length << BDRV_SECTOR_BITS); >>>> ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS, >>>> bs->bl.max_transfer_length << BDRV_SECTOR_BITS, >>>> qiov2, flags); >>>> soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS; >>>> sector_num += bs->bl.max_transfer_length; >>>> nb_sectors -= bs->bl.max_transfer_length; >>>> } >>>> qemu_iovec_destroy(qiov2); >>>> if (ret) { >>>> return ret; >>>> } >>> The error check needs to be immediately after the assignment of ret, >>> otherwise the next loop iteration can overwrite an error with a success >>> (and if it didn't, it would still do useless I/O because the request as >>> a whole would fail anyway). >> There is a && !ret in the loop condition. I wanted to avoid copying the destroy part. > Ah, yes, clever. I missed that. Maybe too clever then. ;-) > >> BTW, is it !ret or ret < 0 ? > It only ever returns 0 or negative, so both are equivalent. I > prefer checks for ret < 0, but that's a matter of style rather than > correctness. > > Another problem I just noticed is that this is not the only caller of > bdrv_co_do_preadv(), so you're not splitting all requests. The > synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get > the functionality this way. > > Perhaps you should be doing it inside bdrv_co_do_preadv(), before the > call to bdrv_aligned_preadv(). Might even be more correct if it can > happen that the alignment adjustment increases a request too much to fit > in bl.max_transfer_length. If I do it this way can I use the same req Object for all splitted requests? Peter