From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFA25C433E0 for ; Wed, 17 Mar 2021 06:56:24 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6824864DF2 for ; Wed, 17 Mar 2021 06:56:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6824864DF2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615964183; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=CXgoTpUJr25KRooyX+MzqlWwMaB9/s/TGJbGpdSrVRU=; b=PB9MM5psJYnRLOLVI/fLuLznQ5EQeqeWQ6Dc4MONYhnKwJ+13V9lG/LiDhqdTHK/qUrjQW rfMW29p/KLDhVu1/aznYXYc+D4EluX+yuI4lRRaOi3S9cbCVK+FsU/OoihNi6yA4e5edoA MlnRWftJWI2yaXRQj3P1T5ss4bmWTQQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-454-GVFOORhDOjSY4FEerBvP0g-1; Wed, 17 Mar 2021 02:56:21 -0400 X-MC-Unique: GVFOORhDOjSY4FEerBvP0g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7D9323E741; Wed, 17 Mar 2021 06:56:16 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 337BA6C333; Wed, 17 Mar 2021 06:56:15 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 152FB1800657; Wed, 17 Mar 2021 06:56:12 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 12H6tCCK029722 for ; Wed, 17 Mar 2021 02:55:12 -0400 Received: by smtp.corp.redhat.com (Postfix) id 777E25D74F; Wed, 17 Mar 2021 06:55:12 +0000 (UTC) Received: from T590 (ovpn-13-221.pek2.redhat.com [10.72.13.221]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 16B215D747; Wed, 17 Mar 2021 06:54:59 +0000 (UTC) Date: Wed, 17 Mar 2021 14:54:55 +0800 From: Ming Lei To: JeffleXu Message-ID: References: <20210316031523.864506-1-ming.lei@redhat.com> <20210316031523.864506-9-ming.lei@redhat.com> <3848e80d-e7ad-9372-c96f-d32bfb9f0ae5@linux.alibaba.com> <06bd2e9e-68c9-f74b-f59f-b565a557c47d@linux.alibaba.com> MIME-Version: 1.0 In-Reply-To: <06bd2e9e-68c9-f74b-f59f-b565a557c47d@linux.alibaba.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-loop: dm-devel@redhat.com Cc: Jens Axboe , linux-block@vger.kernel.org, dm-devel@redhat.com, Christoph Hellwig , Mike Snitzer Subject: Re: [dm-devel] [RFC PATCH 08/11] block: use per-task poll context to implement bio based io poll X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Mar 17, 2021 at 11:53:12AM +0800, JeffleXu wrote: > > > On 3/17/21 10:54 AM, Ming Lei wrote: > > On Tue, Mar 16, 2021 at 04:52:36PM +0800, JeffleXu wrote: > >> > >> > >> On 3/16/21 3:17 PM, Ming Lei wrote: > >>> On Tue, Mar 16, 2021 at 02:46:08PM +0800, JeffleXu wrote: > >>>> It is a giant progress to gather all split bios that need to be polled > >>>> in a per-task queue. Still some comments below. > >>>> > >>>> > >>>> On 3/16/21 11:15 AM, Ming Lei wrote: > >>>>> Currently bio based IO poll needs to poll all hw queue blindly, this way > >>>>> is very inefficient, and the big reason is that we can't pass bio > >>>>> submission result to io poll task. > >>>>> > >>>>> In IO submission context, store associated underlying bios into the > >>>>> submission queue and save 'cookie' poll data in bio->bi_iter.bi_private_data, > >>>>> and return current->pid to caller of submit_bio() for any DM or bio based > >>>>> driver's IO, which is submitted from FS. > >>>>> > >>>>> In IO poll context, the passed cookie tells us the PID of submission > >>>>> context, and we can find the bio from that submission context. Moving > >>>>> bio from submission queue to poll queue of the poll context, and keep > >>>>> polling until these bios are ended. Remove bio from poll queue if the > >>>>> bio is ended. Add BIO_DONE and BIO_END_BY_POLL for such purpose. > >>>>> > >>>>> Usually submission shares context with io poll. The per-task poll context > >>>>> is just like stack variable, and it is cheap to move data between the two > >>>>> per-task queues. > >>>>> > >>>>> Signed-off-by: Ming Lei > >>>>> --- > >>>>> block/bio.c | 5 ++ > >>>>> block/blk-core.c | 74 +++++++++++++++++- > >>>>> block/blk-mq.c | 156 +++++++++++++++++++++++++++++++++++++- > >>>>> include/linux/blk_types.h | 3 + > >>>>> 4 files changed, 235 insertions(+), 3 deletions(-) > >>>>> > >>>>> diff --git a/block/bio.c b/block/bio.c > >>>>> index a1c4d2900c7a..bcf5eca0e8e3 100644 > >>>>> --- a/block/bio.c > >>>>> +++ b/block/bio.c > >>>>> @@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio) > >>>>> **/ > >>>>> void bio_endio(struct bio *bio) > >>>>> { > >>>>> + /* BIO_END_BY_POLL has to be set before calling submit_bio */ > >>>>> + if (bio_flagged(bio, BIO_END_BY_POLL)) { > >>>>> + bio_set_flag(bio, BIO_DONE); > >>>>> + return; > >>>>> + } > >>>>> again: > >>>>> if (!bio_remaining_done(bio)) > >>>>> return; > >>>>> diff --git a/block/blk-core.c b/block/blk-core.c > >>>>> index a082bbc856fb..970b23fa2e6e 100644 > >>>>> --- a/block/blk-core.c > >>>>> +++ b/block/blk-core.c > >>>>> @@ -854,6 +854,40 @@ static inline void blk_bio_poll_preprocess(struct request_queue *q, > >>>>> bio->bi_opf |= REQ_TAG; > >>>>> } > >>>>> > >>>>> +static bool blk_bio_poll_prep_submit(struct io_context *ioc, struct bio *bio) > >>>>> +{ > >>>>> + struct blk_bio_poll_data data = { > >>>>> + .bio = bio, > >>>>> + }; > >>>>> + struct blk_bio_poll_ctx *pc = ioc->data; > >>>>> + unsigned int queued; > >>>>> + > >>>>> + /* lock is required if there is more than one writer */ > >>>>> + if (unlikely(atomic_read(&ioc->nr_tasks) > 1)) { > >>>>> + spin_lock(&pc->lock); > >>>>> + queued = kfifo_put(&pc->sq, data); > >>>>> + spin_unlock(&pc->lock); > >>>>> + } else { > >>>>> + queued = kfifo_put(&pc->sq, data); > >>>>> + } > >>>>> + > >>>>> + /* > >>>>> + * Now the bio is added per-task fifo, mark it as END_BY_POLL, > >>>>> + * so we can save cookie into this bio after submit_bio(). > >>>>> + */ > >>>>> + if (queued) > >>>>> + bio_set_flag(bio, BIO_END_BY_POLL); > >>>>> + else > >>>>> + bio->bi_opf &= ~(REQ_HIPRI | REQ_TAG); > >>>>> + > >>>>> + return queued; > >>>>> +} > >>>> > >>>> The size of kfifo is limited, and it seems that once the sq of kfifio is > >>>> full, REQ_HIPRI flag is cleared and the corresponding bio is actually > >>>> enqueued into the default hw queue, which is IRQ driven. > >>> > >>> Yeah, this patch starts with 64 queue depth, and we can increase it to > >>> 128, which should cover most of cases. > >> > >> It seems that the queue depth of kfifo will affect the performance as I > >> did a fast test. > >> > >> > >> > >> Test Result: > >> > >> BLK_BIO_POLL_SQ_SZ | iodepth | IOPS > >> ------------------ | ------- | ---- > >> 64 | 128 | 301k (IRQ) -> 340k (iopoll) > >> 64 | 16 | 304k (IRQ) -> 392k (iopoll) > >> 128 | 128 | 204k (IRQ) -> 317k (iopoll) > >> 256 | 128 | 241k (IRQ) -> 391k (iopoll) > >> > >> It seems that BLK_BIO_POLL_SQ_SZ need to be increased accordingly when > >> iodepth is quite large. But I don't know why the performance in IRQ mode > >> decreases when BLK_BIO_POLL_SQ_SZ is increased. > > > > This patchset is supposed to not affect IRQ mode because HIPRI isn't set > > at IRQ mode. Or you mean '--hipri' & io_uring is setup but setting > > nvme.poll_queues as 0 at your 'IRQ' mode test? > > > > Thanks for starting to run performance test, and so far I just run test > > in KVM, not start performance test yet. > > > > 'IRQ' means 'hipri=0' of fio configuration. 'hipri=0' isn't supposed to be affected by this patchset. thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC5FDC433E6 for ; Wed, 17 Mar 2021 06:55:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B40264F9E for ; Wed, 17 Mar 2021 06:55:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231495AbhCQGzZ (ORCPT ); Wed, 17 Mar 2021 02:55:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:29904 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231411AbhCQGzQ (ORCPT ); Wed, 17 Mar 2021 02:55:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615964115; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cBNnF1VkbT4zWHtzQzP4RpRu5UyOA6tHQT1xgUMS3Gs=; b=B7j94W8RRzJ3rfFlk98wLX5M/zlAAZ4egX/k5L7ojOibLvZJFlMIJAN5fAPRD9byGZhUqG WeKSzm+ZA7p2Yt1h9oOwcyPmK+YHM+WGjC+KUYjFNv+zXML1nPhtb7tHvWtbn4iCWGt8Eu xfr/ldOztJE/9oERD5fKG0R0S8ScZt0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-576-rK8u71CIOxWQ0Kh4Ep6kaQ-1; Wed, 17 Mar 2021 02:55:13 -0400 X-MC-Unique: rK8u71CIOxWQ0Kh4Ep6kaQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 79FD03E741; Wed, 17 Mar 2021 06:55:12 +0000 (UTC) Received: from T590 (ovpn-13-221.pek2.redhat.com [10.72.13.221]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 16B215D747; Wed, 17 Mar 2021 06:54:59 +0000 (UTC) Date: Wed, 17 Mar 2021 14:54:55 +0800 From: Ming Lei To: JeffleXu Cc: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , Mike Snitzer , dm-devel@redhat.com Subject: Re: [RFC PATCH 08/11] block: use per-task poll context to implement bio based io poll Message-ID: References: <20210316031523.864506-1-ming.lei@redhat.com> <20210316031523.864506-9-ming.lei@redhat.com> <3848e80d-e7ad-9372-c96f-d32bfb9f0ae5@linux.alibaba.com> <06bd2e9e-68c9-f74b-f59f-b565a557c47d@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <06bd2e9e-68c9-f74b-f59f-b565a557c47d@linux.alibaba.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, Mar 17, 2021 at 11:53:12AM +0800, JeffleXu wrote: > > > On 3/17/21 10:54 AM, Ming Lei wrote: > > On Tue, Mar 16, 2021 at 04:52:36PM +0800, JeffleXu wrote: > >> > >> > >> On 3/16/21 3:17 PM, Ming Lei wrote: > >>> On Tue, Mar 16, 2021 at 02:46:08PM +0800, JeffleXu wrote: > >>>> It is a giant progress to gather all split bios that need to be polled > >>>> in a per-task queue. Still some comments below. > >>>> > >>>> > >>>> On 3/16/21 11:15 AM, Ming Lei wrote: > >>>>> Currently bio based IO poll needs to poll all hw queue blindly, this way > >>>>> is very inefficient, and the big reason is that we can't pass bio > >>>>> submission result to io poll task. > >>>>> > >>>>> In IO submission context, store associated underlying bios into the > >>>>> submission queue and save 'cookie' poll data in bio->bi_iter.bi_private_data, > >>>>> and return current->pid to caller of submit_bio() for any DM or bio based > >>>>> driver's IO, which is submitted from FS. > >>>>> > >>>>> In IO poll context, the passed cookie tells us the PID of submission > >>>>> context, and we can find the bio from that submission context. Moving > >>>>> bio from submission queue to poll queue of the poll context, and keep > >>>>> polling until these bios are ended. Remove bio from poll queue if the > >>>>> bio is ended. Add BIO_DONE and BIO_END_BY_POLL for such purpose. > >>>>> > >>>>> Usually submission shares context with io poll. The per-task poll context > >>>>> is just like stack variable, and it is cheap to move data between the two > >>>>> per-task queues. > >>>>> > >>>>> Signed-off-by: Ming Lei > >>>>> --- > >>>>> block/bio.c | 5 ++ > >>>>> block/blk-core.c | 74 +++++++++++++++++- > >>>>> block/blk-mq.c | 156 +++++++++++++++++++++++++++++++++++++- > >>>>> include/linux/blk_types.h | 3 + > >>>>> 4 files changed, 235 insertions(+), 3 deletions(-) > >>>>> > >>>>> diff --git a/block/bio.c b/block/bio.c > >>>>> index a1c4d2900c7a..bcf5eca0e8e3 100644 > >>>>> --- a/block/bio.c > >>>>> +++ b/block/bio.c > >>>>> @@ -1402,6 +1402,11 @@ static inline bool bio_remaining_done(struct bio *bio) > >>>>> **/ > >>>>> void bio_endio(struct bio *bio) > >>>>> { > >>>>> + /* BIO_END_BY_POLL has to be set before calling submit_bio */ > >>>>> + if (bio_flagged(bio, BIO_END_BY_POLL)) { > >>>>> + bio_set_flag(bio, BIO_DONE); > >>>>> + return; > >>>>> + } > >>>>> again: > >>>>> if (!bio_remaining_done(bio)) > >>>>> return; > >>>>> diff --git a/block/blk-core.c b/block/blk-core.c > >>>>> index a082bbc856fb..970b23fa2e6e 100644 > >>>>> --- a/block/blk-core.c > >>>>> +++ b/block/blk-core.c > >>>>> @@ -854,6 +854,40 @@ static inline void blk_bio_poll_preprocess(struct request_queue *q, > >>>>> bio->bi_opf |= REQ_TAG; > >>>>> } > >>>>> > >>>>> +static bool blk_bio_poll_prep_submit(struct io_context *ioc, struct bio *bio) > >>>>> +{ > >>>>> + struct blk_bio_poll_data data = { > >>>>> + .bio = bio, > >>>>> + }; > >>>>> + struct blk_bio_poll_ctx *pc = ioc->data; > >>>>> + unsigned int queued; > >>>>> + > >>>>> + /* lock is required if there is more than one writer */ > >>>>> + if (unlikely(atomic_read(&ioc->nr_tasks) > 1)) { > >>>>> + spin_lock(&pc->lock); > >>>>> + queued = kfifo_put(&pc->sq, data); > >>>>> + spin_unlock(&pc->lock); > >>>>> + } else { > >>>>> + queued = kfifo_put(&pc->sq, data); > >>>>> + } > >>>>> + > >>>>> + /* > >>>>> + * Now the bio is added per-task fifo, mark it as END_BY_POLL, > >>>>> + * so we can save cookie into this bio after submit_bio(). > >>>>> + */ > >>>>> + if (queued) > >>>>> + bio_set_flag(bio, BIO_END_BY_POLL); > >>>>> + else > >>>>> + bio->bi_opf &= ~(REQ_HIPRI | REQ_TAG); > >>>>> + > >>>>> + return queued; > >>>>> +} > >>>> > >>>> The size of kfifo is limited, and it seems that once the sq of kfifio is > >>>> full, REQ_HIPRI flag is cleared and the corresponding bio is actually > >>>> enqueued into the default hw queue, which is IRQ driven. > >>> > >>> Yeah, this patch starts with 64 queue depth, and we can increase it to > >>> 128, which should cover most of cases. > >> > >> It seems that the queue depth of kfifo will affect the performance as I > >> did a fast test. > >> > >> > >> > >> Test Result: > >> > >> BLK_BIO_POLL_SQ_SZ | iodepth | IOPS > >> ------------------ | ------- | ---- > >> 64 | 128 | 301k (IRQ) -> 340k (iopoll) > >> 64 | 16 | 304k (IRQ) -> 392k (iopoll) > >> 128 | 128 | 204k (IRQ) -> 317k (iopoll) > >> 256 | 128 | 241k (IRQ) -> 391k (iopoll) > >> > >> It seems that BLK_BIO_POLL_SQ_SZ need to be increased accordingly when > >> iodepth is quite large. But I don't know why the performance in IRQ mode > >> decreases when BLK_BIO_POLL_SQ_SZ is increased. > > > > This patchset is supposed to not affect IRQ mode because HIPRI isn't set > > at IRQ mode. Or you mean '--hipri' & io_uring is setup but setting > > nvme.poll_queues as 0 at your 'IRQ' mode test? > > > > Thanks for starting to run performance test, and so far I just run test > > in KVM, not start performance test yet. > > > > 'IRQ' means 'hipri=0' of fio configuration. 'hipri=0' isn't supposed to be affected by this patchset. thanks, Ming