From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02C17C4361B for ; Sun, 20 Dec 2020 15:03:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B141622252 for ; Sun, 20 Dec 2020 15:03:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727487AbgLTPDf (ORCPT ); Sun, 20 Dec 2020 10:03:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726943AbgLTPDe (ORCPT ); Sun, 20 Dec 2020 10:03:34 -0500 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C24CC0613CF for ; Sun, 20 Dec 2020 07:02:53 -0800 (PST) Received: by mail-ed1-x52b.google.com with SMTP id b73so7154136edf.13 for ; Sun, 20 Dec 2020 07:02:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1gyGEWn6o17EKFiF7MfiNX4fjgTL53VmZaubCiRPelE=; b=HwdDyDekaEGwYNVY0jIJRZGEZqmvPLxC4Lb6jiGEkXv4FI7RbTasX/TW8aAUaibfI9 C90nkFQGQsccc2K1gDvddFvWlfsdqfwTy2m2COVPsv0kJdn3MK8RIX8usJK/3g1nIWoX x0gGLCYpKIDyRnsMvThn2ewavbxQ5OfrW//xbWOHa3q2y2EvYeKzSE0iHF+Xq4lETVpr V8+Gw4yGCHFqliPgHBlT5jFZYGVvGdrC1I7ij4s0maTT/PxB0zD4cmlzFG+q59Haf5gN hFhVRSaxfM/V6UDWeSsxrmGe9oHGRsFjsMtZNRikrAom23f/TqxhZwYXDzkxrZE/3D1D AFhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1gyGEWn6o17EKFiF7MfiNX4fjgTL53VmZaubCiRPelE=; b=dasW5WF1PFvldVC9SmBWOjIZ7ku0rYmH82CYGdXB+5/BZKqeuT90d6phDTCDQzypK8 VJMpNTq/pRUrxrOp7hxTWAmCnqXHssOAmBgemEQE5RrHj52ImXLCvC9MtjvOvyJ32BCY hUvPIcoF/ScMBCewQG2vOzzpry7+1g+S78Dal0ZKwsHCgKlX3xmTfW2ewLwG0Mqe7He7 +aaBoDxHY+sZkMkbTDUqA4BUr39Wqqj+aHP9fMXFFGGPIPfj+6ntdyWNCCW+ezbbx+nv 3KBnFuzzCyDWu0LmxqVwxkXQ69gtNoExYDR35+sSwjFmDny9Ck3EuwuXHdalp8H3iSoz tyrA== X-Gm-Message-State: AOAM533VXSkiVdh71ndfDWzfudPE+75c/MgnO84pTpGrDkvuXcNW6SPO VO7hbPhS3gwmIYU3FheZ6HFM2bzwHBXYOQ== X-Google-Smtp-Source: ABdhPJzq7LVX+BX7gKxg7Cn/07AtdKTw6Cv4UjqHNF4psk6MBQX9ipNvqhA0/iViygoaSPeVRL8GDQ== X-Received: by 2002:a05:6402:1748:: with SMTP id v8mr12770376edx.136.1608476572137; Sun, 20 Dec 2020 07:02:52 -0800 (PST) Received: from kheib-workstation ([77.137.92.108]) by smtp.gmail.com with ESMTPSA id o10sm7983455eju.89.2020.12.20.07.02.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Dec 2020 07:02:51 -0800 (PST) Date: Sun, 20 Dec 2020 17:02:48 +0200 From: Kamal Heib To: Bernard Metzler Cc: linux-rdma@vger.kernel.org, jgg@nvidia.com, linux-nvme@lists.infradead.org, leon@kernel.org, Yi Zhang , kernel test robot Subject: Re: [PATCH v3] RDMA/siw: Fix handling of zero-sized Read and Receive Queues. Message-ID: <20201220150248.GA37725@kheib-workstation> References: <20201216110000.611-1-bmt@zurich.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201216110000.611-1-bmt@zurich.ibm.com> Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Wed, Dec 16, 2020 at 12:00:00PM +0100, Bernard Metzler wrote: > During connection setup, the application may choose to zero-size > inbound and outbound READ queues, as well as the Receive queue. > This patch fixes handling of zero-sized queues, but not prevents > it. > > Kamal Heib says in an initial error report: > When running the blktests over siw the following shift-out-of-bounds is > reported, this is happening because the passed IRD or ORD from the ulp > could be zero which will lead to unexpected behavior when calling > roundup_pow_of_two(), fix that by blocking zero values of ORD or IRD. > > UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 > shift exponent 64 is too large for 64-bit type 'long unsigned int' > CPU: 20 PID: 3957 Comm: kworker/u64:13 Tainted: G S 5.10.0-rc6 #2 > Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016 > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > Call Trace: > dump_stack+0x99/0xcb > ubsan_epilogue+0x5/0x40 > __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3 > ? down_write+0x183/0x3d0 > siw_qp_modify.cold.8+0x2d/0x32 [siw] > ? __local_bh_enable_ip+0xa5/0xf0 > siw_accept+0x906/0x1b60 [siw] > ? xa_load+0x147/0x1f0 > ? siw_connect+0x17a0/0x17a0 [siw] > ? lock_downgrade+0x700/0x700 > ? siw_get_base_qp+0x1c2/0x340 [siw] > ? _raw_spin_unlock_irqrestore+0x39/0x40 > iw_cm_accept+0x1f4/0x430 [iw_cm] > rdma_accept+0x3fa/0xb10 [rdma_cm] > ? check_flush_dependency+0x410/0x410 > ? cma_rep_recv+0x570/0x570 [rdma_cm] > nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma] > ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma] > ? lock_release+0x56e/0xcc0 > ? lock_downgrade+0x700/0x700 > ? lock_downgrade+0x700/0x700 > ? __xa_alloc_cyclic+0xef/0x350 > ? __xa_alloc+0x2d0/0x2d0 > ? rdma_restrack_add+0xbe/0x2c0 [ib_core] > ? __ww_mutex_die+0x190/0x190 > cma_cm_event_handler+0xf2/0x500 [rdma_cm] > iw_conn_req_handler+0x910/0xcb0 [rdma_cm] > ? _raw_spin_unlock_irqrestore+0x39/0x40 > ? trace_hardirqs_on+0x1c/0x150 > ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm] > ? __kasan_kmalloc.constprop.7+0xc1/0xd0 > cm_work_handler+0x121c/0x17a0 [iw_cm] > ? iw_cm_reject+0x190/0x190 [iw_cm] > ? trace_hardirqs_on+0x1c/0x150 > process_one_work+0x8fb/0x16c0 > ? pwq_dec_nr_in_flight+0x320/0x320 > worker_thread+0x87/0xb40 > ? __kthread_parkme+0xd1/0x1a0 > ? process_one_work+0x16c0/0x16c0 > kthread+0x35f/0x430 > ? kthread_mod_delayed_work+0x180/0x180 > ret_from_fork+0x22/0x30 > > Fixes: a531975279f3 ("rdma/siw: main include file") > Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") > Fixes: 8b6a361b8c48 ("rdma/siw: receive path") > Fixes: b9be6f18cf9e ("rdma/siw: transmit path") > Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") > Reported-by: Kamal Heib > Reported-by: Yi Zhang > Reported-by: kernel test robot > Signed-off-by: Bernard Metzler Tested-by: Kamal Heib > --- > v2 changes: > - Fix uninitialized variable introduced in siw_qp_rx.c, as > Reported-by: kernel test robot > - Add initial error report as > Reported-by: Kamal Heib > > v3 changes: > - correct patch changelog location > - remove prints for failed queue malloc's as pointed out by > Leon Romanovsky > > drivers/infiniband/sw/siw/siw.h | 2 +- > drivers/infiniband/sw/siw/siw_qp.c | 52 ++++++++++++++++----------- > drivers/infiniband/sw/siw/siw_qp_rx.c | 26 +++++++++----- > drivers/infiniband/sw/siw/siw_qp_tx.c | 4 +-- > drivers/infiniband/sw/siw/siw_verbs.c | 20 +++++++---- > 5 files changed, 66 insertions(+), 38 deletions(-) > > diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h > index e9753831ac3f..6f17392f975a 100644 > --- a/drivers/infiniband/sw/siw/siw.h > +++ b/drivers/infiniband/sw/siw/siw.h > @@ -654,7 +654,7 @@ static inline struct siw_sqe *orq_get_free(struct siw_qp *qp) > { > struct siw_sqe *orq_e = orq_get_tail(qp); > > - if (orq_e && READ_ONCE(orq_e->flags) == 0) > + if (READ_ONCE(orq_e->flags) == 0) > return orq_e; > > return NULL; > diff --git a/drivers/infiniband/sw/siw/siw_qp.c b/drivers/infiniband/sw/siw/siw_qp.c > index 875d36d4b1c6..05fb289c1819 100644 > --- a/drivers/infiniband/sw/siw/siw_qp.c > +++ b/drivers/infiniband/sw/siw/siw_qp.c > @@ -199,26 +199,26 @@ void siw_qp_llp_write_space(struct sock *sk) > > static int siw_qp_readq_init(struct siw_qp *qp, int irq_size, int orq_size) > { > - irq_size = roundup_pow_of_two(irq_size); > - orq_size = roundup_pow_of_two(orq_size); > - > - qp->attrs.irq_size = irq_size; > - qp->attrs.orq_size = orq_size; > - > - qp->irq = vzalloc(irq_size * sizeof(struct siw_sqe)); > - if (!qp->irq) { > - siw_dbg_qp(qp, "irq malloc for %d failed\n", irq_size); > - qp->attrs.irq_size = 0; > - return -ENOMEM; > + if (irq_size) { > + irq_size = roundup_pow_of_two(irq_size); > + qp->irq = vzalloc(irq_size * sizeof(struct siw_sqe)); > + if (!qp->irq) { > + qp->attrs.irq_size = 0; > + return -ENOMEM; > + } > } > - qp->orq = vzalloc(orq_size * sizeof(struct siw_sqe)); > - if (!qp->orq) { > - siw_dbg_qp(qp, "orq malloc for %d failed\n", orq_size); > - qp->attrs.orq_size = 0; > - qp->attrs.irq_size = 0; > - vfree(qp->irq); > - return -ENOMEM; > + if (orq_size) { > + orq_size = roundup_pow_of_two(orq_size); > + qp->orq = vzalloc(orq_size * sizeof(struct siw_sqe)); > + if (!qp->orq) { > + qp->attrs.orq_size = 0; > + qp->attrs.irq_size = 0; > + vfree(qp->irq); > + return -ENOMEM; > + } > } > + qp->attrs.irq_size = irq_size; > + qp->attrs.orq_size = orq_size; > siw_dbg_qp(qp, "ORD %d, IRD %d\n", orq_size, irq_size); > return 0; > } > @@ -288,13 +288,14 @@ int siw_qp_mpa_rts(struct siw_qp *qp, enum mpa_v2_ctrl ctrl) > if (ctrl & MPA_V2_RDMA_WRITE_RTR) > wqe->sqe.opcode = SIW_OP_WRITE; > else if (ctrl & MPA_V2_RDMA_READ_RTR) { > - struct siw_sqe *rreq; > + struct siw_sqe *rreq = NULL; > > wqe->sqe.opcode = SIW_OP_READ; > > spin_lock(&qp->orq_lock); > > - rreq = orq_get_free(qp); > + if (qp->attrs.orq_size) > + rreq = orq_get_free(qp); > if (rreq) { > siw_read_to_orq(rreq, &wqe->sqe); > qp->orq_put++; > @@ -889,6 +890,9 @@ int siw_activate_tx(struct siw_qp *qp) > struct siw_wqe *wqe = tx_wqe(qp); > int rv = 1; > > + if (!qp->attrs.irq_size) > + goto no_irq; > + > irqe = &qp->irq[qp->irq_get % qp->attrs.irq_size]; > > if (irqe->flags & SIW_WQE_VALID) { > @@ -933,6 +937,7 @@ int siw_activate_tx(struct siw_qp *qp) > > goto out; > } > +no_irq: > sqe = sq_get_next(qp); > if (sqe) { > skip_irq: > @@ -971,7 +976,7 @@ int siw_activate_tx(struct siw_qp *qp) > } > spin_lock(&qp->orq_lock); > > - if (!siw_orq_empty(qp)) { > + if (qp->attrs.orq_size && !siw_orq_empty(qp)) { > qp->tx_ctx.orq_fence = 1; > rv = 0; > } > @@ -981,6 +986,11 @@ int siw_activate_tx(struct siw_qp *qp) > wqe->sqe.opcode == SIW_OP_READ_LOCAL_INV) { > struct siw_sqe *rreq; > > + if (unlikely(!qp->attrs.orq_size)) { > + /* We negotiated not to send READ req's */ > + rv = -EINVAL; > + goto out; > + } > wqe->sqe.num_sge = 1; > > spin_lock(&qp->orq_lock); > diff --git a/drivers/infiniband/sw/siw/siw_qp_rx.c b/drivers/infiniband/sw/siw/siw_qp_rx.c > index 4bd1f1f84057..60116f20653c 100644 > --- a/drivers/infiniband/sw/siw/siw_qp_rx.c > +++ b/drivers/infiniband/sw/siw/siw_qp_rx.c > @@ -680,6 +680,10 @@ static int siw_init_rresp(struct siw_qp *qp, struct siw_rx_stream *srx) > } > spin_lock_irqsave(&qp->sq_lock, flags); > > + if (unlikely(!qp->attrs.irq_size)) { > + run_sq = 0; > + goto error_irq; > + } > if (tx_work->wr_status == SIW_WR_IDLE) { > /* > * immediately schedule READ response w/o > @@ -712,8 +716,9 @@ static int siw_init_rresp(struct siw_qp *qp, struct siw_rx_stream *srx) > /* RRESP now valid as current TX wqe or placed into IRQ */ > smp_store_mb(resp->flags, SIW_WQE_VALID); > } else { > - pr_warn("siw: [QP %u]: irq %d exceeded %d\n", qp_id(qp), > - qp->irq_put % qp->attrs.irq_size, qp->attrs.irq_size); > +error_irq: > + pr_warn("siw: [QP %u]: IRQ exceeded or null, size %d\n", > + qp_id(qp), qp->attrs.irq_size); > > siw_init_terminate(qp, TERM_ERROR_LAYER_RDMAP, > RDMAP_ETYPE_REMOTE_OPERATION, > @@ -740,6 +745,9 @@ static int siw_orqe_start_rx(struct siw_qp *qp) > struct siw_sqe *orqe; > struct siw_wqe *wqe = NULL; > > + if (unlikely(!qp->attrs.orq_size)) > + return -EPROTO; > + > /* make sure ORQ indices are current */ > smp_mb(); > > @@ -796,8 +804,8 @@ int siw_proc_rresp(struct siw_qp *qp) > */ > rv = siw_orqe_start_rx(qp); > if (rv) { > - pr_warn("siw: [QP %u]: ORQ empty at idx %d\n", > - qp_id(qp), qp->orq_get % qp->attrs.orq_size); > + pr_warn("siw: [QP %u]: ORQ empty, size %d\n", > + qp_id(qp), qp->attrs.orq_size); > goto error_term; > } > rv = siw_rresp_check_ntoh(srx, frx); > @@ -1290,11 +1298,13 @@ static int siw_rdmap_complete(struct siw_qp *qp, int error) > wc_status); > siw_wqe_put_mem(wqe, SIW_OP_READ); > > - if (!error) > + if (!error) { > rv = siw_check_tx_fence(qp); > - else > - /* Disable current ORQ eleement */ > - WRITE_ONCE(orq_get_current(qp)->flags, 0); > + } else { > + /* Disable current ORQ element */ > + if (qp->attrs.orq_size) > + WRITE_ONCE(orq_get_current(qp)->flags, 0); > + } > break; > > case RDMAP_RDMA_READ_REQ: > diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c > index d19d8325588b..7989c4043db4 100644 > --- a/drivers/infiniband/sw/siw/siw_qp_tx.c > +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c > @@ -1107,8 +1107,8 @@ int siw_qp_sq_process(struct siw_qp *qp) > /* > * RREQ may have already been completed by inbound RRESP! > */ > - if (tx_type == SIW_OP_READ || > - tx_type == SIW_OP_READ_LOCAL_INV) { > + if ((tx_type == SIW_OP_READ || > + tx_type == SIW_OP_READ_LOCAL_INV) && qp->attrs.orq_size) { > /* Cleanup pending entry in ORQ */ > qp->orq_put--; > qp->orq[qp->orq_put % qp->attrs.orq_size].flags = 0; > diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c > index 7cf3242ffb41..fb25e8011f5a 100644 > --- a/drivers/infiniband/sw/siw/siw_verbs.c > +++ b/drivers/infiniband/sw/siw/siw_verbs.c > @@ -362,13 +362,23 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd, > if (rv) > goto err_out; > > + num_sqe = attrs->cap.max_send_wr; > + num_rqe = attrs->cap.max_recv_wr; > + > /* All queue indices are derived from modulo operations > * on a free running 'get' (consumer) and 'put' (producer) > * unsigned counter. Having queue sizes at power of two > * avoids handling counter wrap around. > */ > - num_sqe = roundup_pow_of_two(attrs->cap.max_send_wr); > - num_rqe = roundup_pow_of_two(attrs->cap.max_recv_wr); > + if (num_sqe) > + num_sqe = roundup_pow_of_two(num_sqe); > + else { > + /* Zero sized SQ is not supported */ > + rv = -EINVAL; > + goto err_out; > + } > + if (num_rqe) > + num_rqe = roundup_pow_of_two(num_rqe); > > if (udata) > qp->sendq = vmalloc_user(num_sqe * sizeof(struct siw_sqe)); > @@ -376,7 +386,6 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd, > qp->sendq = vzalloc(num_sqe * sizeof(struct siw_sqe)); > > if (qp->sendq == NULL) { > - siw_dbg(base_dev, "SQ size %d alloc failed\n", num_sqe); > rv = -ENOMEM; > goto err_out_xa; > } > @@ -410,7 +419,6 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd, > qp->recvq = vzalloc(num_rqe * sizeof(struct siw_rqe)); > > if (qp->recvq == NULL) { > - siw_dbg(base_dev, "RQ size %d alloc failed\n", num_rqe); > rv = -ENOMEM; > goto err_out_xa; > } > @@ -960,9 +968,9 @@ int siw_post_receive(struct ib_qp *base_qp, const struct ib_recv_wr *wr, > unsigned long flags; > int rv = 0; > > - if (qp->srq) { > + if (qp->srq || qp->attrs.rq_size == 0) { > *bad_wr = wr; > - return -EOPNOTSUPP; /* what else from errno.h? */ > + return -EINVAL; > } > if (!rdma_is_kernel_res(&qp->base_qp.res)) { > siw_dbg_qp(qp, "no kernel post_recv for user mapped rq\n"); > -- > 2.17.2 >