From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61794EB64DC for ; Sun, 2 Jul 2023 02:28:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229500AbjGBC2j (ORCPT ); Sat, 1 Jul 2023 22:28:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjGBC2i (ORCPT ); Sat, 1 Jul 2023 22:28:38 -0400 Received: from out-14.mta0.migadu.com (out-14.mta0.migadu.com [91.218.175.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B714170A for ; Sat, 1 Jul 2023 19:28:35 -0700 (PDT) Message-ID: <59531bbf-7842-3fea-debf-08785b59bf64@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1688264913; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZdsRK3sJrL5Aeu3mUDnruaq2OzLDkNa2UVdUgxPUKlM=; b=d1B413M4VSCCZS2M8dO2PO6dBmXiAri/abHT6godCftdHXEHGQmEAYRIfgVvwnfLPsWjHX 5ZbPo93G2RBXYYVZzR1/lsbnlbbqh58zjaFJBGdv0UPQfjOT+vrJR7IfQCugHwL1/mEPzU Mekj/VDm14ZBHCMXUtu/eV/lfnS3//M= Date: Sun, 2 Jul 2023 10:28:26 +0800 MIME-Version: 1.0 Subject: Re: [PATCH for-next] RDMA/rxe: Fix freeing busy objects X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun To: Bob Pearson , jgg@nvidia.com, zyjzyj2000@gmail.com, frank.zago@hpe.com, ian.ziemba@hpe.com, jhack@hpe.com, linux-rdma@vger.kernel.org References: <20230630163827.95373-1-rpearsonhpe@gmail.com> <83abcbad-64f4-6829-64e4-20ce55eb6ab4@gmail.com> <77743769-ae5b-c174-e6f7-bb96066a250d@linux.dev> <9519b3cd-ad46-3bb4-4bba-3df04644215e@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org 在 2023/7/2 10:00, Zhu Yanjun 写道: > > 在 2023/7/2 1:59, Bob Pearson 写道: >> On 7/1/23 10:37, Zhu Yanjun wrote: >>> 在 2023/7/1 22:48, Bob Pearson 写道: >>>> On 7/1/23 01:50, Zhu Yanjun wrote: >>>>> 在 2023/7/1 0:38, Bob Pearson 写道: >>>>>> Currently the rxe driver calls wait_for_completion_timeout() in >>>>>> rxe_complete() to wait until all the references to the object have >>>>>> been freed before final cleanup and returning to the rdma-core >>>>>> destroy verb. If this does not happen within the timeout interval >>>>>> it prints a WARN_ON and returns to the 'destroy' verb caller >>>>>> without any indication that there was an error. This is incorrect. >>>>>> >>>>>> A very heavily loaded system can take an arbitrarily long time to >>>>>> complete the work needed before freeing all the references with no >>>>>> guarantees of performance within a specific time. This has been >>>>>> observed in high stress high scale testing of the rxe driver. >>>>>> >>>>>> Another frequent cause of these timeouts is due to ref counting bugs >>>>>> introduced by changes in the driver so it is helpful to continue >>>>>> to report the timeouts. >>>>>> >>>>>> This patch puts the completion timeout call in a loop with a 10 >>>>>> second >>>>>> timeout and issues a WARN_ON each pass through the loop. The case >>>>>> for objects that cannot sleep is treated similarly. It also changes >>>>>> the type of the rxe_cleanup() subroutine to void and fixes calls to >>>>>> reflect this API change. This is better aligned with the code in >>>>>> rdma-core which sometimes fails to check the return value of >>>>>> destroy verb calls assuming they will always succeed. Specifically >>>>>> this is the case for kernel qp's. >>>>> Hi, Bob >>>>> >>>>> You change the timeout to 10s in this commit. Based on >>>>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes >>>>> >>>>> Can you describe the pros and cons of this change? >>>>> If there are some data to support your change, it is better. >>>> The problem is that a heavily loaded system can take a long time to >>>> complete work. >>>> This is not a bug per se so it would be wrong to terminate IO in >>>> progress. But on the >>>> other hand it is most likely not going to recover so it helps to >>>> report the condition. >>>> This patch replaces an error exit with a wait as long as necessary, >>>> or until someone >>>> intervenes and reboots the system. The previous timeout code issued >>>> a WARN_ON in about >>>> 0.2 seconds and then exiting leaving a messy situation that wasn't >>>> going to get fixed. >>>> The new version issues a WARN_ON every 10 seconds until the >>>> operation succeeds or >>>> the system is rebooted. This is long enough that is won't clog the >>>> logs but often enough >>>> to get noticed. >>> >>> To fix this problem, you changes too much. >>> >>> I received the trainings from different companies. In the >>> trainings,  an important code conduct is: modify the minimum source >>> codes to fix a problem. >>> >>> Too much changes will introduce risks. This is to a commericial >>> software products. I am not sure whether it is good to an open >>> source project or not. >>> >>> And if we need to changes too much, according to >>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes, >>> it had better split the big commit to several commits. >>> >>> One commit is to fix one problem. >> This isn't a complicated patch. Basically, because rdma-core mostly >> ignores return codes from ib_destroy_qp I changed >> the rxe_cleanup function to a void. This required changing the calls >> in rxe-verbs.c to not check the return code, >> a trivial change. Finally since I can't return an error I just wait >> in a loop calling wait_for_completion_timeout(). >> That's it. There is one change the rest is required to adapt to it. > > As a viewer, I want to know: > > 1. Just change the timeout value from 0.2seconds to 10 seconds, does > it solve the problem? If yes,  Just changing the timeout value can fix > the problem. > > So I think it can be a commit. > > 2. Based on the first commit,  because rdma-core mostly ignores return > codes from ib_destroy_qp I changed the rxe_cleanup function to a void. > >  This required changing the calls in rxe-verbs.c to not check the > return code,  a trivial change. This can be the 2nd commit? > > 3. Finally since I can't return an error I just wait in a loop calling > wait_for_completion_timeout(). This can be the 3rd commit? > > Just my 2 cents. > > Your commit can fix this problem. Thanks a lot. I think this commit is OK. Let us wait for the comments from Jason. Zhu Yanjun > > Reviewed-by: Zhu Yanjun > > Zhu Yanjun > >>> Just my 2 cents. >>> >>> Zhu Yanjun >>> >>>> If rdma-core handled error returns from ib_destroy_qp correctly a >>>> cleaner solution might >>>> be possible but it doesn't. This approach is similar to the soft >>>> lockup warnings from Linux. >>>> >>>> This change is a result of very high load stress testing causing >>>> the current version to fail. >>>> >>>> Bob >>>>> Thanks >>>>> Zhu Yanjun >>>>> >>>>>> Fixes: 215d0a755e1b ("RDMA/rxe: Stop lookup of partially built >>>>>> objects") >>>>>> Signed-off-by: Bob Pearson >>>>>> --- >>>>>>     drivers/infiniband/sw/rxe/rxe_pool.c  |  39 ++++------ >>>>>>     drivers/infiniband/sw/rxe/rxe_pool.h  |   2 +- >>>>>>     drivers/infiniband/sw/rxe/rxe_verbs.c | 108 >>>>>> ++++++++++---------------- >>>>>>     3 files changed, 56 insertions(+), 93 deletions(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> index 6215c6de3a84..819dc30a7a96 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>>>>> @@ -6,7 +6,7 @@ >>>>>>       #include "rxe.h" >>>>>>     -#define RXE_POOL_TIMEOUT    (200) >>>>>> +#define RXE_POOL_TIMEOUT    (10000)    /* 10 seconds */ >>>>>>     #define RXE_POOL_ALIGN        (16) >>>>>>       static const struct rxe_type_info { >>>>>> @@ -171,15 +171,16 @@ static void rxe_elem_release(struct kref >>>>>> *kref) >>>>>>     { >>>>>>         struct rxe_pool_elem *elem = container_of(kref, >>>>>> typeof(*elem), ref_cnt); >>>>>>     -    complete(&elem->complete); >>>>>> +    complete_all(&elem->complete); >>>>>>     } >>>>>>     -int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable) >>>>>> +void __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable) >>>>>>     { >>>>>>         struct rxe_pool *pool = elem->pool; >>>>>>         struct xarray *xa = &pool->xa; >>>>>> -    static int timeout = RXE_POOL_TIMEOUT; >>>>>> -    int ret, err = 0; >>>>>> +    int timeout = RXE_POOL_TIMEOUT; >>>>>> +    unsigned long until; >>>>>> +    int ret; >>>>>>         void *xa_ret; >>>>>>           if (sleepable) >>>>>> @@ -202,39 +203,31 @@ int __rxe_cleanup(struct rxe_pool_elem >>>>>> *elem, bool sleepable) >>>>>>          * return to rdma-core >>>>>>          */ >>>>>>         if (sleepable) { >>>>>> -        if (!completion_done(&elem->complete) && timeout) { >>>>>> +        while (!completion_done(&elem->complete) && timeout) { >>>>>>                 ret = wait_for_completion_timeout(&elem->complete, >>>>>>                         timeout); >>>>>> - >>>>>> -            /* Shouldn't happen. There are still references to >>>>>> -             * the object but, rather than deadlock, free the >>>>>> -             * object or pass back to rdma-core. >>>>>> -             */ >>>>>> -            if (WARN_ON(!ret)) >>>>>> -                err = -EINVAL; >>>>>> +            WARN_ON(!ret); >>>>>>             } >>>>>>         } else { >>>>>> -        unsigned long until = jiffies + timeout; >>>>>> - >>>>>>             /* AH objects are unique in that the destroy_ah verb >>>>>>              * can be called in atomic context. This delay >>>>>>              * replaces the wait_for_completion call above >>>>>>              * when the destroy_ah call is not sleepable >>>>>>              */ >>>>>> -        while (!completion_done(&elem->complete) && >>>>>> -                time_before(jiffies, until)) >>>>>> -            mdelay(1); >>>>>> - >>>>>> -        if (WARN_ON(!completion_done(&elem->complete))) >>>>>> -            err = -EINVAL; >>>>>> +        while (!completion_done(&elem->complete) && timeout) { >>>>>> +            until = jiffies + timeout; >>>>>> +            while (!completion_done(&elem->complete) && >>>>>> +                   time_before(jiffies, until)) { >>>>>> +                mdelay(10); >>>>>> +            } >>>>>> + WARN_ON(!completion_done(&elem->complete)); >>>>>> +        } >>>>>>         } >>>>>>           if (pool->cleanup) >>>>>>             pool->cleanup(elem); >>>>>>           atomic_dec(&pool->num_elem); >>>>>> - >>>>>> -    return err; >>>>>>     } >>>>>>       int __rxe_get(struct rxe_pool_elem *elem) >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h >>>>>> b/drivers/infiniband/sw/rxe/rxe_pool.h >>>>>> index b42e26427a70..14facdb45aad 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.h >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h >>>>>> @@ -70,7 +70,7 @@ int __rxe_get(struct rxe_pool_elem *elem); >>>>>>     int __rxe_put(struct rxe_pool_elem *elem); >>>>>>     #define rxe_put(obj) __rxe_put(&(obj)->elem) >>>>>>     -int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable); >>>>>> +void __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable); >>>>>>     #define rxe_cleanup(obj) __rxe_cleanup(&(obj)->elem, true) >>>>>>     #define rxe_cleanup_ah(obj, sleepable) >>>>>> __rxe_cleanup(&(obj)->elem, sleepable) >>>>>>     diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c >>>>>> b/drivers/infiniband/sw/rxe/rxe_verbs.c >>>>>> index f4321a172000..a5e639ee2217 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c >>>>>> @@ -218,11 +218,8 @@ static int rxe_alloc_ucontext(struct >>>>>> ib_ucontext *ibuc, struct ib_udata *udata) >>>>>>     static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc) >>>>>>     { >>>>>>         struct rxe_ucontext *uc = to_ruc(ibuc); >>>>>> -    int err; >>>>>>     -    err = rxe_cleanup(uc); >>>>>> -    if (err) >>>>>> -        rxe_err_uc(uc, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(uc); >>>>>>     } >>>>>>       /* pd */ >>>>>> @@ -248,11 +245,8 @@ static int rxe_alloc_pd(struct ib_pd *ibpd, >>>>>> struct ib_udata *udata) >>>>>>     static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata >>>>>> *udata) >>>>>>     { >>>>>>         struct rxe_pd *pd = to_rpd(ibpd); >>>>>> -    int err; >>>>>>     -    err = rxe_cleanup(pd); >>>>>> -    if (err) >>>>>> -        rxe_err_pd(pd, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(pd); >>>>>>           return 0; >>>>>>     } >>>>>> @@ -265,7 +259,7 @@ static int rxe_create_ah(struct ib_ah *ibah, >>>>>>         struct rxe_dev *rxe = to_rdev(ibah->device); >>>>>>         struct rxe_ah *ah = to_rah(ibah); >>>>>>         struct rxe_create_ah_resp __user *uresp = NULL; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (udata) { >>>>>>             /* test if new user provider */ >>>>>> @@ -312,9 +306,7 @@ static int rxe_create_ah(struct ib_ah *ibah, >>>>>>         return 0; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(ah); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_ah(ah, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(ah); >>>>>>     err_out: >>>>>>         rxe_err_ah(ah, "returned err = %d", err); >>>>>>         return err; >>>>>> @@ -354,11 +346,8 @@ static int rxe_query_ah(struct ib_ah *ibah, >>>>>> struct rdma_ah_attr *attr) >>>>>>     static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags) >>>>>>     { >>>>>>         struct rxe_ah *ah = to_rah(ibah); >>>>>> -    int err; >>>>>>     -    err = rxe_cleanup_ah(ah, flags & >>>>>> RDMA_DESTROY_AH_SLEEPABLE); >>>>>> -    if (err) >>>>>> -        rxe_err_ah(ah, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup_ah(ah, flags & RDMA_DESTROY_AH_SLEEPABLE); >>>>>>           return 0; >>>>>>     } >>>>>> @@ -371,12 +360,12 @@ static int rxe_create_srq(struct ib_srq >>>>>> *ibsrq, struct ib_srq_init_attr *init, >>>>>>         struct rxe_pd *pd = to_rpd(ibsrq->pd); >>>>>>         struct rxe_srq *srq = to_rsrq(ibsrq); >>>>>>         struct rxe_create_srq_resp __user *uresp = NULL; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (udata) { >>>>>>             if (udata->outlen < sizeof(*uresp)) { >>>>>>                 err = -EINVAL; >>>>>> -            rxe_err_dev(rxe, "malformed udata"); >>>>>> +            rxe_dbg_dev(rxe, "malformed udata"); >>>>>>                 goto err_out; >>>>>>             } >>>>>>             uresp = udata->outbuf; >>>>>> @@ -413,9 +402,7 @@ static int rxe_create_srq(struct ib_srq >>>>>> *ibsrq, struct ib_srq_init_attr *init, >>>>>>         return 0; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(srq); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_srq(srq, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(srq); >>>>>>     err_out: >>>>>>         rxe_err_dev(rxe, "returned err = %d", >>>>>> err);5f004bcaee4cb552cf1b46a50 >>>>>>         return err; >>>>>> @@ -514,11 +501,8 @@ static int rxe_post_srq_recv(struct ib_srq >>>>>> *ibsrq, const struct ib_recv_wr *wr, >>>>>>     static int rxe_destroy_srq(struct ib_srq *ibsrq, struct >>>>>> ib_udata *udata) >>>>>>     { >>>>>>         struct rxe_srq *srq = to_rsrq(ibsrq); >>>>>> -    int err; >>>>>>     -    err = rxe_cleanup(srq); >>>>>> -    if (err) >>>>>> -        rxe_err_srq(srq, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(srq); >>>>>>           return 0; >>>>>>     } >>>>>> @@ -531,7 +515,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, >>>>>> struct ib_qp_init_attr *init, >>>>>>         struct rxe_pd *pd = to_rpd(ibqp->pd); >>>>>>         struct rxe_qp *qp = to_rqp(ibqp); >>>>>>         struct rxe_create_qp_resp __user *uresp = NULL; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (udata) { >>>>>>             if (udata->inlen) { >>>>>> @@ -580,9 +564,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, >>>>>> struct ib_qp_init_attr *init, >>>>>>         return 0; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(qp); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_qp(qp, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(qp); >>>>>>     err_out: >>>>>>         rxe_err_dev(rxe, "returned err = %d", err); >>>>>>         return err; >>>>>> @@ -648,9 +630,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, >>>>>> struct ib_udata *udata) >>>>>>             goto err_out; >>>>>>         } >>>>>>     -    err = rxe_cleanup(qp); >>>>>> -    if (err) >>>>>> -        rxe_err_qp(qp, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(qp); >>>>>>           return 0; >>>>>>     @@ -675,12 +655,12 @@ static int validate_send_wr(struct >>>>>> rxe_qp *qp, const struct ib_send_wr *ibwr, >>>>>>         do { >>>>>>             mask = wr_opcode_mask(ibwr->opcode, qp); >>>>>>             if (!mask) { >>>>>> -            rxe_err_qp(qp, "bad wr opcode for qp type"); >>>>>> +            rxe_dbg_qp(qp, "bad wr opcode for qp type"); >>>>>>                 break; >>>>>>             } >>>>>>               if (num_sge > sq->max_sge) { >>>>>> -            rxe_err_qp(qp, "num_sge > max_sge"); >>>>>> +            rxe_dbg_qp(qp, "num_sge > max_sge"); >>>>>>                 break; >>>>>>             } >>>>>>     @@ -689,27 +669,27 @@ static int validate_send_wr(struct >>>>>> rxe_qp *qp, const struct ib_send_wr *ibwr, >>>>>>                 length += ibwr->sg_list[i].length; >>>>>>               if (length > (1UL << 31)) { >>>>>> -            rxe_err_qp(qp, "message length too long"); >>>>>> +            rxe_dbg_qp(qp, "message length too long"); >>>>>>                 break; >>>>>>             } >>>>>>               if (mask & WR_ATOMIC_MASK) { >>>>>>                 if (length != 8) { >>>>>> -                rxe_err_qp(qp, "atomic length != 8"); >>>>>> +                rxe_dbg_qp(qp, "atomic length != 8"); >>>>>>                     break; >>>>>>                 } >>>>>>                 if (atomic_wr(ibwr)->remote_addr & 0x7) { >>>>>> -                rxe_err_qp(qp, "misaligned atomic address"); >>>>>> +                rxe_dbg_qp(qp, "misaligned atomic address"); >>>>>>                     break; >>>>>>                 } >>>>>>             } >>>>>>             if (ibwr->send_flags & IB_SEND_INLINE) { >>>>>>                 if (!(mask & WR_INLINE_MASK)) { >>>>>> -                rxe_err_qp(qp, "opcode doesn't support inline >>>>>> data"); >>>>>> +                rxe_dbg_qp(qp, "opcode doesn't support inline >>>>>> data"); >>>>>>                     break; >>>>>>                 } >>>>>>                 if (length > sq->max_inline) { >>>>>> -                rxe_err_qp(qp, "inline length too big"); >>>>>> +                rxe_dbg_qp(qp, "inline length too big"); >>>>>>                     break; >>>>>>                 } >>>>>>             } >>>>>> @@ -747,7 +727,7 @@ static int init_send_wr(struct rxe_qp *qp, >>>>>> struct rxe_send_wr *wr, >>>>>>             case IB_WR_SEND: >>>>>>                 break; >>>>>>             default: >>>>>> -            rxe_err_qp(qp, "bad wr opcode %d for UD/GSI QP", >>>>>> +            rxe_dbg_qp(qp, "bad wr opcode %d for UD/GSI QP", >>>>>>                         wr->opcode); >>>>>>                 return -EINVAL; >>>>>>             } >>>>>> @@ -795,7 +775,7 @@ static int init_send_wr(struct rxe_qp *qp, >>>>>> struct rxe_send_wr *wr, >>>>>>             case IB_WR_ATOMIC_WRITE: >>>>>>                 break; >>>>>>             default: >>>>>> -            rxe_err_qp(qp, "unsupported wr opcode %d", >>>>>> +            rxe_dbg_qp(qp, "unsupported wr opcode %d", >>>>>>                         wr->opcode); >>>>>>                 return -EINVAL; >>>>>>                 break; >>>>>> @@ -871,7 +851,7 @@ static int post_one_send(struct rxe_qp *qp, >>>>>> const struct ib_send_wr *ibwr) >>>>>>           full = queue_full(sq->queue, QUEUE_TYPE_FROM_ULP); >>>>>>         if (unlikely(full)) { >>>>>> -        rxe_err_qp(qp, "send queue full"); >>>>>> +        rxe_dbg_qp(qp, "send queue full"); >>>>>>             return -ENOMEM; >>>>>>         } >>>>>>     @@ -923,14 +903,14 @@ static int rxe_post_send(struct ib_qp >>>>>> *ibqp, const struct ib_send_wr *wr, >>>>>>         /* caller has already called destroy_qp */ >>>>>>         if (WARN_ON_ONCE(!qp->valid)) { >>>>>> spin_unlock_irqrestore(&qp->state_lock, flags); >>>>>> -        rxe_err_qp(qp, "qp has been destroyed"); >>>>>> +        rxe_dbg_qp(qp, "qp has been destroyed"); >>>>>>             return -EINVAL; >>>>>>         } >>>>>>           if (unlikely(qp_state(qp) < IB_QPS_RTS)) { >>>>>> spin_unlock_irqrestore(&qp->state_lock, flags); >>>>>>             *bad_wr = wr; >>>>>> -        rxe_err_qp(qp, "qp not ready to send"); >>>>>> +        rxe_dbg_qp(qp, "qp not ready to send"); >>>>>>             return -EINVAL; >>>>>>         } >>>>>>         spin_unlock_irqrestore(&qp->state_lock, flags); >>>>>> @@ -997,7 +977,7 @@ static int post_one_recv(struct rxe_rq *rq, >>>>>> const struct ib_recv_wr *ibwr) >>>>>>         return 0; >>>>>>       err_out: >>>>>> -    rxe_dbg("returned err = %d", err); >>>>>> +    rxe_err("returned err = %d", err); >>>>>>         return err; >>>>>>     } >>>>>>     @@ -1013,7 +993,7 @@ static int rxe_post_recv(struct ib_qp >>>>>> *ibqp, const struct ib_recv_wr *wr, >>>>>>         /* caller has already called destroy_qp */ >>>>>>         if (WARN_ON_ONCE(!qp->valid)) { >>>>>> spin_unlock_irqrestore(&qp->state_lock, flags); >>>>>> -        rxe_err_qp(qp, "qp has been destroyed"); >>>>>> +        rxe_dbg_qp(qp, "qp has been destroyed"); >>>>>>             return -EINVAL; >>>>>>         } >>>>>>     @@ -1061,7 +1041,7 @@ static int rxe_create_cq(struct ib_cq >>>>>> *ibcq, const struct ib_cq_init_attr *attr, >>>>>>         struct rxe_dev *rxe = to_rdev(dev); >>>>>>         struct rxe_cq *cq = to_rcq(ibcq); >>>>>>         struct rxe_create_cq_resp __user *uresp = NULL; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (udata) { >>>>>>             if (udata->outlen < sizeof(*uresp)) { >>>>>> @@ -1100,9 +1080,7 @@ static int rxe_create_cq(struct ib_cq >>>>>> *ibcq, const struct ib_cq_init_attr *attr, >>>>>>         return 0; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(cq); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_cq(cq, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(cq); >>>>>>     err_out: >>>>>>         rxe_err_dev(rxe, "returned err = %d", err); >>>>>>         return err; >>>>>> @@ -1207,9 +1185,7 @@ static int rxe_destroy_cq(struct ib_cq >>>>>> *ibcq, struct ib_udata *udata) >>>>>>             goto err_out; >>>>>>         } >>>>>>     -    err = rxe_cleanup(cq); >>>>>> -    if (err) >>>>>> -        rxe_err_cq(cq, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(cq); >>>>>>           return 0; >>>>>>     @@ -1257,10 +1233,10 @@ static struct ib_mr >>>>>> *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, >>>>>>         struct rxe_dev *rxe = to_rdev(ibpd->device); >>>>>>         struct rxe_pd *pd = to_rpd(ibpd); >>>>>>         struct rxe_mr *mr; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (access & ~RXE_ACCESS_SUPPORTED_MR) { >>>>>> -        rxe_err_pd(pd, "access = %#x not supported (%#x)", access, >>>>>> +        rxe_dbg_pd(pd, "access = %#x not supported (%#x)", access, >>>>>>                     RXE_ACCESS_SUPPORTED_MR); >>>>>>             return ERR_PTR(-EOPNOTSUPP); >>>>>>         } >>>>>> @@ -1289,9 +1265,7 @@ static struct ib_mr *rxe_reg_user_mr(struct >>>>>> ib_pd *ibpd, u64 start, >>>>>>         return &mr->ibmr; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(mr); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(mr); >>>>>>     err_free: >>>>>>         kfree(mr); >>>>>>         rxe_err_pd(pd, "returned err = %d", err); >>>>>> @@ -1311,7 +1285,7 @@ static struct ib_mr >>>>>> *rxe_rereg_user_mr(struct ib_mr *ibmr, int flags, >>>>>>          * rereg_pd and rereg_access >>>>>>          */ >>>>>>         if (flags & ~RXE_MR_REREG_SUPPORTED) { >>>>>> -        rxe_err_mr(mr, "flags = %#x not supported", flags); >>>>>> +        rxe_dbg_mr(mr, "flags = %#x not supported", flags); >>>>>>             return ERR_PTR(-EOPNOTSUPP); >>>>>>         } >>>>>>     @@ -1323,7 +1297,7 @@ static struct ib_mr >>>>>> *rxe_rereg_user_mr(struct ib_mr *ibmr, int flags, >>>>>>           if (flags & IB_MR_REREG_ACCESS) { >>>>>>             if (access & ~RXE_ACCESS_SUPPORTED_MR) { >>>>>> -            rxe_err_mr(mr, "access = %#x not supported", access); >>>>>> +            rxe_dbg_mr(mr, "access = %#x not supported", access); >>>>>>                 return ERR_PTR(-EOPNOTSUPP); >>>>>>             } >>>>>>             mr->access = access; >>>>>> @@ -1338,7 +1312,7 @@ static struct ib_mr *rxe_alloc_mr(struct >>>>>> ib_pd *ibpd, enum ib_mr_type mr_type, >>>>>>         struct rxe_dev *rxe = to_rdev(ibpd->device); >>>>>>         struct rxe_pd *pd = to_rpd(ibpd); >>>>>>         struct rxe_mr *mr; >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           if (mr_type != IB_MR_TYPE_MEM_REG) { >>>>>>             err = -EINVAL; >>>>>> @@ -1369,9 +1343,7 @@ static struct ib_mr *rxe_alloc_mr(struct >>>>>> ib_pd *ibpd, enum ib_mr_type mr_type, >>>>>>         return &mr->ibmr; >>>>>>       err_cleanup: >>>>>> -    cleanup_err = rxe_cleanup(mr); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_mr(mr, "cleanup failed, err = %d", err); >>>>>> +    rxe_cleanup(mr); >>>>>>     err_free: >>>>>>         kfree(mr); >>>>>>     err_out: >>>>>> @@ -1382,7 +1354,7 @@ static struct ib_mr *rxe_alloc_mr(struct >>>>>> ib_pd *ibpd, enum ib_mr_type mr_type, >>>>>>     static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata >>>>>> *udata) >>>>>>     { >>>>>>         struct rxe_mr *mr = to_rmr(ibmr); >>>>>> -    int err, cleanup_err; >>>>>> +    int err; >>>>>>           /* See IBA 10.6.7.2.6 */ >>>>>>         if (atomic_read(&mr->num_mw) > 0) { >>>>>> @@ -1391,9 +1363,7 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, >>>>>> struct ib_udata *udata) >>>>>>             goto err_out; >>>>>>         } >>>>>>     -    cleanup_err = rxe_cleanup(mr); >>>>>> -    if (cleanup_err) >>>>>> -        rxe_err_mr(mr, "cleanup failed, err = %d", cleanup_err); >>>>>> +    rxe_cleanup(mr); >>>>>>           kfree_rcu(mr); >>>>>>         return 0; >>>>>> @@ -1524,7 +1494,7 @@ int rxe_register_device(struct rxe_dev >>>>>> *rxe, const char *ibdev_name) >>>>>>           err = ib_register_device(dev, ibdev_name, NULL); >>>>>>         if (err) >>>>>> -        rxe_dbg_dev(rxe, "failed with error %d\n", err); >>>>>> +        rxe_err_dev(rxe, "failed with error %d\n", err); >>>>>>           /* >>>>>>          * Note that rxe may be invalid at this point if another >>>>>> thread >>>>>> >>>>>> base-commit: 5f004bcaee4cb552cf1b46a505f18f08777db7e5