From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8145C433E1 for ; Tue, 18 Aug 2020 01:48:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 82A2C2075E for ; Tue, 18 Aug 2020 01:48:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TGjfP9pO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726302AbgHRBsw (ORCPT ); Mon, 17 Aug 2020 21:48:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726297AbgHRBst (ORCPT ); Mon, 17 Aug 2020 21:48:49 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92367C061389 for ; Mon, 17 Aug 2020 18:48:48 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id f10so8452314plj.8 for ; Mon, 17 Aug 2020 18:48:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=/SsAuzgpdvf2qLoRaGI7sdBasfnROtNutplpDFBFz4A=; b=TGjfP9pOvpP5UkJqyQs4TUhkHhuFmMbKTdzBdY6xDZqpei8+B7pr7AOH+gbgBwBzSJ ePIEBgfQL1L0R96Jcf8n7I3RkU2emnKUO03I9v5WXj18DVkWFLMm/LLpZxZGyJqERP+t wIx99fXw6Y7QaFPiFo+yI2efPr9C2tLhRykb5cCOFtojTymVa6M0vg9uBstTR8WXlWbf wLv1kUKH4evdgcZeRxhyUF+75mGXcqmqYMve+KtQvH+Y7XQxGMtP1/kpba96FRICPoTk 96+o4Rw2sZuVQb/0tCCKLF/DSgdQ9iJq+O0v4v78XiE5XZ00Q5MGGqSOhryZVduavguc BQRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=/SsAuzgpdvf2qLoRaGI7sdBasfnROtNutplpDFBFz4A=; b=ZchCuxJyse9VcriFLPz4WlDVxEB1BU08pkqIk2stMf06uKPS37Gc2v3hJCiUQ5EiXW uRDv3vSHtgqWRs5so4tdtjmC5Boh1z0ocBmUkmqXM5x9zPwD3wzjRuCxMbieAX/mzFtg rjGYn11+Ry5Xq42LpI47KcY6fHx+j4m6GDiK6QtsVItRn4F99jakZvUGbnRp2J6e2wNJ tm54E/bvF9wHr35tLiRTn+L7sqr4B9+pCSq0eGUaEG5OUydKUdZ2t7c3zVQ7ElwNPiCY vAtNRr/v0PCnMd1vDdEKS1k0RCrTdc7i2l2ZvhyOuQ/yHI49csDnPGvS8GlSuF2PJ8eN KF/w== X-Gm-Message-State: AOAM531GGpxjGuW2pLVawSRGvRimLgb5tE5PBaT8lHLSRexrfdCU9w0L BQsvTmsdSj80UC7Fw+iVnuM= X-Google-Smtp-Source: ABdhPJxuZczmyaKZebfYL9bsUbWaienHZNE1uwuXj+RfcF7Ydnd23r2jQo/Syq+wwmgEBDGHb1cpXg== X-Received: by 2002:a17:90a:b88c:: with SMTP id o12mr10956476pjr.187.1597715327782; Mon, 17 Aug 2020 18:48:47 -0700 (PDT) Received: from [10.75.201.17] ([118.201.220.138]) by smtp.gmail.com with ESMTPSA id na14sm18860086pjb.6.2020.08.17.18.48.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 Aug 2020 18:48:47 -0700 (PDT) Subject: Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() To: Kamal Heib Cc: linux-rdma@vger.kernel.org, Doug Ledford , Jason Gunthorpe References: <20200812111447.256822-1-kamalheib1@gmail.com> <9701a68d-c377-474a-5f65-c4e045a67e11@gmail.com> <20200816221236.GA821081@kheib-workstation> From: Zhu Yanjun Message-ID: Date: Tue, 18 Aug 2020 09:48:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.1.1 MIME-Version: 1.0 In-Reply-To: <20200816221236.GA821081@kheib-workstation> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On 8/17/2020 6:12 AM, Kamal Heib wrote: > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: >> On 8/12/2020 7:14 PM, Kamal Heib wrote: >>> To avoid the following kernel panic when calling kmem_cache_create() >>> with a NULL pointer from pool_cache(), >> What is the root cause of this kernel panic? >> > The kernel panic is triggered using the following command and it happen > because the cache is not getting initialized. > > modprobe rdma_rxe add=eno1 > > Thanks, > Kamal > >> Zhu Yanjun >> >>> move the rxe_cache_init() to the >>> context of device creation. >>> >>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP NOPTI >>> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 >>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 >>> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 >>> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 >>> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 >>> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 >>> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 >>> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 >>> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 >>> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 >>> Call Trace: >>> rxe_alloc+0xc8/0x160 [rdma_rxe] >>> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] >>> __ib_alloc_pd+0xcb/0x160 [ib_core] >>> ib_mad_init_device+0x296/0x8b0 [ib_core] >>> add_client_context+0x11a/0x160 [ib_core] >>> enable_device_and_get+0xdc/0x1d0 [ib_core] >>> ib_register_device+0x572/0x6b0 [ib_core] >>> ? crypto_create_tfm+0x32/0xe0 >>> ? crypto_create_tfm+0x7a/0xe0 >>> ? crypto_alloc_tfm+0x58/0xf0 >>> rxe_register_device+0x19d/0x1c0 [rdma_rxe] >>> rxe_net_add+0x3d/0x70 [rdma_rxe] >>> ? dev_get_by_name_rcu+0x73/0x90 >>> rxe_param_set_add+0xaf/0xc0 [rdma_rxe] >>> parse_args+0x179/0x370 >>> ? ref_module+0x1b0/0x1b0 >>> load_module+0x135e/0x17e0 >>> ? ref_module+0x1b0/0x1b0 >>> ? __do_sys_init_module+0x13b/0x180 >>> __do_sys_init_module+0x13b/0x180 >>> do_syscall_64+0x5b/0x1a0 >>> entry_SYSCALL_64_after_hwframe+0x65/0xca >>> RIP: 0033:0x7f9137ed296e >>> >>> Fixes: 8700e3e7c485 ("Soft RoCE driver") >>> Signed-off-by: Kamal Heib >>> --- >>> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- >>> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ >>> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ >>> 3 files changed, 17 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c >>> index 5642eefb4ba1..60d5086dd34d 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe.c >>> +++ b/drivers/infiniband/sw/rxe/rxe.c >>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) >>> goto err; >>> } >>> + /* initialize slab caches for managed objects */ >>> + err = rxe_cache_init(); >>> + if (err) { >>> + pr_err("unable to init object pools\n"); >>> + goto err; >>> + } >>> + >>> err = rxe_net_add(ibdev_name, ndev); >>> if (err) { >>> pr_err("failed to add %s\n", ndev->name); >>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) >>> { >>> int err; >>> - /* initialize slab caches for managed objects */ >>> - err = rxe_cache_init(); When modprobe rdma_rxe, rxe_module_init should be called. Then rxe_cache_init should be also called. Why does the above call trace occur? Zhu Yanjun >>> - if (err) { >>> - pr_err("unable to init object pools\n"); >>> - return err; >>> - } >>> - >>> err = rxe_net_init(); >>> if (err) >>> return err; >>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >>> index fbcbac52290b..06c6d1f835b7 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>> @@ -139,6 +139,9 @@ int rxe_cache_init(void) >>> for (i = 0; i < RXE_NUM_TYPES; i++) { >>> type = &rxe_type_info[i]; >>> size = ALIGN(type->size, RXE_POOL_ALIGN); >>> + if (type->cache) >>> + continue; >>> + >>> if (!(type->flags & RXE_POOL_NO_ALLOC)) { >>> type->cache = >>> kmem_cache_create(type->name, size, >>> diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> index ccda5f5a3bc0..d0af48ba0110 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) >>> goto err; >>> } >>> + /* initialize slab caches for managed objects */ >>> + err = rxe_cache_init(); >>> + if (err) { >>> + pr_err("unable to init object pools\n"); >>> + goto err; >>> + } >>> + >>> err = rxe_net_add("rxe%d", ndev); >>> if (err) { >>> pr_err("failed to add %s\n", intf); >>