Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@fujitsu.com>,
	'Rain River' <rain.1986.08.12@gmail.com>
Cc: Zhu Yanjun <yanjun.zhu@intel.com>,
	"yi.zhang@redhat.com" <yi.zhang@redhat.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	"leon@kernel.org" <leon@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"zyjzyj2000@gmail.com" <zyjzyj2000@gmail.com>
Subject: Re: [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size
Date: Wed, 18 Oct 2023 16:34:35 +0800	[thread overview]
Message-ID: <12bc6723-6048-4aa6-ba3e-ad6ff2403196@linux.dev> (raw)
In-Reply-To: <OS3PR01MB9865A90A3FD2B9D3096E6E2CE5D7A@OS3PR01MB9865.jpnprd01.prod.outlook.com>

在 2023/10/16 14:07, Daisuke Matsuda (Fujitsu) 写道:
> On Fri, Oct 13, 2023 10:44 PM Rain River <rain.1986.08.12@gmail.com> wrote:
>>> On Friday, October 13, 2023 9:29 PM Zhu Yanjun:
>>>> 在 2023/10/13 20:01, Daisuke Matsuda (Fujitsu) 写道:
>>>>> On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote:
>>>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>>>
>>>>>> The page_size of mr is set in infiniband core originally. In the commit
>>>>>> 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the
>>>>>> page_size is also set. Sometime this will cause conflict.
>>>>>
>>>>> I appreciate your prompt action, but I do not think this commit deals with
>>>>> the root cause. I agree that the problem lies in rxe driver, but what is wrong
>>>>> with assigning actual page size to ibmr.page_size?
>>>>
>>>> Please check the source code. ibmr.page_size is assigned in
>>>> infiniband/core. And then it is assigned in rxe.
>>>> When the 2 are different, the problem will occur.
>>>
>>> In the first place, the two must always be equal.
>>> Is there any situation there are two different page sizes for a MR?
>>> I think I have explained the value assigned in the core layer is wrong
>>> when the PAGE_SIZE is bigger than 4k, and that is why they are inconsistent.
>>>
>>> As I have no environment to test the kernel panic, it remains speculative,
>>> but I have provided enough information for everyone to rebut my argument.
>>> It is possible I am wrong. I hope someone will tell me specifically where
>>> my guess is wrong, or Yi would kindly take the trouble to verify it.
>>
>> I made a quick test.
>>
>> With Zhu's patch, the problem fixed.
>> With your patch, this problem exists. And other problems occur. I do
>> not know why.
>> Will continue to make tests.
> 
> Thank you for your time and work.
> I will try to find out why there were two different page sizes from
> different perspectives. This may take a while because I am busy
> with other projects now. If anybody need the driver without the crash
> issue right now, I do not object to partially undoing "RDMA/rxe: Cleanup
> page variables in rxe_mr.c" as Zhu suggests, but that should be interim
> and not a final solution.

Sorry, it is late to reply.

This is a difficult and complicated problem because it involves rdma and 
block/blktests. This commit is based on the fact that the commit 
325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c") causes 
this problem.

So I delved into this commit. In this commit, the core part is to change 
the size of ibmr.page_size.
 From this, I checked the source code. I found that this page_size is 
also set in infiniband/core. So I try to remove the change of 
ibmr.page_size.

I will continue to delve into this problem and the source code to find 
the root cause that you also agree with.

Zhu Yanjun
> 
> Thanks,
> Daisuke
> 
>>
>>>
>>> Thanks,
>>> Daisuke Matsuda
>>>
>>>> Please add logs to infiniband/core and rxe to check them.
>>>>
>>>> Zhu Yanjun
>>>>
>>>>>
>>>>> IMO, the problem comes from the device attribute of rxe driver, which is used
>>>>> in ulp/srp layer to calculate the page_size.
>>>>> =====
>>>>> static int srp_add_one(struct ib_device *device)
>>>>> {
>>>>>           struct srp_device *srp_dev;
>>>>>           struct ib_device_attr *attr = &device->attrs;
>>>>> <...>
>>>>>           /*
>>>>>            * Use the smallest page size supported by the HCA, down to a
>>>>>            * minimum of 4096 bytes. We're unlikely to build large sglists
>>>>>            * out of smaller entries.
>>>>>            */
>>>>>           mr_page_shift           = max(12, ffs(attr->page_size_cap) - 1);
>>>>>           srp_dev->mr_page_size   = 1 << mr_page_shift;
>>>>> =====
>>>>> On initialization of srp driver, mr_page_size is calculated here.
>>>>> Note that the device attribute is used to calculate the value of page shift
>>>>> when the device is trying to use a page size larger than 4096. Since Yi specified
>>>>> CONFIG_ARM64_64K_PAGES, the system naturally met the condition.
>>>>>
>>>>> =====
>>>>> static int srp_map_finish_fr(struct srp_map_state *state,
>>>>>                                struct srp_request *req,
>>>>>                                struct srp_rdma_ch *ch, int sg_nents,
>>>>>                                unsigned int *sg_offset_p)
>>>>> {
>>>>>           struct srp_target_port *target = ch->target;
>>>>>           struct srp_device *dev = target->srp_host->srp_dev;
>>>>> <...>
>>>>>           n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, sg_offset_p,
>>>>>                            dev->mr_page_size);
>>>>> =====
>>>>> After that, mr_page_size is presumably passed to ib_core layer.
>>>>>
>>>>> =====
>>>>> int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
>>>>>                    unsigned int *sg_offset, unsigned int page_size)
>>>>> {
>>>>>           if (unlikely(!mr->device->ops.map_mr_sg))
>>>>>                   return -EOPNOTSUPP;
>>>>>
>>>>>           mr->page_size = page_size;
>>>>>
>>>>>           return mr->device->ops.map_mr_sg(mr, sg, sg_nents, sg_offset);
>>>>> }
>>>>> EXPORT_SYMBOL(ib_map_mr_sg);
>>>>> =====
>>>>> Consequently, the page size calculated in srp driver is set to ibmr.page_size.
>>>>>
>>>>> Coming back to rxe, the device attribute is set here:
>>>>> =====
>>>>> rxe.c
>>>>> <...>
>>>>> /* initialize rxe device parameters */
>>>>> static void rxe_init_device_param(struct rxe_dev *rxe)
>>>>> {
>>>>>           rxe->max_inline_data                    = RXE_MAX_INLINE_DATA;
>>>>>
>>>>>           rxe->attr.vendor_id                     = RXE_VENDOR_ID;
>>>>>           rxe->attr.max_mr_size                   = RXE_MAX_MR_SIZE;
>>>>>           rxe->attr.page_size_cap                 = RXE_PAGE_SIZE_CAP;
>>>>>           rxe->attr.max_qp                        = RXE_MAX_QP;
>>>>> ---
>>>>> rxe_param.h
>>>>> <...>
>>>>> /* default/initial rxe device parameter settings */
>>>>> enum rxe_device_param {
>>>>>           RXE_MAX_MR_SIZE                 = -1ull,
>>>>>           RXE_PAGE_SIZE_CAP               = 0xfffff000,
>>>>>           RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
>>>>> =====
>>>>> rxe_init_device_param() sets the attributes to rxe_dev->attr, and it is later copied
>>>>> to ib_device->attrs in setup_device()@core/device.c.
>>>>> See that the page size cap is hardcoded to 4096 bytes. I suspect this led to
>>>>> incorrect page_size being set to ibmr.page_size, resulting in the kernel crash.
>>>>>
>>>>> I think rxe driver is made to be able to handle arbitrary page sizes.
>>>>> Probably, we can just modify RXE_PAGE_SIZE_CAP to fix the issue.
>>>>> What do you guys think?
>>>>>
>>>>> Thanks,
>>>>> Daisuke Matsuda
>>>>>
>>>


  reply	other threads:[~2023-10-18  8:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13  1:18 [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size Zhu Yanjun
2023-10-13 12:01 ` Daisuke Matsuda (Fujitsu)
2023-10-13 12:28   ` Zhu Yanjun
2023-10-13 13:01     ` Daisuke Matsuda (Fujitsu)
2023-10-13 13:44       ` Rain River
2023-10-16  6:07         ` Daisuke Matsuda (Fujitsu)
2023-10-18  8:34           ` Zhu Yanjun [this message]
2023-10-20  3:47   ` Zhijian Li (Fujitsu)
2023-10-20  6:54     ` Zhijian Li (Fujitsu)
2023-10-20 16:21       ` Bart Van Assche
2023-10-23  0:58         ` Zhijian Li (Fujitsu)
2023-10-20 14:01     ` Jason Gunthorpe
2023-10-23  3:52       ` Zhijian Li (Fujitsu)
2023-10-23  6:08         ` Zhu Yanjun
2023-10-23 10:45         ` Yi Zhang
2023-10-24  8:15           ` Zhijian Li (Fujitsu)
2023-10-24  9:13             ` Zhijian Li (Fujitsu)
2023-10-26  9:05               ` Zhijian Li (Fujitsu)
2023-10-26 11:42                 ` Jason Gunthorpe
2023-10-26 12:59                   ` Zhu Yanjun
2023-10-26 23:23                     ` Jason Gunthorpe
2023-10-27  1:36                       ` Zhu Yanjun
2023-10-27  4:01                       ` Zhu Yanjun
2023-10-27 11:51                         ` Jason Gunthorpe
2023-10-26 13:28                 ` Bart Van Assche
2023-10-26 13:43                   ` Jason Gunthorpe
2023-10-26 21:47                     ` Bart Van Assche
2023-10-27  1:26                       ` Daisuke Matsuda (Fujitsu)
2023-10-27  1:39                       ` Zhu Yanjun
2023-10-27  5:43                       ` Zhijian Li (Fujitsu)
2023-10-31  1:36                 ` Zhu Yanjun
     [not found]                   ` <CAEz=LcuLCe7bhUohh6BcHdJ1_ocJdZq=eu07vWb3Md5_ZOGDBg@mail.gmail.com>
     [not found]                     ` <CAEz=LcuQ6fFpHqBPT1oTUgKABAHFJqYDC-AHidE-+n6OtzmCPQ@mail.gmail.com>
2023-10-31  8:14                       ` Greg Sword

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12bc6723-6048-4aa6-ba3e-ad6ff2403196@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matsuda-daisuke@fujitsu.com \
    --cc=rain.1986.08.12@gmail.com \
    --cc=yanjun.zhu@intel.com \
    --cc=yi.zhang@redhat.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox