Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@fujitsu.com>,
	'Zhu Yanjun' <yanjun.zhu@intel.com>,
	"yi.zhang@redhat.com" <yi.zhang@redhat.com>
Cc: "jgg@ziepe.ca" <jgg@ziepe.ca>,
	"leon@kernel.org" <leon@kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"zyjzyj2000@gmail.com" <zyjzyj2000@gmail.com>
Subject: Re: [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size
Date: Fri, 13 Oct 2023 20:28:39 +0800	[thread overview]
Message-ID: <6f155aa3-8e75-40c5-9686-cad9f9ac0d81@linux.dev> (raw)
In-Reply-To: <OS3PR01MB98651C7454C46841B8A78F11E5D2A@OS3PR01MB9865.jpnprd01.prod.outlook.com>

在 2023/10/13 20:01, Daisuke Matsuda (Fujitsu) 写道:
> On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> The page_size of mr is set in infiniband core originally. In the commit
>> 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the
>> page_size is also set. Sometime this will cause conflict.
> 
> I appreciate your prompt action, but I do not think this commit deals with
> the root cause. I agree that the problem lies in rxe driver, but what is wrong
> with assigning actual page size to ibmr.page_size?

Please check the source code. ibmr.page_size is assigned in 
infiniband/core. And then it is assigned in rxe.
When the 2 are different, the problem will occur.
Please add logs to infiniband/core and rxe to check them.

Zhu Yanjun

> 
> IMO, the problem comes from the device attribute of rxe driver, which is used
> in ulp/srp layer to calculate the page_size.
> =====
> static int srp_add_one(struct ib_device *device)
> {
>          struct srp_device *srp_dev;
>          struct ib_device_attr *attr = &device->attrs;
> <...>
>          /*
>           * Use the smallest page size supported by the HCA, down to a
>           * minimum of 4096 bytes. We're unlikely to build large sglists
>           * out of smaller entries.
>           */
>          mr_page_shift           = max(12, ffs(attr->page_size_cap) - 1);
>          srp_dev->mr_page_size   = 1 << mr_page_shift;
> =====
> On initialization of srp driver, mr_page_size is calculated here.
> Note that the device attribute is used to calculate the value of page shift
> when the device is trying to use a page size larger than 4096. Since Yi specified
> CONFIG_ARM64_64K_PAGES, the system naturally met the condition.
> 
> =====
> static int srp_map_finish_fr(struct srp_map_state *state,
>                               struct srp_request *req,
>                               struct srp_rdma_ch *ch, int sg_nents,
>                               unsigned int *sg_offset_p)
> {
>          struct srp_target_port *target = ch->target;
>          struct srp_device *dev = target->srp_host->srp_dev;
> <...>
>          n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, sg_offset_p,
>                           dev->mr_page_size);
> =====
> After that, mr_page_size is presumably passed to ib_core layer.
> 
> =====
> int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
>                   unsigned int *sg_offset, unsigned int page_size)
> {
>          if (unlikely(!mr->device->ops.map_mr_sg))
>                  return -EOPNOTSUPP;
> 
>          mr->page_size = page_size;
> 
>          return mr->device->ops.map_mr_sg(mr, sg, sg_nents, sg_offset);
> }
> EXPORT_SYMBOL(ib_map_mr_sg);
> =====
> Consequently, the page size calculated in srp driver is set to ibmr.page_size.
> 
> Coming back to rxe, the device attribute is set here:
> =====
> rxe.c
> <...>
> /* initialize rxe device parameters */
> static void rxe_init_device_param(struct rxe_dev *rxe)
> {
>          rxe->max_inline_data                    = RXE_MAX_INLINE_DATA;
> 
>          rxe->attr.vendor_id                     = RXE_VENDOR_ID;
>          rxe->attr.max_mr_size                   = RXE_MAX_MR_SIZE;
>          rxe->attr.page_size_cap                 = RXE_PAGE_SIZE_CAP;
>          rxe->attr.max_qp                        = RXE_MAX_QP;
> ---
> rxe_param.h
> <...>
> /* default/initial rxe device parameter settings */
> enum rxe_device_param {
>          RXE_MAX_MR_SIZE                 = -1ull,
>          RXE_PAGE_SIZE_CAP               = 0xfffff000,
>          RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
> =====
> rxe_init_device_param() sets the attributes to rxe_dev->attr, and it is later copied
> to ib_device->attrs in setup_device()@core/device.c.
> See that the page size cap is hardcoded to 4096 bytes. I suspect this led to
> incorrect page_size being set to ibmr.page_size, resulting in the kernel crash.
> 
> I think rxe driver is made to be able to handle arbitrary page sizes.
> Probably, we can just modify RXE_PAGE_SIZE_CAP to fix the issue.
> What do you guys think?
> 
> Thanks,
> Daisuke Matsuda
> 


  reply	other threads:[~2023-10-13 12:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13  1:18 [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size Zhu Yanjun
2023-10-13 12:01 ` Daisuke Matsuda (Fujitsu)
2023-10-13 12:28   ` Zhu Yanjun [this message]
2023-10-13 13:01     ` Daisuke Matsuda (Fujitsu)
2023-10-13 13:44       ` Rain River
2023-10-16  6:07         ` Daisuke Matsuda (Fujitsu)
2023-10-18  8:34           ` Zhu Yanjun
2023-10-20  3:47   ` Zhijian Li (Fujitsu)
2023-10-20  6:54     ` Zhijian Li (Fujitsu)
2023-10-20 16:21       ` Bart Van Assche
2023-10-23  0:58         ` Zhijian Li (Fujitsu)
2023-10-20 14:01     ` Jason Gunthorpe
2023-10-23  3:52       ` Zhijian Li (Fujitsu)
2023-10-23  6:08         ` Zhu Yanjun
2023-10-23 10:45         ` Yi Zhang
2023-10-24  8:15           ` Zhijian Li (Fujitsu)
2023-10-24  9:13             ` Zhijian Li (Fujitsu)
2023-10-26  9:05               ` Zhijian Li (Fujitsu)
2023-10-26 11:42                 ` Jason Gunthorpe
2023-10-26 12:59                   ` Zhu Yanjun
2023-10-26 23:23                     ` Jason Gunthorpe
2023-10-27  1:36                       ` Zhu Yanjun
2023-10-27  4:01                       ` Zhu Yanjun
2023-10-27 11:51                         ` Jason Gunthorpe
2023-10-26 13:28                 ` Bart Van Assche
2023-10-26 13:43                   ` Jason Gunthorpe
2023-10-26 21:47                     ` Bart Van Assche
2023-10-27  1:26                       ` Daisuke Matsuda (Fujitsu)
2023-10-27  1:39                       ` Zhu Yanjun
2023-10-27  5:43                       ` Zhijian Li (Fujitsu)
2023-10-31  1:36                 ` Zhu Yanjun
     [not found]                   ` <CAEz=LcuLCe7bhUohh6BcHdJ1_ocJdZq=eu07vWb3Md5_ZOGDBg@mail.gmail.com>
     [not found]                     ` <CAEz=LcuQ6fFpHqBPT1oTUgKABAHFJqYDC-AHidE-+n6OtzmCPQ@mail.gmail.com>
2023-10-31  8:14                       ` Greg Sword

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f155aa3-8e75-40c5-9686-cad9f9ac0d81@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matsuda-daisuke@fujitsu.com \
    --cc=yanjun.zhu@intel.com \
    --cc=yi.zhang@redhat.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox