All of lore.kernel.org
 help / color / mirror / Atom feed
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>, jgg@ziepe.ca
Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
	nirranjan@chelsio.com, hch@lst.de,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH for-rc] nvme-rdma/nvmet-rdma: Allocate sufficient RW ctxs to match hosts pgs len
Date: Thu, 27 Feb 2020 21:16:21 +0530	[thread overview]
Message-ID: <20200227154220.GA3153@chelsio.com> (raw)
In-Reply-To: <b7a7abdc-574a-4ce9-ccf0-a51532f1ac58@grimberg.me>

Hi Sagi & Jason,
	
Thanks for the comments, please see inline.

On Wednesday, February 02/26/20, 2020 at 15:05:59 -0800, Sagi Grimberg wrote:
> 
> >Current nvmet-rdma code allocates MR pool budget based on host's SQ
> >size, assuming both host and target use the same "max_pages_per_mr"
> >count. But if host's max_pages_per_mr is greater than target's, then
> >target can run out of MRs while processing larger IO WRITEs.
> >
> >That is, say host's SQ size is 100, then the MR pool budget allocated
> >currently at target will also be 100 MRs. But 100 IO WRITE Requests
> >with 256 sg_count(IO size above 1MB) require 200 MRs when target's
> >"max_pages_per_mr" is 128.
> 
> The patch doesn't say if this is an actual bug you are seeing or
> theoretical.
	
I've noticed this issue while running the below fio command:
fio --rw=randwrite --name=random --norandommap --ioengine=libaio
--size=16m --group_reporting --exitall --fsync_on_close=1 --invalidate=1
--direct=1 --filename=/dev/nvme2n1 --iodepth=32 --numjobs=16
--unit_base=1 --bs=4m --kb_base=1000

Note: here NVMe Host is on SIW & Target is on iw_cxgb4 and the
max_pages_per_mr supported by SIW and iw_cxgb4 are 255 and 128
respectively.
	
Traces on Target:

#cat /sys/kernel/debug/tracing/trace_pipe|grep -v "status=0x0"
kworker/8:1H-2461  [008] .... 25476.995437: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=3, res=0xffff8b7f2ae534d0, status=0x6
kworker/8:1H-2461  [008] .... 25476.995467: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=4, res=0xffff8b7f2ae53700, status=0x6
kworker/8:1H-2461  [008] .... 25476.995511: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=1, res=0xffff8b7f2ae53980, status=0x6

> 
> >The proposed patch enables host to advertise the max_fr_pages(via
> >nvme_rdma_cm_req) such that target can allocate that many number of
> >RW ctxs(if host's max_fr_pages is higher than target's).
> 
> As mentioned by Jason, this s a non-compatible change, if you want to
> introduce this you need to go through the standard and update the
> cm private_data layout (would mean that the fmt needs to increment as
> well to be backward compatible).

Sure, will initiate a discussion at NVMe TWG about CM private_data format.
Will update the response soon.
> 
> 
> As a stop-gap, nvmet needs to limit the controller mdts to how much
> it can allocate based on the HCA capabilities
> (max_fast_reg_page_list_len).

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>, jgg@ziepe.ca
Cc: linux-nvme@lists.infradead.org, hch@lst.de,
	linux-rdma@vger.kernel.org, nirranjan@chelsio.com,
	bharat@chelsio.com
Subject: Re: [PATCH for-rc] nvme-rdma/nvmet-rdma: Allocate sufficient RW ctxs to match hosts pgs len
Date: Thu, 27 Feb 2020 21:16:21 +0530	[thread overview]
Message-ID: <20200227154220.GA3153@chelsio.com> (raw)
In-Reply-To: <b7a7abdc-574a-4ce9-ccf0-a51532f1ac58@grimberg.me>

Hi Sagi & Jason,
	
Thanks for the comments, please see inline.

On Wednesday, February 02/26/20, 2020 at 15:05:59 -0800, Sagi Grimberg wrote:
> 
> >Current nvmet-rdma code allocates MR pool budget based on host's SQ
> >size, assuming both host and target use the same "max_pages_per_mr"
> >count. But if host's max_pages_per_mr is greater than target's, then
> >target can run out of MRs while processing larger IO WRITEs.
> >
> >That is, say host's SQ size is 100, then the MR pool budget allocated
> >currently at target will also be 100 MRs. But 100 IO WRITE Requests
> >with 256 sg_count(IO size above 1MB) require 200 MRs when target's
> >"max_pages_per_mr" is 128.
> 
> The patch doesn't say if this is an actual bug you are seeing or
> theoretical.
	
I've noticed this issue while running the below fio command:
fio --rw=randwrite --name=random --norandommap --ioengine=libaio
--size=16m --group_reporting --exitall --fsync_on_close=1 --invalidate=1
--direct=1 --filename=/dev/nvme2n1 --iodepth=32 --numjobs=16
--unit_base=1 --bs=4m --kb_base=1000

Note: here NVMe Host is on SIW & Target is on iw_cxgb4 and the
max_pages_per_mr supported by SIW and iw_cxgb4 are 255 and 128
respectively.
	
Traces on Target:

#cat /sys/kernel/debug/tracing/trace_pipe|grep -v "status=0x0"
kworker/8:1H-2461  [008] .... 25476.995437: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=3, res=0xffff8b7f2ae534d0, status=0x6
kworker/8:1H-2461  [008] .... 25476.995467: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=4, res=0xffff8b7f2ae53700, status=0x6
kworker/8:1H-2461  [008] .... 25476.995511: nvmet_req_complete: nvmet1:
disk=/dev/ram0, qid=1, cmdid=1, res=0xffff8b7f2ae53980, status=0x6

> 
> >The proposed patch enables host to advertise the max_fr_pages(via
> >nvme_rdma_cm_req) such that target can allocate that many number of
> >RW ctxs(if host's max_fr_pages is higher than target's).
> 
> As mentioned by Jason, this s a non-compatible change, if you want to
> introduce this you need to go through the standard and update the
> cm private_data layout (would mean that the fmt needs to increment as
> well to be backward compatible).

Sure, will initiate a discussion at NVMe TWG about CM private_data format.
Will update the response soon.
> 
> 
> As a stop-gap, nvmet needs to limit the controller mdts to how much
> it can allocate based on the HCA capabilities
> (max_fast_reg_page_list_len).

  reply	other threads:[~2020-02-27 15:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 14:13 [PATCH for-rc] nvme-rdma/nvmet-rdma: Allocate sufficient RW ctxs to match hosts pgs len Krishnamraju Eraparaju
2020-02-26 14:13 ` Krishnamraju Eraparaju
2020-02-26 17:07 ` Jason Gunthorpe
2020-02-26 17:07   ` Jason Gunthorpe
2020-02-26 23:19   ` Sagi Grimberg
2020-02-26 23:19     ` Sagi Grimberg
2020-02-26 23:05 ` Sagi Grimberg
2020-02-26 23:05   ` Sagi Grimberg
2020-02-27 15:46   ` Krishnamraju Eraparaju [this message]
2020-02-27 15:46     ` Krishnamraju Eraparaju
2020-02-27 23:14     ` Sagi Grimberg
2020-02-27 23:14       ` Sagi Grimberg
2020-03-01 14:05       ` Max Gurtovoy
2020-03-01 14:05         ` Max Gurtovoy
2020-03-02  7:32         ` Krishnamraju Eraparaju
2020-03-02  7:32           ` Krishnamraju Eraparaju
2020-03-02 17:43           ` Sagi Grimberg
2020-03-02 17:43             ` Sagi Grimberg
2020-03-03 23:11             ` Max Gurtovoy
2020-03-03 23:11               ` Max Gurtovoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200227154220.GA3153@chelsio.com \
    --to=krishna2@chelsio.com \
    --cc=bharat@chelsio.com \
    --cc=hch@lst.de \
    --cc=jgg@ziepe.ca \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=nirranjan@chelsio.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.