All of lore.kernel.org
 help / color / mirror / Atom feed
From: jitendra.bhivare@broadcom.com (Jitendra Bhivare)
Subject: Read request exceeding max_hw_sectors_kb
Date: Wed, 13 Jun 2018 16:11:56 +0530	[thread overview]
Message-ID: <0ba3cccb9d2ee7cba50a91bfc92b373a@mail.gmail.com> (raw)

Hi Christoph, Daniel,

On my NVMf setup using SPDK, MDTS gets configured to 4 for a target
subsystem.
This sets max_hw_sector_kb to 64KB for a remotely attached NS block device
on the initiator.
The NS is exported from Intel NVMe 750 SSD connected to target which has a
quirk of NVME_QUIRK_STRIPE_SIZE.
So SPDK is filling up noiob to 256 as per vendor specific controller data
vs[3] = 5. The corresponding NS on initiator
gets chunk_sectors configured to 256 (128KB) though max_hw_sectors_kb is
64KB.

This is causing an issue in block layer submitting a readahead request
exceeding max transfer size which SPDK fails causing
nvme-rdma controller recovery. Call trace when request gets submitted has
been pasted below.

The path which allows the request to go through is
blk_mq_make_request -> blk_queue_split -> blk_bio_segment_split ->
get_max_io_size.

Though target is sending chunk size greater than mdts, shouldn't nvme-core
set chunk_sectors appropriately?
In this case, the block layer too, didn't seem to honor the
max_hw_sectors_kb.

So something like resolves the issue:
static void nvme_set_chunk_size(struct nvme_ns *ns)
{
	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));

	chunk_size = rounddown_pow_of_two(chunk_size);
	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
	blk_queue_chunk_sectors(ns->queue,
rounddown_pow_of_two(chunk_size));
}

Please do let me know if this is the right approach or SPDK should set
noiob appropriately.

Thanks,

JB

[ 1798.507808] nvme nvme2: JB: ctrl ffff880220f942c0 max_hw_sectors 128
max_segments 17 page_size 4096
[ 1798.512752] CPU: 8 PID: 5749 Comm: systemd-udevd Tainted: G O 4.14.44
#2
[ 1798.512753] Hardware name: Dell Inc. PowerEdge T620/0658N7, BIOS 2.5.4
01/22/2016
[ 1798.512754] Call Trace:
[ 1798.512761] dump_stack+0x63/0x87
[ 1798.512766] nvme_rdma_queue_rq+0x5ee/0x670 [nvme_rdma]
[ 1798.512769] __blk_mq_try_issue_directly+0xde/0x140
[ 1798.512771] blk_mq_try_issue_directly+0x6f/0x80
[ 1798.512773] ? blk_account_io_start+0xf4/0x190
[ 1798.512774] blk_mq_make_request+0x32a/0x5f0
[ 1798.512776] generic_make_request+0x122/0x2f0
[ 1798.512777] submit_bio+0x73/0x150
[ 1798.512778] ? submit_bio+0x73/0x150
[ 1798.512781] ? guard_bio_eod+0x2c/0x100
[ 1798.512783] mpage_readpages+0x1aa/0x1f0
[ 1798.512784] ? I_BDEV+0x20/0x20
[ 1798.512787] ? alloc_pages_current+0x6a/0xe0
[ 1798.512788] blkdev_readpages+0x1d/0x20
[ 1798.512791] __do_page_cache_readahead+0x1be/0x2c0
[ 1798.512793] force_page_cache_readahead+0xb8/0x110
[ 1798.512794] ? force_page_cache_readahead+0xb8/0x110
[ 1798.512795] page_cache_sync_readahead+0x3f/0x50
[ 1798.512798] generic_file_read_iter+0x7eb/0xbb0
[ 1798.512800] ? page_cache_tree_insert+0xb0/0xb0
[ 1798.512801] blkdev_read_iter+0x35/0x40
[ 1798.512804] __vfs_read+0xf9/0x170
[ 1798.512806] vfs_read+0x93/0x130
[ 1798.512807] SyS_read+0x55/0xc0
[ 1798.512810] do_syscall_64+0x73/0x130
[ 1798.512813] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1798.512814] RIP: 0033:0x7f59a6260500
[ 1798.512815] RSP: 002b:00007ffe79114e58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 1798.512817] RAX: ffffffffffffffda RBX: 000056152f7e88c0 RCX:
00007f59a6260500
[ 1798.512818] RDX: 0000000000040000 RSI: 000056152f7e88e8 RDI:
000000000000000f
[ 1798.512819] RBP: 000056152f7966b0 R08: 000056152f7e88c0 R09:
0000000000000000
[ 1798.512819] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000040000
[ 1798.512820] R13: 0000000000000000 R14: 000056152f796700 R15:
000056152f7e88d8
[ 1798.512824] nvme nvme2: JB: nvme_rdma_queue_rq: rq ffff88021da00000 op
0 data len 0x11000

WARNING: multiple messages have this Message-ID (diff)
From: Jitendra Bhivare <jitendra.bhivare at broadcom.com>
To: spdk@lists.01.org
Subject: [SPDK] Read request exceeding max_hw_sectors_kb
Date: Wed, 13 Jun 2018 16:11:56 +0530	[thread overview]
Message-ID: <0ba3cccb9d2ee7cba50a91bfc92b373a@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

Hi Christoph, Daniel,

On my NVMf setup using SPDK, MDTS gets configured to 4 for a target
subsystem.
This sets max_hw_sector_kb to 64KB for a remotely attached NS block device
on the initiator.
The NS is exported from Intel NVMe 750 SSD connected to target which has a
quirk of NVME_QUIRK_STRIPE_SIZE.
So SPDK is filling up noiob to 256 as per vendor specific controller data
vs[3] = 5. The corresponding NS on initiator
gets chunk_sectors configured to 256 (128KB) though max_hw_sectors_kb is
64KB.

This is causing an issue in block layer submitting a readahead request
exceeding max transfer size which SPDK fails causing
nvme-rdma controller recovery. Call trace when request gets submitted has
been pasted below.

The path which allows the request to go through is
blk_mq_make_request -> blk_queue_split -> blk_bio_segment_split ->
get_max_io_size.

Though target is sending chunk size greater than mdts, shouldn't nvme-core
set chunk_sectors appropriately?
In this case, the block layer too, didn't seem to honor the
max_hw_sectors_kb.

So something like resolves the issue:
static void nvme_set_chunk_size(struct nvme_ns *ns)
{
	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));

	chunk_size = rounddown_pow_of_two(chunk_size);
	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
	blk_queue_chunk_sectors(ns->queue,
rounddown_pow_of_two(chunk_size));
}

Please do let me know if this is the right approach or SPDK should set
noiob appropriately.

Thanks,

JB

[ 1798.507808] nvme nvme2: JB: ctrl ffff880220f942c0 max_hw_sectors 128
max_segments 17 page_size 4096
[ 1798.512752] CPU: 8 PID: 5749 Comm: systemd-udevd Tainted: G O 4.14.44
#2
[ 1798.512753] Hardware name: Dell Inc. PowerEdge T620/0658N7, BIOS 2.5.4
01/22/2016
[ 1798.512754] Call Trace:
[ 1798.512761] dump_stack+0x63/0x87
[ 1798.512766] nvme_rdma_queue_rq+0x5ee/0x670 [nvme_rdma]
[ 1798.512769] __blk_mq_try_issue_directly+0xde/0x140
[ 1798.512771] blk_mq_try_issue_directly+0x6f/0x80
[ 1798.512773] ? blk_account_io_start+0xf4/0x190
[ 1798.512774] blk_mq_make_request+0x32a/0x5f0
[ 1798.512776] generic_make_request+0x122/0x2f0
[ 1798.512777] submit_bio+0x73/0x150
[ 1798.512778] ? submit_bio+0x73/0x150
[ 1798.512781] ? guard_bio_eod+0x2c/0x100
[ 1798.512783] mpage_readpages+0x1aa/0x1f0
[ 1798.512784] ? I_BDEV+0x20/0x20
[ 1798.512787] ? alloc_pages_current+0x6a/0xe0
[ 1798.512788] blkdev_readpages+0x1d/0x20
[ 1798.512791] __do_page_cache_readahead+0x1be/0x2c0
[ 1798.512793] force_page_cache_readahead+0xb8/0x110
[ 1798.512794] ? force_page_cache_readahead+0xb8/0x110
[ 1798.512795] page_cache_sync_readahead+0x3f/0x50
[ 1798.512798] generic_file_read_iter+0x7eb/0xbb0
[ 1798.512800] ? page_cache_tree_insert+0xb0/0xb0
[ 1798.512801] blkdev_read_iter+0x35/0x40
[ 1798.512804] __vfs_read+0xf9/0x170
[ 1798.512806] vfs_read+0x93/0x130
[ 1798.512807] SyS_read+0x55/0xc0
[ 1798.512810] do_syscall_64+0x73/0x130
[ 1798.512813] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1798.512814] RIP: 0033:0x7f59a6260500
[ 1798.512815] RSP: 002b:00007ffe79114e58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 1798.512817] RAX: ffffffffffffffda RBX: 000056152f7e88c0 RCX:
00007f59a6260500
[ 1798.512818] RDX: 0000000000040000 RSI: 000056152f7e88e8 RDI:
000000000000000f
[ 1798.512819] RBP: 000056152f7966b0 R08: 000056152f7e88c0 R09:
0000000000000000
[ 1798.512819] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000040000
[ 1798.512820] R13: 0000000000000000 R14: 000056152f796700 R15:
000056152f7e88d8
[ 1798.512824] nvme nvme2: JB: nvme_rdma_queue_rq: rq ffff88021da00000 op
0 data len 0x11000

             reply	other threads:[~2018-06-13 10:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-13 10:41 Jitendra Bhivare [this message]
2018-06-13 10:41 ` [SPDK] Read request exceeding max_hw_sectors_kb Jitendra Bhivare
2018-06-13 11:47 ` Christoph Hellwig
2018-06-13 11:47   ` [SPDK] " Christoph Hellwig
2018-06-13 17:00   ` Daniel Verkamp
2018-06-13 17:00     ` [SPDK] " Daniel Verkamp
2018-06-14 10:16     ` Jitendra Bhivare
2018-06-14 10:16       ` [SPDK] " Jitendra Bhivare
2018-06-14 14:54 ` Keith Busch
2018-06-14 14:54   ` [SPDK] " Keith Busch
2018-06-15 13:58   ` Jitendra Bhivare
2018-06-15 13:58     ` [SPDK] " Jitendra Bhivare
2018-06-26 10:47   ` Jitendra Bhivare
2018-06-26 10:47     ` [SPDK] " Jitendra Bhivare
2018-06-26 15:07     ` Keith Busch
2018-06-26 15:07       ` [SPDK] " Keith Busch
2018-06-27 10:28       ` Jitendra Bhivare
2018-06-27 10:28         ` [SPDK] " Jitendra Bhivare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ba3cccb9d2ee7cba50a91bfc92b373a@mail.gmail.com \
    --to=jitendra.bhivare@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.