Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH RFC v2 1/3] rdma_cm: add rdma_reject_msg() helper function
From: Sagi Grimberg @ 2016-10-21 21:43 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	sagig-NQWnxTmZq1alnMjI0IkVqw, axboe-b10kYP2dOMg
In-Reply-To: <005d01d22ba4$672effd0$358cff70$@opengridcomputing.com>

Looks good to me either way,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC v2 2/3] rdma_cm: add rdma_consumer_reject() helper function
From: Sagi Grimberg @ 2016-10-21 21:45 UTC (permalink / raw)
  To: Parav Pandit, Steve Wise
  Cc: Christoph Hellwig, Doug Ledford, Hefty, Sean, linux-rdma,
	Bart Van Assche, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	sagig-NQWnxTmZq1alnMjI0IkVqw, axboe-b10kYP2dOMg
In-Reply-To: <CAG53R5Xyp+n7KYj6zZF6PFuXke3XtqaMgaGTRmgd_uGXTFNDtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


>>> Yes.  Also I'd just inline the ib_consumer_reject and iw_consumer_reject
>>> helpers here.
>>>
>>> Aso wouldn't it be better named rdma_consumer_is_reject or similar
>>> given that we don't reject anything here, but check if the request
>>> has been rejected?
>>
>> How about rdma_rejected_by_consumer()?
>>
> How about rdma_reject_by_ulp()?
> We have ulp directory holding iser, srp etc.

I like consumer better.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC v2 3/3] nvme-rdma: use rdma_reject_msg() to log connection rejects
From: Sagi Grimberg @ 2016-10-21 21:48 UTC (permalink / raw)
  To: Christoph Hellwig, Steve Wise
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	sagig-NQWnxTmZq1alnMjI0IkVqw, axboe-b10kYP2dOMg
In-Reply-To: <20161021122318.GB17325-jcswGhMUV9g@public.gmane.org>


> Given the nasty casting issues in the current RDMA/CM API maybe we should
> actually expand the scope of the rdma_consumer_reject helper to include
> the above check, e.g. check that there is a private data len and then
> return a pointer to the private data?
>
> Something like
>
> static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,
> 		struct rdma_cm_event *ev)
> {
> 	struct rdma_cm_id *cm_id = queue->cm_id;
> 	struct nvme_rdma_cm_rej *rej
> 	short nvme_status = -1;
>
> 	rej = rdma_cm_reject_message(ev);
> 	if (rej)
> 		nvme_status = le16_to_cpu(rej->sts);
>

Looks nicer...

>>
>> +	dev_err(queue->ctrl->ctrl.device, "Connect rejected: status %d (%s) "
>> +		"nvme status %d.\n", rdma_status,
>> +		rdma_reject_msg(cm_id, rdma_status), nvme_status);
>
> And while we're pretty printing the rest it would be nice to pretty
> print the NVMe status here as well.

Would be nice...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC v2 1/3] rdma_cm: add rdma_reject_msg() helper function
From: Sagi Grimberg @ 2016-10-21 21:51 UTC (permalink / raw)
  To: Steve Wise, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: sagig-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, axboe-b10kYP2dOMg,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, hch-jcswGhMUV9g
In-Reply-To: <1360f08b7c25f3befcd6836b47af81e2ecb51b75.1477003235.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>


> +static const char * const ib_rej_reason_strs[] = {
> +	[IB_CM_REJ_NO_QP]			= "no qp",
> +	[IB_CM_REJ_NO_EEC]			= "no eec",
> +	[IB_CM_REJ_NO_RESOURCES]		= "no resources",
> +	[IB_CM_REJ_TIMEOUT]			= "timeout",
> +	[IB_CM_REJ_UNSUPPORTED]			= "unsupported",
> +	[IB_CM_REJ_INVALID_COMM_ID]		= "invalid comm id",
> +	[IB_CM_REJ_INVALID_COMM_INSTANCE]	= "invalid comm instance",
> +	[IB_CM_REJ_INVALID_SERVICE_ID]		= "invalid service id",
> +	[IB_CM_REJ_INVALID_TRANSPORT_TYPE]	= "invalid transport type",
> +	[IB_CM_REJ_STALE_CONN]			= "stale conn",
> +	[IB_CM_REJ_RDC_NOT_EXIST]		= "rdc not exist",
> +	[IB_CM_REJ_INVALID_GID]			= "invalid gid",
> +	[IB_CM_REJ_INVALID_LID]			= "invalid lid",
> +	[IB_CM_REJ_INVALID_SL]			= "invalid sl",
> +	[IB_CM_REJ_INVALID_TRAFFIC_CLASS]	= "invalid traffic class",
> +	[IB_CM_REJ_INVALID_HOP_LIMIT]		= "invalid hop limit",
> +	[IB_CM_REJ_INVALID_PACKET_RATE]		= "invalid packet rate",
> +	[IB_CM_REJ_INVALID_ALT_GID]		= "invalid alt gid",
> +	[IB_CM_REJ_INVALID_ALT_LID]		= "invalid alt lid",
> +	[IB_CM_REJ_INVALID_ALT_SL]		= "invalid alt sl",
> +	[IB_CM_REJ_INVALID_ALT_TRAFFIC_CLASS]	= "invalid alt traffic class",
> +	[IB_CM_REJ_INVALID_ALT_HOP_LIMIT]	= "invalid alt hop limit",
> +	[IB_CM_REJ_INVALID_ALT_PACKET_RATE]	= "invalid alt packet rate",
> +	[IB_CM_REJ_PORT_CM_REDIRECT]		= "port cm redirect",
> +	[IB_CM_REJ_PORT_REDIRECT]		= "port redirect",
> +	[IB_CM_REJ_INVALID_MTU]			= "invalid mtu",
> +	[IB_CM_REJ_INSUFFICIENT_RESP_RESOURCES]	= "insufficient resp resources",
> +	[IB_CM_REJ_CONSUMER_DEFINED]		= "consumer defined",
> +	[IB_CM_REJ_INVALID_RNR_RETRY]		= "invalid rnr retry",
> +	[IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID]	= "duplicate local comm id",
> +	[IB_CM_REJ_INVALID_CLASS_VERSION]	= "invalid class version",
> +	[IB_CM_REJ_INVALID_FLOW_LABEL]		= "invalid flow label",
> +	[IB_CM_REJ_INVALID_ALT_FLOW_LABEL]	= "invalid alt flow label",
> +};
> +
> +const char *__attribute_const__ ib_reject_msg(int reason)

Please call it ibcm_reject_msg()


> +const char *__attribute_const__ iw_reject_msg(int reason)

and this iwcm_reject_msg
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC v2 0/3] connect reject event helpers
From: Sagi Grimberg @ 2016-10-21 21:53 UTC (permalink / raw)
  To: Steve Wise, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: sagig-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, axboe-b10kYP2dOMg,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, hch-jcswGhMUV9g
In-Reply-To: <cover.1477003235.git.swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>


> While reviewing:
>
> http://lists.infradead.org/pipermail/linux-nvme/2016-October/006681.html
>
> I decided to propose transport-agnostic helper functions to better
> handle connection reject event information.  I've included a nvme_rdma
> patch to utilize the new helpers.
>
> Thoughts?

Hey Steve,

This looks nice and useful. Would be great if you can
also help other ULPs that use this (e.g. srp/srpt)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH RFC v2 0/3] connect reject event helpers
From: Steve Wise @ 2016-10-21 21:58 UTC (permalink / raw)
  To: 'Sagi Grimberg', dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: sagig-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, axboe-b10kYP2dOMg,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, hch-jcswGhMUV9g
In-Reply-To: <a46f48f3-01b3-fd9b-b642-c20555759107-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

> 
> 
> > While reviewing:
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2016-October/006681.html
> >
> > I decided to propose transport-agnostic helper functions to better
> > handle connection reject event information.  I've included a nvme_rdma
> > patch to utilize the new helpers.
> >
> > Thoughts?
> 
> Hey Steve,
> 
> This looks nice and useful. Would be great if you can
> also help other ULPs that use this (e.g. srp/srpt)

can-do!


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] IB/isert: do not ignore errors in dma_map_single()
From: Alexey Khoroshilov @ 2016-10-21 22:01 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Alexey Khoroshilov, Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, target-devel, linux-kernel, ldv-project

There are several places, where errors in dma_map_single() are
ignored. The patch fixes them.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f63238e..f0ba5f83b02c 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1851,6 +1851,8 @@ isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 		isert_cmd->pdu_buf_dma = ib_dma_map_single(ib_dev,
 				(void *)cmd->sense_buffer, pdu_len,
 				DMA_TO_DEVICE);
+		if (ib_dma_mapping_error(ib_dev, isert_cmd->pdu_buf_dma))
+			return -ENOMEM;
 
 		isert_cmd->pdu_buf_len = pdu_len;
 		tx_dsg->addr	= isert_cmd->pdu_buf_dma;
@@ -1978,6 +1980,8 @@ isert_put_reject(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
 	isert_cmd->pdu_buf_dma = ib_dma_map_single(ib_dev,
 			(void *)cmd->buf_ptr, ISCSI_HDR_LEN,
 			DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(ib_dev, isert_cmd->pdu_buf_dma))
+		return -ENOMEM;
 	isert_cmd->pdu_buf_len = ISCSI_HDR_LEN;
 	tx_dsg->addr	= isert_cmd->pdu_buf_dma;
 	tx_dsg->length	= ISCSI_HDR_LEN;
@@ -2018,6 +2022,8 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
 
 		isert_cmd->pdu_buf_dma = ib_dma_map_single(ib_dev,
 				txt_rsp_buf, txt_rsp_len, DMA_TO_DEVICE);
+		if (ib_dma_mapping_error(ib_dev, isert_cmd->pdu_buf_dma))
+			return -ENOMEM;
 
 		isert_cmd->pdu_buf_len = txt_rsp_len;
 		tx_dsg->addr	= isert_cmd->pdu_buf_dma;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] IB/isert: do not ignore errors in dma_map_single()
From: Sagi Grimberg @ 2016-10-21 22:03 UTC (permalink / raw)
  To: Alexey Khoroshilov
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma,
	target-devel, linux-kernel, ldv-project
In-Reply-To: <1477087281-26275-1-git-send-email-khoroshilov@ispras.ru>

Thanks Alexey!

Acked-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: NVMeoF Linux GIT repo
From: Sagi Grimberg @ 2016-10-21 22:19 UTC (permalink / raw)
  To: Robert Randall (rrandall), Keith Busch,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
  Cc: Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <f5eefaea1d4b4d24945fdfb12da5a6ab-ESsAEwT0rQfU3M7VfFl2o0EOCMrvLtNR@public.gmane.org>

Hey Robert,

> Sorry Keith, I'm back to the same question again.  I've tried using the released 4.8.2 kernel and I'm seeing errors in the Linux RDMA layer.  Log file is attached.  My guess is this may have been fixed already but since I'm not writing code on Linux it is difficult to keep up with which repo and which branch I should be using.
>
> It reports a syndrome 5 which appears to mean "work request flush error".
>
> Setup is stable 4.8.2 kernel with Mellanox RoCE v2.
>
> So, where do I grab the latest and greatest code these days?

So from a quick look at the log the FLUSH errors are
just side effects. Once a queue-pair transitions to
ERROR state it flushes all the pending work requests with
a FLUSH syndrome, so we should look at the first error which
is:

mlx5_1:poll_soft_wc:647:(pid 3422): polled software generated completion 
on CQ 0x14

This seems to come from the GSI QP completion emulation from
Haggai (CC'd). CQ 0x14 is not nvmet-rdma completion queue (from
the log it's 0x5d) so something went wrong but its does not
seem to be nvmet-rdma's fault.

Haggai, any tips for Robert?

Log output:
[   12.588248] mlx5_core 0000:06:00.0 enp6s0f0: Link up
[   12.735116] mlx5_core 0000:06:00.1 enp6s0f1: Link up
[   16.490224] e1000e: enp8s0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx
[   20.005394] input: PS/2 Generic Mouse as 
/devices/platform/i8042/serio1/input/input4
[   33.138312] random: crng init done
[   57.710309] (0000:06:00.1): E-Switch: disable SRIOV: active vports(1) 
mode(0)
[   57.713889] (0000:06:00.1): E-Switch: cleanup
[   58.399819] (0000:06:00.0): E-Switch: disable SRIOV: active vports(1) 
mode(0)
[   58.401660] (0000:06:00.0): E-Switch: cleanup
[   59.134997] mlx5_core 0000:06:00.0: firmware version: 12.16.1020
[   59.855855] (0000:06:00.0): E-Switch: Total vports 9, l2 table 
size(65536), per vport: max uc(1024) max mc(16384)
[   59.857209] mlx5_core 0000:06:00.1: firmware version: 12.16.1020
[   60.563522] (0000:06:00.1): E-Switch: Total vports 9, l2 table 
size(65536), per vport: max uc(1024) max mc(16384)
[   60.566269] mlx5_core 0000:06:00.0: MLX5E: StrdRq(0) RqSz(1024) 
StrdSz(1) RxCqeCmprss(0)
[   60.737262] mlx5_core 0000:06:00.0 enp6s0f0: renamed from eth0
[   60.737617] mlx5_core 0000:06:00.1: MLX5E: StrdRq(0) RqSz(1024) 
StrdSz(1) RxCqeCmprss(0)
[   61.038325] mlx5_core 0000:06:00.0 enp6s0f0: Link up
[   61.041446] mlx5_core 0000:06:00.1 enp6s0f1: renamed from eth0
[   61.047290] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 
(Feb 2014)
[   61.981595] mlx5_core 0000:06:00.1 enp6s0f1: Link up
[   63.159807] e1000e: enp8s0 NIC Link is Down
[   67.836775] e1000e: enp8s0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx/Tx
[   74.745144] mlx5_core 0000:06:00.0 enp6s0f0: Link up
[   74.944733] nvmet: adding nsid 1 to subsystem ignite1
[   74.945148] nvmet: adding nsid 2 to subsystem ignite2
[   74.945510] nvmet: adding nsid 3 to subsystem ignite3
[   74.945864] nvmet: adding nsid 4 to subsystem ignite4
[   74.946747] nvmet_rdma: enabling port 2 (192.168.2.10:5150)
micron@fmslnx0:~$ tail -f /var/log/syslog
Oct 20 07:28:17 fmslnx0 systemd[1]: Reloaded OpenBSD Secure Shell server.
Oct 20 07:28:17 fmslnx0 kernel: [   74.745144] mlx5_core 0000:06:00.0 
enp6s0f0: Link up
Oct 20 07:28:17 fmslnx0 systemd[1]: Reloading OpenBSD Secure Shell server.
Oct 20 07:28:17 fmslnx0 systemd[1]: Reloaded OpenBSD Secure Shell server.
Oct 20 07:28:17 fmslnx0 systemd[1]: Started Raise network interfaces.
Oct 20 07:28:18 fmslnx0 kernel: [   74.944733] nvmet: adding nsid 1 to 
subsystem ignite1
Oct 20 07:28:18 fmslnx0 kernel: [   74.945148] nvmet: adding nsid 2 to 
subsystem ignite2
Oct 20 07:28:18 fmslnx0 kernel: [   74.945510] nvmet: adding nsid 3 to 
subsystem ignite3
Oct 20 07:28:18 fmslnx0 kernel: [   74.945864] nvmet: adding nsid 4 to 
subsystem ignite4
Oct 20 07:28:18 fmslnx0 kernel: [   74.946747] nvmet_rdma: enabling port 
2 (192.168.2.10:5150)
Oct 20 07:35:01 fmslnx0 CRON[3376]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Oct 20 07:42:35 fmslnx0 systemd[1]: Starting Cleanup of Temporary 
Directories...
Oct 20 07:42:35 fmslnx0 systemd-tmpfiles[3381]: 
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", 
ignoring.
Oct 20 07:42:35 fmslnx0 systemd[1]: Started Cleanup of Temporary 
Directories.
Oct 20 07:45:01 fmslnx0 CRON[3395]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Oct 20 07:50:48 fmslnx0 dhclient[3245]: DHCPREQUEST of 10.113.22.90 on 
enp8s0 to 10.113.22.46 port 67 (xid=0x19ae475f)
Oct 20 07:50:48 fmslnx0 dhclient[3245]: DHCPACK of 10.113.22.90 from 
10.113.22.46
Oct 20 07:50:48 fmslnx0 dhclient[3245]: bound to 10.113.22.90 -- renewal 
in 1615 seconds.
Oct 20 07:55:01 fmslnx0 CRON[3412]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Oct 20 08:05:01 fmslnx0 CRON[3418]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.978705] nvmet_rdma: connect 
request (4): status 0 id ffff8e0aa2f2bc00
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.979382] nvmet_rdma: added mlx5_1.
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.980211] 
mlx5_1:mlx5_ib_create_cq:948:(pid 1442): cqn 0x5d
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.980269] 
mlx5_1:calc_sq_size:355:(pid 1442): wqe_size 640
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.981626] 
mlx5_1:mlx5_ib_create_qp:2041:(pid 1442): ib qpnum 0xf1, mlx qpn 0xf1, 
rcqn 0x5d, scqn 0x5d
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.982372] nvmet_rdma: 
nvmet_rdma_create_queue_ib: max_cqe= 63 max_sge= 32 sq_size = 51 cm_id= 
ffff8e0aa2f2bc00
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.984271] 
mlx5_1:poll_soft_wc:647:(pid 3422): polled software generated completion 
on CQ 0x14
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.985964] nvmet_rdma: established 
(9): status 0 id ffff8e0aa2f2bc00
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.987073] nvmet: ctrl 1 start 
keep-alive timer for 120 secs
Oct 20 08:13:15 fmslnx0 kernel: [ 2771.987122] nvmet: creating 
controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:01020304.
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.120199] nvmet_rdma: disconnected 
(10): status 0 id ffff8e0aa2f2bc00
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.120253] nvmet_rdma: cm_id= 
ffff8e0aa2f2bc00 queue->state= 1
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.121781] 
mlx5_1:poll_soft_wc:647:(pid 3422): polled software generated completion 
on CQ 0x14
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122046] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122105] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf5
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122315] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Requestor error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122374] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf5
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122428] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122480] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122534] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122586] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122639] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122690] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122742] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122794] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122846] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122897] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.122949] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123000] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123052] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123103] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123155] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123207] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123260] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123311] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123363] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123414] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123466] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123518] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123570] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123621] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123673] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123724] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123776] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123828] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123881] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123933] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.123991] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124042] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124433] 
mlx5_1:mlx5_poll_one:586:(pid 3422): Responder error cqe on cqn 0x5d:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124488] 
mlx5_1:mlx5_poll_one:588:(pid 3422): syndrome 0x5, vendor syndrome 0xf9
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124567] nvmet_rdma: freeing queue 0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124607] ------------[ cut here 
]------------
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124651] WARNING: CPU: 0 PID: 1445 
at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.124710] Modules linked in: 
mlx5_ib mlx5_core rdma_ucm ib_uverbs ib_mthca nvmet_rdma nvmet 
nls_iso8859_1 intel_rapl sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp dcdbas dell_smm_hwmon kvm_intel kvm irqbypass 
serio_raw snd_hda_codec_realtek snd_hda_codec_generic joydev input_leds 
snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep 
snd_pcm lpc_ich snd_timer snd soundcore mei_me shpchp mei ib_iser 
rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 
raid0 multipath linear hid_generic usbhid hid amdkfd amd_iommu_v2 radeon 
crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel ttm 
aesni_intel drm_kms_helper aes_x86_64 lrw gf128mul syscopyarea 
glue_helper sysfillrect ablk_helper sysimgblt cryptd fb_sys_fops psmouse 
e1000e isci ahci ptp drm libahci nvme libsas pps_core nvme_core 
scsi_transport_sas fjes jitterentropy_rng drbg ansi_cprng [last 
unloaded: mlx5_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.138833] CPU: 0 PID: 1445 Comm: 
kworker/0:29 Not tainted 4.8.2 #1
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.141298] Hardware name: Dell Inc. 
Precision T7600/0VHRW1, BIOS A12 09/29/2014
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.143795] Workqueue: ib_cm 
cm_work_handler [ib_cm]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.146279]  0000000000000086 
00000000d356260a ffff8e0ac44478f8 ffffffffb93dfce3
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.148788]  0000000000000000 
0000000000000000 ffff8e0ac4447938 ffffffffb907899b
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.151285]  00000096d356260a 
0000000000000200 ffff8e0abb9b0000 ffff8e0cae2a6494
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.153789] Call Trace:
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.156253]  [<ffffffffb93dfce3>] 
dump_stack+0x63/0x90
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.158692]  [<ffffffffb907899b>] 
__warn+0xcb/0xf0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.161080]  [<ffffffffb9078acd>] 
warn_slowpath_null+0x1d/0x20
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.163444]  [<ffffffffb907e08b>] 
__local_bh_enable_ip+0x6b/0x80
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.165818]  [<ffffffffb975de6c>] 
ipv4_neigh_lookup+0xac/0x130
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.168184]  [<ffffffffc086c512>] 
addr_resolve_neigh+0xb2/0x2b0 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.170524]  [<ffffffffc086c91c>] 
addr_resolve+0x20c/0x280 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.172851]  [<ffffffffb93f5a2a>] ? 
find_next_zero_bit+0x1a/0x20
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.175142]  [<ffffffffb93e12f9>] ? 
idr_get_empty_slot+0x199/0x3b0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.177413]  [<ffffffffb91e855c>] ? 
kmem_cache_alloc_trace+0xdc/0x1a0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.179693]  [<ffffffffc086cc3c>] 
rdma_resolve_ip+0x18c/0x2c0 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.181975]  [<ffffffffc086c060>] ? 
rdma_addr_register_client+0x30/0x30 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.184272]  [<ffffffffc086d199>] 
rdma_addr_find_l2_eth_by_grh+0x139/0x240 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.186577]  [<ffffffffb902a77c>] ? 
__switch_to+0x2dc/0x700
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.188885]  [<ffffffffc086164d>] 
ib_init_ah_from_wc+0x19d/0x570 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.191212]  [<ffffffffb9034ec9>] ? 
sched_clock+0x9/0x10
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.193531]  [<ffffffffb90a8f3f>] ? 
sched_clock_cpu+0x8f/0xa0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.195846]  [<ffffffffb90a2cb4>] ? 
check_preempt_curr+0x54/0x90
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.198162]  [<ffffffffb90ac453>] ? 
update_curr+0xf3/0x180
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.200471]  [<ffffffffc0861a59>] 
ib_create_ah_from_wc+0x39/0x70 [ib_core]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.202794]  [<ffffffffc0672fc7>] 
cm_alloc_response_msg.isra.33+0x37/0xb0 [ib_cm]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.205134]  [<ffffffffc0677da5>] 
cm_work_handler+0x11d5/0x16f2 [ib_cm]
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.207476]  [<ffffffffb9092c4b>] 
process_one_work+0x16b/0x480
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.209815]  [<ffffffffb9092fab>] 
worker_thread+0x4b/0x500
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.212157]  [<ffffffffb9092f60>] ? 
process_one_work+0x480/0x480
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.214506]  [<ffffffffb9099158>] 
kthread+0xd8/0xf0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.216852]  [<ffffffffb9832e1f>] 
ret_from_fork+0x1f/0x40
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.219189]  [<ffffffffb9099080>] ? 
kthread_create_on_node+0x1a0/0x1a0
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.221540] ---[ end trace 
40812fc5b7bae90e ]---
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.223991] 
mlx5_1:poll_soft_wc:647:(pid 3422): polled software generated completion 
on CQ 0x14
Oct 20 08:13:15 fmslnx0 kernel: [ 2772.232114] nvmet: ctrl 1 stop keep-alive
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next,v2,7/9] net: use core MTU range checking in misc drivers
From: Sven Eckelmann @ 2016-10-22  7:17 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Stefan Richter, Faisal Latif, Cliff Whickman, Robin Holt,
	Jes Sorensen, Marek Lindner, Simon Wunderlich, Antonio Quartulli,
	Sathya Prakash, Chaitra P B, Suganath Prabu Subramani,
	MPT-FusionLinux.pdl-dY08KVG/lbpWk0Htik3J/w, Sebastian Reichel,
	Felipe Balbi, Arvid Brodin, Remi Denis-Courmont
In-Reply-To: <20161020175524.6184-8-jarod-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2230 bytes --]

On Donnerstag, 20. Oktober 2016 13:55:22 CEST Jarod Wilson wrote:
[...]
> batman-adv:
> - set max_mtu
> - remove batadv_interface_change_mtu
> - initialization is a little async, not 100% certain that max_mtu is set
>   in the optimal place, don't have hardware to test with

batman-adv is creating a virtual interface - so there are no
hardware requirements (ok, ethernet compatible hardware - even
when only virtual/emulated).

[...]
> diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
> index 49e16b6..112679d 100644
> --- a/net/batman-adv/soft-interface.c
> +++ b/net/batman-adv/soft-interface.c
> @@ -158,17 +158,6 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p)
>  	return 0;
>  }
>  
> -static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu)
> -{
> -	/* check ranges */
> -	if ((new_mtu < 68) || (new_mtu > batadv_hardif_min_mtu(dev)))
> -		return -EINVAL;
> -
> -	dev->mtu = new_mtu;
> -
> -	return 0;
> -}
> -
>  /**
>   * batadv_interface_set_rx_mode - set the rx mode of a device
>   * @dev: registered network device to modify
> @@ -920,7 +909,6 @@ static const struct net_device_ops batadv_netdev_ops = {
>  	.ndo_vlan_rx_add_vid = batadv_interface_add_vid,
>  	.ndo_vlan_rx_kill_vid = batadv_interface_kill_vid,
>  	.ndo_set_mac_address = batadv_interface_set_mac_addr,
> -	.ndo_change_mtu = batadv_interface_change_mtu,
>  	.ndo_set_rx_mode = batadv_interface_set_rx_mode,
>  	.ndo_start_xmit = batadv_interface_tx,
>  	.ndo_validate_addr = eth_validate_addr,
> @@ -987,6 +975,7 @@ struct net_device *batadv_softif_create(struct net *net, const char *name)
>  	dev_net_set(soft_iface, net);
>  
>  	soft_iface->rtnl_link_ops = &batadv_link_ops;
> +	soft_iface->max_mtu = batadv_hardif_min_mtu(soft_iface);
>  
>  	ret = register_netdevice(soft_iface);
>  	if (ret < 0) {

This looks bogus to me. You are now setting max_mtu during initialization of
the virtual interface. But at this time no slave interfaces were added to the
master batman-adv interface. So the batadv_hardif_min_mtu will not return the
correct value here. Especially if you don't have fragmentation enabled.

So this change looks like a bug to me

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: Introduction of libqedr to the Consolidated Userspace RDMA Library Repo
From: Amrani, Ram @ 2016-10-22  7:46 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Borundia, Rajesh
In-Reply-To: <30ac13b7-f9e7-e08f-5f12-e6517117f2ba-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

>I saw you fixed up the things Jason had referred to.  I merged your
>request, but there is still an outstanding build issue (I couldn't get
>to the travis logs to see it at the time, but Jason let me know it was a
>real issue, not an issue with Travis CI).  Please get that fixed up as
>soon as possible.  As soon as the build fix is available we need to get
>it merged in too.

The only outstanding "real" issue I'm aware of is compilation in 32-bit
as reported by Travis CI. I have a patch ready for it but I cannot send it 
right now (it's Holiday and I'm away). I guess I will be able to pull it off
tonight (in 12 hours) or tomorrow morning.

Do let me know if there's anything else ("but Jason let me know it was a
real issue, not an issue with Travis CI").

Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: use core MTU range checking in misc drivers
From: Stefan Richter @ 2016-10-22  9:36 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Sabrina Dubroca, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Faisal Latif,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Cliff Whickman, Robin Holt,
	Jes Sorensen, Marek Lindner, Simon Wunderlich, Antonio Quartulli
In-Reply-To: <20161020031641.GJ18569-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Oct 19 Jarod Wilson wrote:
> On Thu, Oct 20, 2016 at 12:38:46AM +0200, Stefan Richter wrote:
> > On Oct 19 Sabrina Dubroca wrote:
> > > 2016-10-18, 22:33:33 -0400, Jarod Wilson wrote:
> > > [...]
> > > > diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c
> > > > index 309311b..b5f125c 100644
> > > > --- a/drivers/firewire/net.c
> > > > +++ b/drivers/firewire/net.c
> > > > @@ -1349,15 +1349,6 @@ static netdev_tx_t fwnet_tx(struct sk_buff *skb, struct net_device *net)
> > > >  	return NETDEV_TX_OK;
> > > >  }
> > > >  
> > > > -static int fwnet_change_mtu(struct net_device *net, int new_mtu)
> > > > -{
> > > > -	if (new_mtu < 68)
> > > > -		return -EINVAL;
> > > > -
> > > > -	net->mtu = new_mtu;
> > > > -	return 0;
> > > > -}
> > > > -  
> > > 
> > > This doesn't do any upper bound checking.
> > 
> > I need to check more closely, but I think the RFC 2734 encapsulation spec
> > and our implementation do not impose a particular upper limit.  Though I
> > guess it's bad to let userland set arbitrarily large values here.
> 
> In which case, that would suggest using IP_MAX_MTU (65535) here.

Probably.  I (or somebody) need to check the spec and the code once more.

[...]
> > > > @@ -1481,6 +1471,8 @@ static int fwnet_probe(struct fw_unit *unit,
> > > >  	max_mtu = (1 << (card->max_receive + 1))
> > > >  		  - sizeof(struct rfc2734_header) - IEEE1394_GASP_HDR_SIZE;
> > > >  	net->mtu = min(1500U, max_mtu);
> > > > +	net->min_mtu = ETH_MIN_MTU;
> > > > +	net->max_mtu = net->mtu;  
> > > 
> > > But that will now prevent increasing the MTU above the initial value?
> > 
> > Indeed, therefore NAK.
> 
> However, there's an explicit calculation for 'max_mtu' right there that I
> glazed right over. It would seem perhaps *that* should be used for
> net->max_mtu here, no?

No.  This 'max_mtu' here is not the absolute maximum.  It is only an
initial MTU which has the property that link fragmentation is not
going to happen (if all other peers will at least as capable as this
node).
-- 
Stefan Richter
-======----- =-=- =-==-
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Introduction of libqedr to the Consolidated Userspace RDMA Library Repo
From: Amrani, Ram @ 2016-10-22 13:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Borundia, Rajesh
In-Reply-To: <SN1PR07MB2207608EF9B6869E265C85EAF8D70-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>


>The only outstanding "real" issue I'm aware of is compilation in 32-bit
>as reported by Travis CI. I have a patch ready for it but I cannot send it 
>right now (it's Holiday and I'm away). I guess I will be able to pull it off
>tonight (in 12 hours) or tomorrow morning.

I just created a pull request, earlier than I expected.
The commit was verified on a 32-bit RH6.8. Travis seems happy too.
(Is it creating reports only on pull requests?)

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Introduction of libqedr to the Consolidated Userspace RDMA Library Repo
From: Doug Ledford @ 2016-10-22 14:28 UTC (permalink / raw)
  To: Amrani, Ram, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Borundia, Rajesh
In-Reply-To: <SN1PR07MB2207E025DDE2F38643270AA0F8D70-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 660 bytes --]

On 10/22/2016 9:26 AM, Amrani, Ram wrote:
> 
>> The only outstanding "real" issue I'm aware of is compilation in 32-bit
>> as reported by Travis CI. I have a patch ready for it but I cannot send it 
>> right now (it's Holiday and I'm away). I guess I will be able to pull it off
>> tonight (in 12 hours) or tomorrow morning.
> 
> I just created a pull request, earlier than I expected.
> The commit was verified on a 32-bit RH6.8. Travis seems happy too.
> (Is it creating reports only on pull requests?)
> 
> Thanks,
> Ram
> 

Thanks, merged.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* RE: [PATCH RFC v2 2/3] rdma_cm: add rdma_consumer_reject() helper      function
From: Steve Wise @ 2016-10-22 15:57 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	sagig-NQWnxTmZq1alnMjI0IkVqw, axboe-b10kYP2dOMg
In-Reply-To: <20161021121428.GB17028-jcswGhMUV9g@public.gmane.org>

> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -114,6 +114,19 @@ const char *__attribute_const__
> rdma_reject_msg(struct rdma_cm_id *id,
> >  }
> >  EXPORT_SYMBOL(rdma_reject_msg);
> >
> > +bool rdma_consumer_reject(struct rdma_cm_id *id, int reason)
> > +{
> > +	if (rdma_ib_or_roce(id->device, id->port_num))
> > +		return ib_consumer_reject(reason);
> > +
> > +	if (rdma_protocol_iwarp(id->device, id->port_num))
> > +		return iw_consumer_reject(reason);
> > +
> > +	/* FIXME should we WARN_ONCE() here? */
> > +	return false;
> 
> Yes.  Also I'd just inline the ib_consumer_reject and iw_consumer_reject
> helpers here.
>

Why is that preferred vs the static inline functions in ib_cm.h and iw_cm.h?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH RFC v2 3/3] nvme-rdma: use rdma_reject_msg() to log connection rejects
From: Steve Wise @ 2016-10-22 16:12 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	sagig-NQWnxTmZq1alnMjI0IkVqw, axboe-b10kYP2dOMg
In-Reply-To: <20161021122318.GB17325-jcswGhMUV9g@public.gmane.org>

> 
> On Thu, Oct 20, 2016 at 03:40:29PM -0700, Steve Wise wrote:
> > @@ -1237,18 +1237,22 @@ out_destroy_queue_ib:
> >  static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,
> >  		struct rdma_cm_event *ev)
> >  {
> > +	struct rdma_cm_id *cm_id = queue->cm_id;
> > +	int rdma_status = ev->status;
> > +	short nvme_status = -1;
> > +
> > +	if (rdma_consumer_reject(cm_id, rdma_status) &&
> > +	    ev->param.conn.private_data_len) {
> >  		struct nvme_rdma_cm_rej *rej =
> >  			(struct nvme_rdma_cm_rej *)ev-
> >param.conn.private_data;
> 
> Given the nasty casting issues in the current RDMA/CM API maybe we should
> actually expand the scope of the rdma_consumer_reject helper to include
> the above check, e.g. check that there is a private data len and then
> return a pointer to the private data?

An application could reject and not provide private data, so I think we need
3 helpers (so far):

rdma_reject_msg() - protocol reject reason string
rdma_is_consumer_reject() - true if the peer consumer/ulp rejected
rdma_consumer_reject_data() - ptr to any private data

Sound good?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: use core MTU range checking in misc drivers
From: Stefan Richter @ 2016-10-22 18:51 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Sabrina Dubroca, linux-kernel, netdev, Faisal Latif, linux-rdma,
	Cliff Whickman, Robin Holt, Jes Sorensen, Marek Lindner,
	Simon Wunderlich, Antonio Quartulli
In-Reply-To: <20161022113607.55832988@kant>

On Oct 22 Stefan Richter wrote:
> On Oct 19 Jarod Wilson wrote:
> > On Thu, Oct 20, 2016 at 12:38:46AM +0200, Stefan Richter wrote:  
> > > On Oct 19 Sabrina Dubroca wrote:  
> > > > 2016-10-18, 22:33:33 -0400, Jarod Wilson wrote:
[...]
> > > > > @@ -1481,6 +1471,8 @@ static int fwnet_probe(struct fw_unit *unit,
> > > > >  	max_mtu = (1 << (card->max_receive + 1))
> > > > >  		  - sizeof(struct rfc2734_header) - IEEE1394_GASP_HDR_SIZE;
> > > > >  	net->mtu = min(1500U, max_mtu);
> > > > > +	net->min_mtu = ETH_MIN_MTU;
> > > > > +	net->max_mtu = net->mtu;    
> > > > 
> > > > But that will now prevent increasing the MTU above the initial value?  
> > > 
> > > Indeed, therefore NAK.  
> > 
> > However, there's an explicit calculation for 'max_mtu' right there that I
> > glazed right over. It would seem perhaps *that* should be used for
> > net->max_mtu here, no?  
> 
> No.  This 'max_mtu' here is not the absolute maximum.  It is only an
> initial MTU which has the property that link fragmentation is not
> going to happen (if all other peers will at least as capable as this
> node).

Besides, card->max_receive is about what the card can receive (at the IEEE
1394 link layer), not about what the card can send.
-- 
Stefan Richter
-======----- =-=- =-==-
http://arcgraph.de/sr/

^ permalink raw reply

* Re: [PATCH net-next v2 7/9] net: use core MTU range checking in misc drivers
From: Stefan Richter @ 2016-10-22 19:16 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, netdev, linux-rdma, Faisal Latif, Cliff Whickman,
	Robin Holt, Jes Sorensen, Marek Lindner, Simon Wunderlich,
	Antonio Quartulli, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, MPT-FusionLinux.pdl, Sebastian Reichel,
	Felipe Balbi, Arvid Brodin, Remi Denis-Courmont
In-Reply-To: <20161020175524.6184-8-jarod@redhat.com>

On Oct 20 Jarod Wilson wrote:
> firewire-net:
> - set min/max_mtu
> - remove fwnet_change_mtu
[...]
> --- a/drivers/firewire/net.c
> +++ b/drivers/firewire/net.c
[...]
> @@ -1478,9 +1467,10 @@ static int fwnet_probe(struct fw_unit *unit,
>  	 * Use the RFC 2734 default 1500 octets or the maximum payload
>  	 * as initial MTU
>  	 */
> -	max_mtu = (1 << (card->max_receive + 1))
> -		  - sizeof(struct rfc2734_header) - IEEE1394_GASP_HDR_SIZE;
> -	net->mtu = min(1500U, max_mtu);
> +	net->max_mtu = (1 << (card->max_receive + 1))
> +		       - sizeof(struct rfc2734_header) - IEEE1394_GASP_HDR_SIZE;
> +	net->mtu = min(1500U, net->max_mtu);
> +	net->min_mtu = ETH_MIN_MTU;
>  
>  	/* Set our hardware address while we're at it */
>  	ha = (union fwnet_hwaddr *)net->dev_addr;

Please preserve the current behavior, i.e. do not enforce any particular
upper bound.  (Especially none based on the local link layer controller's
max_receive parameter.)

BTW, after having read RFC 2734, RFC 3146, and the code, I am convinced
that net->mtu should be initialized to 1500, not less.  But such a change
should be done in a separate patch.
-- 
Stefan Richter
-======----- =-=- =-==-
http://arcgraph.de/sr/

^ permalink raw reply

* Re: Introduction of libqedr to the Consolidated Userspace RDMA Library Repo
From: Jason Gunthorpe @ 2016-10-23 15:52 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Elior, Ariel, Kalderon, Michal, Borundia, Rajesh
In-Reply-To: <SN1PR07MB2207E025DDE2F38643270AA0F8D70-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

On Sat, Oct 22, 2016 at 01:26:51PM +0000, Amrani, Ram wrote:

> I just created a pull request, earlier than I expected.
> The commit was verified on a 32-bit RH6.8. Travis seems happy too.
> (Is it creating reports only on pull requests?)

travis runs on pull requests and after each merge on the TOT.

It became horribly broken last friday and could not report anything.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch 2/2] IB/hns: Fix some error handling in hns_roce_v1_query_qp()
From: oulijun @ 2016-10-24 12:29 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Wei Hu(Xavier), Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma, kernel-janitors
In-Reply-To: <20161014073441.GB15238@mwanda>

在 2016/10/14 15:34, Dan Carpenter 写道:
> to_ib_qp_state() returns IB_QPS_ERR (6) on error, it doesn't return -1.
> 
> Fixes: 9a4435375cd1 ('IB/hns: Add driver files for hns RoCE driver')
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
> index 58b150e..280abac 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
> @@ -2544,7 +2544,7 @@ int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
>  			       QP_CONTEXT_QPC_BYTES_144_QP_STATE_M,
>  			       QP_CONTEXT_QPC_BYTES_144_QP_STATE_S);
>  	tmp_qp_state = (int)to_ib_qp_state((enum hns_roce_qp_state)state);
> -	if (tmp_qp_state == -1) {
> +	if (tmp_qp_state == IB_QPS_ERR) {
>  		dev_err(dev, "to_ib_qp_state error\n");
>  		ret = -EINVAL;
>  		goto out;
> 
> .
> 
Hi, Dan Carpenter
  After checking your modification, I think that it maybe fix this:
  static enum ib_qp_state to_ib_qp_state(enum hns_roce_qp_state state)
{
	switch (state) {
	case HNS_ROCE_QP_STATE_RST:
		return IB_QPS_RESET;
	case HNS_ROCE_QP_STATE_INIT:
		return IB_QPS_INIT;
	case HNS_ROCE_QP_STATE_RTR:
		return IB_QPS_RTR;
	case HNS_ROCE_QP_STATE_RTS:
		return IB_QPS_RTS;
	case HNS_ROCE_QP_STATE_SQD:
		return IB_QPS_SQD;
	case HNS_ROCE_QP_STATE_ERR:
		return IB_QPS_ERR;
	default:
		return -1;
	}
}

because the IB_QPS_ERR is legal state, but the tmp_qp_state should return a illegal state when
the the to_ib_qp_state executed the default branch.

thanks
Lijun Ou




^ permalink raw reply

* Re: NVMe Over Fabrics - Random Crash with SoftROCE
From: Christoph Hellwig @ 2016-10-24 12:46 UTC (permalink / raw)
  To: Ripduman Sohan
  Cc: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAGbt=A6q27wj79iTt=EDhCKo_6kzw29Aoz4QN6_yjTWUzKg9uQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Ripduman,

please report all NVMe issues to the linux-nvme list.  I'm reading there
as well, but it will allow for more people to follow the issue.

I'm not even sure what the error is between all the traces, but maybe
someone understands the rxe traces better there or on the linux-rdma
list.

On Fri, Oct 21, 2016 at 10:30:15PM +0100, Ripduman Sohan wrote:
> Hi,
> 
> I'm trying to get NVMF going over SoftRoCE (rxe_rdma) and I get random
> crashes.  At the simplest reduction, if I connect the initiator to the
> target, on an idle system I will on occasion get the error below on the
> initiator (no data has been transferred between hosts at this point - and
> this happens randomly, sometimes it takes hours, sometimes it happens
> within 10 mins of boot).
> 
> I'll probably start to debug this in a couple of weeks, but I thought it
> might be passing it by you in case it's something you might have seen
> before/have some clues?
> 
> Thanks
> 
> Rip
> 
> 
> ---- log below ---- (initiator).
> 
> rdma_rxe: loaded
> rdma_rxe: set rxe0 active
> rdma_rxe: added rxe0 to eth4
> nvme nvme0: creating 8 I/O queues.
> nvme nvme0: new ctrl: NQN "ramdisk", addr 172.16.139.22:4420
> nvme nvme0: failed nvme_keep_alive_end_io error=16391
> nvme nvme0: reconnecting in 10 seconds
> nvme nvme0: Successfully reconnected
> 
> 1317: nvme nvme0: disconnected (10): status 0 id ffff8801389c6800
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff8801376d8000
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff8801369ee400
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff88013a9dc400
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff88013997d000
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff880137201c00
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff88013548f800
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff880138c0b800
> 1346: nvme nvme0: disconnect received - connection closed
> 1317: nvme nvme0: disconnected (10): status 0 id ffff880139936400
> 1346: nvme nvme0: disconnect received - connection closed
> 756: rdma_rxe: qp#26 state -> ERR
> 756: rdma_rxe: qp#26 state -> ERR
> 756: rdma_rxe: qp#26 state -> ERR
> 756: rdma_rxe: qp#27 state -> ERR
> 756: rdma_rxe: qp#27 state -> ERR
> 756: rdma_rxe: qp#27 state -> ERR
> 756: rdma_rxe: qp#28 state -> ERR
> 756: rdma_rxe: qp#28 state -> ERR
> 756: rdma_rxe: qp#28 state -> ERR
> 756: rdma_rxe: qp#29 state -> ERR
> 756: rdma_rxe: qp#29 state -> ERR
> 756: rdma_rxe: qp#29 state -> ERR
> 756: rdma_rxe: qp#30 state -> ERR
> 756: rdma_rxe: qp#30 state -> ERR
> 756: rdma_rxe: qp#30 state -> ERR
> 756: rdma_rxe: qp#31 state -> ERR
> 756: rdma_rxe: qp#31 state -> ERR
> 756: rdma_rxe: qp#31 state -> ERR
> 756: rdma_rxe: qp#32 state -> ERR
> 756: rdma_rxe: qp#32 state -> ERR
> 756: rdma_rxe: qp#32 state -> ERR
> 756: rdma_rxe: qp#33 state -> ERR
> 756: rdma_rxe: qp#33 state -> ERR
> 756: rdma_rxe: qp#33 state -> ERR
> 756: rdma_rxe: qp#25 state -> ERR
> 756: rdma_rxe: qp#25 state -> ERR
> 756: rdma_rxe: qp#25 state -> ERR
> 1317: nvme nvme0: address resolved (0): status 0 id ffff8801389c6800
> 302: rdma_rxe: qp#33 max_wr = 33, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#33 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff8801389c6800
> 730: rdma_rxe: qp#33 state -> INIT
> 698: rdma_rxe: qp#33 set resp psn = 0x7a0c05
> 704: rdma_rxe: qp#33 set min rnr timer = 0x0
> 736: rdma_rxe: qp#33 state -> RTR
> 684: rdma_rxe: qp#33 set retry count = 7
> 691: rdma_rxe: qp#33 set rnr retry count = 7
> 711: rdma_rxe: qp#33 set req psn = 0x2c631
> 741: rdma_rxe: qp#33 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff8801389c6800
> 1317: nvme nvme0: address resolved (0): status 0 id ffff88013a461800
> 302: rdma_rxe: qp#34 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#34 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff88013a461800
> 730: rdma_rxe: qp#34 state -> INIT
> 698: rdma_rxe: qp#34 set resp psn = 0x4e6c1c
> 704: rdma_rxe: qp#34 set min rnr timer = 0x0
> 736: rdma_rxe: qp#34 state -> RTR
> 684: rdma_rxe: qp#34 set retry count = 7
> 691: rdma_rxe: qp#34 set rnr retry count = 7
> 711: rdma_rxe: qp#34 set req psn = 0x186e10
> 741: rdma_rxe: qp#34 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff88013a461800
> 1317: nvme nvme0: address resolved (0): status 0 id ffff88013997dc00
> 302: rdma_rxe: qp#35 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#35 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff88013997dc00
> 730: rdma_rxe: qp#35 state -> INIT
> 698: rdma_rxe: qp#35 set resp psn = 0xd727f8
> 704: rdma_rxe: qp#35 set min rnr timer = 0x0
> 736: rdma_rxe: qp#35 state -> RTR
> 684: rdma_rxe: qp#35 set retry count = 7
> 691: rdma_rxe: qp#35 set rnr retry count = 7
> 711: rdma_rxe: qp#35 set req psn = 0xd8e512
> 741: rdma_rxe: qp#35 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff88013997dc00
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880139d81000
> 302: rdma_rxe: qp#36 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#36 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880139d81000
> 730: rdma_rxe: qp#36 state -> INIT
> 698: rdma_rxe: qp#36 set resp psn = 0x7978ee
> 704: rdma_rxe: qp#36 set min rnr timer = 0x0
> 736: rdma_rxe: qp#36 state -> RTR
> 684: rdma_rxe: qp#36 set retry count = 7
> 691: rdma_rxe: qp#36 set rnr retry count = 7
> 711: rdma_rxe: qp#36 set req psn = 0xc5b0ef
> 741: rdma_rxe: qp#36 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880139d81000
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880137201800
> 302: rdma_rxe: qp#37 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#37 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880137201800
> 730: rdma_rxe: qp#37 state -> INIT
> 698: rdma_rxe: qp#37 set resp psn = 0x970dd5
> 704: rdma_rxe: qp#37 set min rnr timer = 0x0
> 736: rdma_rxe: qp#37 state -> RTR
> 684: rdma_rxe: qp#37 set retry count = 7
> 691: rdma_rxe: qp#37 set rnr retry count = 7
> 711: rdma_rxe: qp#37 set req psn = 0x71f2a2
> 741: rdma_rxe: qp#37 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880137201800
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880139e34c00
> 302: rdma_rxe: qp#38 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#38 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880139e34c00
> 730: rdma_rxe: qp#38 state -> INIT
> 698: rdma_rxe: qp#38 set resp psn = 0x542d56
> 704: rdma_rxe: qp#38 set min rnr timer = 0x0
> 736: rdma_rxe: qp#38 state -> RTR
> 684: rdma_rxe: qp#38 set retry count = 7
> 691: rdma_rxe: qp#38 set rnr retry count = 7
> 711: rdma_rxe: qp#38 set req psn = 0x71fad4
> 741: rdma_rxe: qp#38 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880139e34c00
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880134e43800
> 302: rdma_rxe: qp#39 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#39 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880134e43800
> 730: rdma_rxe: qp#39 state -> INIT
> 698: rdma_rxe: qp#39 set resp psn = 0xdbca4
> 704: rdma_rxe: qp#39 set min rnr timer = 0x0
> 736: rdma_rxe: qp#39 state -> RTR
> 684: rdma_rxe: qp#39 set retry count = 7
> 691: rdma_rxe: qp#39 set rnr retry count = 7
> 711: rdma_rxe: qp#39 set req psn = 0xd84ac0
> 741: rdma_rxe: qp#39 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880134e43800
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880138d15400
> 302: rdma_rxe: qp#40 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#40 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880138d15400
> 730: rdma_rxe: qp#40 state -> INIT
> 698: rdma_rxe: qp#40 set resp psn = 0x6afd31
> 704: rdma_rxe: qp#40 set min rnr timer = 0x0
> 736: rdma_rxe: qp#40 state -> RTR
> 684: rdma_rxe: qp#40 set retry count = 7
> 691: rdma_rxe: qp#40 set rnr retry count = 7
> 711: rdma_rxe: qp#40 set req psn = 0xb917ed
> 741: rdma_rxe: qp#40 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880138d15400
> 1317: nvme nvme0: address resolved (0): status 0 id ffff880134f45400
> 302: rdma_rxe: qp#41 max_wr = 129, max_sge = 1, wqe_size = 56
> 730: rdma_rxe: qp#41 state -> INIT
> 1317: nvme nvme0: route resolved  (2): status 0 id ffff880134f45400
> 730: rdma_rxe: qp#41 state -> INIT
> 698: rdma_rxe: qp#41 set resp psn = 0x8a6989
> 704: rdma_rxe: qp#41 set min rnr timer = 0x0
> 736: rdma_rxe: qp#41 state -> RTR
> 684: rdma_rxe: qp#41 set retry count = 7
> 691: rdma_rxe: qp#41 set rnr retry count = 7
> 711: rdma_rxe: qp#41 set req psn = 0x23c909
> 741: rdma_rxe: qp#41 state -> RTS
> 1317: nvme nvme0: established (9): status 0 id ffff880134f45400
> nvme nvme0: Successfully reconnected
> 
> -- 
> --rip
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH infiniband-diags v3] ibportstate: Fixed switch peer port probing when using DR routing
From: Knut Omang @ 2016-10-24 13:34 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Line Holen, Knut Omang,
	Dag Moxnes

From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

ibportstate queries to a remote peer port on a switch using direct
routing would result in timeouts. The reason for this is that the
DR path was not correctly constructed.

Signed-off-by: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Line Holen <line.holen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 src/ibportstate.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/src/ibportstate.c b/src/ibportstate.c
index cfb8be7..82cbcc2 100644
--- a/src/ibportstate.c
+++ b/src/ibportstate.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2004-2009 Voltaire Inc.  All rights reserved.
  * Copyright (c) 2010,2011 Mellanox Technologies LTD.  All rights reserved.
+ * Copyright (c) 2011,2016 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -655,15 +656,22 @@ int main(int argc, char **argv)
 
 			/* Setup portid for peer port */
 			memcpy(&peerportid, &portid, sizeof(peerportid));
-			peerportid.drpath.cnt = 1;
-			peerportid.drpath.p[1] = (uint8_t) portnum;
-
-			/* Set DrSLID to local lid */
-			if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
-						&selfport, 0) < 0)
-				IBEXIT("could not resolve self");
-			peerportid.drpath.drslid = (uint16_t) selfportid.lid;
-			peerportid.drpath.drdlid = 0xffff;
+			if (portid.lid == 0) {
+				peerportid.drpath.cnt++;
+				if (peerportid.drpath.cnt == IB_SUBNET_PATH_HOPS_MAX) {
+					IBEXIT("Too many hops");
+				}
+			} else {
+				peerportid.drpath.cnt = 1;
+
+				/* Set DrSLID to local lid */
+				if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
+						         &selfport, 0) < 0)
+					IBEXIT("could not resolve self");
+				peerportid.drpath.drslid = selfportid.lid;
+				peerportid.drpath.drdlid = 0xffff;
+			}
+			peerportid.drpath.p[peerportid.drpath.cnt] = (uint8_t) portnum;
 
 			/* Get peer port NodeInfo to obtain peer port number */
 			is_peer_switch = get_node_info(&peerportid, data);

base-commit: 17e03b4738913365a3f947719c4897fcb92df32c
-- 
git-series 0.8.10
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [patch 2/2] IB/hns: Fix some error handling in hns_roce_v1_query_qp()
From: Dan Carpenter @ 2016-10-24 13:43 UTC (permalink / raw)
  To: oulijun
  Cc: Wei Hu(Xavier), Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <580DFEB0.20100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Good catch.  Thanks for the review.  I will resend.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH infiniband-diags v3] ibportstate: Fixed switch peer port probing when using DR routing
From: Hal Rosenstock @ 2016-10-24 13:50 UTC (permalink / raw)
  To: Knut Omang, Ira Weiny
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Line Holen, Dag Moxnes
In-Reply-To: <22b5963c36776d0b03eb0f6706323b4ba4bafac9.1477315879.git-series.knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On 10/24/2016 9:34 AM, Knut Omang wrote:
> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> ibportstate queries to a remote peer port on a switch using direct
> routing would result in timeouts. The reason for this is that the
> DR path was not correctly constructed.
> 
> Signed-off-by: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Line Holen <line.holen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  src/ibportstate.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/src/ibportstate.c b/src/ibportstate.c
> index cfb8be7..82cbcc2 100644
> --- a/src/ibportstate.c
> +++ b/src/ibportstate.c
> @@ -1,6 +1,7 @@
>  /*
>   * Copyright (c) 2004-2009 Voltaire Inc.  All rights reserved.
>   * Copyright (c) 2010,2011 Mellanox Technologies LTD.  All rights reserved.
> + * Copyright (c) 2011,2016 Oracle and/or its affiliates. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -655,15 +656,22 @@ int main(int argc, char **argv)
>  
>  			/* Setup portid for peer port */
>  			memcpy(&peerportid, &portid, sizeof(peerportid));
> -			peerportid.drpath.cnt = 1;
> -			peerportid.drpath.p[1] = (uint8_t) portnum;
> -
> -			/* Set DrSLID to local lid */
> -			if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
> -						&selfport, 0) < 0)
> -				IBEXIT("could not resolve self");
> -			peerportid.drpath.drslid = (uint16_t) selfportid.lid;
> -			peerportid.drpath.drdlid = 0xffff;
> +			if (portid.lid == 0) {
> +				peerportid.drpath.cnt++;
> +				if (peerportid.drpath.cnt == IB_SUBNET_PATH_HOPS_MAX) {
> +					IBEXIT("Too many hops");
> +				}
> +			} else {
> +				peerportid.drpath.cnt = 1;
> +
> +				/* Set DrSLID to local lid */
> +				if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
> +						         &selfport, 0) < 0)
> +					IBEXIT("could not resolve self");
> +				peerportid.drpath.drslid = selfportid.lid;

Why was casting of selfportid.lid to (uint16_t) dropped ?

> +				peerportid.drpath.drdlid = 0xffff;
> +			}
> +			peerportid.drpath.p[peerportid.drpath.cnt] = (uint8_t) portnum;
>  
>  			/* Get peer port NodeInfo to obtain peer port number */
>  			is_peer_switch = get_node_info(&peerportid, data);
> 
> base-commit: 17e03b4738913365a3f947719c4897fcb92df32c
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH infiniband-diags v3] ibportstate: Fixed switch peer port probing when using DR routing
From: Dag Moxnes @ 2016-10-24 14:37 UTC (permalink / raw)
  To: Hal Rosenstock, Knut Omang, Ira Weiny
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Line Holen
In-Reply-To: <02e18a78-ba54-d7a2-c3fe-3e4d392f5593-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>



On 10/24/2016 03:50 PM, Hal Rosenstock wrote:
> On 10/24/2016 9:34 AM, Knut Omang wrote:
>> From: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>
>> ibportstate queries to a remote peer port on a switch using direct
>> routing would result in timeouts. The reason for this is that the
>> DR path was not correctly constructed.
>>
>> Signed-off-by: Dag Moxnes <dag.moxnes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> Reviewed-by: Line Holen <line.holen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> ---
>>   src/ibportstate.c | 26 +++++++++++++++++---------
>>   1 file changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/ibportstate.c b/src/ibportstate.c
>> index cfb8be7..82cbcc2 100644
>> --- a/src/ibportstate.c
>> +++ b/src/ibportstate.c
>> @@ -1,6 +1,7 @@
>>   /*
>>    * Copyright (c) 2004-2009 Voltaire Inc.  All rights reserved.
>>    * Copyright (c) 2010,2011 Mellanox Technologies LTD.  All rights reserved.
>> + * Copyright (c) 2011,2016 Oracle and/or its affiliates. All rights reserved.
>>    *
>>    * This software is available to you under a choice of one of two
>>    * licenses.  You may choose to be licensed under the terms of the GNU
>> @@ -655,15 +656,22 @@ int main(int argc, char **argv)
>>   
>>   			/* Setup portid for peer port */
>>   			memcpy(&peerportid, &portid, sizeof(peerportid));
>> -			peerportid.drpath.cnt = 1;
>> -			peerportid.drpath.p[1] = (uint8_t) portnum;
>> -
>> -			/* Set DrSLID to local lid */
>> -			if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
>> -						&selfport, 0) < 0)
>> -				IBEXIT("could not resolve self");
>> -			peerportid.drpath.drslid = (uint16_t) selfportid.lid;
>> -			peerportid.drpath.drdlid = 0xffff;
>> +			if (portid.lid == 0) {
>> +				peerportid.drpath.cnt++;
>> +				if (peerportid.drpath.cnt == IB_SUBNET_PATH_HOPS_MAX) {
>> +					IBEXIT("Too many hops");
>> +				}
>> +			} else {
>> +				peerportid.drpath.cnt = 1;
>> +
>> +				/* Set DrSLID to local lid */
>> +				if (resolve_self(ibd_ca, ibd_ca_port, &selfportid,
>> +						         &selfport, 0) < 0)
>> +					IBEXIT("could not resolve self");
>> +				peerportid.drpath.drslid = selfportid.lid;
> Why was casting of selfportid.lid to (uint16_t) dropped ?
That was not intentional. Thanks for pointing it out.

Regards,
-Dag
>
>> +				peerportid.drpath.drdlid = 0xffff;
>> +			}
>> +			peerportid.drpath.p[peerportid.drpath.cnt] = (uint8_t) portnum;
>>   
>>   			/* Get peer port NodeInfo to obtain peer port number */
>>   			is_peer_switch = get_node_info(&peerportid, data);
>>
>> base-commit: 17e03b4738913365a3f947719c4897fcb92df32c
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox